On Nov 9, 6:10 pm, "Rajat Gogri" <[EMAIL PROTECTED]> wrote: > Hello, > > I was asked this question for MSFT interview. > > You are given Symbols for stock ticker application, > > All are 4 character , Capital letters only so min is AAAA and max is ZZZZ. > for example MSFT, GOOG etcetc > How will you hash it??? .... > But I think space is huge here...
There are 26 choices for the first character, 26 choices for the second character etc... 26^4 = 456976 permutations. You knew this? Each entry in the table holds a pointer to a record or a null. To implement a table in memory on an 8/32 bit machine (8 bit byte, 32 bit addressing) you need: 456976 * 4 = 1827904 bytes = say 1.3/4 MB. The table is large but not unreasonable. For an entry say a, b, c, d the relationship between the key AAAA..ZZZZ and table location is: (Ascii(a)-Ascii('A'))* 26^3 + (Ascii(b)-Ascii('A'))* 26^2 + (Ascii(c)- Ascii('A'))* 26 + (Ascii(d)-Ascii('A')); i.e. 0..(26^4 - 1) This is working radix 26 and shifting character values left. Use the inverse to get the key for any table location. > Any better answers???????????? Your answer looks reasonable but maybe lacks acuracy, scope, don't you think? A record will be a lot bigger than 4 bytes, so perhaps we want to keep records on disk, only load them when required and maybe unload them if memory gets short - table also stores file offsets. Using key and database: might as well get local bit of DBMS to manage table and forget all this stuff. Saving table to disk: only store non-nulls complete with table index. Personally I would call above a vector table not a hash table. To me hashing involves a table that is smaller than the range of key values, hash function, collision handling. Google for Hash Function, Table etc. is quite good. Hashing downside is that an in-order traversal requires iterating through key values to see what comes out or additional provision that is going to slow down insertion, deletion. Alternatives: 1) Use sparse array techniques to reduce vector table size. 2) A multi-branch tree, also known as a trie. Each node has 26 way vector table. Full population gets same total table but only create nodes and/or tables when required. 3) B-Tree - google it. Hope these thoughts help. - Martin --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Algorithm Geeks" group. To post to this group, send email to algogeeks@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/algogeeks -~----------~----~----~----~------~----~------~--~---