On Nov 9, 6:10 pm, "Rajat Gogri" <[EMAIL PROTECTED]> wrote:
> Hello,
>
> I was asked this question for MSFT interview.
>
> You are given Symbols for stock ticker application,
>
> All are 4 character , Capital letters only so min is AAAA and max is ZZZZ.
> for example MSFT, GOOG etcetc
> How will you hash it???
  ....
> But I think space is huge here...

There are 26 choices for the first character, 26 choices for the
second character etc... 26^4 = 456976 permutations. You knew this?
Each entry in the table holds a pointer to a record or a null.

To implement a table in memory on an 8/32 bit machine (8 bit byte, 32
bit addressing) you need:
  456976 * 4 = 1827904 bytes
             = say 1.3/4 MB.

The table is large but not unreasonable.

For an entry say a, b, c, d the relationship between the key
AAAA..ZZZZ and table location is:

(Ascii(a)-Ascii('A'))* 26^3 + (Ascii(b)-Ascii('A'))* 26^2 + (Ascii(c)-
Ascii('A'))* 26 + (Ascii(d)-Ascii('A')); i.e. 0..(26^4 - 1)

This is working radix 26 and shifting character values left. Use the
inverse to get the key for any table location.

> Any better answers????????????

Your answer looks reasonable but maybe lacks acuracy, scope, don't you
think?

A record will be a lot bigger than 4 bytes, so perhaps we want to keep
records on disk, only load them when required and maybe unload them if
memory gets short - table also stores file offsets. Using key and
database: might as well get local bit of DBMS to manage table and
forget all this stuff. Saving table to disk: only store non-nulls
complete with table index.

Personally I would call above a vector table not a hash table. To me
hashing involves a table that is smaller than the range of key values,
hash function, collision handling. Google for Hash Function, Table
etc. is quite good.

Hashing downside is that an in-order traversal requires iterating
through key values to see what comes out or additional provision that
is going to slow down insertion, deletion.

Alternatives: 1) Use sparse array techniques to reduce vector table
size. 2) A multi-branch tree, also known as a trie. Each node has 26
way vector table. Full population gets same total table but only
create nodes and/or tables when required. 3) B-Tree - google it.

Hope these thoughts help.


- Martin


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Algorithm Geeks" group.
To post to this group, send email to algogeeks@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/algogeeks
-~----------~----~----~----~------~----~------~--~---

Reply via email to