Hi,

1. I have an implementation with some optimizations that you
mentioned. Even when keying on the first two words on a ngram, we
would still have skewed sharding for unigrams. Isn't it?
>
You would, but this will be a lot less.
>
2. One of the nice things I would like to facilitate is daily
*incremental* updates to LM. I have previously read your work on
randomized storage of LM's and found it very interesting. I will look
through it again to jog my memory and send questions I have your away.
>
We also have this too, and in a randomised setting also.  Look at out
"streaming" language model work, which allows for incremental updates
to a precomputed LM.
Although I say so myself, I like this a lot, since it effectively
allows for LMs to be trained on unbounded amounts of monolingual data:

www.aclweb.org/anthology/D/D09/D09-1079.pdf


3. It would great if you can elaborate on why HBase did not meet your
needs. Was this application specific?
>
This may have been due to us using an early version of it.  But it was
just too slow and unreliable at the time.  Also, we have a strong
preference for code in C++ and having to deal with Java is just a
pain.
>.
thanks,
Mandar

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

Reply via email to