Thanks Miles,
1. I agree that I might not have to use any fancy smoothing, but even
at Google scale using simple smoothing seems to aid performance (at
least for Machine translation)
http://acl.ldc.upenn.edu/D/D07/D07-1090.pdf
Has that been your experience as well?

2. Is your code open source?

3. I was also looking to understand if there were any efforts to store
these large sets optimally for real time access. Can you please point
me to effort on hosting LM's using hypertable effort?

4. On related note, does anyone here have experience with hypertable
or similar open source distributed storage paradigms for production
systems.

Mandar

On Thu, Feb 4, 2010 at 3:36 AM, Miles Osborne <[email protected]> wrote:
> My trusty Google alert spotted this!
>
> But yes, I have code which builds large LMs using Hadoop.  That is,
> taking raw text and building ngrams and counts for later hosting.  In
> parallel with this there is a current effort to host this using
> HyperTable.
>
> What I don't have is Hadoop code to smooth the ngrams.  But, if you
> need to use Hadoop to build your LMs then the chances are you don't
> need to do any fancy smoothing either.
>
> Miles
>
>>
> Miles Osborne and Chris Dyer have worked on this separately.
>
> Hopefully Miles is listening.
>
> On Wed, Feb 3, 2010 at 10:07 AM, Mandar Rahurkar <rahur...@...> wrote:
>
>> Hi, All,
>> I was wondering if there has been an initiative to implement large
>> scale language models using hadoop. If not and if there is sufficient
>> interest, I would be interested in adding that functionality.
>>
>> regards,
>> Mandar
>>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>

Reply via email to