On Wed, May 19, 2010 at 3:49 PM, Jeff Eastman <[email protected]>wrote:


> I cannot imagine how one could ever get LDA to scale if it is always
> limited to a single input vector file. Is there a way to get multiple output
> vector files from seqtosparse?
>

I don't know offhand, but is the default input split (mapred.min.split.size)
size too large for this particular use case? (if it is 0/unspecified it
defaults to the block size, which is 64MB). I wonder if setting that smaller
will allow more mappers to spawn.

Reply via email to