On Wed, May 19, 2010 at 3:49 PM, Jeff Eastman <[email protected]>wrote:
> I cannot imagine how one could ever get LDA to scale if it is always > limited to a single input vector file. Is there a way to get multiple output > vector files from seqtosparse? > I don't know offhand, but is the default input split (mapred.min.split.size) size too large for this particular use case? (if it is 0/unspecified it defaults to the block size, which is 64MB). I wonder if setting that smaller will allow more mappers to spawn.
