Not sure if it's the same issue, but I also see the counter of Map input records is greater than the actual number of input records in some cases.
Jie On Thu, Jul 26, 2012 at 6:04 PM, Prasanth J <[email protected]> wrote: > Hello everyone > > I am using RandomSampleLoader to load 1000 tuples per mapper. I have 11 map > jobs in a small dataset and 109 map jobs in a large dataset. > > I am expecting 11000 tuples from the small dataset and 109000 tuples from the > large dataset. But the actual number of tuples that I get is always more than > what I expected. In small dataset case I am getting 15000 tuples whereas in > large dataset case I am getting 145000 (sometimes 150000) tuples. > > Is this a bug? or is it an expected behavior? If reservoir sampling is used > by all mappers then why is the number of total samples is more? > > Thanks > -- Prasanth >
