It would be helpful to see some statistics out of both the jobs like bytes
read, written number of errors etc.

On Thu, Aug 16, 2012 at 8:02 PM, Raj Vishwanathan <rajv...@yahoo.com> wrote:

> You probably have speculative execution on. Extra maps and reduce tasks
> are run in case some of them fail
>
> Raj
>
>
> Sent from my iPad
> Please excuse the typos.
>
> On Aug 16, 2012, at 11:36 AM, "in.abdul" <in.ab...@gmail.com> wrote:
>
> > Hi Gaurav,
> >   Number map is not depents upon number block . It is really depends upon
> > number of input splits . If you had 100GB of data and you had 10 split
> > means then you can see only 10 maps .
> >
> > Please correct me if i am wrong
> >
> > Thanks and regards,
> > Syed abdul kather
> > On Aug 16, 2012 7:44 PM, "Gaurav Dasgupta [via Lucene]" <
> > ml-node+s472066n4001631...@n3.nabble.com> wrote:
> >
> >> Hi users,
> >>
> >> I am working on a CDH3 cluster of 12 nodes (Task Trackers running on all
> >> the 12 nodes and 1 node running the Job Tracker).
> >> In order to perform a WordCount benchmark test, I did the following:
> >>
> >>   - Executed "RandomTextWriter" first to create 100 GB data (Note that I
> >>   have changed the "test.randomtextwrite.total_bytes" parameter only,
> rest
> >>   all are kept default).
> >>   - Next, executed the "WordCount" program for that 100 GB dataset.
> >>
> >> The "Block Size" in "hdfs-site.xml" is set as 128 MB. Now, according to
> my
> >> calculation, total number of Maps to be executed by the wordcount job
> >> should be 100 GB / 128 MB or 102400 MB / 128 MB = 800.
> >> But when I am executing the job, it is running a total number of 900
> Maps,
> >> i.e., 100 extra.
> >> So, why this extra number of Maps? Although, my job is completing
> >> successfully without any error.
> >>
> >> Again, if I don't execute the "RandomTextWwriter" job to create data for
> >> my wordcount, rather I put my own 100 GB text file in HDFS and run
> >> "WordCount", I can then see the number of Maps are equivalent to my
> >> calculation, i.e., 800.
> >>
> >> Can anyone tell me why this odd behaviour of Hadoop regarding the number
> >> of Maps for WordCount only when the dataset is generated by
> >> RandomTextWriter? And what is the purpose of these extra number of Maps?
> >>
> >> Regards,
> >> Gaurav Dasgupta
> >>
> >>
> >> ------------------------------
> >> If you reply to this email, your message will be added to the discussion
> >> below:
> >>
> >>
> http://lucene.472066.n3.nabble.com/Number-of-Maps-running-more-than-expected-tp4001631.html
> >> To unsubscribe from Lucene, click here<
> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=472066&code=aW4uYWJkdWxAZ21haWwuY29tfDQ3MjA2NnwxMDczOTUyNDEw
> >
> >> .
> >> NAML<
> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml
> >
> >>
> >
> >
> >
> >
> > -----
> > THANKS AND REGARDS,
> > SYED ABDUL KATHER
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/Number-of-Maps-running-more-than-expected-tp4001631p4001683.html
> > Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
>

Reply via email to