Thanks Harsh I did set mapred.map.tasks = 1
but still I can consistently see 3 mappers being invoked and the order is always like this: ****_00002_0 ***_00000_0 ***_00001_0 the 00002_0 and 00001_0 tasks are the ones that consume 0 data this does look like a bug ---- you could try with a simple pig test Yang On Wed, Jul 11, 2012 at 10:15 PM, Harsh J <ha...@cloudera.com> wrote: > Er, sorry I meant mapred.map.tasks = 1 > > On Thu, Jul 12, 2012 at 10:44 AM, Harsh J <ha...@cloudera.com> wrote: > > Try passing mapred.map.tasks = 0 or set a higher min-split size? > >t > > On Thu, Jul 12, 2012 at 10:36 AM, Yang <teddyyyy...@gmail.com> wrote: > >> Thanks Harsh > >> > >> I see > >> > >> then there seems to be some small problems with the Splitter / > InputFormat. > >> > >> I'm just reading a 1-line text file through pig: > >> > >> A = LOAD 'myinput.txt' ; > >> > >> supposedly it should generate at most 1 mapper. > >> > >> but in reality , it seems that pig generated 3 mappers, and basically > fed > >> empty input to 2 of the mappers > >> > >> > >> Thanks > >> Yang > >> > >> On Wed, Jul 11, 2012 at 10:00 PM, Harsh J <ha...@cloudera.com> wrote: > >> > >>> Yang, > >>> > >>> No, those three are individual task attempts. > >>> > >>> This is how you may generally dissect an attempt ID when reading it: > >>> > >>> attempt_201207111710_0024_m_000000_0 > >>> > >>> 1. "attempt" - indicates its an attempt ID you'll be reading > >>> 2. "201207111710" - The job tracker timestamp ID, indicating which > >>> instance of JT ran this job > >>> 3. "0024" - The Job ID for which this was a task attempt > >>> 4. "m" - Indicating this is a mapper (reducers are "r") > >>> 5. "000000" - The task ID of the mapper (00000 is the first mapper, > >>> 00001 is the second, etc.) > >>> 6. "0" - The attempt # for the task ID. 0 means it is the first > >>> attempt, 1 indicates the second attempt, etc. > >>> > >>> On Thu, Jul 12, 2012 at 9:16 AM, Yang <teddyyyy...@gmail.com> wrote: > >>> > I set the following params to be false in my pig script (0.10.0) > >>> > > >>> > SET mapred.map.tasks.speculative.execution false; > >>> > SET mapred.reduce.tasks.speculative.execution false; > >>> > > >>> > > >>> > I also verified in the jobtracker UI in the job.xml that they are > indeed > >>> > set correctly. > >>> > > >>> > when the job finished, jobtracker UI shows that there is only one > attempt > >>> > for each task (in fact I have only 1 task too). > >>> > > >>> > but when I went to the tasktracker node, looked under the > >>> > /var/log/hadoop/userlogs/job_id_here/ > >>> > dir , there are 3 attempts dir , > >>> > job_201207111710_0024 # ls > >>> > attempt_201207111710_0024_m_000000_0 > >>> attempt_201207111710_0024_m_000001_0 > >>> > attempt_201207111710_0024_m_000002_0 job-acls.xml > >>> > > >>> > so 3 attempts were indeed fired ?? > >>> > > >>> > I have to get this controlled correctly because I'm trying to debug > the > >>> > mappers through eclipse, > >>> > but if more than 1 mapper process is fired, they all try to connect > to > >>> the > >>> > same debugger port, and the end result is that nobody is able to > >>> > hook to the debugger. > >>> > > >>> > > >>> > Thanks > >>> > Yang > >>> > >>> > >>> > >>> -- > >>> Harsh J > >>> > > > > > > > > -- > > Harsh J > > > > -- > Harsh J >