[jira] [Commented] (PIG-2029) Inconsistency in Pig Stats reports
[ https://issues.apache.org/jira/browse/PIG-2029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036403#comment-13036403 ] Richard Ding commented on PIG-2029: --- Patch committed to trunk and 0.9 branch. > Inconsistency in Pig Stats reports > --- > > Key: PIG-2029 > URL: https://issues.apache.org/jira/browse/PIG-2029 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.8.1, 0.9.0 >Reporter: Viraj Bhat >Assignee: Richard Ding > Fix For: 0.9.0 > > Attachments: PIG-2029.patch > > > I have a Pig script which reports varying Stats for the same M/R job (same > inputs). Sometimes the PigStats reports all the stats (such as > Maps,Reduces,MaxMapTime,MinMapTime,AvgMapTime,MaxReduceTime, MinReduceTime > and AvgReduceTime) for the M/R job as 0. Sometimes it reports it correctly. > Enclosed are the stderr logs for 2 runs, you can notice that for Run 1 > job_201103091134_556600 from Run 1; has 0 against all the columns whereas in > Run 2, Hadoop job job_201104272229_75693 has some valid values. > The actual Job Tracker link shows that they are non empty. This points to a > bug in the interaction of the PigStats module with the Jobtracker. > Run 1: > {quote} > Job Stats (time in seconds): > JobId MapsReduces MaxMapTime MinMapTIme AvgMapTime > MaxReduceTime MinReduceTime AvgReduceTime Alias Feature Outputs > job_201103091134_556458 160 100 552 191 368 1257 > 371 392 > IN,SP10P,SP11P,SP12P,SP13P,SP16P,SP17P,SP18P,SP20P,SP21P,SP22P,SP23P,SP24P,SP26P,SP27P,SP28P,SP29P,SP30P,SP31P,SP32P,SP33P,SP34P,SP4P,SP6P,SP7P,SP8P,SP9P >DISTINCT,MULTI_QUERY > job_201103091134_556600 0 0 0 0 0 0 > 0 0 UNION5 MULTI_QUERY,MAP_ONLY/user/viraj/dir,, > job_201103091134_556601 7 100 17 8 14 200 > 15 27 CNJOIN25,GNJOIN25,sampleNJOIN25 GROUP_BY,COMBINER > job_201103091134_556602 0 0 0 0 0 0 > 0 0 CNJOIN3,GNJOIN3,sampleNJOIN3GROUP_BY,COMBINER > job_201103091134_556603 0 0 0 0 0 0 > 0 0 CNJOIN15,GNJOIN15,sampleNJOIN15 GROUP_BY,COMBINER > job_201103091134_556604 2 100 13 7 10 34 > 13 31 CNJOIN19,GNJOIN19,sampleNJOIN19 GROUP_BY,COMBINER > job_201103091134_556644 0 0 0 0 0 0 > 0 0 ONJOIN15SAMPLER > job_201103091134_556645 0 0 0 0 0 0 > 0 0 ONJOIN25SAMPLER > job_201103091134_556646 0 0 0 0 0 0 > 0 0 ONJOIN3 SAMPLER > job_201103091134_556654 0 0 0 0 0 0 > 0 0 ONJOIN19SAMPLER > job_201103091134_556662 0 0 0 0 0 0 > 0 0 ONJOIN19ORDER_BY,COMBINER > .. > {quote} > Run 2: > {quote} > Job Stats (time in seconds): > JobId MapsReduces MaxMapTime MinMapTIme AvgMapTime > MaxReduceTime MinReduceTime AvgReduceTime Alias Feature Outputs > job_201104272229_75503159 100 484 192 353 396 > 308 321 > IN,SP10P,SP11P,SP12P,SP13P,SP16P,SP17P,SP18P,SP20P,SP21P,SP22P,SP23P,SP24P,SP26P,SP27P,SP28P,SP29P,SP30P,SP31P,SP32P,SP33P,SP34P,SP4P,SP6P,SP7P,SP8P,SP9P >DISTINCT,MULTI_QUERY > job_201104272229_7569318 0 31 14 24 0 > 0 UNION5 MULTI_QUERY,MAP_ONLY /user/viraj/dir, > job_201104272229_756947 100 34 13 22 46 > 20 25 CNJOIN25,GNJOIN25,sampleNJOIN25 GROUP_BY,COMBINER > job_201104272229_75695125 100 19 11 15 32 > 18 26 CNJOIN3,GNJOIN3,sampleNJOIN3GROUP_BY,COMBINER > job_201104272229_756981 100 12 12 12 13 > 9 11 CNJOIN15,GNJOIN15,sampleNJOIN15 GROUP_BY,COMBINER > job_201104272229_757022 100 21 5 13 35 > 22 26 CNJOIN19,GNJOIN19,sampleNJOIN19 GROUP_BY,COMBINER > job_201104272229_757241 1 4 4 4 11 > 11 11 ONJOIN15SAMPLER > job_201104272229_757250 0 0 0 0 0 > 0 ONJOIN25SAMPLER > job_201104272229_757266 1 8 6 8 24 > 24 24 ONJOIN3 SAMPLER > job_201104272229_757290 0 0 0 0
[jira] [Commented] (PIG-2029) Inconsistency in Pig Stats reports
[ https://issues.apache.org/jira/browse/PIG-2029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036315#comment-13036315 ] Thejas M Nair commented on PIG-2029: +1 > Inconsistency in Pig Stats reports > --- > > Key: PIG-2029 > URL: https://issues.apache.org/jira/browse/PIG-2029 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.8.1, 0.9.0 >Reporter: Viraj Bhat >Assignee: Richard Ding > Fix For: 0.9.0 > > Attachments: PIG-2029.patch > > > I have a Pig script which reports varying Stats for the same M/R job (same > inputs). Sometimes the PigStats reports all the stats (such as > Maps,Reduces,MaxMapTime,MinMapTime,AvgMapTime,MaxReduceTime, MinReduceTime > and AvgReduceTime) for the M/R job as 0. Sometimes it reports it correctly. > Enclosed are the stderr logs for 2 runs, you can notice that for Run 1 > job_201103091134_556600 from Run 1; has 0 against all the columns whereas in > Run 2, Hadoop job job_201104272229_75693 has some valid values. > The actual Job Tracker link shows that they are non empty. This points to a > bug in the interaction of the PigStats module with the Jobtracker. > Run 1: > {quote} > Job Stats (time in seconds): > JobId MapsReduces MaxMapTime MinMapTIme AvgMapTime > MaxReduceTime MinReduceTime AvgReduceTime Alias Feature Outputs > job_201103091134_556458 160 100 552 191 368 1257 > 371 392 > IN,SP10P,SP11P,SP12P,SP13P,SP16P,SP17P,SP18P,SP20P,SP21P,SP22P,SP23P,SP24P,SP26P,SP27P,SP28P,SP29P,SP30P,SP31P,SP32P,SP33P,SP34P,SP4P,SP6P,SP7P,SP8P,SP9P >DISTINCT,MULTI_QUERY > job_201103091134_556600 0 0 0 0 0 0 > 0 0 UNION5 MULTI_QUERY,MAP_ONLY/user/viraj/dir,, > job_201103091134_556601 7 100 17 8 14 200 > 15 27 CNJOIN25,GNJOIN25,sampleNJOIN25 GROUP_BY,COMBINER > job_201103091134_556602 0 0 0 0 0 0 > 0 0 CNJOIN3,GNJOIN3,sampleNJOIN3GROUP_BY,COMBINER > job_201103091134_556603 0 0 0 0 0 0 > 0 0 CNJOIN15,GNJOIN15,sampleNJOIN15 GROUP_BY,COMBINER > job_201103091134_556604 2 100 13 7 10 34 > 13 31 CNJOIN19,GNJOIN19,sampleNJOIN19 GROUP_BY,COMBINER > job_201103091134_556644 0 0 0 0 0 0 > 0 0 ONJOIN15SAMPLER > job_201103091134_556645 0 0 0 0 0 0 > 0 0 ONJOIN25SAMPLER > job_201103091134_556646 0 0 0 0 0 0 > 0 0 ONJOIN3 SAMPLER > job_201103091134_556654 0 0 0 0 0 0 > 0 0 ONJOIN19SAMPLER > job_201103091134_556662 0 0 0 0 0 0 > 0 0 ONJOIN19ORDER_BY,COMBINER > .. > {quote} > Run 2: > {quote} > Job Stats (time in seconds): > JobId MapsReduces MaxMapTime MinMapTIme AvgMapTime > MaxReduceTime MinReduceTime AvgReduceTime Alias Feature Outputs > job_201104272229_75503159 100 484 192 353 396 > 308 321 > IN,SP10P,SP11P,SP12P,SP13P,SP16P,SP17P,SP18P,SP20P,SP21P,SP22P,SP23P,SP24P,SP26P,SP27P,SP28P,SP29P,SP30P,SP31P,SP32P,SP33P,SP34P,SP4P,SP6P,SP7P,SP8P,SP9P >DISTINCT,MULTI_QUERY > job_201104272229_7569318 0 31 14 24 0 > 0 UNION5 MULTI_QUERY,MAP_ONLY /user/viraj/dir, > job_201104272229_756947 100 34 13 22 46 > 20 25 CNJOIN25,GNJOIN25,sampleNJOIN25 GROUP_BY,COMBINER > job_201104272229_75695125 100 19 11 15 32 > 18 26 CNJOIN3,GNJOIN3,sampleNJOIN3GROUP_BY,COMBINER > job_201104272229_756981 100 12 12 12 13 > 9 11 CNJOIN15,GNJOIN15,sampleNJOIN15 GROUP_BY,COMBINER > job_201104272229_757022 100 21 5 13 35 > 22 26 CNJOIN19,GNJOIN19,sampleNJOIN19 GROUP_BY,COMBINER > job_201104272229_757241 1 4 4 4 11 > 11 11 ONJOIN15SAMPLER > job_201104272229_757250 0 0 0 0 0 > 0 ONJOIN25SAMPLER > job_201104272229_757266 1 8 6 8 24 > 24 24 ONJOIN3 SAMPLER > job_201104272229_757290 0 0 0 0 0 > 0
[jira] [Commented] (PIG-2029) Inconsistency in Pig Stats reports
[ https://issues.apache.org/jira/browse/PIG-2029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035010#comment-13035010 ] Richard Ding commented on PIG-2029: --- Currently Pig prints out zero (0) if max/min/avg map/reduce time isn't available by querying hadoop using hadoop client API. This is misleading. I propose that we change those values to 'n/a' as following: {code} Job Stats (time in seconds): JobId MapsReduces MaxMapTime MinMapTIme AvgMapTime MaxReduceTime MinReduceTime AvgReduceTime Alias Feature Outputs job_201104272229_434232 2 10 354 220 287 168 149 163 IN,SP10P,SP11P,SP12P,SP13P,SP16P,SP17P,SP18P,SP20P,SP21P,SP22P,SP23P,SP24P,SP26P,SP27P,SP28P,SP29P,SP30P,SP31P,SP32P,SP33P,SP34P,SP4P,SP6P,SP7P,SP8P,SP9P DISTINCT,MULTI_QUERY job_201104272229_434319 2 0 9 3 6 0 0 0 UNION5 MULTI_QUERY,MAP_ONLY/user/rding/verifypigstats2-UNION5, job_201104272229_434320 2 10 n/a n/a n/a n/a n/a n/a CNJOIN3,GNJOIN3,sampleNJOIN3GROUP_BY,COMBINER job_201104272229_434321 1 10 5 5 5 23 9 17 CNJOIN25,GNJOIN25,sampleNJOIN25 GROUP_BY,COMBINER job_201104272229_434322 2 10 n/a n/a n/a n/a n/a n/a CNJOIN15,GNJOIN15,sampleNJOIN15 GROUP_BY,COMBINER job_201104272229_434323 2 10 n/a n/a n/a n/a n/a n/a CNJOIN19,GNJOIN19,sampleNJOIN19 GROUP_BY,COMBINER job_201104272229_434331 2 1 n/a n/a n/a n/a n/a n/a ONJOIN15SAMPLER job_201104272229_434332 2 1 n/a n/a n/a n/a n/a n/a ONJOIN3 SAMPLER job_201104272229_434333 1 1 2 2 2 13 13 13 ONJOIN25SAMPLER job_201104272229_434334 1 1 1 1 1 12 12 12 ONJOIN19SAMPLER job_201104272229_434342 1 10 2 2 2 16 8 11 ONJOIN25ORDER_BY,COMBINER {code} > Inconsistency in Pig Stats reports > --- > > Key: PIG-2029 > URL: https://issues.apache.org/jira/browse/PIG-2029 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.8.1, 0.9.0 >Reporter: Viraj Bhat >Assignee: Richard Ding > Fix For: 0.10 > > > I have a Pig script which reports varying Stats for the same M/R job (same > inputs). Sometimes the PigStats reports all the stats (such as > Maps,Reduces,MaxMapTime,MinMapTime,AvgMapTime,MaxReduceTime, MinReduceTime > and AvgReduceTime) for the M/R job as 0. Sometimes it reports it correctly. > Enclosed are the stderr logs for 2 runs, you can notice that for Run 1 > job_201103091134_556600 from Run 1; has 0 against all the columns whereas in > Run 2, Hadoop job job_201104272229_75693 has some valid values. > The actual Job Tracker link shows that they are non empty. This points to a > bug in the interaction of the PigStats module with the Jobtracker. > Run 1: > {quote} > Job Stats (time in seconds): > JobId MapsReduces MaxMapTime MinMapTIme AvgMapTime > MaxReduceTime MinReduceTime AvgReduceTime Alias Feature Outputs > job_201103091134_556458 160 100 552 191 368 1257 > 371 392 > IN,SP10P,SP11P,SP12P,SP13P,SP16P,SP17P,SP18P,SP20P,SP21P,SP22P,SP23P,SP24P,SP26P,SP27P,SP28P,SP29P,SP30P,SP31P,SP32P,SP33P,SP34P,SP4P,SP6P,SP7P,SP8P,SP9P >DISTINCT,MULTI_QUERY > job_201103091134_556600 0 0 0 0 0 0 > 0 0 UNION5 MULTI_QUERY,MAP_ONLY/user/viraj/dir,, > job_201103091134_556601 7 100 17 8 14 200 > 15 27 CNJOIN25,GNJOIN25,sampleNJOIN25 GROUP_BY,COMBINER > job_201103091134_556602 0 0 0 0 0 0 > 0 0 CNJOIN3,GNJOIN3,sampleNJOIN3GROUP_BY,COMBINER > job_201103091134_556603 0 0 0 0 0 0 > 0 0 CNJOIN15,GNJOIN15,sampleNJOIN15 GROUP_BY,COMBINER > job_201103091134_556604 2 100 13 7 10 34 > 13 31 CNJOIN19,GNJOIN19,sampleNJOIN19 GROUP_BY,COMBINER > job_201103091134_556644 0 0 0 0 0 0 > 0 0 ONJOIN15SAMPLER > job_201103091134_556645 0 0 0 0 0 0 > 0 0 ONJOIN25SAMPLER > job_201103091134_556646 0 0 0 0 0 0 > 0 0 ONJOIN3 SAMPLER > job_201103091134_556654 0 0 0 0