[jira] [Updated] (PIG-2029) Inconsistency in Pig Stats reports

2011-05-18 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-2029:


Fix Version/s: (was: 0.10)
   0.9.0

> Inconsistency in Pig Stats reports 
> ---
>
> Key: PIG-2029
> URL: https://issues.apache.org/jira/browse/PIG-2029
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.1, 0.9.0
>Reporter: Viraj Bhat
>Assignee: Richard Ding
> Fix For: 0.9.0
>
> Attachments: PIG-2029.patch
>
>
> I have a Pig script which reports varying Stats for the same M/R job (same 
> inputs). Sometimes the PigStats reports all the stats (such as 
> Maps,Reduces,MaxMapTime,MinMapTime,AvgMapTime,MaxReduceTime, MinReduceTime 
> and AvgReduceTime) for the M/R job as 0. Sometimes it reports it correctly.
> Enclosed are the stderr logs for 2 runs, you can notice that for Run 1 
> job_201103091134_556600 from Run 1; has 0 against all the columns whereas in 
> Run 2, Hadoop job job_201104272229_75693 has some valid values. 
> The actual Job Tracker link shows that they are non empty. This points to a 
> bug in the interaction of the PigStats module with the Jobtracker.
> Run 1:
> {quote}
> Job Stats (time in seconds):
> JobId MapsReduces MaxMapTime  MinMapTIme  AvgMapTime  
> MaxReduceTime   MinReduceTime   AvgReduceTime   Alias   Feature Outputs
> job_201103091134_556458   160 100 552 191 368 1257
> 371 392 
> IN,SP10P,SP11P,SP12P,SP13P,SP16P,SP17P,SP18P,SP20P,SP21P,SP22P,SP23P,SP24P,SP26P,SP27P,SP28P,SP29P,SP30P,SP31P,SP32P,SP33P,SP34P,SP4P,SP6P,SP7P,SP8P,SP9P
>DISTINCT,MULTI_QUERY
> job_201103091134_556600   0   0   0   0   0   0   
> 0   0   UNION5  MULTI_QUERY,MAP_ONLY/user/viraj/dir,,
> job_201103091134_556601   7   100 17  8   14  200 
> 15  27  CNJOIN25,GNJOIN25,sampleNJOIN25 GROUP_BY,COMBINER   
> job_201103091134_556602   0   0   0   0   0   0   
> 0   0   CNJOIN3,GNJOIN3,sampleNJOIN3GROUP_BY,COMBINER   
> job_201103091134_556603   0   0   0   0   0   0   
> 0   0   CNJOIN15,GNJOIN15,sampleNJOIN15 GROUP_BY,COMBINER   
> job_201103091134_556604   2   100 13  7   10  34  
> 13  31  CNJOIN19,GNJOIN19,sampleNJOIN19 GROUP_BY,COMBINER   
> job_201103091134_556644   0   0   0   0   0   0   
> 0   0   ONJOIN15SAMPLER 
> job_201103091134_556645   0   0   0   0   0   0   
> 0   0   ONJOIN25SAMPLER 
> job_201103091134_556646   0   0   0   0   0   0   
> 0   0   ONJOIN3 SAMPLER 
> job_201103091134_556654   0   0   0   0   0   0   
> 0   0   ONJOIN19SAMPLER 
> job_201103091134_556662   0   0   0   0   0   0   
> 0   0   ONJOIN19ORDER_BY,COMBINER
> ..
> {quote}
> Run 2:
> {quote}
> Job Stats (time in seconds):
> JobId MapsReduces MaxMapTime  MinMapTIme  AvgMapTime  
> MaxReduceTime   MinReduceTime   AvgReduceTime   Alias   Feature Outputs
> job_201104272229_75503159 100 484 192 353 396 
> 308 321 
> IN,SP10P,SP11P,SP12P,SP13P,SP16P,SP17P,SP18P,SP20P,SP21P,SP22P,SP23P,SP24P,SP26P,SP27P,SP28P,SP29P,SP30P,SP31P,SP32P,SP33P,SP34P,SP4P,SP6P,SP7P,SP8P,SP9P
>DISTINCT,MULTI_QUERY
> job_201104272229_7569318  0   31  14  24  0   
> 0   UNION5 MULTI_QUERY,MAP_ONLY /user/viraj/dir,
> job_201104272229_756947   100 34  13  22  46  
> 20  25  CNJOIN25,GNJOIN25,sampleNJOIN25 GROUP_BY,COMBINER   
> job_201104272229_75695125 100 19  11  15  32  
> 18  26  CNJOIN3,GNJOIN3,sampleNJOIN3GROUP_BY,COMBINER   
> job_201104272229_756981   100 12  12  12  13  
> 9   11  CNJOIN15,GNJOIN15,sampleNJOIN15 GROUP_BY,COMBINER   
> job_201104272229_757022   100 21  5   13  35  
> 22  26  CNJOIN19,GNJOIN19,sampleNJOIN19 GROUP_BY,COMBINER   
> job_201104272229_757241   1   4   4   4   11  
> 11  11  ONJOIN15SAMPLER 
> job_201104272229_757250   0   0   0   0   0   
> 0   ONJOIN25SAMPLER 
> job_201104272229_757266   1   8   6   8   24  
> 24  24  ONJOIN3 SAMPLER 
> job_201104272229_757290   0   0   0   0   0   
> 0

[jira] [Updated] (PIG-2029) Inconsistency in Pig Stats reports

2011-05-17 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-2029:
--

Attachment: PIG-2029.patch

> Inconsistency in Pig Stats reports 
> ---
>
> Key: PIG-2029
> URL: https://issues.apache.org/jira/browse/PIG-2029
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.1, 0.9.0
>Reporter: Viraj Bhat
>Assignee: Richard Ding
> Fix For: 0.10
>
> Attachments: PIG-2029.patch
>
>
> I have a Pig script which reports varying Stats for the same M/R job (same 
> inputs). Sometimes the PigStats reports all the stats (such as 
> Maps,Reduces,MaxMapTime,MinMapTime,AvgMapTime,MaxReduceTime, MinReduceTime 
> and AvgReduceTime) for the M/R job as 0. Sometimes it reports it correctly.
> Enclosed are the stderr logs for 2 runs, you can notice that for Run 1 
> job_201103091134_556600 from Run 1; has 0 against all the columns whereas in 
> Run 2, Hadoop job job_201104272229_75693 has some valid values. 
> The actual Job Tracker link shows that they are non empty. This points to a 
> bug in the interaction of the PigStats module with the Jobtracker.
> Run 1:
> {quote}
> Job Stats (time in seconds):
> JobId MapsReduces MaxMapTime  MinMapTIme  AvgMapTime  
> MaxReduceTime   MinReduceTime   AvgReduceTime   Alias   Feature Outputs
> job_201103091134_556458   160 100 552 191 368 1257
> 371 392 
> IN,SP10P,SP11P,SP12P,SP13P,SP16P,SP17P,SP18P,SP20P,SP21P,SP22P,SP23P,SP24P,SP26P,SP27P,SP28P,SP29P,SP30P,SP31P,SP32P,SP33P,SP34P,SP4P,SP6P,SP7P,SP8P,SP9P
>DISTINCT,MULTI_QUERY
> job_201103091134_556600   0   0   0   0   0   0   
> 0   0   UNION5  MULTI_QUERY,MAP_ONLY/user/viraj/dir,,
> job_201103091134_556601   7   100 17  8   14  200 
> 15  27  CNJOIN25,GNJOIN25,sampleNJOIN25 GROUP_BY,COMBINER   
> job_201103091134_556602   0   0   0   0   0   0   
> 0   0   CNJOIN3,GNJOIN3,sampleNJOIN3GROUP_BY,COMBINER   
> job_201103091134_556603   0   0   0   0   0   0   
> 0   0   CNJOIN15,GNJOIN15,sampleNJOIN15 GROUP_BY,COMBINER   
> job_201103091134_556604   2   100 13  7   10  34  
> 13  31  CNJOIN19,GNJOIN19,sampleNJOIN19 GROUP_BY,COMBINER   
> job_201103091134_556644   0   0   0   0   0   0   
> 0   0   ONJOIN15SAMPLER 
> job_201103091134_556645   0   0   0   0   0   0   
> 0   0   ONJOIN25SAMPLER 
> job_201103091134_556646   0   0   0   0   0   0   
> 0   0   ONJOIN3 SAMPLER 
> job_201103091134_556654   0   0   0   0   0   0   
> 0   0   ONJOIN19SAMPLER 
> job_201103091134_556662   0   0   0   0   0   0   
> 0   0   ONJOIN19ORDER_BY,COMBINER
> ..
> {quote}
> Run 2:
> {quote}
> Job Stats (time in seconds):
> JobId MapsReduces MaxMapTime  MinMapTIme  AvgMapTime  
> MaxReduceTime   MinReduceTime   AvgReduceTime   Alias   Feature Outputs
> job_201104272229_75503159 100 484 192 353 396 
> 308 321 
> IN,SP10P,SP11P,SP12P,SP13P,SP16P,SP17P,SP18P,SP20P,SP21P,SP22P,SP23P,SP24P,SP26P,SP27P,SP28P,SP29P,SP30P,SP31P,SP32P,SP33P,SP34P,SP4P,SP6P,SP7P,SP8P,SP9P
>DISTINCT,MULTI_QUERY
> job_201104272229_7569318  0   31  14  24  0   
> 0   UNION5 MULTI_QUERY,MAP_ONLY /user/viraj/dir,
> job_201104272229_756947   100 34  13  22  46  
> 20  25  CNJOIN25,GNJOIN25,sampleNJOIN25 GROUP_BY,COMBINER   
> job_201104272229_75695125 100 19  11  15  32  
> 18  26  CNJOIN3,GNJOIN3,sampleNJOIN3GROUP_BY,COMBINER   
> job_201104272229_756981   100 12  12  12  13  
> 9   11  CNJOIN15,GNJOIN15,sampleNJOIN15 GROUP_BY,COMBINER   
> job_201104272229_757022   100 21  5   13  35  
> 22  26  CNJOIN19,GNJOIN19,sampleNJOIN19 GROUP_BY,COMBINER   
> job_201104272229_757241   1   4   4   4   11  
> 11  11  ONJOIN15SAMPLER 
> job_201104272229_757250   0   0   0   0   0   
> 0   ONJOIN25SAMPLER 
> job_201104272229_757266   1   8   6   8   24  
> 24  24  ONJOIN3 SAMPLER 
> job_201104272229_757290   0   0   0   0   0   
> 0   ONJOIN19SAMPLER 
> job_2011

[jira] [Updated] (PIG-2029) Inconsistency in Pig Stats reports

2011-05-16 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-2029:


Fix Version/s: (was: 0.9.0)
   0.10

> Inconsistency in Pig Stats reports 
> ---
>
> Key: PIG-2029
> URL: https://issues.apache.org/jira/browse/PIG-2029
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.1, 0.9.0
>Reporter: Viraj Bhat
>Assignee: Richard Ding
> Fix For: 0.10
>
>
> I have a Pig script which reports varying Stats for the same M/R job (same 
> inputs). Sometimes the PigStats reports all the stats (such as 
> Maps,Reduces,MaxMapTime,MinMapTime,AvgMapTime,MaxReduceTime, MinReduceTime 
> and AvgReduceTime) for the M/R job as 0. Sometimes it reports it correctly.
> Enclosed are the stderr logs for 2 runs, you can notice that for Run 1 
> job_201103091134_556600 from Run 1; has 0 against all the columns whereas in 
> Run 2, Hadoop job job_201104272229_75693 has some valid values. 
> The actual Job Tracker link shows that they are non empty. This points to a 
> bug in the interaction of the PigStats module with the Jobtracker.
> Run 1:
> {quote}
> Job Stats (time in seconds):
> JobId MapsReduces MaxMapTime  MinMapTIme  AvgMapTime  
> MaxReduceTime   MinReduceTime   AvgReduceTime   Alias   Feature Outputs
> job_201103091134_556458   160 100 552 191 368 1257
> 371 392 
> IN,SP10P,SP11P,SP12P,SP13P,SP16P,SP17P,SP18P,SP20P,SP21P,SP22P,SP23P,SP24P,SP26P,SP27P,SP28P,SP29P,SP30P,SP31P,SP32P,SP33P,SP34P,SP4P,SP6P,SP7P,SP8P,SP9P
>DISTINCT,MULTI_QUERY
> job_201103091134_556600   0   0   0   0   0   0   
> 0   0   UNION5  MULTI_QUERY,MAP_ONLY/user/viraj/dir,,
> job_201103091134_556601   7   100 17  8   14  200 
> 15  27  CNJOIN25,GNJOIN25,sampleNJOIN25 GROUP_BY,COMBINER   
> job_201103091134_556602   0   0   0   0   0   0   
> 0   0   CNJOIN3,GNJOIN3,sampleNJOIN3GROUP_BY,COMBINER   
> job_201103091134_556603   0   0   0   0   0   0   
> 0   0   CNJOIN15,GNJOIN15,sampleNJOIN15 GROUP_BY,COMBINER   
> job_201103091134_556604   2   100 13  7   10  34  
> 13  31  CNJOIN19,GNJOIN19,sampleNJOIN19 GROUP_BY,COMBINER   
> job_201103091134_556644   0   0   0   0   0   0   
> 0   0   ONJOIN15SAMPLER 
> job_201103091134_556645   0   0   0   0   0   0   
> 0   0   ONJOIN25SAMPLER 
> job_201103091134_556646   0   0   0   0   0   0   
> 0   0   ONJOIN3 SAMPLER 
> job_201103091134_556654   0   0   0   0   0   0   
> 0   0   ONJOIN19SAMPLER 
> job_201103091134_556662   0   0   0   0   0   0   
> 0   0   ONJOIN19ORDER_BY,COMBINER
> ..
> {quote}
> Run 2:
> {quote}
> Job Stats (time in seconds):
> JobId MapsReduces MaxMapTime  MinMapTIme  AvgMapTime  
> MaxReduceTime   MinReduceTime   AvgReduceTime   Alias   Feature Outputs
> job_201104272229_75503159 100 484 192 353 396 
> 308 321 
> IN,SP10P,SP11P,SP12P,SP13P,SP16P,SP17P,SP18P,SP20P,SP21P,SP22P,SP23P,SP24P,SP26P,SP27P,SP28P,SP29P,SP30P,SP31P,SP32P,SP33P,SP34P,SP4P,SP6P,SP7P,SP8P,SP9P
>DISTINCT,MULTI_QUERY
> job_201104272229_7569318  0   31  14  24  0   
> 0   UNION5 MULTI_QUERY,MAP_ONLY /user/viraj/dir,
> job_201104272229_756947   100 34  13  22  46  
> 20  25  CNJOIN25,GNJOIN25,sampleNJOIN25 GROUP_BY,COMBINER   
> job_201104272229_75695125 100 19  11  15  32  
> 18  26  CNJOIN3,GNJOIN3,sampleNJOIN3GROUP_BY,COMBINER   
> job_201104272229_756981   100 12  12  12  13  
> 9   11  CNJOIN15,GNJOIN15,sampleNJOIN15 GROUP_BY,COMBINER   
> job_201104272229_757022   100 21  5   13  35  
> 22  26  CNJOIN19,GNJOIN19,sampleNJOIN19 GROUP_BY,COMBINER   
> job_201104272229_757241   1   4   4   4   11  
> 11  11  ONJOIN15SAMPLER 
> job_201104272229_757250   0   0   0   0   0   
> 0   ONJOIN25SAMPLER 
> job_201104272229_757266   1   8   6   8   24  
> 24  24  ONJOIN3 SAMPLER 
> job_201104272229_757290   0   0   0   0   0   
> 0   ONJOIN19SAMPLER 
> job_20110427222

[jira] [Updated] (PIG-2029) Inconsistency in Pig Stats reports

2011-05-03 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-2029:


Fix Version/s: (was: 0.8.1)
 Assignee: Richard Ding

I do not believe it is a P1 issue so don't think it belongs on 0.8 branch. Even 
for 0.9 I do not see it as a blocker. If we can find a quick reproducible case, 
we will fix it in 0.9. Otherwise will delay till we can reproduce. Also, this 
could be a potential issue with Hadoop.

> Inconsistency in Pig Stats reports 
> ---
>
> Key: PIG-2029
> URL: https://issues.apache.org/jira/browse/PIG-2029
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.1, 0.9.0
>Reporter: Viraj Bhat
>Assignee: Richard Ding
> Fix For: 0.9.0
>
>
> I have a Pig script which reports varying Stats for the same M/R job (same 
> inputs). Sometimes the PigStats reports all the stats (such as 
> Maps,Reduces,MaxMapTime,MinMapTime,AvgMapTime,MaxReduceTime, MinReduceTime 
> and AvgReduceTime) for the M/R job as 0. Sometimes it reports it correctly.
> Enclosed are the stderr logs for 2 runs, you can notice that for Run 1 
> job_201103091134_556600 from Run 1; has 0 against all the columns whereas in 
> Run 2, Hadoop job job_201104272229_75693 has some valid values. 
> The actual Job Tracker link shows that they are non empty. This points to a 
> bug in the interaction of the PigStats module with the Jobtracker.
> Run 1:
> {quote}
> Job Stats (time in seconds):
> JobId MapsReduces MaxMapTime  MinMapTIme  AvgMapTime  
> MaxReduceTime   MinReduceTime   AvgReduceTime   Alias   Feature Outputs
> job_201103091134_556458   160 100 552 191 368 1257
> 371 392 
> IN,SP10P,SP11P,SP12P,SP13P,SP16P,SP17P,SP18P,SP20P,SP21P,SP22P,SP23P,SP24P,SP26P,SP27P,SP28P,SP29P,SP30P,SP31P,SP32P,SP33P,SP34P,SP4P,SP6P,SP7P,SP8P,SP9P
>DISTINCT,MULTI_QUERY
> job_201103091134_556600   0   0   0   0   0   0   
> 0   0   UNION5  MULTI_QUERY,MAP_ONLY/user/viraj/dir,,
> job_201103091134_556601   7   100 17  8   14  200 
> 15  27  CNJOIN25,GNJOIN25,sampleNJOIN25 GROUP_BY,COMBINER   
> job_201103091134_556602   0   0   0   0   0   0   
> 0   0   CNJOIN3,GNJOIN3,sampleNJOIN3GROUP_BY,COMBINER   
> job_201103091134_556603   0   0   0   0   0   0   
> 0   0   CNJOIN15,GNJOIN15,sampleNJOIN15 GROUP_BY,COMBINER   
> job_201103091134_556604   2   100 13  7   10  34  
> 13  31  CNJOIN19,GNJOIN19,sampleNJOIN19 GROUP_BY,COMBINER   
> job_201103091134_556644   0   0   0   0   0   0   
> 0   0   ONJOIN15SAMPLER 
> job_201103091134_556645   0   0   0   0   0   0   
> 0   0   ONJOIN25SAMPLER 
> job_201103091134_556646   0   0   0   0   0   0   
> 0   0   ONJOIN3 SAMPLER 
> job_201103091134_556654   0   0   0   0   0   0   
> 0   0   ONJOIN19SAMPLER 
> job_201103091134_556662   0   0   0   0   0   0   
> 0   0   ONJOIN19ORDER_BY,COMBINER
> ..
> {quote}
> Run 2:
> {quote}
> Job Stats (time in seconds):
> JobId MapsReduces MaxMapTime  MinMapTIme  AvgMapTime  
> MaxReduceTime   MinReduceTime   AvgReduceTime   Alias   Feature Outputs
> job_201104272229_75503159 100 484 192 353 396 
> 308 321 
> IN,SP10P,SP11P,SP12P,SP13P,SP16P,SP17P,SP18P,SP20P,SP21P,SP22P,SP23P,SP24P,SP26P,SP27P,SP28P,SP29P,SP30P,SP31P,SP32P,SP33P,SP34P,SP4P,SP6P,SP7P,SP8P,SP9P
>DISTINCT,MULTI_QUERY
> job_201104272229_7569318  0   31  14  24  0   
> 0   UNION5 MULTI_QUERY,MAP_ONLY /user/viraj/dir,
> job_201104272229_756947   100 34  13  22  46  
> 20  25  CNJOIN25,GNJOIN25,sampleNJOIN25 GROUP_BY,COMBINER   
> job_201104272229_75695125 100 19  11  15  32  
> 18  26  CNJOIN3,GNJOIN3,sampleNJOIN3GROUP_BY,COMBINER   
> job_201104272229_756981   100 12  12  12  13  
> 9   11  CNJOIN15,GNJOIN15,sampleNJOIN15 GROUP_BY,COMBINER   
> job_201104272229_757022   100 21  5   13  35  
> 22  26  CNJOIN19,GNJOIN19,sampleNJOIN19 GROUP_BY,COMBINER   
> job_201104272229_757241   1   4   4   4   11  
> 11  11  ONJOIN15SAMPLER 
> job_201104272229_757250   0   0   0   0   0   
> 0