[jira] [Updated] (PIG-3765) Ability to disable Pig commands and operators

2014-03-06 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi updated PIG-3765:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Ability to disable Pig commands and operators
> -
>
> Key: PIG-3765
> URL: https://issues.apache.org/jira/browse/PIG-3765
> Project: Pig
>  Issue Type: New Feature
>  Components: documentation, grunt
>Reporter: Prashant Kommireddi
>Assignee: Prashant Kommireddi
> Fix For: 0.13.0
>
> Attachments: PIG-3765.patch, PIG-3765_2.patch, PIG-3765_3.patch, 
> PIG-3765_4.patch, PIG-3765_5.patch
>
>
> This is an admin feature providing ability to blacklist or/and whitelist 
> certain commands and operations. Pig exposes a few of these that could be not 
> very safe in a multitenant environment. For example, "sh" invokes shell 
> commands, "set" allows users to change non-final configs. While these are 
> tremendously useful in general, having an ability to disable would make Pig a 
> safer platform. The goal is to allow administrators to be able to have more 
> control over user scripts. Default behaviour would still be the same - no 
> filters applied on commands and operators.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3765) Ability to disable Pig commands and operators

2014-03-06 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi updated PIG-3765:
-

Fix Version/s: 0.13.0

> Ability to disable Pig commands and operators
> -
>
> Key: PIG-3765
> URL: https://issues.apache.org/jira/browse/PIG-3765
> Project: Pig
>  Issue Type: New Feature
>  Components: documentation, grunt
>Reporter: Prashant Kommireddi
>Assignee: Prashant Kommireddi
> Fix For: 0.13.0
>
> Attachments: PIG-3765.patch, PIG-3765_2.patch, PIG-3765_3.patch, 
> PIG-3765_4.patch, PIG-3765_5.patch
>
>
> This is an admin feature providing ability to blacklist or/and whitelist 
> certain commands and operations. Pig exposes a few of these that could be not 
> very safe in a multitenant environment. For example, "sh" invokes shell 
> commands, "set" allows users to change non-final configs. While these are 
> tremendously useful in general, having an ability to disable would make Pig a 
> safer platform. The goal is to allow administrators to be able to have more 
> control over user scripts. Default behaviour would still be the same - no 
> filters applied on commands and operators.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3765) Ability to disable Pig commands and operators

2014-03-06 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13923648#comment-13923648
 ] 

Prashant Kommireddi commented on PIG-3765:
--

Committed to trunk. Thanks [~cheolsoo] for the review!

Note: typo in the above comment - "failed as PigServer was NOT being set"

> Ability to disable Pig commands and operators
> -
>
> Key: PIG-3765
> URL: https://issues.apache.org/jira/browse/PIG-3765
> Project: Pig
>  Issue Type: New Feature
>  Components: documentation, grunt
>Reporter: Prashant Kommireddi
>Assignee: Prashant Kommireddi
> Attachments: PIG-3765.patch, PIG-3765_2.patch, PIG-3765_3.patch, 
> PIG-3765_4.patch, PIG-3765_5.patch
>
>
> This is an admin feature providing ability to blacklist or/and whitelist 
> certain commands and operations. Pig exposes a few of these that could be not 
> very safe in a multitenant environment. For example, "sh" invokes shell 
> commands, "set" allows users to change non-final configs. While these are 
> tremendously useful in general, having an ability to disable would make Pig a 
> safer platform. The goal is to allow administrators to be able to have more 
> control over user scripts. Default behaviour would still be the same - no 
> filters applied on commands and operators.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3731) Ability to specify local-mode specific configuration (useful for local/auto-local mode)

2014-03-06 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-3731:


   Resolution: Fixed
Fix Version/s: 0.13.0
   Status: Resolved  (was: Patch Available)

> Ability to specify local-mode specific configuration (useful for 
> local/auto-local mode)
> ---
>
> Key: PIG-3731
> URL: https://issues.apache.org/jira/browse/PIG-3731
> Project: Pig
>  Issue Type: Improvement
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
> Fix For: 0.13.0
>
> Attachments: PIG-3731-1.patch
>
>
> This could be done in several ways, however, adding a namespace (pig.local.) 
> for this is better so that only a few number of jobs that require this can 
> make use of it.
> set pig.local.io.sort.mb 50 will set io.sort.mb=50 for local mode jobs 
> allowing many jobs to run in parallel. Another setting that might be required 
> is - io.compression.codecs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3754) InputSizeReducerEstimator.getTotalInputFileSize reports incorrect size

2014-03-06 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13923641#comment-13923641
 ] 

Aniket Mokashi commented on PIG-3754:
-

Committed to trunk. Thanks [~cheolsoo] and [~julienledem] for the review.

> InputSizeReducerEstimator.getTotalInputFileSize reports incorrect size
> --
>
> Key: PIG-3754
> URL: https://issues.apache.org/jira/browse/PIG-3754
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
>Priority: Trivial
> Fix For: 0.13.0
>
> Attachments: PIG-3754-1.patch, PIG-3754-2.patch
>
>
> If you have more than one input, 
> InputSizeReducerEstimator.getTotalInputFileSize can return incorrect value if 
> one of the loader returns \-1 and is not file based (eg- hbase). This causes 
> incorrect reducer estimation and problems in auto.local mode.
> If size of input is not found in for any of the inputs, we should bail out 
> with return value of -1.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3754) InputSizeReducerEstimator.getTotalInputFileSize reports incorrect size

2014-03-06 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-3754:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

> InputSizeReducerEstimator.getTotalInputFileSize reports incorrect size
> --
>
> Key: PIG-3754
> URL: https://issues.apache.org/jira/browse/PIG-3754
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
>Priority: Trivial
> Fix For: 0.13.0
>
> Attachments: PIG-3754-1.patch, PIG-3754-2.patch
>
>
> If you have more than one input, 
> InputSizeReducerEstimator.getTotalInputFileSize can return incorrect value if 
> one of the loader returns \-1 and is not file based (eg- hbase). This causes 
> incorrect reducer estimation and problems in auto.local mode.
> If size of input is not found in for any of the inputs, we should bail out 
> with return value of -1.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3765) Ability to disable Pig commands and operators

2014-03-06 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi updated PIG-3765:
-

Attachment: PIG-3765_5.patch

Verified "test-commit" passes with the changes.

However noticed {{test/org/apache/pig/test/pigunit/pig/TestGruntParser.java}} 
failed as PigServer was being set. Made a minor change towards fixing that.

> Ability to disable Pig commands and operators
> -
>
> Key: PIG-3765
> URL: https://issues.apache.org/jira/browse/PIG-3765
> Project: Pig
>  Issue Type: New Feature
>  Components: documentation, grunt
>Reporter: Prashant Kommireddi
>Assignee: Prashant Kommireddi
> Attachments: PIG-3765.patch, PIG-3765_2.patch, PIG-3765_3.patch, 
> PIG-3765_4.patch, PIG-3765_5.patch
>
>
> This is an admin feature providing ability to blacklist or/and whitelist 
> certain commands and operations. Pig exposes a few of these that could be not 
> very safe in a multitenant environment. For example, "sh" invokes shell 
> commands, "set" allows users to change non-final configs. While these are 
> tremendously useful in general, having an ability to disable would make Pig a 
> safer platform. The goal is to allow administrators to be able to have more 
> control over user scripts. Default behaviour would still be the same - no 
> filters applied on commands and operators.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3754) InputSizeReducerEstimator.getTotalInputFileSize reports incorrect size

2014-03-06 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13923639#comment-13923639
 ] 

Aniket Mokashi commented on PIG-3754:
-

Attached PIG-3754-2.patch with tests. Will commit to trunk.

> InputSizeReducerEstimator.getTotalInputFileSize reports incorrect size
> --
>
> Key: PIG-3754
> URL: https://issues.apache.org/jira/browse/PIG-3754
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
>Priority: Trivial
> Fix For: 0.13.0
>
> Attachments: PIG-3754-1.patch, PIG-3754-2.patch
>
>
> If you have more than one input, 
> InputSizeReducerEstimator.getTotalInputFileSize can return incorrect value if 
> one of the loader returns \-1 and is not file based (eg- hbase). This causes 
> incorrect reducer estimation and problems in auto.local mode.
> If size of input is not found in for any of the inputs, we should bail out 
> with return value of -1.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3754) InputSizeReducerEstimator.getTotalInputFileSize reports incorrect size

2014-03-06 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-3754:


Attachment: PIG-3754-2.patch

> InputSizeReducerEstimator.getTotalInputFileSize reports incorrect size
> --
>
> Key: PIG-3754
> URL: https://issues.apache.org/jira/browse/PIG-3754
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
>Priority: Trivial
> Fix For: 0.13.0
>
> Attachments: PIG-3754-1.patch, PIG-3754-2.patch
>
>
> If you have more than one input, 
> InputSizeReducerEstimator.getTotalInputFileSize can return incorrect value if 
> one of the loader returns \-1 and is not file based (eg- hbase). This causes 
> incorrect reducer estimation and problems in auto.local mode.
> If size of input is not found in for any of the inputs, we should bail out 
> with return value of -1.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3793) Provide info on number of LogicalRelationalOperator(s) used in the script through LogicalPlanData

2014-03-06 Thread Kyungho Jeon (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13923575#comment-13923575
 ] 

Kyungho Jeon commented on PIG-3793:
---

Okay, I was confused. As {{PigServer}} is the only interface that Pig users can 
use for submitting a query and Pig doesn't expose {{LogicalPlan}}, there's no 
way to mutate currDAG.lp. So now I believe the current 
{{getNumLogicalRelationOperators()}} implementation is reasonable and having 
{{resetLogicalPlanData()}} will be sufficient. 

> Provide info on number of LogicalRelationalOperator(s) used in the script 
> through LogicalPlanData
> -
>
> Key: PIG-3793
> URL: https://issues.apache.org/jira/browse/PIG-3793
> Project: Pig
>  Issue Type: Improvement
>  Components: data
>Affects Versions: 0.13.0
>Reporter: Prashant Kommireddi
>Assignee: Prashant Kommireddi
> Fix For: 0.13.0
>
> Attachments: PIG-3793.patch, PIG-3793_2.patch
>
>
> Its useful to have an understanding of how many operators are being used in 
> the script via the API. This could allow admins to enforce 
> checks/restrictions on the length/complexity of the plan in user scripts.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: pig12 job stuck in infinite loop

2014-03-06 Thread Suhas Satish
The example that reproduces the issue along with data is attached in the
very first email on this thread

On Thursday, March 6, 2014, Cheolsoo Park  wrote:

> So that's backend. It has nothing to do with the filter extractor. The
> filter extractor is for predicate push down on the frontend.
>
> The code that you're showing is the entry point where Pig mapper begins. So
> it doesn't tell us much. The mapper is given a segment of physical plan
> (pipeline), and the getNext() call pulls records from roots to leaves one
> by one.
>
> You need to find where time is spent in the pipeline. If you're suspecting
> Filter By is slow, then it should be POFilter. Please take thread dump
> multiple times and see the stack traces. Unless you provide an example that
> reproduces the error, I cannot help you more.
>
>
>
> On Thu, Mar 6, 2014 at 6:03 PM, Suhas Satish 
> >
> wrote:
>
> > Hi Cheolsoo,
> > This is where its hanging -
> > *pWeek = FILTER gTWeek BY PERIOD == $previousPeriod;*
> >
> > org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/
> > PigGenericMapBase.java:
> >
> > protected void *runPipeline*(PhysicalOperator leaf) throws IOException,
> > InterruptedException {
> > while(true){
> > Result res = leaf.getNext(DUMMYTUPLE);
> > if(res.returnStatus==POStatus.STATUS_OK){
> > collect(outputCollector,(Tuple)res.result);
> > continue;
> > }
> > 
> >
> > Cheers,
> > Suhas.
> >
> >
> > On Thu, Mar 6, 2014 at 5:56 PM, Cheolsoo Park 
> > wrote:
> >
> > > Hi Suhas,
> > >
> > > No. The issue with PIG-3461 is that Pig hangs at the query compilation
> > with
> > > a big filter expression before the job is submitted.
> > > In addition, the filter extractor was totally rewritten in 0.12.
> > > https://issues.apache.org/jira/browse/PIG-3461
> > >
> > > Where exactly is your job hanging? Backend or frontend? Are you running
> > it
> > > in local mode or remote mode?
> > >
> > > Thanks,
> > > Cheolsoo
> > >
> > > p.s.
> > > There are two known issues with the new filter extractor in 0.12.0
> > although
> > > these are probably not related to your issue-
> > > https://issues.apache.org/jira/browse/PIG-3510
> > > https://issues.apache.org/jira/browse/PIG-3657
> > >
> > >
> > > On Thu, Mar 6, 2014 at 5:30 PM, Suhas Satish 
> > > wrote:
> > >
> > > > I seem to be hitting this issue in pig-0.12 although it claims to be
> > > fixed
> > > > in pig-0.12
> > > > https://issues.apache.org/jira/browse/PIG-3395
> > > > Large filter expression makes Pig hang
> > > >
> > > > Cheers,
> > > > Suhas.
> > > >
> > > >
> > > > On Thu, Mar 6, 2014 at 4:26 PM, Suhas Satish  >
> > > > wrote:
> > > >
> > > > > This is the pig script -
> > > > >
> > > > > %default previousPeriod $pPeriod
> > > > >
> > > > > tWeek = LOAD '/tmp/test_data.txt' USING PigStorage ('|') AS
> > (WEEK:int,
> > > > > DESCRIPTION:chararray, END_DATE:chararray, PERIOD:int);
> > > > >
> > > > > gTWeek = FOREACH tWeek GENERATE WEEK AS WEEK, PERIOD AS PERIOD;
> > > > >
> > > > > *pWeek = FILTER gTWeek BY PERIOD == $previousPeriod;*
> > > > >
> > > > > pWeekRanked = RANK pWeek BY WEEK ASC DENSE;
> > > > >
> > > > > gpWeekRanked = FOREACH pWeekRanked GENERATE $0;
> > > > > store gpWeekRanked INTO 'gpWeekRanked';
> > > > > describe gpWeekRanked;
> > > > >
> > > > >
> > > > > Without the filter statement, the code runs without hanging.
> > > > >
> > > > > Cheers,
> > > > > Suhas.
> > > > >
> > > > >
> > > > > On Thu, Mar 6, 2014 at 3:05 PM, Suhas Satish <
> suhas.sat...@gmail.com
> > > > >wrote:
> > > > >
> > > > >> Hi
> > > > >> I launched the attached pig job on pig-12 with hadoop MRv1 with
> the
> > > > >> attached data, but the FILTER function causes the job to get stuck
> > in
> > > an
> > > > >> infinite loop.
> > > > >>
> > > > >> pig -p pPeriod=201312 -f test.pig
> > > > >>
> > > > >> The thread in question seems to be stuck forever inside while loop
> > of
> > > > >> runPipel



-- 
Cheers,
Suhas.


Re: pig12 job stuck in infinite loop

2014-03-06 Thread Cheolsoo Park
So that's backend. It has nothing to do with the filter extractor. The
filter extractor is for predicate push down on the frontend.

The code that you're showing is the entry point where Pig mapper begins. So
it doesn't tell us much. The mapper is given a segment of physical plan
(pipeline), and the getNext() call pulls records from roots to leaves one
by one.

You need to find where time is spent in the pipeline. If you're suspecting
Filter By is slow, then it should be POFilter. Please take thread dump
multiple times and see the stack traces. Unless you provide an example that
reproduces the error, I cannot help you more.



On Thu, Mar 6, 2014 at 6:03 PM, Suhas Satish  wrote:

> Hi Cheolsoo,
> This is where its hanging -
> *pWeek = FILTER gTWeek BY PERIOD == $previousPeriod;*
>
> org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/
> PigGenericMapBase.java:
>
> protected void *runPipeline*(PhysicalOperator leaf) throws IOException,
> InterruptedException {
> while(true){
> Result res = leaf.getNext(DUMMYTUPLE);
> if(res.returnStatus==POStatus.STATUS_OK){
> collect(outputCollector,(Tuple)res.result);
> continue;
> }
> 
>
> Cheers,
> Suhas.
>
>
> On Thu, Mar 6, 2014 at 5:56 PM, Cheolsoo Park 
> wrote:
>
> > Hi Suhas,
> >
> > No. The issue with PIG-3461 is that Pig hangs at the query compilation
> with
> > a big filter expression before the job is submitted.
> > In addition, the filter extractor was totally rewritten in 0.12.
> > https://issues.apache.org/jira/browse/PIG-3461
> >
> > Where exactly is your job hanging? Backend or frontend? Are you running
> it
> > in local mode or remote mode?
> >
> > Thanks,
> > Cheolsoo
> >
> > p.s.
> > There are two known issues with the new filter extractor in 0.12.0
> although
> > these are probably not related to your issue-
> > https://issues.apache.org/jira/browse/PIG-3510
> > https://issues.apache.org/jira/browse/PIG-3657
> >
> >
> > On Thu, Mar 6, 2014 at 5:30 PM, Suhas Satish 
> > wrote:
> >
> > > I seem to be hitting this issue in pig-0.12 although it claims to be
> > fixed
> > > in pig-0.12
> > > https://issues.apache.org/jira/browse/PIG-3395
> > > Large filter expression makes Pig hang
> > >
> > > Cheers,
> > > Suhas.
> > >
> > >
> > > On Thu, Mar 6, 2014 at 4:26 PM, Suhas Satish 
> > > wrote:
> > >
> > > > This is the pig script -
> > > >
> > > > %default previousPeriod $pPeriod
> > > >
> > > > tWeek = LOAD '/tmp/test_data.txt' USING PigStorage ('|') AS
> (WEEK:int,
> > > > DESCRIPTION:chararray, END_DATE:chararray, PERIOD:int);
> > > >
> > > > gTWeek = FOREACH tWeek GENERATE WEEK AS WEEK, PERIOD AS PERIOD;
> > > >
> > > > *pWeek = FILTER gTWeek BY PERIOD == $previousPeriod;*
> > > >
> > > > pWeekRanked = RANK pWeek BY WEEK ASC DENSE;
> > > >
> > > > gpWeekRanked = FOREACH pWeekRanked GENERATE $0;
> > > > store gpWeekRanked INTO 'gpWeekRanked';
> > > > describe gpWeekRanked;
> > > >
> > > >
> > > > Without the filter statement, the code runs without hanging.
> > > >
> > > > Cheers,
> > > > Suhas.
> > > >
> > > >
> > > > On Thu, Mar 6, 2014 at 3:05 PM, Suhas Satish  > > >wrote:
> > > >
> > > >> Hi
> > > >> I launched the attached pig job on pig-12 with hadoop MRv1 with the
> > > >> attached data, but the FILTER function causes the job to get stuck
> in
> > an
> > > >> infinite loop.
> > > >>
> > > >> pig -p pPeriod=201312 -f test.pig
> > > >>
> > > >> The thread in question seems to be stuck forever inside while loop
> of
> > > >> runPipeline method.
> > > >>
> > > >> stack trace:
> > > >> ---
> > > >>
> > > >> "main" prio=10 tid=0x7fd74800b000 nid=0x2f63 runnable
> > > >> [0x7fd750d5]
> > > >>java.lang.Thread.State: RUNNABLE
> > > >> at
> > > >> org.apache.pig.backend.hadoop.executionengine.physicalLayer.
> > > >> relationalOperators.POForEach.getNextTuple(POForEach.java:217)
> > > >> at
> > > >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.
> > > >> PigGenericMapBase.runPipeline(PigGenericMapBase.java:282)
> > > >> at
> > > >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.
> > > >> PigGenericMapBase.map(PigGenericMapBase.java:277)
> > > >> at
> > > >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.
> > > >> PigGenericMapBase.map(PigGenericMapBase.java:64)
> > > >> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > > >> at
> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:680)
> > > >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:346)
> > > >> at org.apache.hadoop.mapred.Child$4.run(Child.java:282)
> > > >> at java.security.AccessController.doPrivileged(Native Method)
> > > >> at javax.security.auth.Subject.doAs(Subject.java:415)
> > > >> at
> > > >> org.apache.hadoop.security.UserGroupInformation.doAs(
> > > >> UserGroupInformation.java:1117)
> > > >> at org.apache.hadoop.mapred.Child.main(Child.java:271)
> > > >>
> > > >>
> 

Re: pig12 job stuck in infinite loop

2014-03-06 Thread Suhas Satish
Hi Cheolsoo,
This is where its hanging -
*pWeek = FILTER gTWeek BY PERIOD == $previousPeriod;*

org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/
PigGenericMapBase.java:

protected void *runPipeline*(PhysicalOperator leaf) throws IOException,
InterruptedException {
while(true){
Result res = leaf.getNext(DUMMYTUPLE);
if(res.returnStatus==POStatus.STATUS_OK){
collect(outputCollector,(Tuple)res.result);
continue;
}


Cheers,
Suhas.


On Thu, Mar 6, 2014 at 5:56 PM, Cheolsoo Park  wrote:

> Hi Suhas,
>
> No. The issue with PIG-3461 is that Pig hangs at the query compilation with
> a big filter expression before the job is submitted.
> In addition, the filter extractor was totally rewritten in 0.12.
> https://issues.apache.org/jira/browse/PIG-3461
>
> Where exactly is your job hanging? Backend or frontend? Are you running it
> in local mode or remote mode?
>
> Thanks,
> Cheolsoo
>
> p.s.
> There are two known issues with the new filter extractor in 0.12.0 although
> these are probably not related to your issue-
> https://issues.apache.org/jira/browse/PIG-3510
> https://issues.apache.org/jira/browse/PIG-3657
>
>
> On Thu, Mar 6, 2014 at 5:30 PM, Suhas Satish 
> wrote:
>
> > I seem to be hitting this issue in pig-0.12 although it claims to be
> fixed
> > in pig-0.12
> > https://issues.apache.org/jira/browse/PIG-3395
> > Large filter expression makes Pig hang
> >
> > Cheers,
> > Suhas.
> >
> >
> > On Thu, Mar 6, 2014 at 4:26 PM, Suhas Satish 
> > wrote:
> >
> > > This is the pig script -
> > >
> > > %default previousPeriod $pPeriod
> > >
> > > tWeek = LOAD '/tmp/test_data.txt' USING PigStorage ('|') AS (WEEK:int,
> > > DESCRIPTION:chararray, END_DATE:chararray, PERIOD:int);
> > >
> > > gTWeek = FOREACH tWeek GENERATE WEEK AS WEEK, PERIOD AS PERIOD;
> > >
> > > *pWeek = FILTER gTWeek BY PERIOD == $previousPeriod;*
> > >
> > > pWeekRanked = RANK pWeek BY WEEK ASC DENSE;
> > >
> > > gpWeekRanked = FOREACH pWeekRanked GENERATE $0;
> > > store gpWeekRanked INTO 'gpWeekRanked';
> > > describe gpWeekRanked;
> > >
> > >
> > > Without the filter statement, the code runs without hanging.
> > >
> > > Cheers,
> > > Suhas.
> > >
> > >
> > > On Thu, Mar 6, 2014 at 3:05 PM, Suhas Satish  > >wrote:
> > >
> > >> Hi
> > >> I launched the attached pig job on pig-12 with hadoop MRv1 with the
> > >> attached data, but the FILTER function causes the job to get stuck in
> an
> > >> infinite loop.
> > >>
> > >> pig -p pPeriod=201312 -f test.pig
> > >>
> > >> The thread in question seems to be stuck forever inside while loop of
> > >> runPipeline method.
> > >>
> > >> stack trace:
> > >> ---
> > >>
> > >> "main" prio=10 tid=0x7fd74800b000 nid=0x2f63 runnable
> > >> [0x7fd750d5]
> > >>java.lang.Thread.State: RUNNABLE
> > >> at
> > >> org.apache.pig.backend.hadoop.executionengine.physicalLayer.
> > >> relationalOperators.POForEach.getNextTuple(POForEach.java:217)
> > >> at
> > >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.
> > >> PigGenericMapBase.runPipeline(PigGenericMapBase.java:282)
> > >> at
> > >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.
> > >> PigGenericMapBase.map(PigGenericMapBase.java:277)
> > >> at
> > >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.
> > >> PigGenericMapBase.map(PigGenericMapBase.java:64)
> > >> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:680)
> > >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:346)
> > >> at org.apache.hadoop.mapred.Child$4.run(Child.java:282)
> > >> at java.security.AccessController.doPrivileged(Native Method)
> > >> at javax.security.auth.Subject.doAs(Subject.java:415)
> > >> at
> > >> org.apache.hadoop.security.UserGroupInformation.doAs(
> > >> UserGroupInformation.java:1117)
> > >> at org.apache.hadoop.mapred.Child.main(Child.java:271)
> > >>
> > >>
> > >>
> > >>
> > >> org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/
> > >> PigGenericMapBase.java:
> > >>
> > >> protected void *runPipeline*(PhysicalOperator leaf) throws
> IOException,
> > >> InterruptedException {
> > >> while(true){
> > >> Result res = leaf.getNext(DUMMYTUPLE);
> > >> if(res.returnStatus==POStatus.STATUS_OK){
> > >> collect(outputCollector,(Tuple)res.result);
> > >> continue;
> > >> }
> > >> 
> > >>
> > >>
> > >>
> > >> Whats the suggested code fix here?
> > >>
> > >>
> > >> Thanks,
> > >> Suhas.
> > >>
> > >
> > >
> >
>


Re: pig12 job stuck in infinite loop

2014-03-06 Thread Cheolsoo Park
Hi Suhas,

No. The issue with PIG-3461 is that Pig hangs at the query compilation with
a big filter expression before the job is submitted.
In addition, the filter extractor was totally rewritten in 0.12.
https://issues.apache.org/jira/browse/PIG-3461

Where exactly is your job hanging? Backend or frontend? Are you running it
in local mode or remote mode?

Thanks,
Cheolsoo

p.s.
There are two known issues with the new filter extractor in 0.12.0 although
these are probably not related to your issue-
https://issues.apache.org/jira/browse/PIG-3510
https://issues.apache.org/jira/browse/PIG-3657


On Thu, Mar 6, 2014 at 5:30 PM, Suhas Satish  wrote:

> I seem to be hitting this issue in pig-0.12 although it claims to be fixed
> in pig-0.12
> https://issues.apache.org/jira/browse/PIG-3395
> Large filter expression makes Pig hang
>
> Cheers,
> Suhas.
>
>
> On Thu, Mar 6, 2014 at 4:26 PM, Suhas Satish 
> wrote:
>
> > This is the pig script -
> >
> > %default previousPeriod $pPeriod
> >
> > tWeek = LOAD '/tmp/test_data.txt' USING PigStorage ('|') AS (WEEK:int,
> > DESCRIPTION:chararray, END_DATE:chararray, PERIOD:int);
> >
> > gTWeek = FOREACH tWeek GENERATE WEEK AS WEEK, PERIOD AS PERIOD;
> >
> > *pWeek = FILTER gTWeek BY PERIOD == $previousPeriod;*
> >
> > pWeekRanked = RANK pWeek BY WEEK ASC DENSE;
> >
> > gpWeekRanked = FOREACH pWeekRanked GENERATE $0;
> > store gpWeekRanked INTO 'gpWeekRanked';
> > describe gpWeekRanked;
> >
> >
> > Without the filter statement, the code runs without hanging.
> >
> > Cheers,
> > Suhas.
> >
> >
> > On Thu, Mar 6, 2014 at 3:05 PM, Suhas Satish  >wrote:
> >
> >> Hi
> >> I launched the attached pig job on pig-12 with hadoop MRv1 with the
> >> attached data, but the FILTER function causes the job to get stuck in an
> >> infinite loop.
> >>
> >> pig -p pPeriod=201312 -f test.pig
> >>
> >> The thread in question seems to be stuck forever inside while loop of
> >> runPipeline method.
> >>
> >> stack trace:
> >> ---
> >>
> >> "main" prio=10 tid=0x7fd74800b000 nid=0x2f63 runnable
> >> [0x7fd750d5]
> >>java.lang.Thread.State: RUNNABLE
> >> at
> >> org.apache.pig.backend.hadoop.executionengine.physicalLayer.
> >> relationalOperators.POForEach.getNextTuple(POForEach.java:217)
> >> at
> >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.
> >> PigGenericMapBase.runPipeline(PigGenericMapBase.java:282)
> >> at
> >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.
> >> PigGenericMapBase.map(PigGenericMapBase.java:277)
> >> at
> >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.
> >> PigGenericMapBase.map(PigGenericMapBase.java:64)
> >> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:680)
> >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:346)
> >> at org.apache.hadoop.mapred.Child$4.run(Child.java:282)
> >> at java.security.AccessController.doPrivileged(Native Method)
> >> at javax.security.auth.Subject.doAs(Subject.java:415)
> >> at
> >> org.apache.hadoop.security.UserGroupInformation.doAs(
> >> UserGroupInformation.java:1117)
> >> at org.apache.hadoop.mapred.Child.main(Child.java:271)
> >>
> >>
> >>
> >>
> >> org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/
> >> PigGenericMapBase.java:
> >>
> >> protected void *runPipeline*(PhysicalOperator leaf) throws IOException,
> >> InterruptedException {
> >> while(true){
> >> Result res = leaf.getNext(DUMMYTUPLE);
> >> if(res.returnStatus==POStatus.STATUS_OK){
> >> collect(outputCollector,(Tuple)res.result);
> >> continue;
> >> }
> >> 
> >>
> >>
> >>
> >> Whats the suggested code fix here?
> >>
> >>
> >> Thanks,
> >> Suhas.
> >>
> >
> >
>


Re: pig12 job stuck in infinite loop

2014-03-06 Thread Suhas Satish
I seem to be hitting this issue in pig-0.12 although it claims to be fixed
in pig-0.12
https://issues.apache.org/jira/browse/PIG-3395
Large filter expression makes Pig hang

Cheers,
Suhas.


On Thu, Mar 6, 2014 at 4:26 PM, Suhas Satish  wrote:

> This is the pig script -
>
> %default previousPeriod $pPeriod
>
> tWeek = LOAD '/tmp/test_data.txt' USING PigStorage ('|') AS (WEEK:int,
> DESCRIPTION:chararray, END_DATE:chararray, PERIOD:int);
>
> gTWeek = FOREACH tWeek GENERATE WEEK AS WEEK, PERIOD AS PERIOD;
>
> *pWeek = FILTER gTWeek BY PERIOD == $previousPeriod;*
>
> pWeekRanked = RANK pWeek BY WEEK ASC DENSE;
>
> gpWeekRanked = FOREACH pWeekRanked GENERATE $0;
> store gpWeekRanked INTO 'gpWeekRanked';
> describe gpWeekRanked;
>
>
> Without the filter statement, the code runs without hanging.
>
> Cheers,
> Suhas.
>
>
> On Thu, Mar 6, 2014 at 3:05 PM, Suhas Satish wrote:
>
>> Hi
>> I launched the attached pig job on pig-12 with hadoop MRv1 with the
>> attached data, but the FILTER function causes the job to get stuck in an
>> infinite loop.
>>
>> pig -p pPeriod=201312 -f test.pig
>>
>> The thread in question seems to be stuck forever inside while loop of
>> runPipeline method.
>>
>> stack trace:
>> ---
>>
>> "main" prio=10 tid=0x7fd74800b000 nid=0x2f63 runnable
>> [0x7fd750d5]
>>java.lang.Thread.State: RUNNABLE
>> at
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.
>> relationalOperators.POForEach.getNextTuple(POForEach.java:217)
>> at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.
>> PigGenericMapBase.runPipeline(PigGenericMapBase.java:282)
>> at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.
>> PigGenericMapBase.map(PigGenericMapBase.java:277)
>> at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.
>> PigGenericMapBase.map(PigGenericMapBase.java:64)
>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:680)
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:346)
>> at org.apache.hadoop.mapred.Child$4.run(Child.java:282)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:415)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(
>> UserGroupInformation.java:1117)
>> at org.apache.hadoop.mapred.Child.main(Child.java:271)
>>
>>
>>
>>
>> org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/
>> PigGenericMapBase.java:
>>
>> protected void *runPipeline*(PhysicalOperator leaf) throws IOException,
>> InterruptedException {
>> while(true){
>> Result res = leaf.getNext(DUMMYTUPLE);
>> if(res.returnStatus==POStatus.STATUS_OK){
>> collect(outputCollector,(Tuple)res.result);
>> continue;
>> }
>> 
>>
>>
>>
>> Whats the suggested code fix here?
>>
>>
>> Thanks,
>> Suhas.
>>
>
>


[jira] [Commented] (PIG-3754) InputSizeReducerEstimator.getTotalInputFileSize reports incorrect size

2014-03-06 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13923365#comment-13923365
 ] 

Julien Le Dem commented on PIG-3754:


LGTM too

> InputSizeReducerEstimator.getTotalInputFileSize reports incorrect size
> --
>
> Key: PIG-3754
> URL: https://issues.apache.org/jira/browse/PIG-3754
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
>Priority: Trivial
> Fix For: 0.13.0
>
> Attachments: PIG-3754-1.patch
>
>
> If you have more than one input, 
> InputSizeReducerEstimator.getTotalInputFileSize can return incorrect value if 
> one of the loader returns \-1 and is not file based (eg- hbase). This causes 
> incorrect reducer estimation and problems in auto.local mode.
> If size of input is not found in for any of the inputs, we should bail out 
> with return value of -1.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3800) Documentation for Pig whitelist and blacklist features

2014-03-06 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3800:
---

Fix Version/s: 0.13.0

Setting FixVersion to 0.13 so we will do this before 0.13 release.

> Documentation for Pig whitelist and blacklist features
> --
>
> Key: PIG-3800
> URL: https://issues.apache.org/jira/browse/PIG-3800
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.13.0
>Reporter: Prashant Kommireddi
>  Labels: documentaion
> Fix For: 0.13.0
>
>
> Documentation for PIG-3765



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PIG-3800) Documentation for Pig whitelist and blacklist features

2014-03-06 Thread Prashant Kommireddi (JIRA)
Prashant Kommireddi created PIG-3800:


 Summary: Documentation for Pig whitelist and blacklist features
 Key: PIG-3800
 URL: https://issues.apache.org/jira/browse/PIG-3800
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.13.0
Reporter: Prashant Kommireddi


Documentation for PIG-3765



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PIG-3799) TestCustomPartitioner is broken in tez branch

2014-03-06 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13922937#comment-13922937
 ] 

Cheolsoo Park edited comment on PIG-3799 at 3/7/14 1:10 AM:


The parallelism is set in POForeach, and it used to overwrite the parallelism 
of TezOperator before PIG-3795. But since I got rid of the overwriting logic, 
the parallelism of vertex is no longer set.

Attached is a patch that explicitly sets the parallelism of vertex.


was (Author: cheolsoo):
The parallelism used to be set in POForeach, and that value used to overwrite 
that of TezOperator before PIG-3795. But since I got rid of the overwriting 
logic, the parallelism of vertex is no longer set.

Attached is a patch that explicitly sets the parallelism of vertex.

> TestCustomPartitioner is broken in tez branch
> -
>
> Key: PIG-3799
> URL: https://issues.apache.org/jira/browse/PIG-3799
> Project: Pig
>  Issue Type: Sub-task
>  Components: tez
>Affects Versions: tez-branch
>Reporter: Cheolsoo Park
>Assignee: Cheolsoo Park
> Fix For: tez-branch
>
> Attachments: PIG-3799-1.patch
>
>
> This is a regression of PIG-3795. In TezCompiler, visitDistinct() doesn't set 
> the requested parallelism of TezOperator, resulting that only one reducer 
> runs for the following query-
> {code}
> A = LOAD 'table_testCustomPartitionerDistinct' as (a0:int, a1:int);
> B = distinct A PARTITION BY 
> org.apache.pig.test.utils.SimpleCustomPartitioner3 parallel 2;
> {code}
> The test fails because it sees a single output file while it expects two.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3765) Ability to disable Pig commands and operators

2014-03-06 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13923348#comment-13923348
 ] 

Cheolsoo Park commented on PIG-3765:


Please open a jira to document this feature. I think it's perfect to document 
it to the admin section-
http://pig.apache.org/docs/r0.12.0/admin.html


> Ability to disable Pig commands and operators
> -
>
> Key: PIG-3765
> URL: https://issues.apache.org/jira/browse/PIG-3765
> Project: Pig
>  Issue Type: New Feature
>  Components: documentation, grunt
>Reporter: Prashant Kommireddi
>Assignee: Prashant Kommireddi
> Attachments: PIG-3765.patch, PIG-3765_2.patch, PIG-3765_3.patch, 
> PIG-3765_4.patch
>
>
> This is an admin feature providing ability to blacklist or/and whitelist 
> certain commands and operations. Pig exposes a few of these that could be not 
> very safe in a multitenant environment. For example, "sh" invokes shell 
> commands, "set" allows users to change non-final configs. While these are 
> tremendously useful in general, having an ability to disable would make Pig a 
> safer platform. The goal is to allow administrators to be able to have more 
> control over user scripts. Default behaviour would still be the same - no 
> filters applied on commands and operators.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3765) Ability to disable Pig commands and operators

2014-03-06 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13923347#comment-13923347
 ] 

Cheolsoo Park commented on PIG-3765:


+1 assuming unit tests pass.

> Ability to disable Pig commands and operators
> -
>
> Key: PIG-3765
> URL: https://issues.apache.org/jira/browse/PIG-3765
> Project: Pig
>  Issue Type: New Feature
>  Components: documentation, grunt
>Reporter: Prashant Kommireddi
>Assignee: Prashant Kommireddi
> Attachments: PIG-3765.patch, PIG-3765_2.patch, PIG-3765_3.patch, 
> PIG-3765_4.patch
>
>
> This is an admin feature providing ability to blacklist or/and whitelist 
> certain commands and operations. Pig exposes a few of these that could be not 
> very safe in a multitenant environment. For example, "sh" invokes shell 
> commands, "set" allows users to change non-final configs. While these are 
> tremendously useful in general, having an ability to disable would make Pig a 
> safer platform. The goal is to allow administrators to be able to have more 
> control over user scripts. Default behaviour would still be the same - no 
> filters applied on commands and operators.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3765) Ability to disable Pig commands and operators

2014-03-06 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi updated PIG-3765:
-

Attachment: PIG-3765_4.patch

Hi [~cheolsoo], I have uploaded a new patch to RB.

> Ability to disable Pig commands and operators
> -
>
> Key: PIG-3765
> URL: https://issues.apache.org/jira/browse/PIG-3765
> Project: Pig
>  Issue Type: New Feature
>  Components: documentation, grunt
>Reporter: Prashant Kommireddi
>Assignee: Prashant Kommireddi
> Attachments: PIG-3765.patch, PIG-3765_2.patch, PIG-3765_3.patch, 
> PIG-3765_4.patch
>
>
> This is an admin feature providing ability to blacklist or/and whitelist 
> certain commands and operations. Pig exposes a few of these that could be not 
> very safe in a multitenant environment. For example, "sh" invokes shell 
> commands, "set" allows users to change non-final configs. While these are 
> tremendously useful in general, having an ability to disable would make Pig a 
> safer platform. The goal is to allow administrators to be able to have more 
> control over user scripts. Default behaviour would still be the same - no 
> filters applied on commands and operators.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] Subscription: PIG patch available

2014-03-06 Thread jira
Issue Subscription
Filter: PIG patch available (20 issues)

Subscriber: pigdaily

Key Summary
PIG-3799TestCustomPartitioner is broken in tez branch
https://issues.apache.org/jira/browse/PIG-3799
PIG-3797Fix some memory leaks affecting container reuse
https://issues.apache.org/jira/browse/PIG-3797
PIG-3794pig -useHCatalog fails using pig command line interface on HDInsight
https://issues.apache.org/jira/browse/PIG-3794
PIG-3783We can predict when small local jobs will cause an OOM and change 
io.sort.mb in that case
https://issues.apache.org/jira/browse/PIG-3783
PIG-3782PushDownForEachFlatten + ColumnMapKeyPrune with user defined schema 
failing due to incorrect UID assignment
https://issues.apache.org/jira/browse/PIG-3782
PIG-3771Piggybank Avrostorage makes a lot of namenode calls in the backend
https://issues.apache.org/jira/browse/PIG-3771
PIG-3765Ability to disable Pig commands and operators
https://issues.apache.org/jira/browse/PIG-3765
PIG-3757Make scalar work
https://issues.apache.org/jira/browse/PIG-3757
PIG-3754InputSizeReducerEstimator.getTotalInputFileSize reports incorrect 
size
https://issues.apache.org/jira/browse/PIG-3754
PIG-3737Bundle dependent jars in distribution in %PIG_HOME%/lib folder
https://issues.apache.org/jira/browse/PIG-3737
PIG-3735UDF to data cleanse the dirty data with expected pattern
https://issues.apache.org/jira/browse/PIG-3735
PIG-3731Ability to specify local-mode specific configuration (useful for 
local/auto-local mode)
https://issues.apache.org/jira/browse/PIG-3731
PIG-3668COR built-in function when atleast one of the coefficient values is 
NaN
https://issues.apache.org/jira/browse/PIG-3668
PIG-3635Fix e2e tests for Hadoop 2.X on Windows
https://issues.apache.org/jira/browse/PIG-3635
PIG-3613UDF for SimilarityMatching between strings with matching scores
https://issues.apache.org/jira/browse/PIG-3613
PIG-3603Add counters to TezStats
https://issues.apache.org/jira/browse/PIG-3603
PIG-3587add functionality for rolling over dates
https://issues.apache.org/jira/browse/PIG-3587
PIG-3456Reduce threadlocal conf access in backend for each record
https://issues.apache.org/jira/browse/PIG-3456
PIG-3441Allow Pig to use default resources from Configuration objects
https://issues.apache.org/jira/browse/PIG-3441
PIG-3373XMLLoader returns non-matching nodes when a tag name spans through 
the block boundary
https://issues.apache.org/jira/browse/PIG-3373

You may edit this subscription at:
https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=13225&filterId=12322384


Re: pig12 job stuck in infinite loop

2014-03-06 Thread Suhas Satish
This is the pig script -

%default previousPeriod $pPeriod

tWeek = LOAD '/tmp/test_data.txt' USING PigStorage ('|') AS (WEEK:int,
DESCRIPTION:chararray, END_DATE:chararray, PERIOD:int);

gTWeek = FOREACH tWeek GENERATE WEEK AS WEEK, PERIOD AS PERIOD;

*pWeek = FILTER gTWeek BY PERIOD == $previousPeriod;*

pWeekRanked = RANK pWeek BY WEEK ASC DENSE;

gpWeekRanked = FOREACH pWeekRanked GENERATE $0;
store gpWeekRanked INTO 'gpWeekRanked';
describe gpWeekRanked;


Without the filter statement, the code runs without hanging.

Cheers,
Suhas.


On Thu, Mar 6, 2014 at 3:05 PM, Suhas Satish  wrote:

> Hi
> I launched the attached pig job on pig-12 with hadoop MRv1 with the
> attached data, but the FILTER function causes the job to get stuck in an
> infinite loop.
>
> pig -p pPeriod=201312 -f test.pig
>
> The thread in question seems to be stuck forever inside while loop of
> runPipeline method.
>
> stack trace:
> ---
>
> "main" prio=10 tid=0x7fd74800b000 nid=0x2f63 runnable
> [0x7fd750d5]
>java.lang.Thread.State: RUNNABLE
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.
> relationalOperators.POForEach.getNextTuple(POForEach.java:217)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.
> PigGenericMapBase.runPipeline(PigGenericMapBase.java:282)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.
> PigGenericMapBase.map(PigGenericMapBase.java:277)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.
> PigGenericMapBase.map(PigGenericMapBase.java:64)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:680)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:346)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:282)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(
> UserGroupInformation.java:1117)
> at org.apache.hadoop.mapred.Child.main(Child.java:271)
>
>
>
>
> org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/
> PigGenericMapBase.java:
>
> protected void *runPipeline*(PhysicalOperator leaf) throws IOException,
> InterruptedException {
> while(true){
> Result res = leaf.getNext(DUMMYTUPLE);
> if(res.returnStatus==POStatus.STATUS_OK){
> collect(outputCollector,(Tuple)res.result);
> continue;
> }
> 
>
>
>
> Whats the suggested code fix here?
>
>
> Thanks,
> Suhas.
>


pig12 job stuck in infinite loop

2014-03-06 Thread Suhas Satish
Hi
I launched the attached pig job on pig-12 with hadoop MRv1 with the
attached data, but the FILTER function causes the job to get stuck in an
infinite loop.

pig -p pPeriod=201312 -f test.pig

The thread in question seems to be stuck forever inside while loop of
runPipeline method.

stack trace:
---

"main" prio=10 tid=0x7fd74800b000 nid=0x2f63 runnable
[0x7fd750d5]
   java.lang.Thread.State: RUNNABLE
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.
relationalOperators.POForEach.getNextTuple(POForEach.java:217)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.
PigGenericMapBase.runPipeline(PigGenericMapBase.java:282)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.
PigGenericMapBase.map(PigGenericMapBase.java:277)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.
PigGenericMapBase.map(PigGenericMapBase.java:64)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:680)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:346)
at org.apache.hadoop.mapred.Child$4.run(Child.java:282)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(
UserGroupInformation.java:1117)
at org.apache.hadoop.mapred.Child.main(Child.java:271)




org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/
PigGenericMapBase.java:

protected void *runPipeline*(PhysicalOperator leaf) throws IOException,
InterruptedException {
while(true){
Result res = leaf.getNext(DUMMYTUPLE);
if(res.returnStatus==POStatus.STATUS_OK){
collect(outputCollector,(Tuple)res.result);
continue;
}




Whats the suggested code fix here?


Thanks,
Suhas.
20100101|1 WEEK ENDING 01/02/10|2010-Jan-02|201001|1468
20100102|1 WEEK ENDING 01/09/10|2010-Jan-09|201001|1469
20100103|1 WEEK ENDING 01/16/10|2010-Jan-16|201001|1470
20100104|1 WEEK ENDING 01/23/10|2010-Jan-23|201001|1471
20100205|1 WEEK ENDING 01/30/10|2010-Jan-30|201002|1472
20100206|1 WEEK ENDING 02/06/10|2010-Feb-06|201002|1473
20100207|1 WEEK ENDING 02/13/10|2010-Feb-13|201002|1474
20100208|1 WEEK ENDING 02/20/10|2010-Feb-20|201002|1475
20100309|1 WEEK ENDING 02/27/10|2010-Feb-27|201003|1476
20100310|1 WEEK ENDING 03/06/10|2010-Mar-06|201003|1477
20100311|1 WEEK ENDING 03/13/10|2010-Mar-13|201003|1478
20100312|1 WEEK ENDING 03/20/10|2010-Mar-20|201003|1479
20100413|1 WEEK ENDING 03/27/10|2010-Mar-27|201004|1480
20100414|1 WEEK ENDING 04/03/10|2010-Apr-03|201004|1481
20100415|1 WEEK ENDING 04/10/10|2010-Apr-10|201004|1482
20100416|1 WEEK ENDING 04/17/10|2010-Apr-17|201004|1483
20100517|1 WEEK ENDING 04/24/10|2010-Apr-24|201005|1484
20100518|1 WEEK ENDING 05/01/10|2010-May-01|201005|1485
20100519|1 WEEK ENDING 05/08/10|2010-May-08|201005|1486
20100520|1 WEEK ENDING 05/15/10|2010-May-15|201005|1487
20100621|1 WEEK ENDING 05/22/10|2010-May-22|201006|1488
20100622|1 WEEK ENDING 05/29/10|2010-May-29|201006|1489
20100623|1 WEEK ENDING 06/05/10|2010-Jun-05|201006|1490
20100624|1 WEEK ENDING 06/12/10|2010-Jun-12|201006|1491
20100725|1 WEEK ENDING 06/19/10|2010-Jun-19|201007|1492
20100726|1 WEEK ENDING 06/26/10|2010-Jun-26|201007|1493
20100727|1 WEEK ENDING 07/03/10|2010-Jul-03|201007|1494
20100728|1 WEEK ENDING 07/10/10|2010-Jul-10|201007|1495
20100829|1 WEEK ENDING 07/17/10|2010-Jul-17|201008|1496
20100830|1 WEEK ENDING 07/24/10|2010-Jul-24|201008|1497
20100831|1 WEEK ENDING 07/31/10|2010-Jul-31|201008|1498
20100832|1 WEEK ENDING 08/07/10|2010-Aug-07|201008|1499
20100933|1 WEEK ENDING 08/14/10|2010-Aug-14|201009|1500
20100934|1 WEEK ENDING 08/21/10|2010-Aug-21|201009|1501
20100935|1 WEEK ENDING 08/28/10|2010-Aug-28|201009|1502
20100936|1 WEEK ENDING 09/04/10|2010-Sep-04|201009|1503
20101037|1 WEEK ENDING 09/11/10|2010-Sep-11|201010|1504
20101038|1 WEEK ENDING 09/18/10|2010-Sep-18|201010|1505
20101039|1 WEEK ENDING 09/25/10|2010-Sep-25|201010|1506
20101040|1 WEEK ENDING 10/02/10|2010-Oct-02|201010|1507
20101141|1 WEEK ENDING 10/09/10|2010-Oct-09|201011|1508
20101142|1 WEEK ENDING 10/16/10|2010-Oct-16|201011|1509
20101143|1 WEEK ENDING 10/23/10|2010-Oct-23|201011|1510
20101144|1 WEEK ENDING 10/30/10|2010-Oct-30|201011|1511
20101245|1 WEEK ENDING 11/06/10|2010-Nov-06|201012|1512
20101246|1 WEEK ENDING 11/13/10|2010-Nov-13|201012|1513
20101247|1 WEEK ENDING 11/20/10|2010-Nov-20|201012|1514
20101248|1 WEEK ENDING 11/27/10|2010-Nov-27|201012|1515
20101349|1 WEEK ENDING 12/04/10|2010-Dec-04|201013|1516
20101350|1 WEEK ENDING 12/11/10|2010-Dec-11|201013|1517
20101351|1 WEEK ENDING 12/18/10|2010-Dec-18|201013|1518
20101352|1 WEEK ENDING 12/25/10|2010-Dec-25|201013|1519
20110101|1 WEEK ENDING 01/01/11|2011-Jan-01|201101|1520
20110102|1 WEEK ENDING 01/08/11|2011-Jan-08|201101|1521
20110103|1 WEEK ENDING 01/15/11|2011-Jan-15|20110

[jira] [Commented] (PIG-3731) Ability to specify local-mode specific configuration (useful for local/auto-local mode)

2014-03-06 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13923206#comment-13923206
 ] 

Aniket Mokashi commented on PIG-3731:
-

Committed to trunk. Thanks again [~cheolsoo] for the review.

> Ability to specify local-mode specific configuration (useful for 
> local/auto-local mode)
> ---
>
> Key: PIG-3731
> URL: https://issues.apache.org/jira/browse/PIG-3731
> Project: Pig
>  Issue Type: Improvement
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
> Attachments: PIG-3731-1.patch
>
>
> This could be done in several ways, however, adding a namespace (pig.local.) 
> for this is better so that only a few number of jobs that require this can 
> make use of it.
> set pig.local.io.sort.mb 50 will set io.sort.mb=50 for local mode jobs 
> allowing many jobs to run in parallel. Another setting that might be required 
> is - io.compression.codecs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3754) InputSizeReducerEstimator.getTotalInputFileSize reports incorrect size

2014-03-06 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13923166#comment-13923166
 ] 

Aniket Mokashi commented on PIG-3754:
-

Will do.

> InputSizeReducerEstimator.getTotalInputFileSize reports incorrect size
> --
>
> Key: PIG-3754
> URL: https://issues.apache.org/jira/browse/PIG-3754
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
>Priority: Trivial
> Fix For: 0.13.0
>
> Attachments: PIG-3754-1.patch
>
>
> If you have more than one input, 
> InputSizeReducerEstimator.getTotalInputFileSize can return incorrect value if 
> one of the loader returns \-1 and is not file based (eg- hbase). This causes 
> incorrect reducer estimation and problems in auto.local mode.
> If size of input is not found in for any of the inputs, we should bail out 
> with return value of -1.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3745) Document auto local mode for pig

2014-03-06 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13923185#comment-13923185
 ] 

Cheolsoo Park commented on PIG-3745:


Just a reminder. Let's include local-mode specific properties (PIG-3731) in the 
documentation too.

> Document auto local mode for pig
> 
>
> Key: PIG-3745
> URL: https://issues.apache.org/jira/browse/PIG-3745
> Project: Pig
>  Issue Type: Bug
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
>
> We need to document feature added in PIG-3463.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3731) Ability to specify local-mode specific configuration (useful for local/auto-local mode)

2014-03-06 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13923177#comment-13923177
 ] 

Cheolsoo Park commented on PIG-3731:


+1.

When you commit it, can you remove this line?
{code:title=ConfigurationUtil.java}
+import org.apache.pig.impl.PigContext;
{code}

And please remove trailing white spaces. Thanks!

> Ability to specify local-mode specific configuration (useful for 
> local/auto-local mode)
> ---
>
> Key: PIG-3731
> URL: https://issues.apache.org/jira/browse/PIG-3731
> Project: Pig
>  Issue Type: Improvement
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
> Attachments: PIG-3731-1.patch
>
>
> This could be done in several ways, however, adding a namespace (pig.local.) 
> for this is better so that only a few number of jobs that require this can 
> make use of it.
> set pig.local.io.sort.mb 50 will set io.sort.mb=50 for local mode jobs 
> allowing many jobs to run in parallel. Another setting that might be required 
> is - io.compression.codecs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3731) Ability to specify local-mode specific configuration (useful for local/auto-local mode)

2014-03-06 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13923182#comment-13923182
 ] 

Aniket Mokashi commented on PIG-3731:
-

Will do. Thanks for reviewing!

> Ability to specify local-mode specific configuration (useful for 
> local/auto-local mode)
> ---
>
> Key: PIG-3731
> URL: https://issues.apache.org/jira/browse/PIG-3731
> Project: Pig
>  Issue Type: Improvement
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
> Attachments: PIG-3731-1.patch
>
>
> This could be done in several ways, however, adding a namespace (pig.local.) 
> for this is better so that only a few number of jobs that require this can 
> make use of it.
> set pig.local.io.sort.mb 50 will set io.sort.mb=50 for local mode jobs 
> allowing many jobs to run in parallel. Another setting that might be required 
> is - io.compression.codecs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3754) InputSizeReducerEstimator.getTotalInputFileSize reports incorrect size

2014-03-06 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13923158#comment-13923158
 ] 

Cheolsoo Park commented on PIG-3754:


Looks good. Would you mind adding a test case for this corner case to 
TestInputSizeReducerEstimator?

> InputSizeReducerEstimator.getTotalInputFileSize reports incorrect size
> --
>
> Key: PIG-3754
> URL: https://issues.apache.org/jira/browse/PIG-3754
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
>Priority: Trivial
> Fix For: 0.13.0
>
> Attachments: PIG-3754-1.patch
>
>
> If you have more than one input, 
> InputSizeReducerEstimator.getTotalInputFileSize can return incorrect value if 
> one of the loader returns \-1 and is not file based (eg- hbase). This causes 
> incorrect reducer estimation and problems in auto.local mode.
> If size of input is not found in for any of the inputs, we should bail out 
> with return value of -1.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3765) Ability to disable Pig commands and operators

2014-03-06 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13923129#comment-13923129
 ] 

Cheolsoo Park commented on PIG-3765:


I made some comments in RB.

> Ability to disable Pig commands and operators
> -
>
> Key: PIG-3765
> URL: https://issues.apache.org/jira/browse/PIG-3765
> Project: Pig
>  Issue Type: New Feature
>  Components: documentation, grunt
>Reporter: Prashant Kommireddi
>Assignee: Prashant Kommireddi
> Attachments: PIG-3765.patch, PIG-3765_2.patch, PIG-3765_3.patch
>
>
> This is an admin feature providing ability to blacklist or/and whitelist 
> certain commands and operations. Pig exposes a few of these that could be not 
> very safe in a multitenant environment. For example, "sh" invokes shell 
> commands, "set" allows users to change non-final configs. While these are 
> tremendously useful in general, having an ability to disable would make Pig a 
> safer platform. The goal is to allow administrators to be able to have more 
> control over user scripts. Default behaviour would still be the same - no 
> filters applied on commands and operators.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3754) InputSizeReducerEstimator.getTotalInputFileSize reports incorrect size

2014-03-06 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-3754:


Attachment: PIG-3754-1.patch

> InputSizeReducerEstimator.getTotalInputFileSize reports incorrect size
> --
>
> Key: PIG-3754
> URL: https://issues.apache.org/jira/browse/PIG-3754
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
>Priority: Trivial
> Fix For: 0.13.0
>
> Attachments: PIG-3754-1.patch
>
>
> If you have more than one input, 
> InputSizeReducerEstimator.getTotalInputFileSize can return incorrect value if 
> one of the loader returns \-1 and is not file based (eg- hbase). This causes 
> incorrect reducer estimation and problems in auto.local mode.
> If size of input is not found in for any of the inputs, we should bail out 
> with return value of -1.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3754) InputSizeReducerEstimator.getTotalInputFileSize reports incorrect size

2014-03-06 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-3754:


Status: Patch Available  (was: Open)

> InputSizeReducerEstimator.getTotalInputFileSize reports incorrect size
> --
>
> Key: PIG-3754
> URL: https://issues.apache.org/jira/browse/PIG-3754
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
>Priority: Trivial
> Fix For: 0.13.0
>
> Attachments: PIG-3754-1.patch
>
>
> If you have more than one input, 
> InputSizeReducerEstimator.getTotalInputFileSize can return incorrect value if 
> one of the loader returns \-1 and is not file based (eg- hbase). This causes 
> incorrect reducer estimation and problems in auto.local mode.
> If size of input is not found in for any of the inputs, we should bail out 
> with return value of -1.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3754) InputSizeReducerEstimator.getTotalInputFileSize reports incorrect size

2014-03-06 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-3754:


Description: 
If you have more than one input, 
InputSizeReducerEstimator.getTotalInputFileSize can return incorrect value if 
one of the loader returns \-1 and is not file based (eg- hbase). This causes 
incorrect reducer estimation and problems in auto.local mode.

If size of input is not found in for any of the inputs, we should bail out with 
return value of -1.

  was:
If you have more than one input, 
InputSizeReducerEstimator.getTotalInputFileSize can return incorrect value if 
one of the loader returns -1 and is not file based (eg- hbase). This causes 
incorrect reducer estimation and problems in auto.local mode.

If size of input is not found in for any of the inputs, we should bail out with 
return value of -1.


> InputSizeReducerEstimator.getTotalInputFileSize reports incorrect size
> --
>
> Key: PIG-3754
> URL: https://issues.apache.org/jira/browse/PIG-3754
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
>Priority: Trivial
> Fix For: 0.13.0
>
>
> If you have more than one input, 
> InputSizeReducerEstimator.getTotalInputFileSize can return incorrect value if 
> one of the loader returns \-1 and is not file based (eg- hbase). This causes 
> incorrect reducer estimation and problems in auto.local mode.
> If size of input is not found in for any of the inputs, we should bail out 
> with return value of -1.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3797) Fix some memory leaks affecting container reuse

2014-03-06 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13923054#comment-13923054
 ] 

Cheolsoo Park commented on PIG-3797:


Why not use 
[ThreadLocal.remove()|http://docs.oracle.com/javase/7/docs/api/java/lang/ThreadLocal.html#remove()]
 to reinitialize the ThreadLocal variables?
{code}
+// Reset static variables cleared for avoiding OOM.
+// TODO: Figure out a cleaner way to do this. ThreadLocals actually 
can be avoided all together
+// for mapreduce/tez mode and just used for Local mode.
+PhysicalOperator.reporter = new ThreadLocal();
+PigMapReduce.sJobConfInternal = new ThreadLocal();
{code}
{code}
+// Avoid memory leak. ThreadLocals especially leak a lot of memory.
+PhysicalOperator.reporter = new ThreadLocal();
+PigMapReduce.sJobConfInternal = new ThreadLocal();
{code}

Otherwise, looks good to me.

> Fix some memory leaks affecting container reuse
> ---
>
> Key: PIG-3797
> URL: https://issues.apache.org/jira/browse/PIG-3797
> Project: Pig
>  Issue Type: Sub-task
>  Components: tez
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: tez-branch
>
> Attachments: PIG-3797-1.patch
>
>
> PigCombiner.sJobContext and PhysicalOperator.reporter hold references to the 
> WrappedReducer$Context which in turn holds TezOutputContextImpl which holds 
> references to the buffers in DefaultSorter. This was causing OOM after the 
> container was reused 2 or 3 times. Debugged this with L17.pig in pigmix. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3790) Several changes in Tez e2e

2014-03-06 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13922969#comment-13922969
 ] 

Rohini Palaniswamy commented on PIG-3790:
-

I am noticing that even when specifying -Dharness.old.pig pointing to a older 
version, the benchmark is still being run with the compiled pig.jar and 
Multiquery benchmarks fail because of that. I would expect the benchmarks to be 
run with pig libraries from -Dharness.old.pig  setting. This needs to be 
investigated.

> Several changes in Tez e2e
> --
>
> Key: PIG-3790
> URL: https://issues.apache.org/jira/browse/PIG-3790
> Project: Pig
>  Issue Type: Sub-task
>  Components: tez
>Affects Versions: tez-branch
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: tez-branch
>
> Attachments: PIG-3790-1.patch
>
>
> Need to change several thing in e2e:
> 1. If test contains tag "verify_pig_script", "verify_pig_script" still run 
> against tez, should run against MR
> 2. Have the following line in test:
> $ENV{'PIG_CLASSPATH'} = $ENV{'PIG_CLASSPATH'} . $separator . $pcp;
> PIG_CLASSPATH eventually get too long and exceed system limit
> 3. Some tests such as MultiQuery_11, perl command is enclosed in double quote 
> (perl -ne "print $_;"), Pig runtime will do the parameter substitution and 
> replace $_ to the last command executed. This seems should goes to MR as 
> well, if so, I will open another ticket to fix in trunk.
> 4. Since e2e is now in a good shape, we need to enable all test suite instead 
> of just tez.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3793) Provide info on number of LogicalRelationalOperator(s) used in the script through LogicalPlanData

2014-03-06 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13922950#comment-13922950
 ] 

Prashant Kommireddi commented on PIG-3793:
--

It's cool, that's how the project evolves. Ideas are always welcome :)

Sounds like you have a use-case where {{PigServer.registerQuery(String)}} could 
be followed by a call to {{LogicalPlanData. getNumLogicalRelationOperators()}} 
- both interleaved and multiple times. Basically registering queries and seeing 
how LP changes through the various registers. Would it suffice to have a 
{{resetLogicalPlanData()}} method on PigServer that recomputes the 
{{LogicalPlanData}} ?

> Provide info on number of LogicalRelationalOperator(s) used in the script 
> through LogicalPlanData
> -
>
> Key: PIG-3793
> URL: https://issues.apache.org/jira/browse/PIG-3793
> Project: Pig
>  Issue Type: Improvement
>  Components: data
>Affects Versions: 0.13.0
>Reporter: Prashant Kommireddi
>Assignee: Prashant Kommireddi
> Fix For: 0.13.0
>
> Attachments: PIG-3793.patch, PIG-3793_2.patch
>
>
> Its useful to have an understanding of how many operators are being used in 
> the script via the API. This could allow admins to enforce 
> checks/restrictions on the length/complexity of the plan in user scripts.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3799) TestCustomPartitioner is broken in tez branch

2014-03-06 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3799:
---

Attachment: PIG-3799-1.patch

The parallelism used to be set in POForeach, and that value used to overwrite 
that of TezOperator before PIG-3975. But since I got rid of the overwriting 
logic, the parallelism of vertex is no longer set.

Attached is a patch that explicitly sets the parallelism of vertex.

> TestCustomPartitioner is broken in tez branch
> -
>
> Key: PIG-3799
> URL: https://issues.apache.org/jira/browse/PIG-3799
> Project: Pig
>  Issue Type: Sub-task
>  Components: tez
>Affects Versions: tez-branch
>Reporter: Cheolsoo Park
>Assignee: Cheolsoo Park
> Fix For: tez-branch
>
> Attachments: PIG-3799-1.patch
>
>
> This is a regression of PIG-3795. In TezCompiler, visitDistinct() doesn't set 
> the requested parallelism of TezOperator, resulting that only one reducer 
> runs for the following query-
> {code}
> A = LOAD 'table_testCustomPartitionerDistinct' as (a0:int, a1:int);
> B = distinct A PARTITION BY 
> org.apache.pig.test.utils.SimpleCustomPartitioner3 parallel 2;
> {code}
> The test fails because it sees a single output file while it expects two.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3799) TestCustomPartitioner is broken in tez branch

2014-03-06 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3799:
---

Status: Patch Available  (was: Open)

> TestCustomPartitioner is broken in tez branch
> -
>
> Key: PIG-3799
> URL: https://issues.apache.org/jira/browse/PIG-3799
> Project: Pig
>  Issue Type: Sub-task
>  Components: tez
>Affects Versions: tez-branch
>Reporter: Cheolsoo Park
>Assignee: Cheolsoo Park
> Fix For: tez-branch
>
> Attachments: PIG-3799-1.patch
>
>
> This is a regression of PIG-3795. In TezCompiler, visitDistinct() doesn't set 
> the requested parallelism of TezOperator, resulting that only one reducer 
> runs for the following query-
> {code}
> A = LOAD 'table_testCustomPartitionerDistinct' as (a0:int, a1:int);
> B = distinct A PARTITION BY 
> org.apache.pig.test.utils.SimpleCustomPartitioner3 parallel 2;
> {code}
> The test fails because it sees a single output file while it expects two.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PIG-3799) TestCustomPartitioner is broken in tez branch

2014-03-06 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13922937#comment-13922937
 ] 

Cheolsoo Park edited comment on PIG-3799 at 3/6/14 7:31 PM:


The parallelism used to be set in POForeach, and that value used to overwrite 
that of TezOperator before PIG-3795. But since I got rid of the overwriting 
logic, the parallelism of vertex is no longer set.

Attached is a patch that explicitly sets the parallelism of vertex.


was (Author: cheolsoo):
The parallelism used to be set in POForeach, and that value used to overwrite 
that of TezOperator before PIG-3975. But since I got rid of the overwriting 
logic, the parallelism of vertex is no longer set.

Attached is a patch that explicitly sets the parallelism of vertex.

> TestCustomPartitioner is broken in tez branch
> -
>
> Key: PIG-3799
> URL: https://issues.apache.org/jira/browse/PIG-3799
> Project: Pig
>  Issue Type: Sub-task
>  Components: tez
>Affects Versions: tez-branch
>Reporter: Cheolsoo Park
>Assignee: Cheolsoo Park
> Fix For: tez-branch
>
> Attachments: PIG-3799-1.patch
>
>
> This is a regression of PIG-3795. In TezCompiler, visitDistinct() doesn't set 
> the requested parallelism of TezOperator, resulting that only one reducer 
> runs for the following query-
> {code}
> A = LOAD 'table_testCustomPartitionerDistinct' as (a0:int, a1:int);
> B = distinct A PARTITION BY 
> org.apache.pig.test.utils.SimpleCustomPartitioner3 parallel 2;
> {code}
> The test fails because it sees a single output file while it expects two.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PIG-3799) TestCustomPartitioner is broken in tez branch

2014-03-06 Thread Cheolsoo Park (JIRA)
Cheolsoo Park created PIG-3799:
--

 Summary: TestCustomPartitioner is broken in tez branch
 Key: PIG-3799
 URL: https://issues.apache.org/jira/browse/PIG-3799
 Project: Pig
  Issue Type: Sub-task
  Components: tez
Affects Versions: tez-branch
Reporter: Cheolsoo Park
Assignee: Cheolsoo Park
 Fix For: tez-branch


This is a regression of PIG-3795. In TezCompiler, visitDistinct() doesn't set 
the requested parallelism of TezOperator, resulting that only one reducer runs 
for the following query-
{code}
A = LOAD 'table_testCustomPartitionerDistinct' as (a0:int, a1:int);
B = distinct A PARTITION BY org.apache.pig.test.utils.SimpleCustomPartitioner3 
parallel 2;
{code}

The test fails because it sees a single output file while it expects two.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (PIG-3743) Use VertexGroup and Alias vertex for union

2014-03-06 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park reassigned PIG-3743:
--

Assignee: Cheolsoo Park

> Use VertexGroup and Alias vertex for union
> --
>
> Key: PIG-3743
> URL: https://issues.apache.org/jira/browse/PIG-3743
> Project: Pig
>  Issue Type: Sub-task
>  Components: tez
>Reporter: Rohini Palaniswamy
>Assignee: Cheolsoo Park
> Fix For: tez-branch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (PIG-3788) NPE when POStream is not in the leaf vertex

2014-03-06 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park resolved PIG-3788.


Resolution: Fixed

Committed to tez branch.

> NPE when POStream is not in the leaf vertex
> ---
>
> Key: PIG-3788
> URL: https://issues.apache.org/jira/browse/PIG-3788
> Project: Pig
>  Issue Type: Sub-task
>  Components: tez
>Affects Versions: tez-branch
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: tez-branch
>
> Attachments: PIG-3788-1.patch
>
>
> POStream will upload log files to the output directory of the job. In MR, 
> every job dump to a output directory (permanent or temporary), in Tez, 
> intermediate vertex does not have an output directory. Thus, 
> HadoopExecutableManager.close throw NPE.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (PIG-3786) POReservoirSample should handle endOfAllInput flag

2014-03-06 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park resolved PIG-3786.


Resolution: Fixed

Committed to tez branch. Thanks Daniel!

> POReservoirSample should handle endOfAllInput flag
> --
>
> Key: PIG-3786
> URL: https://issues.apache.org/jira/browse/PIG-3786
> Project: Pig
>  Issue Type: Sub-task
>  Components: tez
>Affects Versions: tez-branch
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: tez-branch
>
> Attachments: PIG-3786-1.patch
>
>
> POReservoirSample assumes it always get STATUS_OK until it hits STATUS_EOP, 
> means all inputs finished. This is not true if the plan also contains a 
> POStream, since POStream will issue a STATUS_EOP even if inputs are not 
> exhausted. Need to make POReservoirSample handles STATUS_EOP and 
> endOfAllInput flag. This cause e2e test failure such as ComputeSpec_3.
> POPoissonSample should do the same.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3793) Provide info on number of LogicalRelationalOperator(s) used in the script through LogicalPlanData

2014-03-06 Thread Kyungho Jeon (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13922703#comment-13922703
 ] 

Kyungho Jeon commented on PIG-3793:
---

Thank you for the response. I found that {{LogicalPlanData}} instance is 
created and initialized when {{PigServer.getLogicalPlanData()}} is called. 
Then, my concern becomes less significant. 

But now I am thinking that {{LogicalPlanData}} should work as an interface (in 
a general meaning, not a Java interface) to {{LogicalPlan}} rather than a 
stateful object, considering that the class was added to avoid exposing 
{{LogicalPlan}}. It won't be a problem in current Pig implementation right now. 
But if someone (like me, actually) is playing with {{LogicalPlan}} and its 
optimizers, for example {{AddForEach}}, then the fact that {{LogicalPlanData}} 
computes and keeps the value might be confusing. 

I have been following Pig for a year but I am still a newbie. Please forgive my 
pestering. I am just trying to understand the implementation. Any comment will 
be appreciated!

> Provide info on number of LogicalRelationalOperator(s) used in the script 
> through LogicalPlanData
> -
>
> Key: PIG-3793
> URL: https://issues.apache.org/jira/browse/PIG-3793
> Project: Pig
>  Issue Type: Improvement
>  Components: data
>Affects Versions: 0.13.0
>Reporter: Prashant Kommireddi
>Assignee: Prashant Kommireddi
> Fix For: 0.13.0
>
> Attachments: PIG-3793.patch, PIG-3793_2.patch
>
>
> Its useful to have an understanding of how many operators are being used in 
> the script via the API. This could allow admins to enforce 
> checks/restrictions on the length/complexity of the plan in user scripts.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3797) Fix some memory leaks affecting container reuse

2014-03-06 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-3797:


Attachment: PIG-3797-1.patch

> Fix some memory leaks affecting container reuse
> ---
>
> Key: PIG-3797
> URL: https://issues.apache.org/jira/browse/PIG-3797
> Project: Pig
>  Issue Type: Sub-task
>  Components: tez
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: tez-branch
>
> Attachments: PIG-3797-1.patch
>
>
> PigCombiner.sJobContext and PhysicalOperator.reporter hold references to the 
> WrappedReducer$Context which in turn holds TezOutputContextImpl which holds 
> references to the buffers in DefaultSorter. This was causing OOM after the 
> container was reused 2 or 3 times. Debugged this with L17.pig in pigmix. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3797) Fix some memory leaks affecting container reuse

2014-03-06 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-3797:


Attachment: (was: PIG-3797-1.patch)

> Fix some memory leaks affecting container reuse
> ---
>
> Key: PIG-3797
> URL: https://issues.apache.org/jira/browse/PIG-3797
> Project: Pig
>  Issue Type: Sub-task
>  Components: tez
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: tez-branch
>
> Attachments: PIG-3797-1.patch
>
>
> PigCombiner.sJobContext and PhysicalOperator.reporter hold references to the 
> WrappedReducer$Context which in turn holds TezOutputContextImpl which holds 
> references to the buffers in DefaultSorter. This was causing OOM after the 
> container was reused 2 or 3 times. Debugged this with L17.pig in pigmix. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3446) Umbrella jira for Pig on Tez

2014-03-06 Thread Rohit Laddha (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohit Laddha updated PIG-3446:
--

Description: 
This is a umbrella jira for Pig on Tez. More detailed subtasks will be added.

More information can be found on the following wiki page:
https://cwiki.apache.org/confluence/display/PIG/Pig+on+Tez

How to set up your development environment- 
# Check out [Tez 
trunk|https://builds.apache.org/job/Tez-Build/127/changeshttps://github.com/apache/incubator-tez].
# Install protobuf 2.5.0.
# Build Tez with Hadoop 2.2.0.(By default, it builds against Hadoop trunk, 
which is 3.0.0.)
# Install Tez jars on local maven repository with "mvn install -DskipTests".
# Check out [Pig Tez branch|https://github.com/apache/pig/tree/tez].
# Build Pig running "ant jar-withouthadoop".
# Set up a single-node (or multi-node) Hadoop 2.2 cluster.
# Install Tez following the instructions on the [Tez 
homepage|http://tez.incubator.apache.org/install.html].
# Run Pig with "-x tez" option.

How to run Tez tests-
* unit test
{code}
ant test-tez
{code}
By default, exectype is tez, and hadoopversion is 23 in tez branch. But you can 
run unit tests in mr mode as follows:
{code}
ant test -Dexectype=mr -Dhadoopversion=20
{code}
* e2e tests
{code}
ant -Dharness.old.pig=$PIG_HOME -Dharness.hadoop.home=$HADOOP_HOME 
-Dharness.cluster.conf=$HADOOP_CONF -Dharness.cluster.bin=$HADOOP_BIN 
test-e2e-tez -Dhadoopversion=23
{code}

  was:
This is a umbrella jira for Pig on Tez. More detailed subtasks will be added.

More information can be found on the following wiki page:
https://cwiki.apache.org/confluence/display/PIG/Pig+on+Tez

How to set up your development environment- 
# Check out [Tez trunk|https://github.com/apache/incubator-tez].
# Install protobuf 2.5.0.
# Build Tez with Hadoop 2.2.0.(By default, it builds against Hadoop trunk, 
which is 3.0.0.)
# Install Tez jars on local maven repository with "mvn install -DskipTests".
# Check out [Pig Tez branch|https://github.com/apache/pig/tree/tez].
# Build Pig running "ant jar-withouthadoop".
# Set up a single-node (or multi-node) Hadoop 2.2 cluster.
# Install Tez following the instructions on the [Tez 
homepage|http://tez.incubator.apache.org/install.html].
# Run Pig with "-x tez" option.

How to run Tez tests-
* unit test
{code}
ant test-tez
{code}
By default, exectype is tez, and hadoopversion is 23 in tez branch. But you can 
run unit tests in mr mode as follows:
{code}
ant test -Dexectype=mr -Dhadoopversion=20
{code}
* e2e tests
{code}
ant -Dharness.old.pig=$PIG_HOME -Dharness.hadoop.home=$HADOOP_HOME 
-Dharness.cluster.conf=$HADOOP_CONF -Dharness.cluster.bin=$HADOOP_BIN 
test-e2e-tez -Dhadoopversion=23
{code}


> Umbrella jira for Pig on Tez
> 
>
> Key: PIG-3446
> URL: https://issues.apache.org/jira/browse/PIG-3446
> Project: Pig
>  Issue Type: New Feature
>  Components: tez
>Affects Versions: tez-branch
>Reporter: Cheolsoo Park
>Assignee: Cheolsoo Park
> Fix For: tez-branch
>
>
> This is a umbrella jira for Pig on Tez. More detailed subtasks will be added.
> More information can be found on the following wiki page:
> https://cwiki.apache.org/confluence/display/PIG/Pig+on+Tez
> How to set up your development environment- 
> # Check out [Tez 
> trunk|https://builds.apache.org/job/Tez-Build/127/changeshttps://github.com/apache/incubator-tez].
> # Install protobuf 2.5.0.
> # Build Tez with Hadoop 2.2.0.(By default, it builds against Hadoop trunk, 
> which is 3.0.0.)
> # Install Tez jars on local maven repository with "mvn install -DskipTests".
> # Check out [Pig Tez branch|https://github.com/apache/pig/tree/tez].
> # Build Pig running "ant jar-withouthadoop".
> # Set up a single-node (or multi-node) Hadoop 2.2 cluster.
> # Install Tez following the instructions on the [Tez 
> homepage|http://tez.incubator.apache.org/install.html].
> # Run Pig with "-x tez" option.
> How to run Tez tests-
> * unit test
> {code}
> ant test-tez
> {code}
> By default, exectype is tez, and hadoopversion is 23 in tez branch. But you 
> can run unit tests in mr mode as follows:
> {code}
> ant test -Dexectype=mr -Dhadoopversion=20
> {code}
> * e2e tests
> {code}
> ant -Dharness.old.pig=$PIG_HOME -Dharness.hadoop.home=$HADOOP_HOME 
> -Dharness.cluster.conf=$HADOOP_CONF -Dharness.cluster.bin=$HADOOP_BIN 
> test-e2e-tez -Dhadoopversion=23
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PIG-3798) registered jar in pig script are appended to the classpath multiple times

2014-03-06 Thread Dotan Patrich (JIRA)
Dotan Patrich created PIG-3798:
--

 Summary: registered jar in pig script are appended to the 
classpath multiple times
 Key: PIG-3798
 URL: https://issues.apache.org/jira/browse/PIG-3798
 Project: Pig
  Issue Type: Bug
 Environment:  Apache Pig version 0.11.0-cdh4.4.0
Reporter: Dotan Patrich


when running several pig scripts one after another using java class PigServer, 
the classpath in taskjvm.sh gets longer and longer, eventually execution breaks 
on having a too long classpath. 

The jar registered at the pig script are appended to the classpath on every 
execution. It seems that PigContext's skipJars member is the root cause for 
this as it is added jars that already exists in the list multiple times.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 15881: PIG-3591: Refactor POPackage

2014-03-06 Thread Rohini Palaniswamy

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15881/#review36273
---


I am still going over the main classes POPackage and Packager implementations. 
Just going over thoroughly to ensure no piece of code is missed out in the 
refactor. Will update any comments on that tomorrow. 


src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler.java


I believe this check should not be removed



src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/plans/PlanPrinter.java


Don't we still need this? i.e Print out the list of packagers 
MultiQueryPackager has.



src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/plans/XMLPhysicalPlanPrinter.java


Don't we still need this?



src/org/apache/pig/pen/IllustratorAttacher.java


Don't we need the equivalent of this code in visitPackage?


- Rohini Palaniswamy


On March 4, 2014, 9:40 p.m., Mark Wagner wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/15881/
> ---
> 
> (Updated March 4, 2014, 9:40 p.m.)
> 
> 
> Review request for pig and Cheolsoo Park.
> 
> 
> Bugs: PIG-3591
> https://issues.apache.org/jira/browse/PIG-3591
> 
> 
> Repository: pig-git
> 
> 
> Description
> ---
> 
> Separate "packaging" logic from "shuffle handling" logic. This moves the 
> packaging logic to a new class "Packager", which is extended by 
> CombinePackager, LitePackager, MultiQueryPackager, and JoinPackager.
> 
> This is not finished. Known problem are illustrate and streaming the last 
> input are not implemented.
> 
> 
> Diffs
> -
> 
>   src/org/apache/pig/backend/hadoop/executionengine/fetch/FetchOptimizer.java 
> d801f6f 
>   
> src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/AccumulatorOptimizer.java
>  3638b5c 
>   
> src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/CombinerOptimizer.java
>  18a382b 
>   
> src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java
>  5e28eb6 
>   
> src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler.java
>  5dddab7 
>   
> src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRUtil.java 
> 93de6d5 
>   
> src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceLauncher.java
>  eb7c428 
>   
> src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MultiQueryOptimizer.java
>  64f0ee1 
>   
> src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PhyPlanSetter.java
>  933363d 
>   
> src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigCombiner.java
>  773a22c 
>   
> src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapReduce.java
>  eea5ce3 
>   
> src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/SecondaryKeyOptimizer.java
>  1578630 
>   
> src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/plans/POPackageAnnotator.java
>  47137d5 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/plans/PhyPlanVisitor.java
>  abb16ff 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/plans/PlanPrinter.java
>  ff82801 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/plans/XMLPhysicalPlanPrinter.java
>  892c26f 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/CombinerPackager.java
>  PRE-CREATION 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/JoinPackager.java
>  PRE-CREATION 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/LitePackager.java
>  PRE-CREATION 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/MultiQueryPackager.java
>  PRE-CREATION 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POCombinerPackage.java
>  9105a0e 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POJoinPackage.java
>  82f11ac 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POMultiQueryPackage.java
>  d604174 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPackage.java
>  86314d9 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPackageLite.java
>  c200715 
>   
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators

[jira] [Updated] (PIG-3603) Add counters to TezStats

2014-03-06 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3603:
---

Attachment: PIG-3603-1.patch

RB link-
https://reviews.apache.org/r/18832/

> Add counters to TezStats
> 
>
> Key: PIG-3603
> URL: https://issues.apache.org/jira/browse/PIG-3603
> Project: Pig
>  Issue Type: Sub-task
>  Components: tez
>Affects Versions: tez-branch
>Reporter: Cheolsoo Park
>Assignee: Cheolsoo Park
> Fix For: tez-branch
>
> Attachments: PIG-3603-1.patch
>
>
> Counters are now supported by Tez (TEZ-12). We should add counters to 
> TezStats.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3603) Add counters to TezStats

2014-03-06 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3603:
---

Status: Patch Available  (was: Open)

> Add counters to TezStats
> 
>
> Key: PIG-3603
> URL: https://issues.apache.org/jira/browse/PIG-3603
> Project: Pig
>  Issue Type: Sub-task
>  Components: tez
>Affects Versions: tez-branch
>Reporter: Cheolsoo Park
>Assignee: Cheolsoo Park
> Fix For: tez-branch
>
> Attachments: PIG-3603-1.patch
>
>
> Counters are now supported by Tez (TEZ-12). We should add counters to 
> TezStats.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Review Request 18832: PIG-3603: Add counters to TezStats

2014-03-06 Thread Cheolsoo Park

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18832/
---

Review request for pig, Mark Wagner and Rohini Palaniswamy.


Bugs: PIG-3603
https://issues.apache.org/jira/browse/PIG-3603


Repository: pig-git


Description
---

This patch adds the following counters to TezStats:
1) # of input/output records
2) hdfs bytes read/written
3) file bytes read/written

The job stats looks like this:

   JobId: job_pigexec_0 
  
  TotalLaunchedTasks: 3 
  
   FileBytesRead: 2434726   
  
FileBytesWritten: 4869516   
  
   HdfsBytesRead: 2219954   
  
HdfsBytesWritten: 2433980   
  

Input(s): Successfully read 1 records (1109977 bytes) from: 
"/user/pig/tests/data/singlefile/studentcomplextab10k"
: Successfully read 1 records (1109977 bytes) from: 
"/user/pig/tests/data/singlefile/studentcomplextab10k"
   Output(s): Successfully stored 10393 records (2433980 bytes) in: 
"hdfs://localhost:57063/tmp/temp90703803/tmp-1606775243"

This patch also includes the following changes in PigStats/JobStats classes:
1) Move getHdfsBytesRead() and getHdfsBytesWritten() from MRPigStatsUtil to 
PigStatsUtil since these are not MR specific.
2) Move [MAP|REDUCE]_[OUT|IN]PUT_RECORDS from MRPigStatsUtil to PigStatsUtil 
since Tez MRInput and MROutput also use them.
3) Fix a typo in JobStats#getAvgREduceTime(): REduce -> Reduce.
4) Fix white spaces.

Note that none of these changes breaks backward compatibility.


Diffs
-

  src/org/apache/pig/PigServer.java 2004edb 
  src/org/apache/pig/backend/hadoop/executionengine/tez/TezJob.java 5d12091 
  src/org/apache/pig/tools/pigstats/InputStats.java 38c8372 
  src/org/apache/pig/tools/pigstats/JobStats.java 4484348 
  src/org/apache/pig/tools/pigstats/OutputStats.java 6a3e3eb 
  src/org/apache/pig/tools/pigstats/PigStats.java 3032728 
  src/org/apache/pig/tools/pigstats/PigStatsUtil.java e690b8d 
  src/org/apache/pig/tools/pigstats/ScriptState.java d58310d 
  src/org/apache/pig/tools/pigstats/mapreduce/MRJobStats.java 115ae1d 
  src/org/apache/pig/tools/pigstats/mapreduce/MRPigStatsUtil.java ed791fd 
  src/org/apache/pig/tools/pigstats/tez/TezStats.java 64d70e7 
  src/org/apache/pig/tools/pigstats/tez/TezTaskStats.java c3f1c3e 
  test/org/apache/pig/test/TestCombiner.java ae2135e 
  test/org/apache/pig/test/TestPigServer.java 8613c3b 

Diff: https://reviews.apache.org/r/18832/diff/


Testing
---

ant test-tez passes except TestTezCompiler (known).
tez e2e tests pass.


Thanks,

Cheolsoo Park



Re: Review Request 18832: PIG-3603: Add counters to TezStats

2014-03-06 Thread Cheolsoo Park

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18832/
---

(Updated March 6, 2014, 8 a.m.)


Review request for pig, Mark Wagner and Rohini Palaniswamy.


Bugs: PIG-3603
https://issues.apache.org/jira/browse/PIG-3603


Repository: pig-git


Description
---

This patch adds the following counters to TezStats:
1) # of input/output records
2) hdfs bytes read/written
3) file bytes read/written

The job stats looks like this:

   JobId: job_pigexec_0 
  
  TotalLaunchedTasks: 3 
  
   FileBytesRead: 2434726   
  
FileBytesWritten: 4869516   
  
   HdfsBytesRead: 2219954   
  
HdfsBytesWritten: 2433980   
  

Input(s): Successfully read 1 records (1109977 bytes) from: 
"/user/pig/tests/data/singlefile/studentcomplextab10k"
: Successfully read 1 records (1109977 bytes) from: 
"/user/pig/tests/data/singlefile/studentcomplextab10k"
   Output(s): Successfully stored 10393 records (2433980 bytes) in: 
"hdfs://localhost:57063/tmp/temp90703803/tmp-1606775243"

This patch also includes the following changes in PigStats/JobStats classes:
1) Move getHdfsBytesRead() and getHdfsBytesWritten() from MRPigStatsUtil to 
PigStatsUtil since these are not MR specific.
2) Move [MAP|REDUCE]_[OUT|IN]PUT_RECORDS from MRPigStatsUtil to PigStatsUtil 
since Tez MRInput and MROutput also use them.
3) Fix a typo in JobStats#getAvgREduceTime(): REduce -> Reduce.
4) Fix white spaces.

Note that none of these changes breaks backward compatibility.


Diffs
-

  src/org/apache/pig/PigServer.java 2004edb 
  src/org/apache/pig/backend/hadoop/executionengine/tez/TezJob.java 5d12091 
  src/org/apache/pig/tools/pigstats/InputStats.java 38c8372 
  src/org/apache/pig/tools/pigstats/JobStats.java 4484348 
  src/org/apache/pig/tools/pigstats/OutputStats.java 6a3e3eb 
  src/org/apache/pig/tools/pigstats/PigStats.java 3032728 
  src/org/apache/pig/tools/pigstats/PigStatsUtil.java e690b8d 
  src/org/apache/pig/tools/pigstats/ScriptState.java d58310d 
  src/org/apache/pig/tools/pigstats/mapreduce/MRJobStats.java 115ae1d 
  src/org/apache/pig/tools/pigstats/mapreduce/MRPigStatsUtil.java ed791fd 
  src/org/apache/pig/tools/pigstats/tez/TezStats.java 64d70e7 
  src/org/apache/pig/tools/pigstats/tez/TezTaskStats.java c3f1c3e 
  test/org/apache/pig/test/TestCombiner.java ae2135e 
  test/org/apache/pig/test/TestPigServer.java 8613c3b 

Diff: https://reviews.apache.org/r/18832/diff/


Testing
---

ant test-tez passes except TestTezCompiler (known).
tez e2e tests pass.


Thanks,

Cheolsoo Park