[jira] [Commented] (PIG-3815) Hadoop bug causes to pig to fail silently with jar cache

2014-03-18 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13940196#comment-13940196
 ] 

Aniket Mokashi commented on PIG-3815:
-

I just realized that there is a better way to refactor this code. Can someone 
review the patch attached?

> Hadoop bug causes to pig to fail silently with jar cache
> 
>
> Key: PIG-3815
> URL: https://issues.apache.org/jira/browse/PIG-3815
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
> Fix For: 0.13.0
>
> Attachments: PIG-3815-1.patch, PIG-3815-2.patch, PIG-3815-3.patch, 
> PIG-3815.patch
>
>
> Pig uses DistributedCache.addFileToClassPath api that puts jars on 
> distributed cache configuration. This uses : to separate list of files to be 
> put of classpath via distributed cache. If fs.default.name has port number in 
> it, it causes the tokenization logic to fail in hadoop for retrieving list of 
> cache filenames in backend.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3815) Hadoop bug causes to pig to fail silently with jar cache

2014-03-18 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-3815:


Status: Patch Available  (was: Reopened)

> Hadoop bug causes to pig to fail silently with jar cache
> 
>
> Key: PIG-3815
> URL: https://issues.apache.org/jira/browse/PIG-3815
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
> Fix For: 0.13.0
>
> Attachments: PIG-3815-1.patch, PIG-3815-2.patch, PIG-3815-3.patch, 
> PIG-3815.patch
>
>
> Pig uses DistributedCache.addFileToClassPath api that puts jars on 
> distributed cache configuration. This uses : to separate list of files to be 
> put of classpath via distributed cache. If fs.default.name has port number in 
> it, it causes the tokenization logic to fail in hadoop for retrieving list of 
> cache filenames in backend.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Reopened] (PIG-3815) Hadoop bug causes to pig to fail silently with jar cache

2014-03-18 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi reopened PIG-3815:
-


> Hadoop bug causes to pig to fail silently with jar cache
> 
>
> Key: PIG-3815
> URL: https://issues.apache.org/jira/browse/PIG-3815
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
> Fix For: 0.13.0
>
> Attachments: PIG-3815-1.patch, PIG-3815-2.patch, PIG-3815-3.patch, 
> PIG-3815.patch
>
>
> Pig uses DistributedCache.addFileToClassPath api that puts jars on 
> distributed cache configuration. This uses : to separate list of files to be 
> put of classpath via distributed cache. If fs.default.name has port number in 
> it, it causes the tokenization logic to fail in hadoop for retrieving list of 
> cache filenames in backend.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3815) Hadoop bug causes to pig to fail silently with jar cache

2014-03-18 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-3815:


Attachment: PIG-3815-3.patch

> Hadoop bug causes to pig to fail silently with jar cache
> 
>
> Key: PIG-3815
> URL: https://issues.apache.org/jira/browse/PIG-3815
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
> Fix For: 0.13.0
>
> Attachments: PIG-3815-1.patch, PIG-3815-2.patch, PIG-3815-3.patch, 
> PIG-3815.patch
>
>
> Pig uses DistributedCache.addFileToClassPath api that puts jars on 
> distributed cache configuration. This uses : to separate list of files to be 
> put of classpath via distributed cache. If fs.default.name has port number in 
> it, it causes the tokenization logic to fail in hadoop for retrieving list of 
> cache filenames in backend.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] Subscription: PIG patch available

2014-03-18 Thread jira
Issue Subscription
Filter: PIG patch available (14 issues)

Subscriber: pigdaily

Key Summary
PIG-3801Auto local mode does not call storeSchema
https://issues.apache.org/jira/browse/PIG-3801
PIG-3794pig -useHCatalog fails using pig command line interface on HDInsight
https://issues.apache.org/jira/browse/PIG-3794
PIG-3789tuple in POStream binaryInputQueue keep changing
https://issues.apache.org/jira/browse/PIG-3789
PIG-3771Piggybank Avrostorage makes a lot of namenode calls in the backend
https://issues.apache.org/jira/browse/PIG-3771
PIG-3757Make scalar work
https://issues.apache.org/jira/browse/PIG-3757
PIG-3737Bundle dependent jars in distribution in %PIG_HOME%/lib folder
https://issues.apache.org/jira/browse/PIG-3737
PIG-3735UDF to data cleanse the dirty data with expected pattern
https://issues.apache.org/jira/browse/PIG-3735
PIG-3668COR built-in function when atleast one of the coefficient values is 
NaN
https://issues.apache.org/jira/browse/PIG-3668
PIG-3635Fix e2e tests for Hadoop 2.X on Windows
https://issues.apache.org/jira/browse/PIG-3635
PIG-3613UDF for SimilarityMatching between strings with matching scores
https://issues.apache.org/jira/browse/PIG-3613
PIG-3587add functionality for rolling over dates
https://issues.apache.org/jira/browse/PIG-3587
PIG-3456Reduce threadlocal conf access in backend for each record
https://issues.apache.org/jira/browse/PIG-3456
PIG-3441Allow Pig to use default resources from Configuration objects
https://issues.apache.org/jira/browse/PIG-3441
PIG-3373XMLLoader returns non-matching nodes when a tag name spans through 
the block boundary
https://issues.apache.org/jira/browse/PIG-3373

You may edit this subscription at:
https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=13225&filterId=12322384


[jira] [Updated] (PIG-3789) tuple in POStream binaryInputQueue keep changing

2014-03-18 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-3789:


Attachment: PIG-3789-2.patch

The problem is in POValueInputTez. When we read tuple from edge, tuples are 
produced by BinSedesTuple.readFields. It reuses the tuple and mFields will be 
cleared and rebuild for every tuple. When running streaming operation 
asynchronously, tuple saved to binaryInputQueue keeps changing. Checked all 
other TezLoad, seems fine. POShuffleTezLoad already made a copy 
(Packager.getValueTuple), POSimpleTezLoad relies on loader to create new tuple. 
Other TezLoad will not send input tuple to binaryInputQueue. 

Attach patch.

> tuple in POStream binaryInputQueue keep changing
> 
>
> Key: PIG-3789
> URL: https://issues.apache.org/jira/browse/PIG-3789
> Project: Pig
>  Issue Type: Sub-task
>  Components: tez
>Affects Versions: tez-branch
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: tez-branch
>
> Attachments: PIG-3789-1.patch, PIG-3789-2.patch
>
>
> Similar to the comments in POSimpleTezLoad:
> {code}
> /**
>  * Previously, we reused the same Result object for all results, but we 
> found
>  * certain operators (e.g. POStream) save references to the Result object 
> and
>  * expect it to be constant.
>  */
> {code}
> Tuples put into binaryInputQueue get changed when it is actually processed. 
> Not exactly sure why, but make a copy of the tuple solves the issue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3789) tuple in POStream binaryInputQueue keep changing

2014-03-18 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-3789:


Status: Patch Available  (was: Open)

> tuple in POStream binaryInputQueue keep changing
> 
>
> Key: PIG-3789
> URL: https://issues.apache.org/jira/browse/PIG-3789
> Project: Pig
>  Issue Type: Sub-task
>  Components: tez
>Affects Versions: tez-branch
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: tez-branch
>
> Attachments: PIG-3789-1.patch, PIG-3789-2.patch
>
>
> Similar to the comments in POSimpleTezLoad:
> {code}
> /**
>  * Previously, we reused the same Result object for all results, but we 
> found
>  * certain operators (e.g. POStream) save references to the Result object 
> and
>  * expect it to be constant.
>  */
> {code}
> Tuples put into binaryInputQueue get changed when it is actually processed. 
> Not exactly sure why, but make a copy of the tuple solves the issue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PIG-3789) tuple in POStream binaryInputQueue keep changing

2014-03-18 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916945#comment-13916945
 ] 

Daniel Dai edited comment on PIG-3789 at 3/19/14 12:10 AM:
---

The problem go away with the patch. Better to dive deep to figure out why it 
happens. This affects MultiQuery_11.


was (Author: daijy):
The problem go away with the patch. Better to dive deep to figure out why it 
happens. This affects ComputeSpec_3.

> tuple in POStream binaryInputQueue keep changing
> 
>
> Key: PIG-3789
> URL: https://issues.apache.org/jira/browse/PIG-3789
> Project: Pig
>  Issue Type: Sub-task
>  Components: tez
>Affects Versions: tez-branch
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: tez-branch
>
> Attachments: PIG-3789-1.patch, PIG-3789-2.patch
>
>
> Similar to the comments in POSimpleTezLoad:
> {code}
> /**
>  * Previously, we reused the same Result object for all results, but we 
> found
>  * certain operators (e.g. POStream) save references to the Result object 
> and
>  * expect it to be constant.
>  */
> {code}
> Tuples put into binaryInputQueue get changed when it is actually processed. 
> Not exactly sure why, but make a copy of the tuple solves the issue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-2970) Nested foreach getting incorrect schema when having unrelated inner query

2014-03-18 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13939975#comment-13939975
 ] 

Daniel Dai commented on PIG-2970:
-

When working on PIG-3807, realize no matter how we push up 
DanglingNestedNodeRemover, we will still invoke getSchema in 
LogicalPlanBuilder. [~horaguchi] is right, we shall iterate inner plan sinks to 
find LOGenerate. This is applicable to LOForeach.getSchema(). After 
DanglingNestedNodeRemover, it is safe to assume LOGenerate is the only sink, so 
we don't need to sprawl the change to optimizer. I will bring some of Koji's 
change in PIG-3807.

> Nested foreach getting incorrect schema when having unrelated inner query
> -
>
> Key: PIG-2970
> URL: https://issues.apache.org/jira/browse/PIG-2970
> Project: Pig
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 0.10.0
>Reporter: Koji Noguchi
>Assignee: Daniel Dai
>Priority: Minor
> Fix For: 0.12.0
>
> Attachments: PIG-2970-0.patch, PIG-2970-1.patch, PIG-2970-2.patch, 
> pig-2970-trunk-v01.txt, pig-2970-trunk-v02.txt
>
>
> While looking at PIG-2968, hit a weird error message.
> {noformat}
> $ cat -n test/foreach2.pig
>  1  daily = load 'nyse' as (exchange, symbol);
>  2  grpd = group daily by exchange;
>  3  unique = foreach grpd {
>  4  sym = daily.symbol;
>  5  uniq_sym = distinct sym;
>  6  --ignoring uniq_sym result
>  7  generate group, daily;
>  8  };
>  9  describe unique;
> 10  zzz = foreach unique generate group;
> 11  explain zzz;
> % pig -x local -t ColumnMapKeyPrune test/foreach2.pig
> ...
> unique: {symbol: bytearray}
> 2012-10-12 16:55:44,226 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1025: 
>  Invalid field projection. 
> Projected field [group] does not exist in schema: symbol:bytearray.
> ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3813) Rank column is assigned different uids everytime when schema is reset

2014-03-18 Thread Suhas Satish (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13939968#comment-13939968
 ] 

Suhas Satish commented on PIG-3813:
---

Thanks for the quick turn around Cheolsoo

> Rank column is assigned different uids everytime when schema is reset
> -
>
> Key: PIG-3813
> URL: https://issues.apache.org/jira/browse/PIG-3813
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.12.0
>Reporter: Suhas Satish
>Assignee: Cheolsoo Park
>Priority: Critical
> Fix For: 0.13.0
>
> Attachments: PIG-3813-1.patch, PIG-3813-2.patch, test_data.txt
>
>
> When the following script is run, pig goes into an infinite loop. This was 
> reproduced on pig trunk as of March 12, 2014 on apache hadoop 1.2. 
> test_data.txt has been attached. 
> test.pig
> tWeek = LOAD '/tmp/test_data.txt' USING PigStorage ('|') AS (WEEK:int, 
> DESCRIPTION:chararray, END_DATE:chararray, PERIOD:int);
> gTWeek = FOREACH tWeek GENERATE WEEK AS WEEK, PERIOD AS PERIOD;
> pWeek = FILTER gTWeek BY PERIOD == 201312;
> pWeekRanked = RANK pWeek BY WEEK ASC DENSE;
> gpWeekRanked = FOREACH pWeekRanked GENERATE $0;
> store gpWeekRanked into 'gpWeekRanked';
> describe gpWeekRanked;
> ---
> The res object of class Result, gets its value from leaf.getNextTuple()
> This gets an empty tuple 
> () 
> with STATUS_OK.
> SO the while(true) condition never gets an End of Processing (EOP) and so 
> does not exit. 
>  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PIG-2784) Framework for dynamic query optimization

2014-03-18 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13939962#comment-13939962
 ] 

Aniket Mokashi edited comment on PIG-2784 at 3/18/14 11:34 PM:
---

[~rajitha], [~zhiweicai], thanks for your interest in this project.

To proceed, you need to submit your proposal with the details on approach, plan 
etc. If you would like to clarify something, please use this jira as a place 
for discussions.

bq. Do we know about the size of data to process before Pig compile the job? 
Yes. This lets pig do reducer estimation.

bq.  What's the difference between implementing this framework inside 
JobControlCompiler and inside MRCompiler? Which one do you think is better?
MRCompiler deals with compiling physical plan into mapreduce operators and 
JobControlCompiler takes these compiled jobs and submits them to run on hadoop 
via hadoop's jobcontrol api. It's also responsible for maintaining progress 
report, stats etc. As part of this jira, you need to find out how we can take 
any (or all) of these optimizations and find the best place to plug them in. I 
will look forward to see your thoughts on how it should work.

bq. Do I need to consider more kind of optimization other than optimizations 
mentioned in the description? Is it possible that we categorize the 
optimizations into several types and make it easier to extend in the future?
It would be nice if we can allow additions of new optimizations in future.


was (Author: aniket486):
[~rajitha], [~zhiweicai], thanks for your interest in this project.

To proceed, you need to submit your proposal with the details on approach, plan 
etc. If you would like to clarify something, please use this jira as a place 
for discussions.

bq. Do we know about the size of data to process before Pig compile the job? 
Yes. This lets pig do reducer estimation.

bq.  What's the difference between implementing this framework inside 
JobControlCompiler and inside MRCompiler? Which one do you think is better?
- MRCompiler deals with compiling physical plan into mapreduce operators and 
JobControlCompiler takes these compiled jobs and submits them to run on hadoop 
via hadoop's jobcontrol api. It's also responsible for maintaining progress 
report, stats etc. As part of this jira, you need to find out how we can take 
any (or all) of these optimizations and find the best place to plug them in. I 
will look forward to see your thoughts on how it should work.

bq. Do I need to consider more kind of optimization other than optimizations 
mentioned in the description? Is it possible that we categorize the 
optimizations into several types and make it easier to extend in the future?
It would be nice if we can allow additions of new optimizations in future.

> Framework for dynamic query optimization
> 
>
> Key: PIG-2784
> URL: https://issues.apache.org/jira/browse/PIG-2784
> Project: Pig
>  Issue Type: New Feature
>Reporter: Jie Li
>  Labels: GSOC2014
>
> We need a framework to implement dynamic query optimization, i.e. changing 
> the query plan at runtime. Currently we support estimating the number of 
> reducers dynamically, which works well as the first step but was not 
> perfectly implemented. In near future, we'll support more dynamic 
> optimization, like [removing sample job for 
> order-by|https://issues.apache.org/jira/browse/PIG-483], [removing limit 
> job|https://issues.apache.org/jira/browse/PIG-2675], dynamically detecting 
> skew and using skew-join, etc.
> Currently estimating #reducer is implemented in JobControlCompiler after 
> MRCompiler compiles all the MapReduceOperators and generate the complete 
> MRPlan. One place (discussed with Thejas) to implement the framework is at 
> the MRCompiler, where the MRPlan'll be generated at batches and adjusted 
> dynamically. 
> Any comment?
> This is a candidate project for Google summer of code 2014. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2014



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-2784) Framework for dynamic query optimization

2014-03-18 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13939962#comment-13939962
 ] 

Aniket Mokashi commented on PIG-2784:
-

[~rajitha], [~zhiweicai], thanks for your interest in this project.

To proceed, you need to submit your proposal with the details on approach, plan 
etc. If you would like to clarify something, please use this jira as a place 
for discussions.

bq. Do we know about the size of data to process before Pig compile the job? 
Yes. This lets pig do reducer estimation.

bq.  What's the difference between implementing this framework inside 
JobControlCompiler and inside MRCompiler? Which one do you think is better?
- MRCompiler deals with compiling physical plan into mapreduce operators and 
JobControlCompiler takes these compiled jobs and submits them to run on hadoop 
via hadoop's jobcontrol api. It's also responsible for maintaining progress 
report, stats etc. As part of this jira, you need to find out how we can take 
any (or all) of these optimizations and find the best place to plug them in. I 
will look forward to see your thoughts on how it should work.

bq. Do I need to consider more kind of optimization other than optimizations 
mentioned in the description? Is it possible that we categorize the 
optimizations into several types and make it easier to extend in the future?
It would be nice if we can allow additions of new optimizations in future.

> Framework for dynamic query optimization
> 
>
> Key: PIG-2784
> URL: https://issues.apache.org/jira/browse/PIG-2784
> Project: Pig
>  Issue Type: New Feature
>Reporter: Jie Li
>  Labels: GSOC2014
>
> We need a framework to implement dynamic query optimization, i.e. changing 
> the query plan at runtime. Currently we support estimating the number of 
> reducers dynamically, which works well as the first step but was not 
> perfectly implemented. In near future, we'll support more dynamic 
> optimization, like [removing sample job for 
> order-by|https://issues.apache.org/jira/browse/PIG-483], [removing limit 
> job|https://issues.apache.org/jira/browse/PIG-2675], dynamically detecting 
> skew and using skew-join, etc.
> Currently estimating #reducer is implemented in JobControlCompiler after 
> MRCompiler compiles all the MapReduceOperators and generate the complete 
> MRPlan. One place (discussed with Thejas) to implement the framework is at 
> the MRCompiler, where the MRPlan'll be generated at batches and adjusted 
> dynamically. 
> Any comment?
> This is a candidate project for Google summer of code 2014. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2014



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3813) Rank column is assigned different uids everytime when schema is reset

2014-03-18 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3813:
---

   Resolution: Fixed
Fix Version/s: 0.13.0
   Status: Resolved  (was: Patch Available)

Committed to trunk.
Thank you Daniel for reviewing the patch!
Thank you Suhas for reporting the issue!

> Rank column is assigned different uids everytime when schema is reset
> -
>
> Key: PIG-3813
> URL: https://issues.apache.org/jira/browse/PIG-3813
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.12.0
>Reporter: Suhas Satish
>Assignee: Cheolsoo Park
>Priority: Critical
> Fix For: 0.13.0
>
> Attachments: PIG-3813-1.patch, PIG-3813-2.patch, test_data.txt
>
>
> When the following script is run, pig goes into an infinite loop. This was 
> reproduced on pig trunk as of March 12, 2014 on apache hadoop 1.2. 
> test_data.txt has been attached. 
> test.pig
> tWeek = LOAD '/tmp/test_data.txt' USING PigStorage ('|') AS (WEEK:int, 
> DESCRIPTION:chararray, END_DATE:chararray, PERIOD:int);
> gTWeek = FOREACH tWeek GENERATE WEEK AS WEEK, PERIOD AS PERIOD;
> pWeek = FILTER gTWeek BY PERIOD == 201312;
> pWeekRanked = RANK pWeek BY WEEK ASC DENSE;
> gpWeekRanked = FOREACH pWeekRanked GENERATE $0;
> store gpWeekRanked into 'gpWeekRanked';
> describe gpWeekRanked;
> ---
> The res object of class Result, gets its value from leaf.getNextTuple()
> This gets an empty tuple 
> () 
> with STATUS_OK.
> SO the while(true) condition never gets an End of Processing (EOP) and so 
> does not exit. 
>  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3813) Rank column is assigned different uids everytime when schema is reset

2014-03-18 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13939928#comment-13939928
 ] 

Daniel Dai commented on PIG-3813:
-

+1

> Rank column is assigned different uids everytime when schema is reset
> -
>
> Key: PIG-3813
> URL: https://issues.apache.org/jira/browse/PIG-3813
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.12.0
>Reporter: Suhas Satish
>Assignee: Cheolsoo Park
>Priority: Critical
> Attachments: PIG-3813-1.patch, PIG-3813-2.patch, test_data.txt
>
>
> When the following script is run, pig goes into an infinite loop. This was 
> reproduced on pig trunk as of March 12, 2014 on apache hadoop 1.2. 
> test_data.txt has been attached. 
> test.pig
> tWeek = LOAD '/tmp/test_data.txt' USING PigStorage ('|') AS (WEEK:int, 
> DESCRIPTION:chararray, END_DATE:chararray, PERIOD:int);
> gTWeek = FOREACH tWeek GENERATE WEEK AS WEEK, PERIOD AS PERIOD;
> pWeek = FILTER gTWeek BY PERIOD == 201312;
> pWeekRanked = RANK pWeek BY WEEK ASC DENSE;
> gpWeekRanked = FOREACH pWeekRanked GENERATE $0;
> store gpWeekRanked into 'gpWeekRanked';
> describe gpWeekRanked;
> ---
> The res object of class Result, gets its value from leaf.getNextTuple()
> This gets an empty tuple 
> () 
> with STATUS_OK.
> SO the while(true) condition never gets an End of Processing (EOP) and so 
> does not exit. 
>  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (PIG-3815) Hadoop bug causes to pig to fail silently with jar cache

2014-03-18 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi resolved PIG-3815.
-

Resolution: Fixed

> Hadoop bug causes to pig to fail silently with jar cache
> 
>
> Key: PIG-3815
> URL: https://issues.apache.org/jira/browse/PIG-3815
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
> Fix For: 0.13.0
>
> Attachments: PIG-3815-1.patch, PIG-3815-2.patch, PIG-3815.patch
>
>
> Pig uses DistributedCache.addFileToClassPath api that puts jars on 
> distributed cache configuration. This uses : to separate list of files to be 
> put of classpath via distributed cache. If fs.default.name has port number in 
> it, it causes the tokenization logic to fail in hadoop for retrieving list of 
> cache filenames in backend.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3815) Hadoop bug causes to pig to fail silently with jar cache

2014-03-18 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13939825#comment-13939825
 ] 

Aniket Mokashi commented on PIG-3815:
-

I have committed PIG-3815-2.patch to trunk! Thanks everyone for your comments.

> Hadoop bug causes to pig to fail silently with jar cache
> 
>
> Key: PIG-3815
> URL: https://issues.apache.org/jira/browse/PIG-3815
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
> Fix For: 0.13.0
>
> Attachments: PIG-3815-1.patch, PIG-3815-2.patch, PIG-3815.patch
>
>
> Pig uses DistributedCache.addFileToClassPath api that puts jars on 
> distributed cache configuration. This uses : to separate list of files to be 
> put of classpath via distributed cache. If fs.default.name has port number in 
> it, it causes the tokenization logic to fail in hadoop for retrieving list of 
> cache filenames in backend.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3815) Hadoop bug causes to pig to fail silently with jar cache

2014-03-18 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13939815#comment-13939815
 ] 

Rohini Palaniswamy commented on PIG-3815:
-

[~julienledem],
It is being qualified only to be used in addCacheFile() which sets the 
mapred.cache.files which is required. conf.set("mapred.job.classpath.files") 
uses just the file path after removing scheme and port.

> Hadoop bug causes to pig to fail silently with jar cache
> 
>
> Key: PIG-3815
> URL: https://issues.apache.org/jira/browse/PIG-3815
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
> Fix For: 0.13.0
>
> Attachments: PIG-3815-1.patch, PIG-3815-2.patch, PIG-3815.patch
>
>
> Pig uses DistributedCache.addFileToClassPath api that puts jars on 
> distributed cache configuration. This uses : to separate list of files to be 
> put of classpath via distributed cache. If fs.default.name has port number in 
> it, it causes the tokenization logic to fail in hadoop for retrieving list of 
> cache filenames in backend.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3813) Rank column is assigned different uids everytime when schema is reset

2014-03-18 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3813:
---

Status: Patch Available  (was: Open)

> Rank column is assigned different uids everytime when schema is reset
> -
>
> Key: PIG-3813
> URL: https://issues.apache.org/jira/browse/PIG-3813
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.12.0
>Reporter: Suhas Satish
>Assignee: Cheolsoo Park
>Priority: Critical
> Attachments: PIG-3813-1.patch, PIG-3813-2.patch, test_data.txt
>
>
> When the following script is run, pig goes into an infinite loop. This was 
> reproduced on pig trunk as of March 12, 2014 on apache hadoop 1.2. 
> test_data.txt has been attached. 
> test.pig
> tWeek = LOAD '/tmp/test_data.txt' USING PigStorage ('|') AS (WEEK:int, 
> DESCRIPTION:chararray, END_DATE:chararray, PERIOD:int);
> gTWeek = FOREACH tWeek GENERATE WEEK AS WEEK, PERIOD AS PERIOD;
> pWeek = FILTER gTWeek BY PERIOD == 201312;
> pWeekRanked = RANK pWeek BY WEEK ASC DENSE;
> gpWeekRanked = FOREACH pWeekRanked GENERATE $0;
> store gpWeekRanked into 'gpWeekRanked';
> describe gpWeekRanked;
> ---
> The res object of class Result, gets its value from leaf.getNextTuple()
> This gets an empty tuple 
> () 
> with STATUS_OK.
> SO the while(true) condition never gets an End of Processing (EOP) and so 
> does not exit. 
>  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3813) Rank column is assigned different uids everytime when schema is reset

2014-03-18 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3813:
---

Summary: Rank column is assigned different uids everytime when schema is 
reset  (was:  PigGenericMapBase  runPipeline() method returns empty tuples and 
goes into infinite loop under certain conditions)

Updating the title to describe the root cause.

> Rank column is assigned different uids everytime when schema is reset
> -
>
> Key: PIG-3813
> URL: https://issues.apache.org/jira/browse/PIG-3813
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.12.0
>Reporter: Suhas Satish
>Assignee: Cheolsoo Park
>Priority: Critical
> Attachments: PIG-3813-1.patch, PIG-3813-2.patch, test_data.txt
>
>
> When the following script is run, pig goes into an infinite loop. This was 
> reproduced on pig trunk as of March 12, 2014 on apache hadoop 1.2. 
> test_data.txt has been attached. 
> test.pig
> tWeek = LOAD '/tmp/test_data.txt' USING PigStorage ('|') AS (WEEK:int, 
> DESCRIPTION:chararray, END_DATE:chararray, PERIOD:int);
> gTWeek = FOREACH tWeek GENERATE WEEK AS WEEK, PERIOD AS PERIOD;
> pWeek = FILTER gTWeek BY PERIOD == 201312;
> pWeekRanked = RANK pWeek BY WEEK ASC DENSE;
> gpWeekRanked = FOREACH pWeekRanked GENERATE $0;
> store gpWeekRanked into 'gpWeekRanked';
> describe gpWeekRanked;
> ---
> The res object of class Result, gets its value from leaf.getNextTuple()
> This gets an empty tuple 
> () 
> with STATUS_OK.
> SO the while(true) condition never gets an End of Processing (EOP) and so 
> does not exit. 
>  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3813) PigGenericMapBase runPipeline() method returns empty tuples and goes into infinite loop under certain conditions

2014-03-18 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3813:
---

Attachment: PIG-3813-2.patch

Here is the 2nd patch that keeps track of the uid of the rank column.

>  PigGenericMapBase  runPipeline() method returns empty tuples and goes into 
> infinite loop under certain conditions
> --
>
> Key: PIG-3813
> URL: https://issues.apache.org/jira/browse/PIG-3813
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.12.0
>Reporter: Suhas Satish
>Assignee: Cheolsoo Park
>Priority: Critical
> Attachments: PIG-3813-1.patch, PIG-3813-2.patch, test_data.txt
>
>
> When the following script is run, pig goes into an infinite loop. This was 
> reproduced on pig trunk as of March 12, 2014 on apache hadoop 1.2. 
> test_data.txt has been attached. 
> test.pig
> tWeek = LOAD '/tmp/test_data.txt' USING PigStorage ('|') AS (WEEK:int, 
> DESCRIPTION:chararray, END_DATE:chararray, PERIOD:int);
> gTWeek = FOREACH tWeek GENERATE WEEK AS WEEK, PERIOD AS PERIOD;
> pWeek = FILTER gTWeek BY PERIOD == 201312;
> pWeekRanked = RANK pWeek BY WEEK ASC DENSE;
> gpWeekRanked = FOREACH pWeekRanked GENERATE $0;
> store gpWeekRanked into 'gpWeekRanked';
> describe gpWeekRanked;
> ---
> The res object of class Result, gets its value from leaf.getNextTuple()
> This gets an empty tuple 
> () 
> with STATUS_OK.
> SO the while(true) condition never gets an End of Processing (EOP) and so 
> does not exit. 
>  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3815) Hadoop bug causes to pig to fail silently with jar cache

2014-03-18 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13939749#comment-13939749
 ] 

Julien Le Dem commented on PIG-3815:


[~rohini] in the code you quoted, don't you think it is putting the port back 
in the following line?
{noformat}
URI uri = fs.makeQualified(file).toUri();
{noformat}

> Hadoop bug causes to pig to fail silently with jar cache
> 
>
> Key: PIG-3815
> URL: https://issues.apache.org/jira/browse/PIG-3815
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
> Fix For: 0.13.0
>
> Attachments: PIG-3815-1.patch, PIG-3815-2.patch, PIG-3815.patch
>
>
> Pig uses DistributedCache.addFileToClassPath api that puts jars on 
> distributed cache configuration. This uses : to separate list of files to be 
> put of classpath via distributed cache. If fs.default.name has port number in 
> it, it causes the tokenization logic to fail in hadoop for retrieving list of 
> cache filenames in backend.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3815) Hadoop bug causes to pig to fail silently with jar cache

2014-03-18 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13939750#comment-13939750
 ] 

Rohini Palaniswamy commented on PIG-3815:
-

Yes. It has been fixed 3 years ago. I am not sure what version of hadoop you 
are using and hitting this issue. But since we still support 0.20 as well there 
is no harm in doing .toUri().getPath() in pig as well. 

+1. Since the issue is not with hadoop 1.0, please update your comment when 
checking in this patch from  "// PIG-3815 In hadoop 1.0, addFileToClassPath 
uses : as separator" to say hadoop 0.20. 

> Hadoop bug causes to pig to fail silently with jar cache
> 
>
> Key: PIG-3815
> URL: https://issues.apache.org/jira/browse/PIG-3815
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
> Fix For: 0.13.0
>
> Attachments: PIG-3815-1.patch, PIG-3815-2.patch, PIG-3815.patch
>
>
> Pig uses DistributedCache.addFileToClassPath api that puts jars on 
> distributed cache configuration. This uses : to separate list of files to be 
> put of classpath via distributed cache. If fs.default.name has port number in 
> it, it causes the tokenization logic to fail in hadoop for retrieving list of 
> cache filenames in backend.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3815) Hadoop bug causes to pig to fail silently with jar cache

2014-03-18 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13939728#comment-13939728
 ] 

Aniket Mokashi commented on PIG-3815:
-

Thanks for your comments, [~rohini]. I was not aware of limitations on the HDFS 
streams, I have attached a patch (PIG-3815-2.patch) to fix those problems.

Hadoop Jira: https://issues.apache.org/jira/browse/MAPREDUCE-2361. Looks like 
this was fixed here - 
http://svn.apache.org/viewvc?view=revision&revision=1077790. 



> Hadoop bug causes to pig to fail silently with jar cache
> 
>
> Key: PIG-3815
> URL: https://issues.apache.org/jira/browse/PIG-3815
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
> Fix For: 0.13.0
>
> Attachments: PIG-3815-1.patch, PIG-3815-2.patch, PIG-3815.patch
>
>
> Pig uses DistributedCache.addFileToClassPath api that puts jars on 
> distributed cache configuration. This uses : to separate list of files to be 
> put of classpath via distributed cache. If fs.default.name has port number in 
> it, it causes the tokenization logic to fail in hadoop for retrieving list of 
> cache filenames in backend.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3815) Hadoop bug causes to pig to fail silently with jar cache

2014-03-18 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-3815:


Attachment: PIG-3815-2.patch

> Hadoop bug causes to pig to fail silently with jar cache
> 
>
> Key: PIG-3815
> URL: https://issues.apache.org/jira/browse/PIG-3815
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
> Fix For: 0.13.0
>
> Attachments: PIG-3815-1.patch, PIG-3815-2.patch, PIG-3815.patch
>
>
> Pig uses DistributedCache.addFileToClassPath api that puts jars on 
> distributed cache configuration. This uses : to separate list of files to be 
> put of classpath via distributed cache. If fs.default.name has port number in 
> it, it causes the tokenization logic to fail in hadoop for retrieving list of 
> cache filenames in backend.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Reopened] (PIG-3815) Hadoop bug causes to pig to fail silently with jar cache

2014-03-18 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy reopened PIG-3815:
-


Actually I see some issue with this patch. Reopening jira.

   1) Changing os.close() to IOUtils.closeQuietly(os); is not good. You can 
close the input quietly, but not output especially HDFS outputstream. HDFS can 
create empty files  without data which can be accessed through NN fine if 
os.close() failed. We have been bitten by this a lot of time. In internal 
projects, we delete the file and retry if os.close() failed.  So please let the 
pig script fail if os.close() failed rather than causing unexpected behavior.

   2) addFileToClassPath is already doing  file.toUri().getPath(). I don't see 
where the hadoop bug is coming from. 

http://svn.apache.org/viewvc/hadoop/common/branches/branch-1.0/src/mapred/org/apache/hadoop/filecache/DistributedCache.java?revision=1206848&view=markup

{code}
public static void addFileToClassPath
   (Path file, Configuration conf, FileSystem fs)
throws IOException {
String filepath = file.toUri().getPath();
String classpath = conf.get("mapred.job.classpath.files");
conf.set("mapred.job.classpath.files", classpath == null
? filepath
: classpath + System.getProperty("path.separator") + filepath);
URI uri = fs.makeQualified(file).toUri();
addCacheFile(uri, conf);
  }
{code}

> Hadoop bug causes to pig to fail silently with jar cache
> 
>
> Key: PIG-3815
> URL: https://issues.apache.org/jira/browse/PIG-3815
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
> Fix For: 0.13.0
>
> Attachments: PIG-3815-1.patch, PIG-3815.patch
>
>
> Pig uses DistributedCache.addFileToClassPath api that puts jars on 
> distributed cache configuration. This uses : to separate list of files to be 
> put of classpath via distributed cache. If fs.default.name has port number in 
> it, it causes the tokenization logic to fail in hadoop for retrieving list of 
> cache filenames in backend.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3813) PigGenericMapBase runPipeline() method returns empty tuples and goes into infinite loop under certain conditions

2014-03-18 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13939661#comment-13939661
 ] 

Cheolsoo Park commented on PIG-3813:


[~daijy], thank you for the comment. Let me try your suggestion.

>  PigGenericMapBase  runPipeline() method returns empty tuples and goes into 
> infinite loop under certain conditions
> --
>
> Key: PIG-3813
> URL: https://issues.apache.org/jira/browse/PIG-3813
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.12.0
>Reporter: Suhas Satish
>Assignee: Cheolsoo Park
>Priority: Critical
> Attachments: PIG-3813-1.patch, test_data.txt
>
>
> When the following script is run, pig goes into an infinite loop. This was 
> reproduced on pig trunk as of March 12, 2014 on apache hadoop 1.2. 
> test_data.txt has been attached. 
> test.pig
> tWeek = LOAD '/tmp/test_data.txt' USING PigStorage ('|') AS (WEEK:int, 
> DESCRIPTION:chararray, END_DATE:chararray, PERIOD:int);
> gTWeek = FOREACH tWeek GENERATE WEEK AS WEEK, PERIOD AS PERIOD;
> pWeek = FILTER gTWeek BY PERIOD == 201312;
> pWeekRanked = RANK pWeek BY WEEK ASC DENSE;
> gpWeekRanked = FOREACH pWeekRanked GENERATE $0;
> store gpWeekRanked into 'gpWeekRanked';
> describe gpWeekRanked;
> ---
> The res object of class Result, gets its value from leaf.getNextTuple()
> This gets an empty tuple 
> () 
> with STATUS_OK.
> SO the while(true) condition never gets an End of Processing (EOP) and so 
> does not exit. 
>  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3813) PigGenericMapBase runPipeline() method returns empty tuples and goes into infinite loop under certain conditions

2014-03-18 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13939622#comment-13939622
 ] 

Daniel Dai commented on PIG-3813:
-

Seems LORank is generating the wrong uid. One thing missing in LORank is to 
keep generated uid across session, so a schema reset will not erase the old 
uid. We can refer to LOCogroup.groupKeyUidOnlySchema to fix LORank.

>  PigGenericMapBase  runPipeline() method returns empty tuples and goes into 
> infinite loop under certain conditions
> --
>
> Key: PIG-3813
> URL: https://issues.apache.org/jira/browse/PIG-3813
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.12.0
>Reporter: Suhas Satish
>Assignee: Cheolsoo Park
>Priority: Critical
> Attachments: PIG-3813-1.patch, test_data.txt
>
>
> When the following script is run, pig goes into an infinite loop. This was 
> reproduced on pig trunk as of March 12, 2014 on apache hadoop 1.2. 
> test_data.txt has been attached. 
> test.pig
> tWeek = LOAD '/tmp/test_data.txt' USING PigStorage ('|') AS (WEEK:int, 
> DESCRIPTION:chararray, END_DATE:chararray, PERIOD:int);
> gTWeek = FOREACH tWeek GENERATE WEEK AS WEEK, PERIOD AS PERIOD;
> pWeek = FILTER gTWeek BY PERIOD == 201312;
> pWeekRanked = RANK pWeek BY WEEK ASC DENSE;
> gpWeekRanked = FOREACH pWeekRanked GENERATE $0;
> store gpWeekRanked into 'gpWeekRanked';
> describe gpWeekRanked;
> ---
> The res object of class Result, gets its value from leaf.getNextTuple()
> This gets an empty tuple 
> () 
> with STATUS_OK.
> SO the while(true) condition never gets an End of Processing (EOP) and so 
> does not exit. 
>  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3815) Hadoop bug causes to pig to fail silently with jar cache

2014-03-18 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13939618#comment-13939618
 ] 

Aniket Mokashi commented on PIG-3815:
-

Thanks for the review, [~julienledem] and [~cheolsoo]. I have attached revised 
patch and committed it to trunk!

> Hadoop bug causes to pig to fail silently with jar cache
> 
>
> Key: PIG-3815
> URL: https://issues.apache.org/jira/browse/PIG-3815
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
> Fix For: 0.13.0
>
> Attachments: PIG-3815-1.patch, PIG-3815.patch
>
>
> Pig uses DistributedCache.addFileToClassPath api that puts jars on 
> distributed cache configuration. This uses : to separate list of files to be 
> put of classpath via distributed cache. If fs.default.name has port number in 
> it, it causes the tokenization logic to fail in hadoop for retrieving list of 
> cache filenames in backend.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (PIG-3815) Hadoop bug causes to pig to fail silently with jar cache

2014-03-18 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi resolved PIG-3815.
-

Resolution: Fixed

> Hadoop bug causes to pig to fail silently with jar cache
> 
>
> Key: PIG-3815
> URL: https://issues.apache.org/jira/browse/PIG-3815
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
> Fix For: 0.13.0
>
> Attachments: PIG-3815-1.patch, PIG-3815.patch
>
>
> Pig uses DistributedCache.addFileToClassPath api that puts jars on 
> distributed cache configuration. This uses : to separate list of files to be 
> put of classpath via distributed cache. If fs.default.name has port number in 
> it, it causes the tokenization logic to fail in hadoop for retrieving list of 
> cache filenames in backend.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3815) Hadoop bug causes to pig to fail silently with jar cache

2014-03-18 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13939606#comment-13939606
 ] 

Julien Le Dem commented on PIG-3815:


same comment as 1. from Cheolsoo
otherwise, this looks good to me.

> Hadoop bug causes to pig to fail silently with jar cache
> 
>
> Key: PIG-3815
> URL: https://issues.apache.org/jira/browse/PIG-3815
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
> Fix For: 0.13.0
>
> Attachments: PIG-3815-1.patch, PIG-3815.patch
>
>
> Pig uses DistributedCache.addFileToClassPath api that puts jars on 
> distributed cache configuration. This uses : to separate list of files to be 
> put of classpath via distributed cache. If fs.default.name has port number in 
> it, it causes the tokenization logic to fail in hadoop for retrieving list of 
> cache filenames in backend.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3815) Hadoop bug causes to pig to fail silently with jar cache

2014-03-18 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-3815:


Attachment: PIG-3815-1.patch

> Hadoop bug causes to pig to fail silently with jar cache
> 
>
> Key: PIG-3815
> URL: https://issues.apache.org/jira/browse/PIG-3815
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
> Fix For: 0.13.0
>
> Attachments: PIG-3815-1.patch, PIG-3815.patch
>
>
> Pig uses DistributedCache.addFileToClassPath api that puts jars on 
> distributed cache configuration. This uses : to separate list of files to be 
> put of classpath via distributed cache. If fs.default.name has port number in 
> it, it causes the tokenization logic to fail in hadoop for retrieving list of 
> cache filenames in backend.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3815) Hadoop bug causes to pig to fail silently with jar cache

2014-03-18 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-3815:


Attachment: (was: PIG-3815-1.patch)

> Hadoop bug causes to pig to fail silently with jar cache
> 
>
> Key: PIG-3815
> URL: https://issues.apache.org/jira/browse/PIG-3815
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
> Fix For: 0.13.0
>
> Attachments: PIG-3815-1.patch, PIG-3815.patch
>
>
> Pig uses DistributedCache.addFileToClassPath api that puts jars on 
> distributed cache configuration. This uses : to separate list of files to be 
> put of classpath via distributed cache. If fs.default.name has port number in 
> it, it causes the tokenization logic to fail in hadoop for retrieving list of 
> cache filenames in backend.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3815) Hadoop bug causes to pig to fail silently with jar cache

2014-03-18 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-3815:


Attachment: (was: PIG-3815-1.patch)

> Hadoop bug causes to pig to fail silently with jar cache
> 
>
> Key: PIG-3815
> URL: https://issues.apache.org/jira/browse/PIG-3815
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
> Fix For: 0.13.0
>
> Attachments: PIG-3815-1.patch, PIG-3815.patch
>
>
> Pig uses DistributedCache.addFileToClassPath api that puts jars on 
> distributed cache configuration. This uses : to separate list of files to be 
> put of classpath via distributed cache. If fs.default.name has port number in 
> it, it causes the tokenization logic to fail in hadoop for retrieving list of 
> cache filenames in backend.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3815) Hadoop bug causes to pig to fail silently with jar cache

2014-03-18 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-3815:


Attachment: PIG-3815-1.patch

> Hadoop bug causes to pig to fail silently with jar cache
> 
>
> Key: PIG-3815
> URL: https://issues.apache.org/jira/browse/PIG-3815
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
> Fix For: 0.13.0
>
> Attachments: PIG-3815-1.patch, PIG-3815.patch
>
>
> Pig uses DistributedCache.addFileToClassPath api that puts jars on 
> distributed cache configuration. This uses : to separate list of files to be 
> put of classpath via distributed cache. If fs.default.name has port number in 
> it, it causes the tokenization logic to fail in hadoop for retrieving list of 
> cache filenames in backend.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3815) Hadoop bug causes to pig to fail silently with jar cache

2014-03-18 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-3815:


Attachment: PIG-3815-1.patch

> Hadoop bug causes to pig to fail silently with jar cache
> 
>
> Key: PIG-3815
> URL: https://issues.apache.org/jira/browse/PIG-3815
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
> Fix For: 0.13.0
>
> Attachments: PIG-3815-1.patch, PIG-3815.patch
>
>
> Pig uses DistributedCache.addFileToClassPath api that puts jars on 
> distributed cache configuration. This uses : to separate list of files to be 
> put of classpath via distributed cache. If fs.default.name has port number in 
> it, it causes the tokenization logic to fail in hadoop for retrieving list of 
> cache filenames in backend.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3816) Incorrect Javadoc for launchPlan() method

2014-03-18 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3816:
---

Fix Version/s: 0.13.0

> Incorrect Javadoc for launchPlan() method
> -
>
> Key: PIG-3816
> URL: https://issues.apache.org/jira/browse/PIG-3816
> Project: Pig
>  Issue Type: Bug
>  Components: documentation
>Reporter: Kyungho Jeon
>Assignee: Kyungho Jeon
>Priority: Trivial
> Fix For: 0.13.0
>
> Attachments: PIG-3816.patch
>
>
> Javadoc of {{protected PigStats launchPlan(LogicalPlan lp, String jobName)}} 
> incorrectly describes that the method takes a physical plan as an argument.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3815) Hadoop bug causes to pig to fail silently with jar cache

2014-03-18 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13939354#comment-13939354
 ] 

Cheolsoo Park commented on PIG-3815:


# Can you delete this? It's unused.
{code}
+import org.codehaus.plexus.util.IOUtil;
{code}
# Do you mind fixing JobControlCompiler.java#L1700 too? Looks like we can use 
IOUtils.closeQuietly() here too.
{code}
OutputStream os = fs.create(dst);
try {
IOUtils.copyBytes(url.openStream(), os, 4096, true);
} finally {
// IOUtils can not close both the input and the output properly in 
a finally
// as we can get an exception in between opening the stream and 
calling the method
os.close();
}
{code}

> Hadoop bug causes to pig to fail silently with jar cache
> 
>
> Key: PIG-3815
> URL: https://issues.apache.org/jira/browse/PIG-3815
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
> Fix For: 0.13.0
>
> Attachments: PIG-3815.patch
>
>
> Pig uses DistributedCache.addFileToClassPath api that puts jars on 
> distributed cache configuration. This uses : to separate list of files to be 
> put of classpath via distributed cache. If fs.default.name has port number in 
> it, it causes the tokenization logic to fail in hadoop for retrieving list of 
> cache filenames in backend.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (PIG-3816) Incorrect Javadoc for launchPlan() method

2014-03-18 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park resolved PIG-3816.


Resolution: Fixed
  Assignee: Kyungho Jeon

Added Kyungho to the contributors. I also added Prashant to the admins.

> Incorrect Javadoc for launchPlan() method
> -
>
> Key: PIG-3816
> URL: https://issues.apache.org/jira/browse/PIG-3816
> Project: Pig
>  Issue Type: Bug
>  Components: documentation
>Reporter: Kyungho Jeon
>Assignee: Kyungho Jeon
>Priority: Trivial
> Attachments: PIG-3816.patch
>
>
> Javadoc of {{protected PigStats launchPlan(LogicalPlan lp, String jobName)}} 
> incorrectly describes that the method takes a physical plan as an argument.



--
This message was sent by Atlassian JIRA
(v6.2#6252)