[jira] [Commented] (PIG-3815) Hadoop bug causes to pig to fail silently with jar cache
[ https://issues.apache.org/jira/browse/PIG-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13940196#comment-13940196 ] Aniket Mokashi commented on PIG-3815: - I just realized that there is a better way to refactor this code. Can someone review the patch attached? > Hadoop bug causes to pig to fail silently with jar cache > > > Key: PIG-3815 > URL: https://issues.apache.org/jira/browse/PIG-3815 > Project: Pig > Issue Type: Bug >Affects Versions: 0.13.0 >Reporter: Aniket Mokashi >Assignee: Aniket Mokashi > Fix For: 0.13.0 > > Attachments: PIG-3815-1.patch, PIG-3815-2.patch, PIG-3815-3.patch, > PIG-3815.patch > > > Pig uses DistributedCache.addFileToClassPath api that puts jars on > distributed cache configuration. This uses : to separate list of files to be > put of classpath via distributed cache. If fs.default.name has port number in > it, it causes the tokenization logic to fail in hadoop for retrieving list of > cache filenames in backend. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PIG-3815) Hadoop bug causes to pig to fail silently with jar cache
[ https://issues.apache.org/jira/browse/PIG-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-3815: Status: Patch Available (was: Reopened) > Hadoop bug causes to pig to fail silently with jar cache > > > Key: PIG-3815 > URL: https://issues.apache.org/jira/browse/PIG-3815 > Project: Pig > Issue Type: Bug >Affects Versions: 0.13.0 >Reporter: Aniket Mokashi >Assignee: Aniket Mokashi > Fix For: 0.13.0 > > Attachments: PIG-3815-1.patch, PIG-3815-2.patch, PIG-3815-3.patch, > PIG-3815.patch > > > Pig uses DistributedCache.addFileToClassPath api that puts jars on > distributed cache configuration. This uses : to separate list of files to be > put of classpath via distributed cache. If fs.default.name has port number in > it, it causes the tokenization logic to fail in hadoop for retrieving list of > cache filenames in backend. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Reopened] (PIG-3815) Hadoop bug causes to pig to fail silently with jar cache
[ https://issues.apache.org/jira/browse/PIG-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi reopened PIG-3815: - > Hadoop bug causes to pig to fail silently with jar cache > > > Key: PIG-3815 > URL: https://issues.apache.org/jira/browse/PIG-3815 > Project: Pig > Issue Type: Bug >Affects Versions: 0.13.0 >Reporter: Aniket Mokashi >Assignee: Aniket Mokashi > Fix For: 0.13.0 > > Attachments: PIG-3815-1.patch, PIG-3815-2.patch, PIG-3815-3.patch, > PIG-3815.patch > > > Pig uses DistributedCache.addFileToClassPath api that puts jars on > distributed cache configuration. This uses : to separate list of files to be > put of classpath via distributed cache. If fs.default.name has port number in > it, it causes the tokenization logic to fail in hadoop for retrieving list of > cache filenames in backend. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PIG-3815) Hadoop bug causes to pig to fail silently with jar cache
[ https://issues.apache.org/jira/browse/PIG-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-3815: Attachment: PIG-3815-3.patch > Hadoop bug causes to pig to fail silently with jar cache > > > Key: PIG-3815 > URL: https://issues.apache.org/jira/browse/PIG-3815 > Project: Pig > Issue Type: Bug >Affects Versions: 0.13.0 >Reporter: Aniket Mokashi >Assignee: Aniket Mokashi > Fix For: 0.13.0 > > Attachments: PIG-3815-1.patch, PIG-3815-2.patch, PIG-3815-3.patch, > PIG-3815.patch > > > Pig uses DistributedCache.addFileToClassPath api that puts jars on > distributed cache configuration. This uses : to separate list of files to be > put of classpath via distributed cache. If fs.default.name has port number in > it, it causes the tokenization logic to fail in hadoop for retrieving list of > cache filenames in backend. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] Subscription: PIG patch available
Issue Subscription Filter: PIG patch available (14 issues) Subscriber: pigdaily Key Summary PIG-3801Auto local mode does not call storeSchema https://issues.apache.org/jira/browse/PIG-3801 PIG-3794pig -useHCatalog fails using pig command line interface on HDInsight https://issues.apache.org/jira/browse/PIG-3794 PIG-3789tuple in POStream binaryInputQueue keep changing https://issues.apache.org/jira/browse/PIG-3789 PIG-3771Piggybank Avrostorage makes a lot of namenode calls in the backend https://issues.apache.org/jira/browse/PIG-3771 PIG-3757Make scalar work https://issues.apache.org/jira/browse/PIG-3757 PIG-3737Bundle dependent jars in distribution in %PIG_HOME%/lib folder https://issues.apache.org/jira/browse/PIG-3737 PIG-3735UDF to data cleanse the dirty data with expected pattern https://issues.apache.org/jira/browse/PIG-3735 PIG-3668COR built-in function when atleast one of the coefficient values is NaN https://issues.apache.org/jira/browse/PIG-3668 PIG-3635Fix e2e tests for Hadoop 2.X on Windows https://issues.apache.org/jira/browse/PIG-3635 PIG-3613UDF for SimilarityMatching between strings with matching scores https://issues.apache.org/jira/browse/PIG-3613 PIG-3587add functionality for rolling over dates https://issues.apache.org/jira/browse/PIG-3587 PIG-3456Reduce threadlocal conf access in backend for each record https://issues.apache.org/jira/browse/PIG-3456 PIG-3441Allow Pig to use default resources from Configuration objects https://issues.apache.org/jira/browse/PIG-3441 PIG-3373XMLLoader returns non-matching nodes when a tag name spans through the block boundary https://issues.apache.org/jira/browse/PIG-3373 You may edit this subscription at: https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=13225&filterId=12322384
[jira] [Updated] (PIG-3789) tuple in POStream binaryInputQueue keep changing
[ https://issues.apache.org/jira/browse/PIG-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-3789: Attachment: PIG-3789-2.patch The problem is in POValueInputTez. When we read tuple from edge, tuples are produced by BinSedesTuple.readFields. It reuses the tuple and mFields will be cleared and rebuild for every tuple. When running streaming operation asynchronously, tuple saved to binaryInputQueue keeps changing. Checked all other TezLoad, seems fine. POShuffleTezLoad already made a copy (Packager.getValueTuple), POSimpleTezLoad relies on loader to create new tuple. Other TezLoad will not send input tuple to binaryInputQueue. Attach patch. > tuple in POStream binaryInputQueue keep changing > > > Key: PIG-3789 > URL: https://issues.apache.org/jira/browse/PIG-3789 > Project: Pig > Issue Type: Sub-task > Components: tez >Affects Versions: tez-branch >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: tez-branch > > Attachments: PIG-3789-1.patch, PIG-3789-2.patch > > > Similar to the comments in POSimpleTezLoad: > {code} > /** > * Previously, we reused the same Result object for all results, but we > found > * certain operators (e.g. POStream) save references to the Result object > and > * expect it to be constant. > */ > {code} > Tuples put into binaryInputQueue get changed when it is actually processed. > Not exactly sure why, but make a copy of the tuple solves the issue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PIG-3789) tuple in POStream binaryInputQueue keep changing
[ https://issues.apache.org/jira/browse/PIG-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-3789: Status: Patch Available (was: Open) > tuple in POStream binaryInputQueue keep changing > > > Key: PIG-3789 > URL: https://issues.apache.org/jira/browse/PIG-3789 > Project: Pig > Issue Type: Sub-task > Components: tez >Affects Versions: tez-branch >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: tez-branch > > Attachments: PIG-3789-1.patch, PIG-3789-2.patch > > > Similar to the comments in POSimpleTezLoad: > {code} > /** > * Previously, we reused the same Result object for all results, but we > found > * certain operators (e.g. POStream) save references to the Result object > and > * expect it to be constant. > */ > {code} > Tuples put into binaryInputQueue get changed when it is actually processed. > Not exactly sure why, but make a copy of the tuple solves the issue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (PIG-3789) tuple in POStream binaryInputQueue keep changing
[ https://issues.apache.org/jira/browse/PIG-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916945#comment-13916945 ] Daniel Dai edited comment on PIG-3789 at 3/19/14 12:10 AM: --- The problem go away with the patch. Better to dive deep to figure out why it happens. This affects MultiQuery_11. was (Author: daijy): The problem go away with the patch. Better to dive deep to figure out why it happens. This affects ComputeSpec_3. > tuple in POStream binaryInputQueue keep changing > > > Key: PIG-3789 > URL: https://issues.apache.org/jira/browse/PIG-3789 > Project: Pig > Issue Type: Sub-task > Components: tez >Affects Versions: tez-branch >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: tez-branch > > Attachments: PIG-3789-1.patch, PIG-3789-2.patch > > > Similar to the comments in POSimpleTezLoad: > {code} > /** > * Previously, we reused the same Result object for all results, but we > found > * certain operators (e.g. POStream) save references to the Result object > and > * expect it to be constant. > */ > {code} > Tuples put into binaryInputQueue get changed when it is actually processed. > Not exactly sure why, but make a copy of the tuple solves the issue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PIG-2970) Nested foreach getting incorrect schema when having unrelated inner query
[ https://issues.apache.org/jira/browse/PIG-2970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13939975#comment-13939975 ] Daniel Dai commented on PIG-2970: - When working on PIG-3807, realize no matter how we push up DanglingNestedNodeRemover, we will still invoke getSchema in LogicalPlanBuilder. [~horaguchi] is right, we shall iterate inner plan sinks to find LOGenerate. This is applicable to LOForeach.getSchema(). After DanglingNestedNodeRemover, it is safe to assume LOGenerate is the only sink, so we don't need to sprawl the change to optimizer. I will bring some of Koji's change in PIG-3807. > Nested foreach getting incorrect schema when having unrelated inner query > - > > Key: PIG-2970 > URL: https://issues.apache.org/jira/browse/PIG-2970 > Project: Pig > Issue Type: Bug > Components: parser >Affects Versions: 0.10.0 >Reporter: Koji Noguchi >Assignee: Daniel Dai >Priority: Minor > Fix For: 0.12.0 > > Attachments: PIG-2970-0.patch, PIG-2970-1.patch, PIG-2970-2.patch, > pig-2970-trunk-v01.txt, pig-2970-trunk-v02.txt > > > While looking at PIG-2968, hit a weird error message. > {noformat} > $ cat -n test/foreach2.pig > 1 daily = load 'nyse' as (exchange, symbol); > 2 grpd = group daily by exchange; > 3 unique = foreach grpd { > 4 sym = daily.symbol; > 5 uniq_sym = distinct sym; > 6 --ignoring uniq_sym result > 7 generate group, daily; > 8 }; > 9 describe unique; > 10 zzz = foreach unique generate group; > 11 explain zzz; > % pig -x local -t ColumnMapKeyPrune test/foreach2.pig > ... > unique: {symbol: bytearray} > 2012-10-12 16:55:44,226 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 1025: > Invalid field projection. > Projected field [group] does not exist in schema: symbol:bytearray. > ... > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PIG-3813) Rank column is assigned different uids everytime when schema is reset
[ https://issues.apache.org/jira/browse/PIG-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13939968#comment-13939968 ] Suhas Satish commented on PIG-3813: --- Thanks for the quick turn around Cheolsoo > Rank column is assigned different uids everytime when schema is reset > - > > Key: PIG-3813 > URL: https://issues.apache.org/jira/browse/PIG-3813 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.12.0 >Reporter: Suhas Satish >Assignee: Cheolsoo Park >Priority: Critical > Fix For: 0.13.0 > > Attachments: PIG-3813-1.patch, PIG-3813-2.patch, test_data.txt > > > When the following script is run, pig goes into an infinite loop. This was > reproduced on pig trunk as of March 12, 2014 on apache hadoop 1.2. > test_data.txt has been attached. > test.pig > tWeek = LOAD '/tmp/test_data.txt' USING PigStorage ('|') AS (WEEK:int, > DESCRIPTION:chararray, END_DATE:chararray, PERIOD:int); > gTWeek = FOREACH tWeek GENERATE WEEK AS WEEK, PERIOD AS PERIOD; > pWeek = FILTER gTWeek BY PERIOD == 201312; > pWeekRanked = RANK pWeek BY WEEK ASC DENSE; > gpWeekRanked = FOREACH pWeekRanked GENERATE $0; > store gpWeekRanked into 'gpWeekRanked'; > describe gpWeekRanked; > --- > The res object of class Result, gets its value from leaf.getNextTuple() > This gets an empty tuple > () > with STATUS_OK. > SO the while(true) condition never gets an End of Processing (EOP) and so > does not exit. > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (PIG-2784) Framework for dynamic query optimization
[ https://issues.apache.org/jira/browse/PIG-2784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13939962#comment-13939962 ] Aniket Mokashi edited comment on PIG-2784 at 3/18/14 11:34 PM: --- [~rajitha], [~zhiweicai], thanks for your interest in this project. To proceed, you need to submit your proposal with the details on approach, plan etc. If you would like to clarify something, please use this jira as a place for discussions. bq. Do we know about the size of data to process before Pig compile the job? Yes. This lets pig do reducer estimation. bq. What's the difference between implementing this framework inside JobControlCompiler and inside MRCompiler? Which one do you think is better? MRCompiler deals with compiling physical plan into mapreduce operators and JobControlCompiler takes these compiled jobs and submits them to run on hadoop via hadoop's jobcontrol api. It's also responsible for maintaining progress report, stats etc. As part of this jira, you need to find out how we can take any (or all) of these optimizations and find the best place to plug them in. I will look forward to see your thoughts on how it should work. bq. Do I need to consider more kind of optimization other than optimizations mentioned in the description? Is it possible that we categorize the optimizations into several types and make it easier to extend in the future? It would be nice if we can allow additions of new optimizations in future. was (Author: aniket486): [~rajitha], [~zhiweicai], thanks for your interest in this project. To proceed, you need to submit your proposal with the details on approach, plan etc. If you would like to clarify something, please use this jira as a place for discussions. bq. Do we know about the size of data to process before Pig compile the job? Yes. This lets pig do reducer estimation. bq. What's the difference between implementing this framework inside JobControlCompiler and inside MRCompiler? Which one do you think is better? - MRCompiler deals with compiling physical plan into mapreduce operators and JobControlCompiler takes these compiled jobs and submits them to run on hadoop via hadoop's jobcontrol api. It's also responsible for maintaining progress report, stats etc. As part of this jira, you need to find out how we can take any (or all) of these optimizations and find the best place to plug them in. I will look forward to see your thoughts on how it should work. bq. Do I need to consider more kind of optimization other than optimizations mentioned in the description? Is it possible that we categorize the optimizations into several types and make it easier to extend in the future? It would be nice if we can allow additions of new optimizations in future. > Framework for dynamic query optimization > > > Key: PIG-2784 > URL: https://issues.apache.org/jira/browse/PIG-2784 > Project: Pig > Issue Type: New Feature >Reporter: Jie Li > Labels: GSOC2014 > > We need a framework to implement dynamic query optimization, i.e. changing > the query plan at runtime. Currently we support estimating the number of > reducers dynamically, which works well as the first step but was not > perfectly implemented. In near future, we'll support more dynamic > optimization, like [removing sample job for > order-by|https://issues.apache.org/jira/browse/PIG-483], [removing limit > job|https://issues.apache.org/jira/browse/PIG-2675], dynamically detecting > skew and using skew-join, etc. > Currently estimating #reducer is implemented in JobControlCompiler after > MRCompiler compiles all the MapReduceOperators and generate the complete > MRPlan. One place (discussed with Thejas) to implement the framework is at > the MRCompiler, where the MRPlan'll be generated at batches and adjusted > dynamically. > Any comment? > This is a candidate project for Google summer of code 2014. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2014 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PIG-2784) Framework for dynamic query optimization
[ https://issues.apache.org/jira/browse/PIG-2784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13939962#comment-13939962 ] Aniket Mokashi commented on PIG-2784: - [~rajitha], [~zhiweicai], thanks for your interest in this project. To proceed, you need to submit your proposal with the details on approach, plan etc. If you would like to clarify something, please use this jira as a place for discussions. bq. Do we know about the size of data to process before Pig compile the job? Yes. This lets pig do reducer estimation. bq. What's the difference between implementing this framework inside JobControlCompiler and inside MRCompiler? Which one do you think is better? - MRCompiler deals with compiling physical plan into mapreduce operators and JobControlCompiler takes these compiled jobs and submits them to run on hadoop via hadoop's jobcontrol api. It's also responsible for maintaining progress report, stats etc. As part of this jira, you need to find out how we can take any (or all) of these optimizations and find the best place to plug them in. I will look forward to see your thoughts on how it should work. bq. Do I need to consider more kind of optimization other than optimizations mentioned in the description? Is it possible that we categorize the optimizations into several types and make it easier to extend in the future? It would be nice if we can allow additions of new optimizations in future. > Framework for dynamic query optimization > > > Key: PIG-2784 > URL: https://issues.apache.org/jira/browse/PIG-2784 > Project: Pig > Issue Type: New Feature >Reporter: Jie Li > Labels: GSOC2014 > > We need a framework to implement dynamic query optimization, i.e. changing > the query plan at runtime. Currently we support estimating the number of > reducers dynamically, which works well as the first step but was not > perfectly implemented. In near future, we'll support more dynamic > optimization, like [removing sample job for > order-by|https://issues.apache.org/jira/browse/PIG-483], [removing limit > job|https://issues.apache.org/jira/browse/PIG-2675], dynamically detecting > skew and using skew-join, etc. > Currently estimating #reducer is implemented in JobControlCompiler after > MRCompiler compiles all the MapReduceOperators and generate the complete > MRPlan. One place (discussed with Thejas) to implement the framework is at > the MRCompiler, where the MRPlan'll be generated at batches and adjusted > dynamically. > Any comment? > This is a candidate project for Google summer of code 2014. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2014 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PIG-3813) Rank column is assigned different uids everytime when schema is reset
[ https://issues.apache.org/jira/browse/PIG-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park updated PIG-3813: --- Resolution: Fixed Fix Version/s: 0.13.0 Status: Resolved (was: Patch Available) Committed to trunk. Thank you Daniel for reviewing the patch! Thank you Suhas for reporting the issue! > Rank column is assigned different uids everytime when schema is reset > - > > Key: PIG-3813 > URL: https://issues.apache.org/jira/browse/PIG-3813 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.12.0 >Reporter: Suhas Satish >Assignee: Cheolsoo Park >Priority: Critical > Fix For: 0.13.0 > > Attachments: PIG-3813-1.patch, PIG-3813-2.patch, test_data.txt > > > When the following script is run, pig goes into an infinite loop. This was > reproduced on pig trunk as of March 12, 2014 on apache hadoop 1.2. > test_data.txt has been attached. > test.pig > tWeek = LOAD '/tmp/test_data.txt' USING PigStorage ('|') AS (WEEK:int, > DESCRIPTION:chararray, END_DATE:chararray, PERIOD:int); > gTWeek = FOREACH tWeek GENERATE WEEK AS WEEK, PERIOD AS PERIOD; > pWeek = FILTER gTWeek BY PERIOD == 201312; > pWeekRanked = RANK pWeek BY WEEK ASC DENSE; > gpWeekRanked = FOREACH pWeekRanked GENERATE $0; > store gpWeekRanked into 'gpWeekRanked'; > describe gpWeekRanked; > --- > The res object of class Result, gets its value from leaf.getNextTuple() > This gets an empty tuple > () > with STATUS_OK. > SO the while(true) condition never gets an End of Processing (EOP) and so > does not exit. > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PIG-3813) Rank column is assigned different uids everytime when schema is reset
[ https://issues.apache.org/jira/browse/PIG-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13939928#comment-13939928 ] Daniel Dai commented on PIG-3813: - +1 > Rank column is assigned different uids everytime when schema is reset > - > > Key: PIG-3813 > URL: https://issues.apache.org/jira/browse/PIG-3813 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.12.0 >Reporter: Suhas Satish >Assignee: Cheolsoo Park >Priority: Critical > Attachments: PIG-3813-1.patch, PIG-3813-2.patch, test_data.txt > > > When the following script is run, pig goes into an infinite loop. This was > reproduced on pig trunk as of March 12, 2014 on apache hadoop 1.2. > test_data.txt has been attached. > test.pig > tWeek = LOAD '/tmp/test_data.txt' USING PigStorage ('|') AS (WEEK:int, > DESCRIPTION:chararray, END_DATE:chararray, PERIOD:int); > gTWeek = FOREACH tWeek GENERATE WEEK AS WEEK, PERIOD AS PERIOD; > pWeek = FILTER gTWeek BY PERIOD == 201312; > pWeekRanked = RANK pWeek BY WEEK ASC DENSE; > gpWeekRanked = FOREACH pWeekRanked GENERATE $0; > store gpWeekRanked into 'gpWeekRanked'; > describe gpWeekRanked; > --- > The res object of class Result, gets its value from leaf.getNextTuple() > This gets an empty tuple > () > with STATUS_OK. > SO the while(true) condition never gets an End of Processing (EOP) and so > does not exit. > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (PIG-3815) Hadoop bug causes to pig to fail silently with jar cache
[ https://issues.apache.org/jira/browse/PIG-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi resolved PIG-3815. - Resolution: Fixed > Hadoop bug causes to pig to fail silently with jar cache > > > Key: PIG-3815 > URL: https://issues.apache.org/jira/browse/PIG-3815 > Project: Pig > Issue Type: Bug >Affects Versions: 0.13.0 >Reporter: Aniket Mokashi >Assignee: Aniket Mokashi > Fix For: 0.13.0 > > Attachments: PIG-3815-1.patch, PIG-3815-2.patch, PIG-3815.patch > > > Pig uses DistributedCache.addFileToClassPath api that puts jars on > distributed cache configuration. This uses : to separate list of files to be > put of classpath via distributed cache. If fs.default.name has port number in > it, it causes the tokenization logic to fail in hadoop for retrieving list of > cache filenames in backend. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PIG-3815) Hadoop bug causes to pig to fail silently with jar cache
[ https://issues.apache.org/jira/browse/PIG-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13939825#comment-13939825 ] Aniket Mokashi commented on PIG-3815: - I have committed PIG-3815-2.patch to trunk! Thanks everyone for your comments. > Hadoop bug causes to pig to fail silently with jar cache > > > Key: PIG-3815 > URL: https://issues.apache.org/jira/browse/PIG-3815 > Project: Pig > Issue Type: Bug >Affects Versions: 0.13.0 >Reporter: Aniket Mokashi >Assignee: Aniket Mokashi > Fix For: 0.13.0 > > Attachments: PIG-3815-1.patch, PIG-3815-2.patch, PIG-3815.patch > > > Pig uses DistributedCache.addFileToClassPath api that puts jars on > distributed cache configuration. This uses : to separate list of files to be > put of classpath via distributed cache. If fs.default.name has port number in > it, it causes the tokenization logic to fail in hadoop for retrieving list of > cache filenames in backend. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PIG-3815) Hadoop bug causes to pig to fail silently with jar cache
[ https://issues.apache.org/jira/browse/PIG-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13939815#comment-13939815 ] Rohini Palaniswamy commented on PIG-3815: - [~julienledem], It is being qualified only to be used in addCacheFile() which sets the mapred.cache.files which is required. conf.set("mapred.job.classpath.files") uses just the file path after removing scheme and port. > Hadoop bug causes to pig to fail silently with jar cache > > > Key: PIG-3815 > URL: https://issues.apache.org/jira/browse/PIG-3815 > Project: Pig > Issue Type: Bug >Affects Versions: 0.13.0 >Reporter: Aniket Mokashi >Assignee: Aniket Mokashi > Fix For: 0.13.0 > > Attachments: PIG-3815-1.patch, PIG-3815-2.patch, PIG-3815.patch > > > Pig uses DistributedCache.addFileToClassPath api that puts jars on > distributed cache configuration. This uses : to separate list of files to be > put of classpath via distributed cache. If fs.default.name has port number in > it, it causes the tokenization logic to fail in hadoop for retrieving list of > cache filenames in backend. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PIG-3813) Rank column is assigned different uids everytime when schema is reset
[ https://issues.apache.org/jira/browse/PIG-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park updated PIG-3813: --- Status: Patch Available (was: Open) > Rank column is assigned different uids everytime when schema is reset > - > > Key: PIG-3813 > URL: https://issues.apache.org/jira/browse/PIG-3813 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.12.0 >Reporter: Suhas Satish >Assignee: Cheolsoo Park >Priority: Critical > Attachments: PIG-3813-1.patch, PIG-3813-2.patch, test_data.txt > > > When the following script is run, pig goes into an infinite loop. This was > reproduced on pig trunk as of March 12, 2014 on apache hadoop 1.2. > test_data.txt has been attached. > test.pig > tWeek = LOAD '/tmp/test_data.txt' USING PigStorage ('|') AS (WEEK:int, > DESCRIPTION:chararray, END_DATE:chararray, PERIOD:int); > gTWeek = FOREACH tWeek GENERATE WEEK AS WEEK, PERIOD AS PERIOD; > pWeek = FILTER gTWeek BY PERIOD == 201312; > pWeekRanked = RANK pWeek BY WEEK ASC DENSE; > gpWeekRanked = FOREACH pWeekRanked GENERATE $0; > store gpWeekRanked into 'gpWeekRanked'; > describe gpWeekRanked; > --- > The res object of class Result, gets its value from leaf.getNextTuple() > This gets an empty tuple > () > with STATUS_OK. > SO the while(true) condition never gets an End of Processing (EOP) and so > does not exit. > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PIG-3813) Rank column is assigned different uids everytime when schema is reset
[ https://issues.apache.org/jira/browse/PIG-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park updated PIG-3813: --- Summary: Rank column is assigned different uids everytime when schema is reset (was: PigGenericMapBase runPipeline() method returns empty tuples and goes into infinite loop under certain conditions) Updating the title to describe the root cause. > Rank column is assigned different uids everytime when schema is reset > - > > Key: PIG-3813 > URL: https://issues.apache.org/jira/browse/PIG-3813 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.12.0 >Reporter: Suhas Satish >Assignee: Cheolsoo Park >Priority: Critical > Attachments: PIG-3813-1.patch, PIG-3813-2.patch, test_data.txt > > > When the following script is run, pig goes into an infinite loop. This was > reproduced on pig trunk as of March 12, 2014 on apache hadoop 1.2. > test_data.txt has been attached. > test.pig > tWeek = LOAD '/tmp/test_data.txt' USING PigStorage ('|') AS (WEEK:int, > DESCRIPTION:chararray, END_DATE:chararray, PERIOD:int); > gTWeek = FOREACH tWeek GENERATE WEEK AS WEEK, PERIOD AS PERIOD; > pWeek = FILTER gTWeek BY PERIOD == 201312; > pWeekRanked = RANK pWeek BY WEEK ASC DENSE; > gpWeekRanked = FOREACH pWeekRanked GENERATE $0; > store gpWeekRanked into 'gpWeekRanked'; > describe gpWeekRanked; > --- > The res object of class Result, gets its value from leaf.getNextTuple() > This gets an empty tuple > () > with STATUS_OK. > SO the while(true) condition never gets an End of Processing (EOP) and so > does not exit. > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PIG-3813) PigGenericMapBase runPipeline() method returns empty tuples and goes into infinite loop under certain conditions
[ https://issues.apache.org/jira/browse/PIG-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park updated PIG-3813: --- Attachment: PIG-3813-2.patch Here is the 2nd patch that keeps track of the uid of the rank column. > PigGenericMapBase runPipeline() method returns empty tuples and goes into > infinite loop under certain conditions > -- > > Key: PIG-3813 > URL: https://issues.apache.org/jira/browse/PIG-3813 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.12.0 >Reporter: Suhas Satish >Assignee: Cheolsoo Park >Priority: Critical > Attachments: PIG-3813-1.patch, PIG-3813-2.patch, test_data.txt > > > When the following script is run, pig goes into an infinite loop. This was > reproduced on pig trunk as of March 12, 2014 on apache hadoop 1.2. > test_data.txt has been attached. > test.pig > tWeek = LOAD '/tmp/test_data.txt' USING PigStorage ('|') AS (WEEK:int, > DESCRIPTION:chararray, END_DATE:chararray, PERIOD:int); > gTWeek = FOREACH tWeek GENERATE WEEK AS WEEK, PERIOD AS PERIOD; > pWeek = FILTER gTWeek BY PERIOD == 201312; > pWeekRanked = RANK pWeek BY WEEK ASC DENSE; > gpWeekRanked = FOREACH pWeekRanked GENERATE $0; > store gpWeekRanked into 'gpWeekRanked'; > describe gpWeekRanked; > --- > The res object of class Result, gets its value from leaf.getNextTuple() > This gets an empty tuple > () > with STATUS_OK. > SO the while(true) condition never gets an End of Processing (EOP) and so > does not exit. > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PIG-3815) Hadoop bug causes to pig to fail silently with jar cache
[ https://issues.apache.org/jira/browse/PIG-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13939749#comment-13939749 ] Julien Le Dem commented on PIG-3815: [~rohini] in the code you quoted, don't you think it is putting the port back in the following line? {noformat} URI uri = fs.makeQualified(file).toUri(); {noformat} > Hadoop bug causes to pig to fail silently with jar cache > > > Key: PIG-3815 > URL: https://issues.apache.org/jira/browse/PIG-3815 > Project: Pig > Issue Type: Bug >Affects Versions: 0.13.0 >Reporter: Aniket Mokashi >Assignee: Aniket Mokashi > Fix For: 0.13.0 > > Attachments: PIG-3815-1.patch, PIG-3815-2.patch, PIG-3815.patch > > > Pig uses DistributedCache.addFileToClassPath api that puts jars on > distributed cache configuration. This uses : to separate list of files to be > put of classpath via distributed cache. If fs.default.name has port number in > it, it causes the tokenization logic to fail in hadoop for retrieving list of > cache filenames in backend. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PIG-3815) Hadoop bug causes to pig to fail silently with jar cache
[ https://issues.apache.org/jira/browse/PIG-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13939750#comment-13939750 ] Rohini Palaniswamy commented on PIG-3815: - Yes. It has been fixed 3 years ago. I am not sure what version of hadoop you are using and hitting this issue. But since we still support 0.20 as well there is no harm in doing .toUri().getPath() in pig as well. +1. Since the issue is not with hadoop 1.0, please update your comment when checking in this patch from "// PIG-3815 In hadoop 1.0, addFileToClassPath uses : as separator" to say hadoop 0.20. > Hadoop bug causes to pig to fail silently with jar cache > > > Key: PIG-3815 > URL: https://issues.apache.org/jira/browse/PIG-3815 > Project: Pig > Issue Type: Bug >Affects Versions: 0.13.0 >Reporter: Aniket Mokashi >Assignee: Aniket Mokashi > Fix For: 0.13.0 > > Attachments: PIG-3815-1.patch, PIG-3815-2.patch, PIG-3815.patch > > > Pig uses DistributedCache.addFileToClassPath api that puts jars on > distributed cache configuration. This uses : to separate list of files to be > put of classpath via distributed cache. If fs.default.name has port number in > it, it causes the tokenization logic to fail in hadoop for retrieving list of > cache filenames in backend. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PIG-3815) Hadoop bug causes to pig to fail silently with jar cache
[ https://issues.apache.org/jira/browse/PIG-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13939728#comment-13939728 ] Aniket Mokashi commented on PIG-3815: - Thanks for your comments, [~rohini]. I was not aware of limitations on the HDFS streams, I have attached a patch (PIG-3815-2.patch) to fix those problems. Hadoop Jira: https://issues.apache.org/jira/browse/MAPREDUCE-2361. Looks like this was fixed here - http://svn.apache.org/viewvc?view=revision&revision=1077790. > Hadoop bug causes to pig to fail silently with jar cache > > > Key: PIG-3815 > URL: https://issues.apache.org/jira/browse/PIG-3815 > Project: Pig > Issue Type: Bug >Affects Versions: 0.13.0 >Reporter: Aniket Mokashi >Assignee: Aniket Mokashi > Fix For: 0.13.0 > > Attachments: PIG-3815-1.patch, PIG-3815-2.patch, PIG-3815.patch > > > Pig uses DistributedCache.addFileToClassPath api that puts jars on > distributed cache configuration. This uses : to separate list of files to be > put of classpath via distributed cache. If fs.default.name has port number in > it, it causes the tokenization logic to fail in hadoop for retrieving list of > cache filenames in backend. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PIG-3815) Hadoop bug causes to pig to fail silently with jar cache
[ https://issues.apache.org/jira/browse/PIG-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-3815: Attachment: PIG-3815-2.patch > Hadoop bug causes to pig to fail silently with jar cache > > > Key: PIG-3815 > URL: https://issues.apache.org/jira/browse/PIG-3815 > Project: Pig > Issue Type: Bug >Affects Versions: 0.13.0 >Reporter: Aniket Mokashi >Assignee: Aniket Mokashi > Fix For: 0.13.0 > > Attachments: PIG-3815-1.patch, PIG-3815-2.patch, PIG-3815.patch > > > Pig uses DistributedCache.addFileToClassPath api that puts jars on > distributed cache configuration. This uses : to separate list of files to be > put of classpath via distributed cache. If fs.default.name has port number in > it, it causes the tokenization logic to fail in hadoop for retrieving list of > cache filenames in backend. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Reopened] (PIG-3815) Hadoop bug causes to pig to fail silently with jar cache
[ https://issues.apache.org/jira/browse/PIG-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohini Palaniswamy reopened PIG-3815: - Actually I see some issue with this patch. Reopening jira. 1) Changing os.close() to IOUtils.closeQuietly(os); is not good. You can close the input quietly, but not output especially HDFS outputstream. HDFS can create empty files without data which can be accessed through NN fine if os.close() failed. We have been bitten by this a lot of time. In internal projects, we delete the file and retry if os.close() failed. So please let the pig script fail if os.close() failed rather than causing unexpected behavior. 2) addFileToClassPath is already doing file.toUri().getPath(). I don't see where the hadoop bug is coming from. http://svn.apache.org/viewvc/hadoop/common/branches/branch-1.0/src/mapred/org/apache/hadoop/filecache/DistributedCache.java?revision=1206848&view=markup {code} public static void addFileToClassPath (Path file, Configuration conf, FileSystem fs) throws IOException { String filepath = file.toUri().getPath(); String classpath = conf.get("mapred.job.classpath.files"); conf.set("mapred.job.classpath.files", classpath == null ? filepath : classpath + System.getProperty("path.separator") + filepath); URI uri = fs.makeQualified(file).toUri(); addCacheFile(uri, conf); } {code} > Hadoop bug causes to pig to fail silently with jar cache > > > Key: PIG-3815 > URL: https://issues.apache.org/jira/browse/PIG-3815 > Project: Pig > Issue Type: Bug >Affects Versions: 0.13.0 >Reporter: Aniket Mokashi >Assignee: Aniket Mokashi > Fix For: 0.13.0 > > Attachments: PIG-3815-1.patch, PIG-3815.patch > > > Pig uses DistributedCache.addFileToClassPath api that puts jars on > distributed cache configuration. This uses : to separate list of files to be > put of classpath via distributed cache. If fs.default.name has port number in > it, it causes the tokenization logic to fail in hadoop for retrieving list of > cache filenames in backend. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PIG-3813) PigGenericMapBase runPipeline() method returns empty tuples and goes into infinite loop under certain conditions
[ https://issues.apache.org/jira/browse/PIG-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13939661#comment-13939661 ] Cheolsoo Park commented on PIG-3813: [~daijy], thank you for the comment. Let me try your suggestion. > PigGenericMapBase runPipeline() method returns empty tuples and goes into > infinite loop under certain conditions > -- > > Key: PIG-3813 > URL: https://issues.apache.org/jira/browse/PIG-3813 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.12.0 >Reporter: Suhas Satish >Assignee: Cheolsoo Park >Priority: Critical > Attachments: PIG-3813-1.patch, test_data.txt > > > When the following script is run, pig goes into an infinite loop. This was > reproduced on pig trunk as of March 12, 2014 on apache hadoop 1.2. > test_data.txt has been attached. > test.pig > tWeek = LOAD '/tmp/test_data.txt' USING PigStorage ('|') AS (WEEK:int, > DESCRIPTION:chararray, END_DATE:chararray, PERIOD:int); > gTWeek = FOREACH tWeek GENERATE WEEK AS WEEK, PERIOD AS PERIOD; > pWeek = FILTER gTWeek BY PERIOD == 201312; > pWeekRanked = RANK pWeek BY WEEK ASC DENSE; > gpWeekRanked = FOREACH pWeekRanked GENERATE $0; > store gpWeekRanked into 'gpWeekRanked'; > describe gpWeekRanked; > --- > The res object of class Result, gets its value from leaf.getNextTuple() > This gets an empty tuple > () > with STATUS_OK. > SO the while(true) condition never gets an End of Processing (EOP) and so > does not exit. > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PIG-3813) PigGenericMapBase runPipeline() method returns empty tuples and goes into infinite loop under certain conditions
[ https://issues.apache.org/jira/browse/PIG-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13939622#comment-13939622 ] Daniel Dai commented on PIG-3813: - Seems LORank is generating the wrong uid. One thing missing in LORank is to keep generated uid across session, so a schema reset will not erase the old uid. We can refer to LOCogroup.groupKeyUidOnlySchema to fix LORank. > PigGenericMapBase runPipeline() method returns empty tuples and goes into > infinite loop under certain conditions > -- > > Key: PIG-3813 > URL: https://issues.apache.org/jira/browse/PIG-3813 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.12.0 >Reporter: Suhas Satish >Assignee: Cheolsoo Park >Priority: Critical > Attachments: PIG-3813-1.patch, test_data.txt > > > When the following script is run, pig goes into an infinite loop. This was > reproduced on pig trunk as of March 12, 2014 on apache hadoop 1.2. > test_data.txt has been attached. > test.pig > tWeek = LOAD '/tmp/test_data.txt' USING PigStorage ('|') AS (WEEK:int, > DESCRIPTION:chararray, END_DATE:chararray, PERIOD:int); > gTWeek = FOREACH tWeek GENERATE WEEK AS WEEK, PERIOD AS PERIOD; > pWeek = FILTER gTWeek BY PERIOD == 201312; > pWeekRanked = RANK pWeek BY WEEK ASC DENSE; > gpWeekRanked = FOREACH pWeekRanked GENERATE $0; > store gpWeekRanked into 'gpWeekRanked'; > describe gpWeekRanked; > --- > The res object of class Result, gets its value from leaf.getNextTuple() > This gets an empty tuple > () > with STATUS_OK. > SO the while(true) condition never gets an End of Processing (EOP) and so > does not exit. > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PIG-3815) Hadoop bug causes to pig to fail silently with jar cache
[ https://issues.apache.org/jira/browse/PIG-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13939618#comment-13939618 ] Aniket Mokashi commented on PIG-3815: - Thanks for the review, [~julienledem] and [~cheolsoo]. I have attached revised patch and committed it to trunk! > Hadoop bug causes to pig to fail silently with jar cache > > > Key: PIG-3815 > URL: https://issues.apache.org/jira/browse/PIG-3815 > Project: Pig > Issue Type: Bug >Affects Versions: 0.13.0 >Reporter: Aniket Mokashi >Assignee: Aniket Mokashi > Fix For: 0.13.0 > > Attachments: PIG-3815-1.patch, PIG-3815.patch > > > Pig uses DistributedCache.addFileToClassPath api that puts jars on > distributed cache configuration. This uses : to separate list of files to be > put of classpath via distributed cache. If fs.default.name has port number in > it, it causes the tokenization logic to fail in hadoop for retrieving list of > cache filenames in backend. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (PIG-3815) Hadoop bug causes to pig to fail silently with jar cache
[ https://issues.apache.org/jira/browse/PIG-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi resolved PIG-3815. - Resolution: Fixed > Hadoop bug causes to pig to fail silently with jar cache > > > Key: PIG-3815 > URL: https://issues.apache.org/jira/browse/PIG-3815 > Project: Pig > Issue Type: Bug >Affects Versions: 0.13.0 >Reporter: Aniket Mokashi >Assignee: Aniket Mokashi > Fix For: 0.13.0 > > Attachments: PIG-3815-1.patch, PIG-3815.patch > > > Pig uses DistributedCache.addFileToClassPath api that puts jars on > distributed cache configuration. This uses : to separate list of files to be > put of classpath via distributed cache. If fs.default.name has port number in > it, it causes the tokenization logic to fail in hadoop for retrieving list of > cache filenames in backend. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PIG-3815) Hadoop bug causes to pig to fail silently with jar cache
[ https://issues.apache.org/jira/browse/PIG-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13939606#comment-13939606 ] Julien Le Dem commented on PIG-3815: same comment as 1. from Cheolsoo otherwise, this looks good to me. > Hadoop bug causes to pig to fail silently with jar cache > > > Key: PIG-3815 > URL: https://issues.apache.org/jira/browse/PIG-3815 > Project: Pig > Issue Type: Bug >Affects Versions: 0.13.0 >Reporter: Aniket Mokashi >Assignee: Aniket Mokashi > Fix For: 0.13.0 > > Attachments: PIG-3815-1.patch, PIG-3815.patch > > > Pig uses DistributedCache.addFileToClassPath api that puts jars on > distributed cache configuration. This uses : to separate list of files to be > put of classpath via distributed cache. If fs.default.name has port number in > it, it causes the tokenization logic to fail in hadoop for retrieving list of > cache filenames in backend. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PIG-3815) Hadoop bug causes to pig to fail silently with jar cache
[ https://issues.apache.org/jira/browse/PIG-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-3815: Attachment: PIG-3815-1.patch > Hadoop bug causes to pig to fail silently with jar cache > > > Key: PIG-3815 > URL: https://issues.apache.org/jira/browse/PIG-3815 > Project: Pig > Issue Type: Bug >Affects Versions: 0.13.0 >Reporter: Aniket Mokashi >Assignee: Aniket Mokashi > Fix For: 0.13.0 > > Attachments: PIG-3815-1.patch, PIG-3815.patch > > > Pig uses DistributedCache.addFileToClassPath api that puts jars on > distributed cache configuration. This uses : to separate list of files to be > put of classpath via distributed cache. If fs.default.name has port number in > it, it causes the tokenization logic to fail in hadoop for retrieving list of > cache filenames in backend. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PIG-3815) Hadoop bug causes to pig to fail silently with jar cache
[ https://issues.apache.org/jira/browse/PIG-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-3815: Attachment: (was: PIG-3815-1.patch) > Hadoop bug causes to pig to fail silently with jar cache > > > Key: PIG-3815 > URL: https://issues.apache.org/jira/browse/PIG-3815 > Project: Pig > Issue Type: Bug >Affects Versions: 0.13.0 >Reporter: Aniket Mokashi >Assignee: Aniket Mokashi > Fix For: 0.13.0 > > Attachments: PIG-3815-1.patch, PIG-3815.patch > > > Pig uses DistributedCache.addFileToClassPath api that puts jars on > distributed cache configuration. This uses : to separate list of files to be > put of classpath via distributed cache. If fs.default.name has port number in > it, it causes the tokenization logic to fail in hadoop for retrieving list of > cache filenames in backend. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PIG-3815) Hadoop bug causes to pig to fail silently with jar cache
[ https://issues.apache.org/jira/browse/PIG-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-3815: Attachment: (was: PIG-3815-1.patch) > Hadoop bug causes to pig to fail silently with jar cache > > > Key: PIG-3815 > URL: https://issues.apache.org/jira/browse/PIG-3815 > Project: Pig > Issue Type: Bug >Affects Versions: 0.13.0 >Reporter: Aniket Mokashi >Assignee: Aniket Mokashi > Fix For: 0.13.0 > > Attachments: PIG-3815-1.patch, PIG-3815.patch > > > Pig uses DistributedCache.addFileToClassPath api that puts jars on > distributed cache configuration. This uses : to separate list of files to be > put of classpath via distributed cache. If fs.default.name has port number in > it, it causes the tokenization logic to fail in hadoop for retrieving list of > cache filenames in backend. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PIG-3815) Hadoop bug causes to pig to fail silently with jar cache
[ https://issues.apache.org/jira/browse/PIG-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-3815: Attachment: PIG-3815-1.patch > Hadoop bug causes to pig to fail silently with jar cache > > > Key: PIG-3815 > URL: https://issues.apache.org/jira/browse/PIG-3815 > Project: Pig > Issue Type: Bug >Affects Versions: 0.13.0 >Reporter: Aniket Mokashi >Assignee: Aniket Mokashi > Fix For: 0.13.0 > > Attachments: PIG-3815-1.patch, PIG-3815.patch > > > Pig uses DistributedCache.addFileToClassPath api that puts jars on > distributed cache configuration. This uses : to separate list of files to be > put of classpath via distributed cache. If fs.default.name has port number in > it, it causes the tokenization logic to fail in hadoop for retrieving list of > cache filenames in backend. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PIG-3815) Hadoop bug causes to pig to fail silently with jar cache
[ https://issues.apache.org/jira/browse/PIG-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-3815: Attachment: PIG-3815-1.patch > Hadoop bug causes to pig to fail silently with jar cache > > > Key: PIG-3815 > URL: https://issues.apache.org/jira/browse/PIG-3815 > Project: Pig > Issue Type: Bug >Affects Versions: 0.13.0 >Reporter: Aniket Mokashi >Assignee: Aniket Mokashi > Fix For: 0.13.0 > > Attachments: PIG-3815-1.patch, PIG-3815.patch > > > Pig uses DistributedCache.addFileToClassPath api that puts jars on > distributed cache configuration. This uses : to separate list of files to be > put of classpath via distributed cache. If fs.default.name has port number in > it, it causes the tokenization logic to fail in hadoop for retrieving list of > cache filenames in backend. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PIG-3816) Incorrect Javadoc for launchPlan() method
[ https://issues.apache.org/jira/browse/PIG-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park updated PIG-3816: --- Fix Version/s: 0.13.0 > Incorrect Javadoc for launchPlan() method > - > > Key: PIG-3816 > URL: https://issues.apache.org/jira/browse/PIG-3816 > Project: Pig > Issue Type: Bug > Components: documentation >Reporter: Kyungho Jeon >Assignee: Kyungho Jeon >Priority: Trivial > Fix For: 0.13.0 > > Attachments: PIG-3816.patch > > > Javadoc of {{protected PigStats launchPlan(LogicalPlan lp, String jobName)}} > incorrectly describes that the method takes a physical plan as an argument. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PIG-3815) Hadoop bug causes to pig to fail silently with jar cache
[ https://issues.apache.org/jira/browse/PIG-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13939354#comment-13939354 ] Cheolsoo Park commented on PIG-3815: # Can you delete this? It's unused. {code} +import org.codehaus.plexus.util.IOUtil; {code} # Do you mind fixing JobControlCompiler.java#L1700 too? Looks like we can use IOUtils.closeQuietly() here too. {code} OutputStream os = fs.create(dst); try { IOUtils.copyBytes(url.openStream(), os, 4096, true); } finally { // IOUtils can not close both the input and the output properly in a finally // as we can get an exception in between opening the stream and calling the method os.close(); } {code} > Hadoop bug causes to pig to fail silently with jar cache > > > Key: PIG-3815 > URL: https://issues.apache.org/jira/browse/PIG-3815 > Project: Pig > Issue Type: Bug >Affects Versions: 0.13.0 >Reporter: Aniket Mokashi >Assignee: Aniket Mokashi > Fix For: 0.13.0 > > Attachments: PIG-3815.patch > > > Pig uses DistributedCache.addFileToClassPath api that puts jars on > distributed cache configuration. This uses : to separate list of files to be > put of classpath via distributed cache. If fs.default.name has port number in > it, it causes the tokenization logic to fail in hadoop for retrieving list of > cache filenames in backend. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (PIG-3816) Incorrect Javadoc for launchPlan() method
[ https://issues.apache.org/jira/browse/PIG-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park resolved PIG-3816. Resolution: Fixed Assignee: Kyungho Jeon Added Kyungho to the contributors. I also added Prashant to the admins. > Incorrect Javadoc for launchPlan() method > - > > Key: PIG-3816 > URL: https://issues.apache.org/jira/browse/PIG-3816 > Project: Pig > Issue Type: Bug > Components: documentation >Reporter: Kyungho Jeon >Assignee: Kyungho Jeon >Priority: Trivial > Attachments: PIG-3816.patch > > > Javadoc of {{protected PigStats launchPlan(LogicalPlan lp, String jobName)}} > incorrectly describes that the method takes a physical plan as an argument. -- This message was sent by Atlassian JIRA (v6.2#6252)