[jira] Updated: (PIG-794) Use Avro serialization in Pig
[ https://issues.apache.org/jira/browse/PIG-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated PIG-794: --- Attachment: AvroStorage_4.patch Attach the patch according Doug's suggestion, extend GenericDatumReader and GenericDatumWriter. But it can not handle InternalMap. Doug, could you help try to look at what's the problem ? Use Avro serialization in Pig - Key: PIG-794 URL: https://issues.apache.org/jira/browse/PIG-794 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.2.0 Reporter: Rakesh Setty Assignee: Dmitriy V. Ryaboy Attachments: avro-0.1-dev-java_r765402.jar, AvroStorage.patch, AvroStorage_2.patch, AvroStorage_3.patch, AvroStorage_4.patch, AvroTest.java, jackson-asl-0.9.4.jar, PIG-794.patch We would like to use Avro serialization in Pig to pass data between MR jobs instead of the current BinStorage. Attached is an implementation of AvroBinStorage which performs significantly better compared to BinStorage on our benchmarks. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1543) IsEmpty returns the wrong value after using LIMIT
[ https://issues.apache.org/jira/browse/PIG-1543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12905587#action_12905587 ] Daniel Dai commented on PIG-1543: - test-patch result: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. All tests pass IsEmpty returns the wrong value after using LIMIT - Key: PIG-1543 URL: https://issues.apache.org/jira/browse/PIG-1543 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Justin Hu Assignee: Daniel Dai Fix For: 0.8.0 Attachments: PIG-1543-1.patch 1. Two input files: 1a: limit_empty.input_a 1 1 1 1b: limit_empty.input_b 2 2 2. The pig script: limit_empty.pig -- A contains only 1's B contains only 2's A = load 'limit_empty.input_a' as (a1:int); B = load 'limit_empty.input_a' as (b1:int); C =COGROUP A by a1, B by b1; D = FOREACH C generate A, B, (IsEmpty(A)? 0:1), (IsEmpty(B)? 0:1), COUNT(A), COUNT(B); store D into 'limit_empty.output/d'; -- After the script done, we see the right results: -- {(1),(1),(1)} {} 1 0 3 0 -- {} {(2),(2)} 0 1 0 2 C1 = foreach C { Alim = limit A 1; Blim = limit B 1; generate Alim, Blim; } D1 = FOREACH C1 generate Alim,Blim, (IsEmpty(Alim)? 0:1), (IsEmpty(Blim)? 0:1), COUNT(Alim), COUNT(Blim); store D1 into 'limit_empty.output/d1'; -- After the script done, we see the unexpected results: -- {(1)} {}1 1 1 0 -- {} {(2)} 1 1 0 1 dump D; dump D1; 3. Run the scrip and redirect the stdout (2 dumps) file. There are two issues: The major one: IsEmpty() returns FALSE for empty bag in limit_empty.output/d1/*, while IsEmpty() returns correctly in limit_empty.output/d/*. The difference is that one has been applied with LIMIT before using IsEmpty(). The minor one: The redirected output only contains the first dump: ({(1),(1),(1)},{},1,0,3L,0L) ({},{(2),(2)},0,1,0L,2L) We expect two more lines like: ({(1)},{},1,1,1L,0L) ({},{(2)},1,1,0L,1L) Besides, there is error says: [main] ERROR org.apache.pig.backend.hadoop.executionengine.HJob - java.lang.ClassCastException: java.lang.Integer cannot be cast to org.apache.pig.data.Tuple -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1550) better error handling in casting relations to scalars
[ https://issues.apache.org/jira/browse/PIG-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-1550: --- Status: Patch Available (was: Open) better error handling in casting relations to scalars - Key: PIG-1550 URL: https://issues.apache.org/jira/browse/PIG-1550 Project: Pig Issue Type: Bug Reporter: Olga Natkovich Assignee: Thejas M Nair Fix For: 0.8.0 Attachments: PIG-1550.1.patch I ran the following script: Input data: joe 100 sam 20 bob 134 Script: A = load 'user_clicks' as (user: chararray, clicks: int); B = group A by user; C = foreach A generate group, SUM(A.clicks); D = foreach A generate clicks/(double)C.$1; dump C; Since C contains more than 1 tuple, I expected to get an error which I did. However, the error was not very clear. When the job failed, I did see a valid error (however it lacked the error code): 210630 [main] ERROR org.apache.pig.tools.pigstats.PigStats - ERROR 0: Scalar has more than one row in the output However at the end of processing, I saw a misleading error: 210709 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2088: Unable to get results for: hdfs://wilbur20.labs.corp.sp1.yahoo.com:9020/tmp/temp818551960/tmp1063730945:org.apache.pig.impl.io.InterStorage 10/08/19 17:16:22 ERROR grunt.Grunt: ERROR 2088: Unable to get results for: hdfs://wilbur20.labs.corp.sp1.yahoo.com:9020/tmp/temp818551960/tmp1063730945:org.apache.pig.impl.io.InterStorage -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1550) better error handling in casting relations to scalars
[ https://issues.apache.org/jira/browse/PIG-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-1550: --- Attachment: PIG-1550.1.patch PIG-1550.1.patch test-patch has succeeded . unit tests are still running. [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. better error handling in casting relations to scalars - Key: PIG-1550 URL: https://issues.apache.org/jira/browse/PIG-1550 Project: Pig Issue Type: Bug Reporter: Olga Natkovich Assignee: Thejas M Nair Fix For: 0.8.0 Attachments: PIG-1550.1.patch I ran the following script: Input data: joe 100 sam 20 bob 134 Script: A = load 'user_clicks' as (user: chararray, clicks: int); B = group A by user; C = foreach A generate group, SUM(A.clicks); D = foreach A generate clicks/(double)C.$1; dump C; Since C contains more than 1 tuple, I expected to get an error which I did. However, the error was not very clear. When the job failed, I did see a valid error (however it lacked the error code): 210630 [main] ERROR org.apache.pig.tools.pigstats.PigStats - ERROR 0: Scalar has more than one row in the output However at the end of processing, I saw a misleading error: 210709 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2088: Unable to get results for: hdfs://wilbur20.labs.corp.sp1.yahoo.com:9020/tmp/temp818551960/tmp1063730945:org.apache.pig.impl.io.InterStorage 10/08/19 17:16:22 ERROR grunt.Grunt: ERROR 2088: Unable to get results for: hdfs://wilbur20.labs.corp.sp1.yahoo.com:9020/tmp/temp818551960/tmp1063730945:org.apache.pig.impl.io.InterStorage -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-1548) Optimize scalar to consolidate the part file
[ https://issues.apache.org/jira/browse/PIG-1548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair reassigned PIG-1548: -- Assignee: Richard Ding (was: Thejas M Nair) Optimize scalar to consolidate the part file Key: PIG-1548 URL: https://issues.apache.org/jira/browse/PIG-1548 Project: Pig Issue Type: Improvement Components: impl Reporter: Daniel Dai Assignee: Richard Ding Fix For: 0.8.0 Current scalar implementation will write a scalar file onto dfs. When Pig need the scalar, it will open the dfs file directly. Each scalar file contains more than one part file though it contains only one record. This puts a huge load to namenode. We should consolidate part file before open it. Another optional step is put the consolicated file into distributed cache. This further bring down the load of namenode. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-794) Use Avro serialization in Pig
[ https://issues.apache.org/jira/browse/PIG-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12905612#action_12905612 ] Dmitriy V. Ryaboy commented on PIG-794: --- Doug and Scott will know better of course, but afaik, Avro doesn't support Object keys. You can cheat and turn Object keys into strings by Base64-encoding their serialized representations.. you'd have to know to reverse the process when deserializing, though. Or we can try to get rid of InternalMap. Use Avro serialization in Pig - Key: PIG-794 URL: https://issues.apache.org/jira/browse/PIG-794 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.2.0 Reporter: Rakesh Setty Assignee: Dmitriy V. Ryaboy Attachments: avro-0.1-dev-java_r765402.jar, AvroStorage.patch, AvroStorage_2.patch, AvroStorage_3.patch, AvroStorage_4.patch, AvroTest.java, jackson-asl-0.9.4.jar, PIG-794.patch We would like to use Avro serialization in Pig to pass data between MR jobs instead of the current BinStorage. Attached is an implementation of AvroBinStorage which performs significantly better compared to BinStorage on our benchmarks. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1544) proactive-spill bags should share the memory alloted for it
[ https://issues.apache.org/jira/browse/PIG-1544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12905628#action_12905628 ] Olga Natkovich commented on PIG-1544: - I am going to take my previous comment back and say that we should make this work for UDFs as well. The main reason for this is that we don't have another way to make sure that UDFs do not run out of memory. One approach that Alan proposed was to make bags when they are created to ask for memory and have a central broker in charge of the memory pool. The details of this or whether there is a better approach need to be still thought through. proactive-spill bags should share the memory alloted for it --- Key: PIG-1544 URL: https://issues.apache.org/jira/browse/PIG-1544 Project: Pig Issue Type: Bug Reporter: Thejas M Nair Initially proactive spill bags were designed for use in (co)group (InternalCacheBag) and they knew the total number of proactive bags that were present, and shared the memory limit specified using the property pig.cachedbag.memusage . But the two proactive bag implementations were added later - InternalDistinctBag and InternalSortedBag are not aware of actual number of bags being used - their users always assume total-numbags = 3. This needs to be fixed and all proactive-spill bags should share the memory-limit . -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1544) proactive-spill bags should share the memory alloted for it
[ https://issues.apache.org/jira/browse/PIG-1544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-1544: Assignee: Thejas M Nair Fix Version/s: 0.9.0 proactive-spill bags should share the memory alloted for it --- Key: PIG-1544 URL: https://issues.apache.org/jira/browse/PIG-1544 Project: Pig Issue Type: Bug Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.9.0 Initially proactive spill bags were designed for use in (co)group (InternalCacheBag) and they knew the total number of proactive bags that were present, and shared the memory limit specified using the property pig.cachedbag.memusage . But the two proactive bag implementations were added later - InternalDistinctBag and InternalSortedBag are not aware of actual number of bags being used - their users always assume total-numbags = 3. This needs to be fixed and all proactive-spill bags should share the memory-limit . -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-794) Use Avro serialization in Pig
[ https://issues.apache.org/jira/browse/PIG-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12905663#action_12905663 ] Doug Cutting commented on PIG-794: -- Some quick comments on the new patch: - you might define a java enum type for the union elements, using Enum#ordinal() for the union indexes - instead of name.equals(union), s.getType()==Type.UNION would be faster, but better yet would be to simply call read() recursively, since it already handles unions, no? - peekArray() can simply return null, and that might be faster Use Avro serialization in Pig - Key: PIG-794 URL: https://issues.apache.org/jira/browse/PIG-794 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.2.0 Reporter: Rakesh Setty Assignee: Dmitriy V. Ryaboy Attachments: avro-0.1-dev-java_r765402.jar, AvroStorage.patch, AvroStorage_2.patch, AvroStorage_3.patch, AvroStorage_4.patch, AvroTest.java, jackson-asl-0.9.4.jar, PIG-794.patch We would like to use Avro serialization in Pig to pass data between MR jobs instead of the current BinStorage. Attached is an implementation of AvroBinStorage which performs significantly better compared to BinStorage on our benchmarks. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1309) Sort Merge Cogroup
[ https://issues.apache.org/jira/browse/PIG-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-1309: Summary: Sort Merge Cogroup (was: Map-side Cogroup) Sort Merge Cogroup -- Key: PIG-1309 URL: https://issues.apache.org/jira/browse/PIG-1309 Project: Pig Issue Type: Bug Components: impl Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Fix For: 0.7.0, 0.8.0 Attachments: mapsideCogrp.patch, pig-1309_1.patch, pig-1309_2.patch, PIG_1309_7.patch In never ending quest to make Pig go faster, we want to parallelize as many relational operations as possible. Its already possible to do Group-by( PIG-984 ) and Joins( PIG-845 , PIG-554 ) purely in map-side in Pig. This jira is to add map-side implementation of Cogroup in Pig. Details to follow. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1550) better error handling in casting relations to scalars
[ https://issues.apache.org/jira/browse/PIG-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12905715#action_12905715 ] Thejas M Nair commented on PIG-1550: Unit tests have succeeded. Patch is ready for review. better error handling in casting relations to scalars - Key: PIG-1550 URL: https://issues.apache.org/jira/browse/PIG-1550 Project: Pig Issue Type: Bug Reporter: Olga Natkovich Assignee: Thejas M Nair Fix For: 0.8.0 Attachments: PIG-1550.1.patch I ran the following script: Input data: joe 100 sam 20 bob 134 Script: A = load 'user_clicks' as (user: chararray, clicks: int); B = group A by user; C = foreach A generate group, SUM(A.clicks); D = foreach A generate clicks/(double)C.$1; dump C; Since C contains more than 1 tuple, I expected to get an error which I did. However, the error was not very clear. When the job failed, I did see a valid error (however it lacked the error code): 210630 [main] ERROR org.apache.pig.tools.pigstats.PigStats - ERROR 0: Scalar has more than one row in the output However at the end of processing, I saw a misleading error: 210709 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2088: Unable to get results for: hdfs://wilbur20.labs.corp.sp1.yahoo.com:9020/tmp/temp818551960/tmp1063730945:org.apache.pig.impl.io.InterStorage 10/08/19 17:16:22 ERROR grunt.Grunt: ERROR 2088: Unable to get results for: hdfs://wilbur20.labs.corp.sp1.yahoo.com:9020/tmp/temp818551960/tmp1063730945:org.apache.pig.impl.io.InterStorage -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1550) better error handling in casting relations to scalars
[ https://issues.apache.org/jira/browse/PIG-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12905717#action_12905717 ] Olga Natkovich commented on PIG-1550: - I will review the patch better error handling in casting relations to scalars - Key: PIG-1550 URL: https://issues.apache.org/jira/browse/PIG-1550 Project: Pig Issue Type: Bug Reporter: Olga Natkovich Assignee: Thejas M Nair Fix For: 0.8.0 Attachments: PIG-1550.1.patch I ran the following script: Input data: joe 100 sam 20 bob 134 Script: A = load 'user_clicks' as (user: chararray, clicks: int); B = group A by user; C = foreach A generate group, SUM(A.clicks); D = foreach A generate clicks/(double)C.$1; dump C; Since C contains more than 1 tuple, I expected to get an error which I did. However, the error was not very clear. When the job failed, I did see a valid error (however it lacked the error code): 210630 [main] ERROR org.apache.pig.tools.pigstats.PigStats - ERROR 0: Scalar has more than one row in the output However at the end of processing, I saw a misleading error: 210709 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2088: Unable to get results for: hdfs://wilbur20.labs.corp.sp1.yahoo.com:9020/tmp/temp818551960/tmp1063730945:org.apache.pig.impl.io.InterStorage 10/08/19 17:16:22 ERROR grunt.Grunt: ERROR 2088: Unable to get results for: hdfs://wilbur20.labs.corp.sp1.yahoo.com:9020/tmp/temp818551960/tmp1063730945:org.apache.pig.impl.io.InterStorage -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1550) better error handling in casting relations to scalars
[ https://issues.apache.org/jira/browse/PIG-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12905731#action_12905731 ] Olga Natkovich commented on PIG-1550: - +1, looks good better error handling in casting relations to scalars - Key: PIG-1550 URL: https://issues.apache.org/jira/browse/PIG-1550 Project: Pig Issue Type: Bug Reporter: Olga Natkovich Assignee: Thejas M Nair Fix For: 0.8.0 Attachments: PIG-1550.1.patch I ran the following script: Input data: joe 100 sam 20 bob 134 Script: A = load 'user_clicks' as (user: chararray, clicks: int); B = group A by user; C = foreach A generate group, SUM(A.clicks); D = foreach A generate clicks/(double)C.$1; dump C; Since C contains more than 1 tuple, I expected to get an error which I did. However, the error was not very clear. When the job failed, I did see a valid error (however it lacked the error code): 210630 [main] ERROR org.apache.pig.tools.pigstats.PigStats - ERROR 0: Scalar has more than one row in the output However at the end of processing, I saw a misleading error: 210709 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2088: Unable to get results for: hdfs://wilbur20.labs.corp.sp1.yahoo.com:9020/tmp/temp818551960/tmp1063730945:org.apache.pig.impl.io.InterStorage 10/08/19 17:16:22 ERROR grunt.Grunt: ERROR 2088: Unable to get results for: hdfs://wilbur20.labs.corp.sp1.yahoo.com:9020/tmp/temp818551960/tmp1063730945:org.apache.pig.impl.io.InterStorage -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1550) better error handling in casting relations to scalars
[ https://issues.apache.org/jira/browse/PIG-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-1550: --- Status: Resolved (was: Patch Available) Resolution: Fixed Patch committed to trunk and 0.8 branch. better error handling in casting relations to scalars - Key: PIG-1550 URL: https://issues.apache.org/jira/browse/PIG-1550 Project: Pig Issue Type: Bug Reporter: Olga Natkovich Assignee: Thejas M Nair Fix For: 0.8.0 Attachments: PIG-1550.1.patch I ran the following script: Input data: joe 100 sam 20 bob 134 Script: A = load 'user_clicks' as (user: chararray, clicks: int); B = group A by user; C = foreach A generate group, SUM(A.clicks); D = foreach A generate clicks/(double)C.$1; dump C; Since C contains more than 1 tuple, I expected to get an error which I did. However, the error was not very clear. When the job failed, I did see a valid error (however it lacked the error code): 210630 [main] ERROR org.apache.pig.tools.pigstats.PigStats - ERROR 0: Scalar has more than one row in the output However at the end of processing, I saw a misleading error: 210709 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2088: Unable to get results for: hdfs://wilbur20.labs.corp.sp1.yahoo.com:9020/tmp/temp818551960/tmp1063730945:org.apache.pig.impl.io.InterStorage 10/08/19 17:16:22 ERROR grunt.Grunt: ERROR 2088: Unable to get results for: hdfs://wilbur20.labs.corp.sp1.yahoo.com:9020/tmp/temp818551960/tmp1063730945:org.apache.pig.impl.io.InterStorage -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1334) Make pig artifacts available through maven
[ https://issues.apache.org/jira/browse/PIG-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12905736#action_12905736 ] Scott Carey commented on PIG-1334: -- This ticket is incomplete. *It did not properly package javadoc. * JUnit, is not marked as a test-time dependency, but as a runtime dependency. * I suspect HBase is not a runtime dependency, but an 'optional' (non-transitive) or 'provided' dependency. Should this be re-opened or make a new ticket? There is a -sources.jar that has java source and additionally other documentation, but no javadoc that I can find, and if it is in there it doesn't have the right folder structure. A properly packaged Maven javadoc jar has a file structure like this: https://repository.apache.org/content/repositories/public/org/apache/avro/avro/1.4.0-SNAPSHOT/avro-1.4.0-20100825.231911-4-javadoc.jar When packaged properly, third party tools (IDE's like Eclipse) will automatically import the javadoc and java sources for the dependency, making them automatically available in the IDE when coding or debugging. Make pig artifacts available through maven -- Key: PIG-1334 URL: https://issues.apache.org/jira/browse/PIG-1334 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: niraj rai Fix For: 0.8.0 Attachments: mvn-pig.patch, mvn_pig_2.patch, mvn_pig_3.patch, mvn_pig_4.patch, mvn_pig_5.patch, mvn_pig_6.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1334) Make pig artifacts available through maven
[ https://issues.apache.org/jira/browse/PIG-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12905744#action_12905744 ] Richard Ding commented on PIG-1334: --- Scott, Please create a new Jira for this. Another follow-up jira (PIG-1562) has already been opened. -Richard Make pig artifacts available through maven -- Key: PIG-1334 URL: https://issues.apache.org/jira/browse/PIG-1334 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: niraj rai Fix For: 0.8.0 Attachments: mvn-pig.patch, mvn_pig_2.patch, mvn_pig_3.patch, mvn_pig_4.patch, mvn_pig_5.patch, mvn_pig_6.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1548) Optimize scalar to consolidate the part file
[ https://issues.apache.org/jira/browse/PIG-1548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1548: -- Attachment: PIG-1458.patch Results of test-patch: {code} [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. {code} Optimize scalar to consolidate the part file Key: PIG-1548 URL: https://issues.apache.org/jira/browse/PIG-1548 Project: Pig Issue Type: Improvement Components: impl Reporter: Daniel Dai Assignee: Richard Ding Fix For: 0.8.0 Attachments: PIG-1458.patch Current scalar implementation will write a scalar file onto dfs. When Pig need the scalar, it will open the dfs file directly. Each scalar file contains more than one part file though it contains only one record. This puts a huge load to namenode. We should consolidate part file before open it. Another optional step is put the consolicated file into distributed cache. This further bring down the load of namenode. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1575) Complete the migration of optimization rule PushUpFilter including missing test cases
[ https://issues.apache.org/jira/browse/PIG-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated PIG-1575: - Status: Patch Available (was: Open) Complete the migration of optimization rule PushUpFilter including missing test cases - Key: PIG-1575 URL: https://issues.apache.org/jira/browse/PIG-1575 Project: Pig Issue Type: Bug Reporter: Xuefu Zhang Assignee: Xuefu Zhang Fix For: 0.8.0 Attachments: jira-1575-1.patch The Optimization rule under the new logical plan, PushUpFilter, only does a subset of optimization scenarios compared to the same rule under the old logical plan. For instance, it only considers filter after join, but the old optimization also considers other operators such as CoGroup, Union, Cross, etc. The migration of the rule should be complete. Also, the test cases created for testing the old PushUpFilter wasn't migrated to the new logical plan code base. It should be also migrated. (A few has been migrated in JIRA-1574.) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1598) Pig gobbles up error messages - Part 2
Pig gobbles up error messages - Part 2 -- Key: PIG-1598 URL: https://issues.apache.org/jira/browse/PIG-1598 Project: Pig Issue Type: Improvement Reporter: Ashutosh Chauhan Another case of PIG-1531 . -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-794) Use Avro serialization in Pig
[ https://issues.apache.org/jira/browse/PIG-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12905775#action_12905775 ] Dmitriy V. Ryaboy commented on PIG-794: --- Jeff, that's what I am saying -- since they are writables, we can turn them into strings and not need InternalMap at all. Use Avro serialization in Pig - Key: PIG-794 URL: https://issues.apache.org/jira/browse/PIG-794 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.2.0 Reporter: Rakesh Setty Assignee: Dmitriy V. Ryaboy Attachments: avro-0.1-dev-java_r765402.jar, AvroStorage.patch, AvroStorage_2.patch, AvroStorage_3.patch, AvroStorage_4.patch, AvroTest.java, jackson-asl-0.9.4.jar, PIG-794.patch We would like to use Avro serialization in Pig to pass data between MR jobs instead of the current BinStorage. Attached is an implementation of AvroBinStorage which performs significantly better compared to BinStorage on our benchmarks. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.