Re: Move pig project from svn to git repository
I'm a huge fan of git and use it exclusively for Pig with the exception of committing patches. I haven't personally experienced any reliability issues with the git mirror. What are the reliability issues you've seen? On Fri, Feb 1, 2013 at 6:46 PM, Jarek Jarcec Cecho wrote: > Hi pig developers, > I personally prefer git over svn, so I'm using the git mirrors that Apache > provides. As those mirrors do not seem entirely reliable I was wondering > whether there are other pig developers that also prefer git over svn as > myself. Apache Infrastructure Team is supporting projects that are > primarily working with git, so my question is - would pig developer > community be interested in migrating the repository from svn to git? > > I've recently participated in three projects that done this change, namely > Sqoop, Flume and MRunit, and it's not a big deal. The process is rather > simple, just it take some time as most of the job is done by Infrastructure > team. I would be more than happy to help or even drive the process in case > that this change would be desirable by community. > > Jarcec > -- *Note that I'm no longer using my Yahoo! email address. Please email me at billgra...@gmail.com going forward.*
Move pig project from svn to git repository
Hi pig developers, I personally prefer git over svn, so I'm using the git mirrors that Apache provides. As those mirrors do not seem entirely reliable I was wondering whether there are other pig developers that also prefer git over svn as myself. Apache Infrastructure Team is supporting projects that are primarily working with git, so my question is - would pig developer community be interested in migrating the repository from svn to git? I've recently participated in three projects that done this change, namely Sqoop, Flume and MRunit, and it's not a big deal. The process is rather simple, just it take some time as most of the job is done by Infrastructure team. I would be more than happy to help or even drive the process in case that this change would be desirable by community. Jarcec signature.asc Description: Digital signature
[jira] [Commented] (PIG-3015) Rewrite of AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569397#comment-13569397 ] Cheolsoo Park commented on PIG-3015: [~jadler], if you could add documentation, that would be awesome! > Rewrite of AvroStorage > -- > > Key: PIG-3015 > URL: https://issues.apache.org/jira/browse/PIG-3015 > Project: Pig > Issue Type: Improvement > Components: piggybank >Reporter: Joseph Adler >Assignee: Joseph Adler > Attachments: bad.avro, good.avro, PIG-3015-2.patch, PIG-3015-3.patch, > PIG-3015-4.patch, PIG-3015-5.patch, PIG-3015-6.patch, PIG-3015-7.patch, > TestInput.java, Test.java > > > The current AvroStorage implementation has a lot of issues: it requires old > versions of Avro, it copies data much more than needed, and it's verbose and > complicated. (One pet peeve of mine is that old versions of Avro don't > support Snappy compression.) > I rewrote AvroStorage from scratch to fix these issues. In early tests, the > new implementation is significantly faster, and the code is a lot simpler. > Rewriting AvroStorage also enabled me to implement support for Trevni (as > TrevniStorage). > I'm opening this ticket to facilitate discussion while I figure out the best > way to contribute the changes back to Apache. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Subscription: PIG patch available
Issue Subscription Filter: PIG patch available (26 issues) Subscriber: pigdaily Key Summary PIG-3142Fixed-width load and store functions for the Piggybank https://issues.apache.org/jira/browse/PIG-3142 PIG-3137fix Piggybank test to not using /tmp dir https://issues.apache.org/jira/browse/PIG-3137 PIG-3136Introduce a syntax making declared aliases optional https://issues.apache.org/jira/browse/PIG-3136 PIG-3123Simplify Logical Plans By Removing Unneccessary Identity Projections https://issues.apache.org/jira/browse/PIG-3123 PIG-3122Operators should not implicitly become reserved keywords https://issues.apache.org/jira/browse/PIG-3122 PIG-3114Duplicated macro name error when using pigunit https://issues.apache.org/jira/browse/PIG-3114 PIG-3108HBaseStorage returns empty maps when mixing wildcard- with other columns https://issues.apache.org/jira/browse/PIG-3108 PIG-3105Fix TestJobSubmission unit test failure. https://issues.apache.org/jira/browse/PIG-3105 PIG-3098Add another test for the self join case https://issues.apache.org/jira/browse/PIG-3098 PIG-3088Add a builtin udf which removes prefixes https://issues.apache.org/jira/browse/PIG-3088 PIG-3069Native Windows Compatibility for Pig E2E Tests and Harness https://issues.apache.org/jira/browse/PIG-3069 PIG-3028testGrunt dev test needs some command filters to run correctly without cygwin https://issues.apache.org/jira/browse/PIG-3028 PIG-3027pigTest unit test needs a newline filter for comparisons of golden multi-line https://issues.apache.org/jira/browse/PIG-3027 PIG-3026Pig checked-in baseline comparisons need a pre-filter to address OS-specific newline differences https://issues.apache.org/jira/browse/PIG-3026 PIG-3025TestPruneColumn unit test - SimpleEchoStreamingCommand perl inline script needs simplification https://issues.apache.org/jira/browse/PIG-3025 PIG-3024TestEmptyInputDir unit test - hadoop version detection logic is brittle https://issues.apache.org/jira/browse/PIG-3024 PIG-3015Rewrite of AvroStorage https://issues.apache.org/jira/browse/PIG-3015 PIG-3010Allow UDF's to flatten themselves https://issues.apache.org/jira/browse/PIG-3010 PIG-2959Add a pig.cmd for Pig to run under Windows https://issues.apache.org/jira/browse/PIG-2959 PIG-2955 Fix bunch of Pig e2e tests on Windows https://issues.apache.org/jira/browse/PIG-2955 PIG-2873Converting bin/pig shell script to python https://issues.apache.org/jira/browse/PIG-2873 PIG-2834MultiStorage requires unused constructor argument https://issues.apache.org/jira/browse/PIG-2834 PIG-2661Pig uses an extra job for loading data in Pigmix L9 https://issues.apache.org/jira/browse/PIG-2661 PIG-1942script UDF (jython) should utilize the intended output schema to more directly convert Py objects to Pig objects https://issues.apache.org/jira/browse/PIG-1942 PIG-1914Support load/store JSON data in Pig https://issues.apache.org/jira/browse/PIG-1914 PIG-1237Piggybank MutliStorage - specify field to write in output https://issues.apache.org/jira/browse/PIG-1237 You may edit this subscription at: https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=13225&filterId=12322384
[jira] [Updated] (PIG-2878) Pig current releases lack a UDF equalIgnoreCase.This function returns a Boolean value indicating whether string left is equal to string right. This check is case insensitiv
[ https://issues.apache.org/jira/browse/PIG-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-2878: Resolution: Fixed Fix Version/s: 0.12 Status: Resolved (was: Patch Available) Patch 1 checked into trunk. Thanks Shami for your work on this. > Pig current releases lack a UDF equalIgnoreCase.This function returns a > Boolean value indicating whether string left is equal to string right. This > check is case insensitive. > -- > > Key: PIG-2878 > URL: https://issues.apache.org/jira/browse/PIG-2878 > Project: Pig > Issue Type: Bug > Components: internal-udfs >Affects Versions: 0.10.0 >Reporter: Arjun K R >Assignee: Shami B > Labels: features > Fix For: 0.12 > > Attachments: PIG-2878-1.patch, PIG-2878.patch, PIG-2878-UnitTest.patch > > > Pig current releases lack a UDF equalIgnoreCase.This function returns a > Boolean value indicating whether string left is equal to string right. This > check is case insensitive. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3015) Rewrite of AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569190#comment-13569190 ] Joseph Adler commented on PIG-3015: --- Let me know what help you need. I can work on the documentation as well. Is early next week enough time? (Also, check out Avro-1241. I couldn't get adequate performance without it.) > Rewrite of AvroStorage > -- > > Key: PIG-3015 > URL: https://issues.apache.org/jira/browse/PIG-3015 > Project: Pig > Issue Type: Improvement > Components: piggybank >Reporter: Joseph Adler >Assignee: Joseph Adler > Attachments: bad.avro, good.avro, PIG-3015-2.patch, PIG-3015-3.patch, > PIG-3015-4.patch, PIG-3015-5.patch, PIG-3015-6.patch, PIG-3015-7.patch, > TestInput.java, Test.java > > > The current AvroStorage implementation has a lot of issues: it requires old > versions of Avro, it copies data much more than needed, and it's verbose and > complicated. (One pet peeve of mine is that old versions of Avro don't > support Snappy compression.) > I rewrote AvroStorage from scratch to fix these issues. In early tests, the > new implementation is significantly faster, and the code is a lot simpler. > Rewriting AvroStorage also enabled me to implement support for Trevni (as > TrevniStorage). > I'm opening this ticket to facilitate discussion while I figure out the best > way to contribute the changes back to Apache. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3145) Parameters in core-site.xml and mapred-site.xml are not correctly substituted
[ https://issues.apache.org/jira/browse/PIG-3145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park updated PIG-3145: --- Resolution: Fixed Status: Resolved (was: Patch Available) Thank you Santhosh for the review. Committed to trunk. > Parameters in core-site.xml and mapred-site.xml are not correctly substituted > - > > Key: PIG-3145 > URL: https://issues.apache.org/jira/browse/PIG-3145 > Project: Pig > Issue Type: Bug >Reporter: Cheolsoo Park >Assignee: Cheolsoo Park > Attachments: PIG-3145.patch > > > To reproduce the issue, please do the following: > # Parameterize the address of name node in core-site.xml. > {code} > > fs.default.name > hdfs://${foo}:8020 > > {code} > # Set the value of "foo" via -D option. > {code} > export PIG_OPTS="-Dfoo=mr1-0.cheolsoo.com" > {code} > # Pig fails with the following error. > {code} > 2013-01-28 18:54:02,786 [main] INFO > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting > to hadoop file system at: hdfs://${foo}:8020 > 2013-01-28 18:54:02,805 [main] ERROR org.apache.pig.Main - ERROR 2999: > Unexpected internal error. null > Details at logfile: /home/cheolsoo/pig-cdh/pig_1359428042522.log > {code} > Note that the parameter $\{foo\} in core-site.xml is not expanded. This is > because the addresses of name node and job tracker are read directly from > core-site.xml instead of reading via Configuration.get(). > {code:title=HExecutionEngine.java} > // properties is Java Properties > cluster = properties.getProperty(JOB_TRACKER_LOCATION); > nameNode = properties.getProperty(FILE_SYSTEM_LOCATION); > {code} > Replacing these lines with Configuration.get() fixes the issue. > {code:title=HExecutionEngine.java} > // jc is Hadoop Configuration > cluster = jc.get(JOB_TRACKER_LOCATION); > nameNode = jc.get(FILE_SYSTEM_LOCATION); > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3145) Parameters in core-site.xml and mapred-site.xml are not correctly substituted
[ https://issues.apache.org/jira/browse/PIG-3145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park updated PIG-3145: --- Fix Version/s: 0.12 > Parameters in core-site.xml and mapred-site.xml are not correctly substituted > - > > Key: PIG-3145 > URL: https://issues.apache.org/jira/browse/PIG-3145 > Project: Pig > Issue Type: Bug >Reporter: Cheolsoo Park >Assignee: Cheolsoo Park > Fix For: 0.12 > > Attachments: PIG-3145.patch > > > To reproduce the issue, please do the following: > # Parameterize the address of name node in core-site.xml. > {code} > > fs.default.name > hdfs://${foo}:8020 > > {code} > # Set the value of "foo" via -D option. > {code} > export PIG_OPTS="-Dfoo=mr1-0.cheolsoo.com" > {code} > # Pig fails with the following error. > {code} > 2013-01-28 18:54:02,786 [main] INFO > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting > to hadoop file system at: hdfs://${foo}:8020 > 2013-01-28 18:54:02,805 [main] ERROR org.apache.pig.Main - ERROR 2999: > Unexpected internal error. null > Details at logfile: /home/cheolsoo/pig-cdh/pig_1359428042522.log > {code} > Note that the parameter $\{foo\} in core-site.xml is not expanded. This is > because the addresses of name node and job tracker are read directly from > core-site.xml instead of reading via Configuration.get(). > {code:title=HExecutionEngine.java} > // properties is Java Properties > cluster = properties.getProperty(JOB_TRACKER_LOCATION); > nameNode = properties.getProperty(FILE_SYSTEM_LOCATION); > {code} > Replacing these lines with Configuration.get() fixes the issue. > {code:title=HExecutionEngine.java} > // jc is Hadoop Configuration > cluster = jc.get(JOB_TRACKER_LOCATION); > nameNode = jc.get(FILE_SYSTEM_LOCATION); > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3137) fix Piggybank test to not using /tmp dir
[ https://issues.apache.org/jira/browse/PIG-3137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Johnny Zhang updated PIG-3137: -- Status: Patch Available (was: Open) [~cheolsoo], thanks for your comments, new patch just posted 1. change contrib/piggybank/java/build.xml to create log in user.dir instead of /tmp 2. not using FileLocalizer but using contrib/piggybank/java/build/test/ to store hsqldb in TestDBStorage 3. use pig.temp.dir to update PigContext's temp dir to contrib/piggybank/java/build/test/tmp/ before using FileLocalizer in TestAvroStorage > fix Piggybank test to not using /tmp dir > > > Key: PIG-3137 > URL: https://issues.apache.org/jira/browse/PIG-3137 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11 >Reporter: Johnny Zhang >Assignee: Johnny Zhang > Fix For: 0.12 > > Attachments: PIG-3137.nows.patch.txt, PIG-3137.patch.txt, > PIG-3137.patch.txt > > > right now several Piggybank tests create directory under /tmp to store test > data, the test could fail because user doesn't have permission to create > directory under /tmp. It is better to move test data dir under build dir to > avoid this problem. > I will submit a patch soon. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3145) Parameters in core-site.xml and mapred-site.xml are not correctly substituted
[ https://issues.apache.org/jira/browse/PIG-3145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569146#comment-13569146 ] Santhosh Srinivasan commented on PIG-3145: -- +1 - the changes look good. > Parameters in core-site.xml and mapred-site.xml are not correctly substituted > - > > Key: PIG-3145 > URL: https://issues.apache.org/jira/browse/PIG-3145 > Project: Pig > Issue Type: Bug >Reporter: Cheolsoo Park >Assignee: Cheolsoo Park > Attachments: PIG-3145.patch > > > To reproduce the issue, please do the following: > # Parameterize the address of name node in core-site.xml. > {code} > > fs.default.name > hdfs://${foo}:8020 > > {code} > # Set the value of "foo" via -D option. > {code} > export PIG_OPTS="-Dfoo=mr1-0.cheolsoo.com" > {code} > # Pig fails with the following error. > {code} > 2013-01-28 18:54:02,786 [main] INFO > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting > to hadoop file system at: hdfs://${foo}:8020 > 2013-01-28 18:54:02,805 [main] ERROR org.apache.pig.Main - ERROR 2999: > Unexpected internal error. null > Details at logfile: /home/cheolsoo/pig-cdh/pig_1359428042522.log > {code} > Note that the parameter $\{foo\} in core-site.xml is not expanded. This is > because the addresses of name node and job tracker are read directly from > core-site.xml instead of reading via Configuration.get(). > {code:title=HExecutionEngine.java} > // properties is Java Properties > cluster = properties.getProperty(JOB_TRACKER_LOCATION); > nameNode = properties.getProperty(FILE_SYSTEM_LOCATION); > {code} > Replacing these lines with Configuration.get() fixes the issue. > {code:title=HExecutionEngine.java} > // jc is Hadoop Configuration > cluster = jc.get(JOB_TRACKER_LOCATION); > nameNode = jc.get(FILE_SYSTEM_LOCATION); > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3137) fix Piggybank test to not using /tmp dir
[ https://issues.apache.org/jira/browse/PIG-3137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Johnny Zhang updated PIG-3137: -- Attachment: PIG-3137.patch.txt > fix Piggybank test to not using /tmp dir > > > Key: PIG-3137 > URL: https://issues.apache.org/jira/browse/PIG-3137 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11 >Reporter: Johnny Zhang >Assignee: Johnny Zhang > Fix For: 0.12 > > Attachments: PIG-3137.nows.patch.txt, PIG-3137.patch.txt, > PIG-3137.patch.txt > > > right now several Piggybank tests create directory under /tmp to store test > data, the test could fail because user doesn't have permission to create > directory under /tmp. It is better to move test data dir under build dir to > avoid this problem. > I will submit a patch soon. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3137) fix Piggybank test to not using /tmp dir
[ https://issues.apache.org/jira/browse/PIG-3137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Johnny Zhang updated PIG-3137: -- Attachment: PIG-3137.nows.patch.txt > fix Piggybank test to not using /tmp dir > > > Key: PIG-3137 > URL: https://issues.apache.org/jira/browse/PIG-3137 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11 >Reporter: Johnny Zhang >Assignee: Johnny Zhang > Fix For: 0.12 > > Attachments: PIG-3137.nows.patch.txt, PIG-3137.patch.txt, > PIG-3137.patch.txt > > > right now several Piggybank tests create directory under /tmp to store test > data, the test could fail because user doesn't have permission to create > directory under /tmp. It is better to move test data dir under build dir to > avoid this problem. > I will submit a patch soon. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3145) Parameters in core-site.xml and mapred-site.xml are not correctly substituted
[ https://issues.apache.org/jira/browse/PIG-3145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park updated PIG-3145: --- Status: Patch Available (was: Open) I ran full unit test and e2e test and found no regression. Note that the following test cases are failing in trunk: * org.apache.pig.test.TestScriptUDF PIG-3153 * org.apache.pig.test.TestPackage PIG-3154 * org.apache.pig.test.TestTypeCheckingValidatorNewLP PIG-3155 * org.apache.pig.data.TestSchemaTuple PIG-3156 However, they are not relevant, and I filed jiras for them. > Parameters in core-site.xml and mapred-site.xml are not correctly substituted > - > > Key: PIG-3145 > URL: https://issues.apache.org/jira/browse/PIG-3145 > Project: Pig > Issue Type: Bug >Reporter: Cheolsoo Park >Assignee: Cheolsoo Park > Attachments: PIG-3145.patch > > > To reproduce the issue, please do the following: > # Parameterize the address of name node in core-site.xml. > {code} > > fs.default.name > hdfs://${foo}:8020 > > {code} > # Set the value of "foo" via -D option. > {code} > export PIG_OPTS="-Dfoo=mr1-0.cheolsoo.com" > {code} > # Pig fails with the following error. > {code} > 2013-01-28 18:54:02,786 [main] INFO > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting > to hadoop file system at: hdfs://${foo}:8020 > 2013-01-28 18:54:02,805 [main] ERROR org.apache.pig.Main - ERROR 2999: > Unexpected internal error. null > Details at logfile: /home/cheolsoo/pig-cdh/pig_1359428042522.log > {code} > Note that the parameter $\{foo\} in core-site.xml is not expanded. This is > because the addresses of name node and job tracker are read directly from > core-site.xml instead of reading via Configuration.get(). > {code:title=HExecutionEngine.java} > // properties is Java Properties > cluster = properties.getProperty(JOB_TRACKER_LOCATION); > nameNode = properties.getProperty(FILE_SYSTEM_LOCATION); > {code} > Replacing these lines with Configuration.get() fixes the issue. > {code:title=HExecutionEngine.java} > // jc is Hadoop Configuration > cluster = jc.get(JOB_TRACKER_LOCATION); > nameNode = jc.get(FILE_SYSTEM_LOCATION); > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-3156) TestSchemaTuple fails in trunk
Cheolsoo Park created PIG-3156: -- Summary: TestSchemaTuple fails in trunk Key: PIG-3156 URL: https://issues.apache.org/jira/browse/PIG-3156 Project: Pig Issue Type: Bug Affects Versions: 0.12 Reporter: Cheolsoo Park Fix For: 0.12 To reproduce the issue, do: {code} ant clean test -Dtestcase=TestSchemaTuple {code} All 3 test cases fail with the following error: {code} Caused by: java.lang.RuntimeException: Unable to compile at org.apache.pig.impl.util.JavaCompilerHelper.compile(JavaCompilerHelper.java:83) at org.apache.pig.data.SchemaTupleClassGenerator.compileCodeString(SchemaTupleClassGenerator.java:233) at org.apache.pig.data.SchemaTupleClassGenerator.generateSchemaTuple(SchemaTupleClassGenerator.java:186) at org.apache.pig.data.SchemaTupleFrontend$SchemaTupleFrontendGenHelper.generateAll(SchemaTupleFrontend.java:203) at org.apache.pig.data.SchemaTupleFrontend$SchemaTupleFrontendGenHelper.access$100(SchemaTupleFrontend.java:91) at org.apache.pig.data.SchemaTupleFrontend.copyAllGeneratedToDistributedCache(SchemaTupleFrontend.java:278) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:656) {code} I found that this was introduced by PIG-2764. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3015) Rewrite of AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569048#comment-13569048 ] Russell Jurney commented on PIG-3015: - I'll start testing this again. > Rewrite of AvroStorage > -- > > Key: PIG-3015 > URL: https://issues.apache.org/jira/browse/PIG-3015 > Project: Pig > Issue Type: Improvement > Components: piggybank >Reporter: Joseph Adler >Assignee: Joseph Adler > Attachments: bad.avro, good.avro, PIG-3015-2.patch, PIG-3015-3.patch, > PIG-3015-4.patch, PIG-3015-5.patch, PIG-3015-6.patch, PIG-3015-7.patch, > TestInput.java, Test.java > > > The current AvroStorage implementation has a lot of issues: it requires old > versions of Avro, it copies data much more than needed, and it's verbose and > complicated. (One pet peeve of mine is that old versions of Avro don't > support Snappy compression.) > I rewrote AvroStorage from scratch to fix these issues. In early tests, the > new implementation is significantly faster, and the code is a lot simpler. > Rewriting AvroStorage also enabled me to implement support for Trevni (as > TrevniStorage). > I'm opening this ticket to facilitate discussion while I figure out the best > way to contribute the changes back to Apache. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-3155) TestTypeCheckingValidatorNewLP.testSortWithInnerPlan3 fails in trunk
Cheolsoo Park created PIG-3155: -- Summary: TestTypeCheckingValidatorNewLP.testSortWithInnerPlan3 fails in trunk Key: PIG-3155 URL: https://issues.apache.org/jira/browse/PIG-3155 Project: Pig Issue Type: Bug Affects Versions: 0.12 Reporter: Cheolsoo Park Fix For: 0.12 To reproduce the failure, do: {code} ant clean test -Dtestcase=TestTypeCheckingValidatorNewLP {code} The test fails with the following error: {code} Error expected junit.framework.AssertionFailedError: Error expected at org.apache.pig.test.TestTypeCheckingValidatorNewLP.testSortWithInnerPlan3(TestTypeCheckingValidatorNewLP.java:1570) {code} I found that this was introduced by PIG-2764. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (PIG-3153) TestScriptUDF.testJavascriptExampleScript fails in trunk
[ https://issues.apache.org/jira/browse/PIG-3153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Johnny Zhang reassigned PIG-3153: - Assignee: Johnny Zhang > TestScriptUDF.testJavascriptExampleScript fails in trunk > > > Key: PIG-3153 > URL: https://issues.apache.org/jira/browse/PIG-3153 > Project: Pig > Issue Type: Bug >Affects Versions: 0.12 >Reporter: Cheolsoo Park >Assignee: Johnny Zhang > Fix For: 0.12 > > > To reproduce the failure, do: > {code} > ant clean test -Dtestcase=TestScriptUDF > {code} > The test fails with the following error: > {code} > Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Given > UDF returns an improper Schema. Schema should only contain one field of a > Tuple, Bag, or a single type. Returns: {word: chararray,num: long} > at > org.apache.pig.newplan.logical.expression.UserFuncExpression.getFieldSchema(UserFuncExpression.java:206) > at > org.apache.pig.newplan.logical.optimizer.FieldSchemaResetter.execute(SchemaResetter.java:264) > at > org.apache.pig.newplan.logical.expression.AllSameExpressionVisitor.visit(AllSameExpressionVisitor.java:143) > at > org.apache.pig.newplan.logical.expression.UserFuncExpression.accept(UserFuncExpression.java:88) > at > org.apache.pig.newplan.ReverseDependencyOrderWalker.walk(ReverseDependencyOrderWalker.java:70) > at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52) > at > org.apache.pig.newplan.logical.optimizer.SchemaResetter.visitAll(SchemaResetter.java:67) > at > org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:122) > at > org.apache.pig.newplan.logical.relational.LOGenerate.accept(LOGenerate.java:240) > at > org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) > at > org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:114) > at > org.apache.pig.newplan.logical.relational.LOForEach.accept(LOForEach.java:76) > at > org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) > at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52) > at > org.apache.pig.parser.LogicalPlanBuilder.expandAndResetVisitor(LogicalPlanBuilder.java:402) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-3154) TestPackage.testOperator fails in trunk
Cheolsoo Park created PIG-3154: -- Summary: TestPackage.testOperator fails in trunk Key: PIG-3154 URL: https://issues.apache.org/jira/browse/PIG-3154 Project: Pig Issue Type: Bug Affects Versions: 0.12 Reporter: Cheolsoo Park Fix For: 0.12 To reproduce the issue, do: {code} ant clean test -Dtestcase=TestPackage {code} The test fails with the following error: {code} No test case for type biginteger junit.framework.AssertionFailedError: No test case for type biginteger at org.apache.pig.test.TestPackage.pickTest(TestPackage.java:153) at org.apache.pig.test.TestPackage.testOperator(TestPackage.java:171) {code} Apparently, this is broken by PIG-2764. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-3153) TestScriptUDF.testJavascriptExampleScript fails in trunk
Cheolsoo Park created PIG-3153: -- Summary: TestScriptUDF.testJavascriptExampleScript fails in trunk Key: PIG-3153 URL: https://issues.apache.org/jira/browse/PIG-3153 Project: Pig Issue Type: Bug Affects Versions: 0.12 Reporter: Cheolsoo Park Fix For: 0.12 To reproduce the failure, do: {code} ant clean test -Dtestcase=TestScriptUDF {code} The test fails with the following error: {code} Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Given UDF returns an improper Schema. Schema should only contain one field of a Tuple, Bag, or a single type. Returns: {word: chararray,num: long} at org.apache.pig.newplan.logical.expression.UserFuncExpression.getFieldSchema(UserFuncExpression.java:206) at org.apache.pig.newplan.logical.optimizer.FieldSchemaResetter.execute(SchemaResetter.java:264) at org.apache.pig.newplan.logical.expression.AllSameExpressionVisitor.visit(AllSameExpressionVisitor.java:143) at org.apache.pig.newplan.logical.expression.UserFuncExpression.accept(UserFuncExpression.java:88) at org.apache.pig.newplan.ReverseDependencyOrderWalker.walk(ReverseDependencyOrderWalker.java:70) at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52) at org.apache.pig.newplan.logical.optimizer.SchemaResetter.visitAll(SchemaResetter.java:67) at org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:122) at org.apache.pig.newplan.logical.relational.LOGenerate.accept(LOGenerate.java:240) at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) at org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:114) at org.apache.pig.newplan.logical.relational.LOForEach.accept(LOForEach.java:76) at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52) at org.apache.pig.parser.LogicalPlanBuilder.expandAndResetVisitor(LogicalPlanBuilder.java:402) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2878) Pig current releases lack a UDF equalIgnoreCase.This function returns a Boolean value indicating whether string left is equal to string right. This check is case insensitiv
[ https://issues.apache.org/jira/browse/PIG-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-2878: Attachment: PIG-2878-1.patch Attaching a single patch with the previous two combined. I also took the liberty of expanding the unit test to have a negative case. This patch represents what I will check in. > Pig current releases lack a UDF equalIgnoreCase.This function returns a > Boolean value indicating whether string left is equal to string right. This > check is case insensitive. > -- > > Key: PIG-2878 > URL: https://issues.apache.org/jira/browse/PIG-2878 > Project: Pig > Issue Type: Bug > Components: internal-udfs >Affects Versions: 0.10.0 >Reporter: Arjun K R >Assignee: Arjun K R > Labels: features > Attachments: PIG-2878-1.patch, PIG-2878.patch, PIG-2878-UnitTest.patch > > > Pig current releases lack a UDF equalIgnoreCase.This function returns a > Boolean value indicating whether string left is equal to string right. This > check is case insensitive. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2878) Pig current releases lack a UDF equalIgnoreCase.This function returns a Boolean value indicating whether string left is equal to string right. This check is case insensitiv
[ https://issues.apache.org/jira/browse/PIG-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-2878: Assignee: Shami B (was: Arjun K R) > Pig current releases lack a UDF equalIgnoreCase.This function returns a > Boolean value indicating whether string left is equal to string right. This > check is case insensitive. > -- > > Key: PIG-2878 > URL: https://issues.apache.org/jira/browse/PIG-2878 > Project: Pig > Issue Type: Bug > Components: internal-udfs >Affects Versions: 0.10.0 >Reporter: Arjun K R >Assignee: Shami B > Labels: features > Attachments: PIG-2878-1.patch, PIG-2878.patch, PIG-2878-UnitTest.patch > > > Pig current releases lack a UDF equalIgnoreCase.This function returns a > Boolean value indicating whether string left is equal to string right. This > check is case insensitive. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-3152) HTable class in Pig 0.10.1
Ionut Ignatescu created PIG-3152: Summary: HTable class in Pig 0.10.1 Key: PIG-3152 URL: https://issues.apache.org/jira/browse/PIG-3152 Project: Pig Issue Type: Bug Affects Versions: 0.10.1 Reporter: Ionut Ignatescu Priority: Blocker In Pig 0.10.1 HTable class is defined under the same package as it is in HBase package. Much more, the version of this class seems to be very old: several methods do not exists or have a different signature. Since in my use case HBase is a transitive dependency, I cannot remove it and I need last version of it(same deployed on my cluster). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3137) fix Piggybank test to not using /tmp dir
[ https://issues.apache.org/jira/browse/PIG-3137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park updated PIG-3137: --- Status: Open (was: Patch Available) Canceling patch until it gets updated. > fix Piggybank test to not using /tmp dir > > > Key: PIG-3137 > URL: https://issues.apache.org/jira/browse/PIG-3137 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11 >Reporter: Johnny Zhang >Assignee: Johnny Zhang > Fix For: 0.12 > > Attachments: PIG-3137.patch.txt > > > right now several Piggybank tests create directory under /tmp to store test > data, the test could fail because user doesn't have permission to create > directory under /tmp. It is better to move test data dir under build dir to > avoid this problem. > I will submit a patch soon. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3137) fix Piggybank test to not using /tmp dir
[ https://issues.apache.org/jira/browse/PIG-3137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park updated PIG-3137: --- Assignee: Johnny Zhang > fix Piggybank test to not using /tmp dir > > > Key: PIG-3137 > URL: https://issues.apache.org/jira/browse/PIG-3137 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11 >Reporter: Johnny Zhang >Assignee: Johnny Zhang > Fix For: 0.12 > > Attachments: PIG-3137.patch.txt > > > right now several Piggybank tests create directory under /tmp to store test > data, the test could fail because user doesn't have permission to create > directory under /tmp. It is better to move test data dir under build dir to > avoid this problem. > I will submit a patch soon. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3137) fix Piggybank test to not using /tmp dir
[ https://issues.apache.org/jira/browse/PIG-3137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568760#comment-13568760 ] Cheolsoo Park commented on PIG-3137: [~dreambird], thank you very much for the patch. I have two suggestions: * FileLocalizer.getTemporaryPath() is for generating random paths in Hadoop cluster (either it's local, mini cluster, or real cluster). So it makes sense to use FileLocalizer in TestAvroStorage where we need temp paths for test outputs. But in TestDBStorage, we need a temp dir for Hsqldb, so I don't think we want to use FileLocalizer there. Using a temporary path under build (e.g. contrib/piggybank/java/build/blah) would be better. * You can control the root dir of FileLocalizer.getTemporaryPath() using the pig.temp.dir property. It would be nice if it's set to somewhere under the build directory, so temporary dirs can be deleted by ant clean. Let me know what you think. Thanks! > fix Piggybank test to not using /tmp dir > > > Key: PIG-3137 > URL: https://issues.apache.org/jira/browse/PIG-3137 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11 >Reporter: Johnny Zhang > Fix For: 0.12 > > Attachments: PIG-3137.patch.txt > > > right now several Piggybank tests create directory under /tmp to store test > data, the test could fail because user doesn't have permission to create > directory under /tmp. It is better to move test data dir under build dir to > avoid this problem. > I will submit a patch soon. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3015) Rewrite of AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568753#comment-13568753 ] Cheolsoo Park commented on PIG-3015: I think the patch is very close to being committed. Two main obstacles are: # Tests do not pass with Hadoop-2.0.x (i.e. ant clean test -Dtestcase=TestAvroStorage -Dhadoopversion=23). # Documentation is missing. I will give another shot on debugging #1 when I get more time, but any help would be appreciated! > Rewrite of AvroStorage > -- > > Key: PIG-3015 > URL: https://issues.apache.org/jira/browse/PIG-3015 > Project: Pig > Issue Type: Improvement > Components: piggybank >Reporter: Joseph Adler >Assignee: Joseph Adler > Attachments: bad.avro, good.avro, PIG-3015-2.patch, PIG-3015-3.patch, > PIG-3015-4.patch, PIG-3015-5.patch, PIG-3015-6.patch, PIG-3015-7.patch, > TestInput.java, Test.java > > > The current AvroStorage implementation has a lot of issues: it requires old > versions of Avro, it copies data much more than needed, and it's verbose and > complicated. (One pet peeve of mine is that old versions of Avro don't > support Snappy compression.) > I rewrote AvroStorage from scratch to fix these issues. In early tests, the > new implementation is significantly faster, and the code is a lot simpler. > Rewriting AvroStorage also enabled me to implement support for Trevni (as > TrevniStorage). > I'm opening this ticket to facilitate discussion while I figure out the best > way to contribute the changes back to Apache. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira