[jira] [Created] (PIG-4623) Fixed the 'new line' character inside double-quote causing the csv parsing failure
Ken Wu created PIG-4623: --- Summary: Fixed the 'new line' character inside double-quote causing the csv parsing failure Key: PIG-4623 URL: https://issues.apache.org/jira/browse/PIG-4623 Project: Pig Issue Type: Bug Components: piggybank Reporter: Ken Wu Assignee: Ken Wu A new line character should be allowed inside a double quote as a valid csv document. For example, the following csv document should be treated as a SINGLE valid csv data Iphone,{ ItemName : Cheez-It 21 Ounce}, However, the current implementation of the getNext() inside org.apache.pig.piggybank.storage.CSVLoader class fails to take care of this case and it sees two lines of data while in fact it should be treated as single line of data. This pull request fixes the above issue. (Note: here is a linke to validate whether a csv document: http://csvlint.io/) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-4623) Fixed the 'new line' character inside double-quote causing the csv parsing failure
[ https://issues.apache.org/jira/browse/PIG-4623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615348#comment-14615348 ] Ken Wu commented on PIG-4623: - This has been fixed and the pull request is available at: https://github.com/apache/pig/pull/20 Fixed the 'new line' character inside double-quote causing the csv parsing failure -- Key: PIG-4623 URL: https://issues.apache.org/jira/browse/PIG-4623 Project: Pig Issue Type: Bug Components: piggybank Reporter: Ken Wu Assignee: Ken Wu Original Estimate: 24h Remaining Estimate: 24h A new line character should be allowed inside a double quote as a valid csv document. For example, the following csv document should be treated as a SINGLE valid csv data Iphone,{ ItemName : Cheez-It 21 Ounce}, However, the current implementation of the getNext() inside org.apache.pig.piggybank.storage.CSVLoader class fails to take care of this case and it sees two lines of data while in fact it should be treated as single line of data. This pull request fixes the above issue. (Note: here is a linke to validate whether a csv document: http://csvlint.io/) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Software Podcast Interview
I'm a host for Software Engineering Radio http://se-radio.net/. In a few weeks, I'll be launching Software Engineering Daily, my own podcast. The structure is daily episodes, and weeklong arcs of material. One of the first few themes will be Big Data Architectures. Would someone from Pig like to come on for a technical discussion? If not, any other recommendations for topics or guests would be appreciated. Thanks, Jeff Meyerson
pig and parquet-bundle*jar
Hi team! I have found strange issue using pig and parquet files. There is no parquet-bundle*jar in pig/lib folder so I have to manually add it to avoid this exception: pig script failed to validate: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve parquet.pig.ParquetLoader using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.] I have investigated build.xml files from pig-0.12 to pig-0.15 and found that parquet-bundle*jar is only compile time dependency. ANT does not copy parquet-bundle*jar to lib folder. Similar issue you can see here https://issues.apache.org/jira/browse/PIG-3445 (see last comment in the thread). So my question is: Was absence of parquet-bundle*jar file done on purpose or we have a bug here? Thanks. Oleksiy Sayankin.
[jira] [Updated] (PIG-4620) Add namespace support to HBaseStorage
[ https://issues.apache.org/jira/browse/PIG-4620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andi Chirita updated PIG-4620: -- Labels: hbase patch (was: ) Add namespace support to HBaseStorage - Key: PIG-4620 URL: https://issues.apache.org/jira/browse/PIG-4620 Project: Pig Issue Type: New Feature Affects Versions: 0.15.0 Reporter: Andi Chirita Assignee: Andi Chirita Labels: hbase, patch Attachments: HBaseStorageNamespace.patch Since version 0.96 HBase introduced namespace support. Apace Pig has recently updated the HBase dependency to 0.98.12 (PIG-4544) Currently there's no way to specify the namespace for a table. I suggest to implement it is using a '-namespace' option. {code} copy = STORE raw INTO 'hbase://SampleTableCopy' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage( 'info:first_name info:last_name friends:* info:*','-namespace SampleNamespace'); {code} We can't use the namespace in the hbase path as it will break the uri validation : 'hbase://SampleNamespace:SampleTableCopy' The patch is available. I will look to extend the unittest for the namespace option. Please review my changes and let me know if I can help with something. Kind regard, Andi -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4107) Calcite for query optimization
[ https://issues.apache.org/jira/browse/PIG-4107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julian Hyde updated PIG-4107: - Description: Calcite (formerly called Optiq) is a query planning engine which is currently in Apache incubator. We'd like to explore the possibility to use Calcite to do query optimization, material view generation for Pig. (was: Optiq is a query planning engine which is currently in Apache incubator. We'd like to explore the possibility to use Optiq to do query optimization, material view generation for Pig.) Calcite for query optimization -- Key: PIG-4107 URL: https://issues.apache.org/jira/browse/PIG-4107 Project: Pig Issue Type: New Feature Components: impl Reporter: Daniel Dai Calcite (formerly called Optiq) is a query planning engine which is currently in Apache incubator. We'd like to explore the possibility to use Calcite to do query optimization, material view generation for Pig. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4107) Calcite for query optimization
[ https://issues.apache.org/jira/browse/PIG-4107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julian Hyde updated PIG-4107: - Summary: Calcite for query optimization (was: Optiq for query optimization) Calcite for query optimization -- Key: PIG-4107 URL: https://issues.apache.org/jira/browse/PIG-4107 Project: Pig Issue Type: New Feature Components: impl Reporter: Daniel Dai Optiq is a query planning engine which is currently in Apache incubator. We'd like to explore the possibility to use Optiq to do query optimization, material view generation for Pig. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-4611) Fix remaining unit test failures about TestHBaseStorage
[ https://issues.apache.org/jira/browse/PIG-4611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615915#comment-14615915 ] Mohit Sabharwal commented on PIG-4611: -- Thanks for the explanation and addressing this issue, [~kellyzly]!!! Let me know if I understand this correctly: 1) Spark Executor will serialize all objects referenced in supplied closures. Since UDFContext.getUDFContext() is not initialized (because Spark does not expose a setup() interface like MR), we always default defaultCaster to STRING_CASTER. 2) However later on, in the *same* Executor thread, the record reader creation will correctly deserialize the UDFContext from JobConf (PigInputFormatSpark.createRecordReader-PigInputFormat.createRecordReader-MapRedUtil.setupUDFContext-UDFContext.deserialize) 3) Next, in the same Executor thread, when HBaseStorage is initialized by the load function, it will find a correctly populated UDFContext. This sounds reasonable to me. Since this a core change, could you please add comments to HBaseStorage.java explaining why we handling this as a special case for Spark ? I assume it is a typo, but you need -Dexectype argument to be {{spark}}, not {{TestHBaseStorage}} when running TestHBaseStorage: {code} ant test -Dhadoopversion=23 -Dtestcase=TestHBaseStorage -Dexectype=spark -DdebugPort= {code} Fix remaining unit test failures about TestHBaseStorage - Key: PIG-4611 URL: https://issues.apache.org/jira/browse/PIG-4611 Project: Pig Issue Type: Sub-task Components: spark Reporter: liyunzhang_intel Assignee: liyunzhang_intel Fix For: spark-branch Attachments: PIG-4611.patch In https://builds.apache.org/job/Pig-spark/lastCompletedBuild/testReport/, it shows following unit test failures about TestHBaseStorage: org.apache.pig.test.TestHBaseStorage.testStoreToHBase_1_with_delete org.apache.pig.test.TestHBaseStorage.testLoadWithProjection_1 org.apache.pig.test.TestHBaseStorage.testLoadWithProjection_2 org.apache.pig.test.TestHBaseStorage.testStoreToHBase_2_with_projection org.apache.pig.test.TestHBaseStorage.testCollectedGroup org.apache.pig.test.TestHBaseStorage.testHeterogeneousScans -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4624) pig errors out on ORC empty file without schema
[ https://issues.apache.org/jira/browse/PIG-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-4624: --- Attachment: PIG-4624.1.patch The ORC issue should be separately addressed in ORC/Hive, however, it would be good if pig can handle this case with already generated files. Attaching patch from [~daijy]. pig errors out on ORC empty file without schema --- Key: PIG-4624 URL: https://issues.apache.org/jira/browse/PIG-4624 Project: Pig Issue Type: Bug Affects Versions: 0.15.0 Reporter: Thejas M Nair Assignee: Daniel Dai Fix For: 0.16.0, 0.15.1 Attachments: PIG-4624.1.patch If ORC produces an empty file without schema (which ideally, it is not supposed to), then pig query reading the data gives the following error - org.apache.pig.tools.grunt.Grunt - ERROR 1031: Incompatable schema -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PIG-4624) pig errors out on ORC empty file without schema
Thejas M Nair created PIG-4624: -- Summary: pig errors out on ORC empty file without schema Key: PIG-4624 URL: https://issues.apache.org/jira/browse/PIG-4624 Project: Pig Issue Type: Bug Affects Versions: 0.15.0 Reporter: Thejas M Nair Assignee: Daniel Dai If ORC produces an empty file without schema (which ideally, it is not supposed to), then pig query reading the data gives the following error - org.apache.pig.tools.grunt.Grunt - ERROR 1031: Incompatable schema -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-4107) Calcite for query optimization
[ https://issues.apache.org/jira/browse/PIG-4107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615859#comment-14615859 ] Julian Hyde commented on PIG-4107: -- I have started prototyping a Pig-like language called Piglet in CALCITE-785. Lessons learned there could be brought into Pig. Calcite for query optimization -- Key: PIG-4107 URL: https://issues.apache.org/jira/browse/PIG-4107 Project: Pig Issue Type: New Feature Components: impl Reporter: Daniel Dai Calcite (formerly called Optiq) is a query planning engine which is currently in Apache incubator. We'd like to explore the possibility to use Calcite to do query optimization, material view generation for Pig. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-4622) Skip TestCubeOperator.testIllustrate and TestMultiQueryLocal.testMultiQueryWithIllustrate
[ https://issues.apache.org/jira/browse/PIG-4622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615874#comment-14615874 ] Mohit Sabharwal commented on PIG-4622: -- Thanks, [~kellyzly]. +1 (non-binding) Skip TestCubeOperator.testIllustrate and TestMultiQueryLocal.testMultiQueryWithIllustrate - Key: PIG-4622 URL: https://issues.apache.org/jira/browse/PIG-4622 Project: Pig Issue Type: Sub-task Components: spark Reporter: liyunzhang_intel Assignee: liyunzhang_intel Fix For: spark-branch Attachments: PIG-4622.patch it shows that in https://builds.apache.org/job/Pig-spark/236/#showFailuresLink following two unit tests fail: TestCubeOperator.testIllustrate and TestMultiQueryLocal.testMultiQueryWithIllustrate This because current we don't support illustrate in spark mode(see PIG-4621). why after PIG-4614_1.patch was merged to branch, these two unit test fail? in PIG-4614_1.patch, we edit [SparkExecutionEngine #instantiateScriptState|https://github.com/apache/pig/blob/a0bea12c3d5600a4c3137a8d05c054d10430b1ce/src/org/apache/pig/backend/hadoop/executionengine/spark/SparkExecutionEngine.java#L37]. When running following script with illustrate. illustrate.pig {code} a = load 'test/org/apache/pig/test/data/passwd' using PigStorage(':') as (uname:chararray, passwd:chararray, uid:int,gid:int); b = filter a by uid 5; illustrate b; store b into './testMultiQueryWithIllustrate.out'; {code} the exception is thrown out at [MRScriptState.get|https://github.com/apache/pig/blob/spark/src/org/apache/pig/tools/pigstats/mapreduce/MRScriptState.java#L67]:java.lang.ClassCastException: org.apache.pig.tools.pigstats.spark.SparkScriptState cannot be cast to org.apache.pig.tools.pigstats.mapreduce.MRScriptState. stacktrace: {code} java.lang.ClassCastException: org.apache.pig.tools.pigstats.spark.SparkScriptState cannot be cast to org.apache.pig.tools.pigstats.mapreduce.MRScriptState at org.apache.pig.tools.pigstats.mapreduce.MRScriptState.get(MRScriptState.java:67) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:512) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:327) at org.apache.pig.pen.LocalMapReduceSimulator.launchPig(LocalMapReduceSimulator.java:110) at org.apache.pig.pen.ExampleGenerator.getData(ExampleGenerator.java:259) at org.apache.pig.pen.ExampleGenerator.readBaseData(ExampleGenerator.java:223) at org.apache.pig.pen.ExampleGenerator.getExamples(ExampleGenerator.java:155) at org.apache.pig.PigServer.getExamples(PigServer.java:1305) at org.apache.pig.tools.grunt.GruntParser.processIllustrate(GruntParser.java:812) at org.apache.pig.tools.pigscript.parser.PigScriptParser.Illustrate(PigScriptParser.java:818) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:385) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81) at org.apache.pig.Main.run(Main.java:624) at org.apache.pig.Main.main(Main.java:170) at sun.reflect.NativeMethodAccessorImpl.invoke0(NativeMethodAccessorImpl.java:-1) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (PIG-4622) Skip TestCubeOperator.testIllustrate and TestMultiQueryLocal.testMultiQueryWithIllustrate
[ https://issues.apache.org/jira/browse/PIG-4622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang resolved PIG-4622. -- Resolution: Fixed Committed to Spark branch. Thanks, Liyun. Skip TestCubeOperator.testIllustrate and TestMultiQueryLocal.testMultiQueryWithIllustrate - Key: PIG-4622 URL: https://issues.apache.org/jira/browse/PIG-4622 Project: Pig Issue Type: Sub-task Components: spark Reporter: liyunzhang_intel Assignee: liyunzhang_intel Fix For: spark-branch Attachments: PIG-4622.patch it shows that in https://builds.apache.org/job/Pig-spark/236/#showFailuresLink following two unit tests fail: TestCubeOperator.testIllustrate and TestMultiQueryLocal.testMultiQueryWithIllustrate This because current we don't support illustrate in spark mode(see PIG-4621). why after PIG-4614_1.patch was merged to branch, these two unit test fail? in PIG-4614_1.patch, we edit [SparkExecutionEngine #instantiateScriptState|https://github.com/apache/pig/blob/a0bea12c3d5600a4c3137a8d05c054d10430b1ce/src/org/apache/pig/backend/hadoop/executionengine/spark/SparkExecutionEngine.java#L37]. When running following script with illustrate. illustrate.pig {code} a = load 'test/org/apache/pig/test/data/passwd' using PigStorage(':') as (uname:chararray, passwd:chararray, uid:int,gid:int); b = filter a by uid 5; illustrate b; store b into './testMultiQueryWithIllustrate.out'; {code} the exception is thrown out at [MRScriptState.get|https://github.com/apache/pig/blob/spark/src/org/apache/pig/tools/pigstats/mapreduce/MRScriptState.java#L67]:java.lang.ClassCastException: org.apache.pig.tools.pigstats.spark.SparkScriptState cannot be cast to org.apache.pig.tools.pigstats.mapreduce.MRScriptState. stacktrace: {code} java.lang.ClassCastException: org.apache.pig.tools.pigstats.spark.SparkScriptState cannot be cast to org.apache.pig.tools.pigstats.mapreduce.MRScriptState at org.apache.pig.tools.pigstats.mapreduce.MRScriptState.get(MRScriptState.java:67) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:512) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:327) at org.apache.pig.pen.LocalMapReduceSimulator.launchPig(LocalMapReduceSimulator.java:110) at org.apache.pig.pen.ExampleGenerator.getData(ExampleGenerator.java:259) at org.apache.pig.pen.ExampleGenerator.readBaseData(ExampleGenerator.java:223) at org.apache.pig.pen.ExampleGenerator.getExamples(ExampleGenerator.java:155) at org.apache.pig.PigServer.getExamples(PigServer.java:1305) at org.apache.pig.tools.grunt.GruntParser.processIllustrate(GruntParser.java:812) at org.apache.pig.tools.pigscript.parser.PigScriptParser.Illustrate(PigScriptParser.java:818) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:385) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81) at org.apache.pig.Main.run(Main.java:624) at org.apache.pig.Main.main(Main.java:170) at sun.reflect.NativeMethodAccessorImpl.invoke0(NativeMethodAccessorImpl.java:-1) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4624) pig errors out on ORC empty file without schema
[ https://issues.apache.org/jira/browse/PIG-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-4624: --- Fix Version/s: 0.15.1 0.16.0 pig errors out on ORC empty file without schema --- Key: PIG-4624 URL: https://issues.apache.org/jira/browse/PIG-4624 Project: Pig Issue Type: Bug Affects Versions: 0.15.0 Reporter: Thejas M Nair Assignee: Daniel Dai Fix For: 0.16.0, 0.15.1 If ORC produces an empty file without schema (which ideally, it is not supposed to), then pig query reading the data gives the following error - org.apache.pig.tools.grunt.Grunt - ERROR 1031: Incompatable schema -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: pig and parquet-bundle*jar
Hi Oleksiy, Initially the idea was that not to include an additional dependency to the pig fatjar. Instead, let the user ship the necessary parquet bundle. However, with PIG-3737 the dependent jars are now copied to the $PIG_HOME/lib directory. I suspect, you are right, the patch in PIG-3737 need to be extended in order to have parquet-pig-bundle-*.jar in the /lib directory as well. On the other hand, it would be also great to bump parquet-bundle version from 1.2.3 to 1.7.0. @Daniel, what do you think? Thanks, Lorand On 06/07/15 18:34, Олексій Саянкін wrote: Hi team! I have found strange issue using pig and parquet files. There is no parquet-bundle*jar in pig/lib folder so I have to manually add it to avoid this exception: pig script failed to validate: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve parquet.pig.ParquetLoader using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.] I have investigated build.xml files from pig-0.12 to pig-0.15 and found that parquet-bundle*jar is only compile time dependency. ANT does not copy parquet-bundle*jar to lib folder. Similar issue you can see here https://issues.apache.org/jira/browse/PIG-3445 (see last comment in the thread). So my question is: Was absence of parquet-bundle*jar file done on purpose or we have a bug here? Thanks. Oleksiy Sayankin.
[jira] [Updated] (PIG-4624) Error on ORC empty file without schema
[ https://issues.apache.org/jira/browse/PIG-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-4624: --- Attachment: PIG-4624.1.patch Error on ORC empty file without schema -- Key: PIG-4624 URL: https://issues.apache.org/jira/browse/PIG-4624 Project: Pig Issue Type: Bug Affects Versions: 0.15.0 Reporter: Thejas M Nair Assignee: Daniel Dai Fix For: 0.16.0, 0.15.1 Attachments: PIG-4624.1.patch If ORC produces an empty file without schema (which ideally, it is not supposed to), then pig query reading the data gives the following error - org.apache.pig.tools.grunt.Grunt - ERROR 1031: Incompatable schema -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4624) Error on ORC empty file without schema
[ https://issues.apache.org/jira/browse/PIG-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-4624: --- Attachment: (was: PIG-4624.1.patch) Error on ORC empty file without schema -- Key: PIG-4624 URL: https://issues.apache.org/jira/browse/PIG-4624 Project: Pig Issue Type: Bug Affects Versions: 0.15.0 Reporter: Thejas M Nair Assignee: Daniel Dai Fix For: 0.16.0, 0.15.1 Attachments: PIG-4624.1.patch If ORC produces an empty file without schema (which ideally, it is not supposed to), then pig query reading the data gives the following error - org.apache.pig.tools.grunt.Grunt - ERROR 1031: Incompatable schema -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4624) Error on ORC empty file without schema
[ https://issues.apache.org/jira/browse/PIG-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-4624: --- Summary: Error on ORC empty file without schema (was: pig errors out on ORC empty file without schema) Error on ORC empty file without schema -- Key: PIG-4624 URL: https://issues.apache.org/jira/browse/PIG-4624 Project: Pig Issue Type: Bug Affects Versions: 0.15.0 Reporter: Thejas M Nair Assignee: Daniel Dai Fix For: 0.16.0, 0.15.1 Attachments: PIG-4624.1.patch If ORC produces an empty file without schema (which ideally, it is not supposed to), then pig query reading the data gives the following error - org.apache.pig.tools.grunt.Grunt - ERROR 1031: Incompatable schema -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4624) Error on ORC empty file without schema
[ https://issues.apache.org/jira/browse/PIG-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-4624: --- Attachment: (was: PIG-4624.1.patch) Error on ORC empty file without schema -- Key: PIG-4624 URL: https://issues.apache.org/jira/browse/PIG-4624 Project: Pig Issue Type: Bug Affects Versions: 0.15.0 Reporter: Thejas M Nair Assignee: Daniel Dai Fix For: 0.16.0, 0.15.1 Attachments: PIG-4624.1.patch If ORC produces an empty file without schema (which ideally, it is not supposed to), then pig query reading the data gives the following error - org.apache.pig.tools.grunt.Grunt - ERROR 1031: Incompatable schema -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4624) Error on ORC empty file without schema
[ https://issues.apache.org/jira/browse/PIG-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-4624: --- Attachment: PIG-4624.1.patch Error on ORC empty file without schema -- Key: PIG-4624 URL: https://issues.apache.org/jira/browse/PIG-4624 Project: Pig Issue Type: Bug Affects Versions: 0.15.0 Reporter: Thejas M Nair Assignee: Daniel Dai Fix For: 0.16.0, 0.15.1 Attachments: PIG-4624.1.patch If ORC produces an empty file without schema (which ideally, it is not supposed to), then pig query reading the data gives the following error - org.apache.pig.tools.grunt.Grunt - ERROR 1031: Incompatable schema -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-4611) Fix remaining unit test failures about TestHBaseStorage
[ https://issues.apache.org/jira/browse/PIG-4611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616082#comment-14616082 ] liyunzhang_intel commented on PIG-4611: --- PIG-4611_2.patch is based on PIG-4622: Skip TestCubeOperator.testIllustrate and TestMultiQueryLocal.testMultiQueryWithIllustrate (Liyun via Xuefu) Fix remaining unit test failures about TestHBaseStorage - Key: PIG-4611 URL: https://issues.apache.org/jira/browse/PIG-4611 Project: Pig Issue Type: Sub-task Components: spark Reporter: liyunzhang_intel Assignee: liyunzhang_intel Fix For: spark-branch Attachments: PIG-4611.patch, PIG-4611_2.patch In https://builds.apache.org/job/Pig-spark/lastCompletedBuild/testReport/, it shows following unit test failures about TestHBaseStorage: org.apache.pig.test.TestHBaseStorage.testStoreToHBase_1_with_delete org.apache.pig.test.TestHBaseStorage.testLoadWithProjection_1 org.apache.pig.test.TestHBaseStorage.testLoadWithProjection_2 org.apache.pig.test.TestHBaseStorage.testStoreToHBase_2_with_projection org.apache.pig.test.TestHBaseStorage.testCollectedGroup org.apache.pig.test.TestHBaseStorage.testHeterogeneousScans -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4611) Fix remaining unit test failures about TestHBaseStorage
[ https://issues.apache.org/jira/browse/PIG-4611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liyunzhang_intel updated PIG-4611: -- Attachment: PIG-4611_2.patch [~mohitsabharwal]: Your understanding is right. According to your suggestion, i add some comment to explain why i need make some changes in HBaseStorage#init and SelfSpillBag#memLimit in PIG-4611_2.patch Fix remaining unit test failures about TestHBaseStorage - Key: PIG-4611 URL: https://issues.apache.org/jira/browse/PIG-4611 Project: Pig Issue Type: Sub-task Components: spark Reporter: liyunzhang_intel Assignee: liyunzhang_intel Fix For: spark-branch Attachments: PIG-4611.patch, PIG-4611_2.patch In https://builds.apache.org/job/Pig-spark/lastCompletedBuild/testReport/, it shows following unit test failures about TestHBaseStorage: org.apache.pig.test.TestHBaseStorage.testStoreToHBase_1_with_delete org.apache.pig.test.TestHBaseStorage.testLoadWithProjection_1 org.apache.pig.test.TestHBaseStorage.testLoadWithProjection_2 org.apache.pig.test.TestHBaseStorage.testStoreToHBase_2_with_projection org.apache.pig.test.TestHBaseStorage.testCollectedGroup org.apache.pig.test.TestHBaseStorage.testHeterogeneousScans -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Build failed in Jenkins: Pig-trunk-commit #2186
See https://builds.apache.org/job/Pig-trunk-commit/2186/ -- [...truncated 2986 lines...] [javadoc] Loading source files for package org.apache.pig.impl.streaming... [javadoc] Loading source files for package org.apache.pig.impl.util... [javadoc] Loading source files for package org.apache.pig.impl.util.avro... [javadoc] Loading source files for package org.apache.pig.impl.util.hive... [javadoc] Loading source files for package org.apache.pig.newplan... [javadoc] Loading source files for package org.apache.pig.newplan.logical... [javadoc] Loading source files for package org.apache.pig.newplan.logical.expression... [javadoc] Loading source files for package org.apache.pig.newplan.logical.optimizer... [javadoc] Loading source files for package org.apache.pig.newplan.logical.relational... [javadoc] Loading source files for package org.apache.pig.newplan.logical.rules... [javadoc] Loading source files for package org.apache.pig.newplan.logical.visitor... [javadoc] Loading source files for package org.apache.pig.newplan.optimizer... [javadoc] Loading source files for package org.apache.pig.parser... [javadoc] Loading source files for package org.apache.pig.pen... [javadoc] Loading source files for package org.apache.pig.pen.util... [javadoc] Loading source files for package org.apache.pig.scripting... [javadoc] Loading source files for package org.apache.pig.scripting.groovy... [javadoc] Loading source files for package org.apache.pig.scripting.jruby... [javadoc] Loading source files for package org.apache.pig.scripting.js... [javadoc] Loading source files for package org.apache.pig.scripting.jython... [javadoc] Loading source files for package org.apache.pig.scripting.streaming.python... [javadoc] Loading source files for package org.apache.pig.tools... [javadoc] Loading source files for package org.apache.pig.tools.cmdline... [javadoc] Loading source files for package org.apache.pig.tools.counters... [javadoc] Loading source files for package org.apache.pig.tools.grunt... [javadoc] Loading source files for package org.apache.pig.tools.parameters... [javadoc] Loading source files for package org.apache.pig.tools.pigstats... [javadoc] Loading source files for package org.apache.pig.tools.pigstats.mapreduce... [javadoc] Loading source files for package org.apache.pig.tools.pigstats.tez... [javadoc] Loading source files for package org.apache.pig.tools.streams... [javadoc] Loading source files for package org.apache.pig.tools.timer... [javadoc] Loading source files for package org.apache.pig.validator... [javadoc] Constructing Javadoc information... [javadoc] /home/jenkins/.ivy2/cache/org.apache.hbase/hbase-common/jars/hbase-common-0.98.12-hadoop2.jar(org/apache/hadoop/hbase/io/ImmutableBytesWritable.class): warning: Cannot find annotation method 'value()' in type 'SuppressWarnings': class file for edu.umd.cs.findbugs.annotations.SuppressWarnings not found [javadoc] /home/jenkins/.ivy2/cache/org.apache.hbase/hbase-common/jars/hbase-common-0.98.12-hadoop2.jar(org/apache/hadoop/hbase/io/ImmutableBytesWritable.class): warning: Cannot find annotation method 'justification()' in type 'SuppressWarnings' [javadoc] https://builds.apache.org/job/Pig-trunk-commit/ws/trunk/src/org/apache/pig/EvalFunc.java:96: warning - Tag @see:illegal character: 123 in {@link EvalFunc#getSchemaType()} [javadoc] https://builds.apache.org/job/Pig-trunk-commit/ws/trunk/src/org/apache/pig/EvalFunc.java:96: warning - Tag @see:illegal character: 64 in {@link EvalFunc#getSchemaType()} [javadoc] Standard Doclet version 1.7.0_65 [javadoc] Building tree for all the packages and classes... [javadoc] https://builds.apache.org/job/Pig-trunk-commit/ws/trunk/src/org/apache/pig/EvalFunc.java:295: warning - Tag @link: reference not found: FuncUtils [javadoc] https://builds.apache.org/job/Pig-trunk-commit/ws/trunk/src/org/apache/pig/EvalFunc.java:96: warning - Tag @see: reference not found: {@link EvalFunc#getSchemaType()} [javadoc] https://builds.apache.org/job/Pig-trunk-commit/ws/trunk/src/org/apache/pig/ExecType.java:90: warning - @return tag has no arguments. [javadoc] https://builds.apache.org/job/Pig-trunk-commit/ws/trunk/src/org/apache/pig/ExecTypeProvider.java:61: warning - @return tag has no arguments. [javadoc] https://builds.apache.org/job/Pig-trunk-commit/ws/trunk/src/org/apache/pig/JVMReuseManager.java:63: warning - @StaticDataCleanup is an unknown tag. [javadoc] https://builds.apache.org/job/Pig-trunk-commit/ws/trunk/src/org/apache/pig/LoadCaster.java:134: warning - @param argument fieldSchema is not a parameter name. [javadoc] https://builds.apache.org/job/Pig-trunk-commit/ws/trunk/src/org/apache/pig/LoadCaster.java:143: warning - @param argument fieldSchema is not a parameter name. [javadoc]
[jira] Subscription: PIG patch available
Issue Subscription Filter: PIG patch available (25 issues) Subscriber: pigdaily Key Summary PIG-4618When use tez as the engine , set pig.user.cache.enabled=true do not take effect https://issues.apache.org/jira/browse/PIG-4618 PIG-4598Allow user defined plan optimizer rules https://issues.apache.org/jira/browse/PIG-4598 PIG-4581thread safe issue in NodeIdGenerator https://issues.apache.org/jira/browse/PIG-4581 PIG-4539New PigUnit https://issues.apache.org/jira/browse/PIG-4539 PIG-4526Make setting up the build environment easier https://issues.apache.org/jira/browse/PIG-4526 PIG-4468Pig's jackson version conflicts with that of hadoop 2.6.0 https://issues.apache.org/jira/browse/PIG-4468 PIG-4455Should use DependencyOrderWalker instead of DepthFirstWalker in MRPrinter https://issues.apache.org/jira/browse/PIG-4455 PIG-4417Pig's register command should support automatic fetching of jars from repo. https://issues.apache.org/jira/browse/PIG-4417 PIG-4373Implement Optimize the use of DistributedCache(PIG-2672) and PIG-3861 in Tez https://issues.apache.org/jira/browse/PIG-4373 PIG-4341Add CMX support to pig.tmpfilecompression.codec https://issues.apache.org/jira/browse/PIG-4341 PIG-4323PackageConverter hanging in Spark https://issues.apache.org/jira/browse/PIG-4323 PIG-4313StackOverflowError in LIMIT operation on Spark https://issues.apache.org/jira/browse/PIG-4313 PIG-4251Pig on Storm https://issues.apache.org/jira/browse/PIG-4251 PIG-4111Make Pig compiles with avro-1.7.7 https://issues.apache.org/jira/browse/PIG-4111 PIG-4002Disable combiner when map-side aggregation is used https://issues.apache.org/jira/browse/PIG-4002 PIG-3952PigStorage accepts '-tagSplit' to return full split information https://issues.apache.org/jira/browse/PIG-3952 PIG-3911Define unique fields with @OutputSchema https://issues.apache.org/jira/browse/PIG-3911 PIG-3877Getting Geo Latitude/Longitude from Address Lines https://issues.apache.org/jira/browse/PIG-3877 PIG-3873Geo distance calculation using Haversine https://issues.apache.org/jira/browse/PIG-3873 PIG-3866Create ThreadLocal classloader per PigContext https://issues.apache.org/jira/browse/PIG-3866 PIG-3864ToDate(userstring, format, timezone) computes DateTime with strange handling of Daylight Saving Time with location based timezones https://issues.apache.org/jira/browse/PIG-3864 PIG-3851Upgrade jline to 2.11 https://issues.apache.org/jira/browse/PIG-3851 PIG-3668COR built-in function when atleast one of the coefficient values is NaN https://issues.apache.org/jira/browse/PIG-3668 PIG-3635Fix e2e tests for Hadoop 2.X on Windows https://issues.apache.org/jira/browse/PIG-3635 PIG-3587add functionality for rolling over dates https://issues.apache.org/jira/browse/PIG-3587 You may edit this subscription at: https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=16328filterId=12322384
[jira] [Commented] (PIG-4515) org.apache.pig.builtin.Distinct throws ClassCastException
[ https://issues.apache.org/jira/browse/PIG-4515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615168#comment-14615168 ] Rohini Palaniswamy commented on PIG-4515: - bq. What is the status of this bug? It is still present in 0.15.0! Missed reviewing it as it was not marked Patch Available and also did not have a Fix Version. You need to click on Submit Patch after posting a patch. Could you do that and also set version to 0.16? Also as I mentioned earlier, this bug is not a blocker and required for 0.15 as there are 3 different ways of writing the script which is normally used by folks to achieve the result. builtin.Distinct was mainly used internally for combiner optimization and I have not seen it being used by many generally. They use the DISTINCT operator. org.apache.pig.builtin.Distinct throws ClassCastException - Key: PIG-4515 URL: https://issues.apache.org/jira/browse/PIG-4515 Project: Pig Issue Type: Bug Affects Versions: 0.14.0 Environment: 2015-04-23 08:37:49,117 [main] INFO org.apache.pig.Main - Apache Pig version 0.14.0 (r1640057) compiled Nov 16 2014, 18:02:05 Reporter: Mikko Kupsu Attachments: fix_singletuplebag_classcast_exception.patch, fix_singletuplebag_classcast_exception_2.patch Running below script causes *ClassCastException*. {code} A = LOAD 'A' AS (a:int, b:int); B = GROUP A BY a; C = FOREACH B GENERATE Distinct(A); DUMP C; {code} Content of A: {code} 1 1 2 1 3 1 4 1 5 2 6 2 7 2 8 2 9 2 {code} {code} Caused by: java.lang.ClassCastException: org.apache.pig.data.SingleTupleBag cannot be cast to org.apache.pig.data.Tuple at org.apache.pig.builtin.Distinct$Initial.exec(Distinct.java:86) at org.apache.pig.builtin.Distinct$Initial.exec(Distinct.java:78) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:323) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNextTuple(POUserFunc.java:362) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:361) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)