[jira] [Created] (PIG-4623) Fixed the 'new line' character inside double-quote causing the csv parsing failure

2015-07-06 Thread Ken Wu (JIRA)
Ken Wu created PIG-4623:
---

 Summary: Fixed the 'new line' character inside double-quote 
causing the csv parsing failure
 Key: PIG-4623
 URL: https://issues.apache.org/jira/browse/PIG-4623
 Project: Pig
  Issue Type: Bug
  Components: piggybank
Reporter: Ken Wu
Assignee: Ken Wu




A new line character should be allowed inside a double quote as a valid csv 
document. For example, the following csv document should be treated as a SINGLE 
valid csv data

Iphone,{ ItemName : Cheez-It
21 Ounce},

However, the current implementation of the getNext() inside 
org.apache.pig.piggybank.storage.CSVLoader class fails to take care of this 
case and it sees two lines of data while in fact it should be treated as single 
line of data.

This pull request fixes the above issue.

(Note: here is a linke to validate whether a csv document: http://csvlint.io/)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4623) Fixed the 'new line' character inside double-quote causing the csv parsing failure

2015-07-06 Thread Ken Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615348#comment-14615348
 ] 

Ken Wu commented on PIG-4623:
-

This has been fixed and the pull request is available at:

https://github.com/apache/pig/pull/20

 Fixed the 'new line' character inside double-quote causing the csv parsing 
 failure
 --

 Key: PIG-4623
 URL: https://issues.apache.org/jira/browse/PIG-4623
 Project: Pig
  Issue Type: Bug
  Components: piggybank
Reporter: Ken Wu
Assignee: Ken Wu
   Original Estimate: 24h
  Remaining Estimate: 24h

 A new line character should be allowed inside a double quote as a valid csv 
 document. For example, the following csv document should be treated as a 
 SINGLE valid csv data
 Iphone,{ ItemName : Cheez-It
 21 Ounce},
 However, the current implementation of the getNext() inside 
 org.apache.pig.piggybank.storage.CSVLoader class fails to take care of this 
 case and it sees two lines of data while in fact it should be treated as 
 single line of data.
 This pull request fixes the above issue.
 (Note: here is a linke to validate whether a csv document: http://csvlint.io/)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Software Podcast Interview

2015-07-06 Thread Jeff Meyerson
I'm a host for Software Engineering Radio http://se-radio.net/. In a few
weeks, I'll be launching Software Engineering Daily, my own podcast. The
structure is daily episodes, and weeklong arcs of material. One of the
first few themes will be Big Data Architectures.

Would someone from Pig like to come on for a technical discussion?

If not, any other recommendations for topics or guests would be appreciated.

Thanks,
Jeff Meyerson


pig and parquet-bundle*jar

2015-07-06 Thread Олексій Саянкін
Hi team!

I have found strange issue using pig and parquet files. There is no
parquet-bundle*jar in pig/lib folder so I have to manually add it to avoid
this exception:

pig script failed to validate:
org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not
resolve parquet.pig.ParquetLoader using imports: [, java.lang.,
org.apache.pig.builtin., org.apache.pig.impl.builtin.]

I have investigated build.xml files from pig-0.12 to pig-0.15 and found
that parquet-bundle*jar is only compile time dependency. ANT does not
copy parquet-bundle*jar
to lib folder. Similar issue you can see here
https://issues.apache.org/jira/browse/PIG-3445 (see last comment in the
thread).

So my question is: Was absence of parquet-bundle*jar file done on purpose
or we have a bug here?

Thanks.
Oleksiy Sayankin.


[jira] [Updated] (PIG-4620) Add namespace support to HBaseStorage

2015-07-06 Thread Andi Chirita (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andi Chirita updated PIG-4620:
--
Labels: hbase patch  (was: )

 Add namespace support to HBaseStorage
 -

 Key: PIG-4620
 URL: https://issues.apache.org/jira/browse/PIG-4620
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.15.0
Reporter: Andi Chirita
Assignee: Andi Chirita
  Labels: hbase, patch
 Attachments: HBaseStorageNamespace.patch


 Since version 0.96 HBase introduced namespace support.
 Apace Pig has recently updated the HBase dependency to 0.98.12 (PIG-4544)
 Currently there's no way to specify the namespace for a table.
 I suggest to implement it is using a '-namespace' option.
 {code}
 copy = STORE raw INTO 'hbase://SampleTableCopy'
USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
'info:first_name info:last_name friends:* info:*','-namespace 
 SampleNamespace');
 {code}
 We can't use the namespace in the hbase path as it will break the uri 
 validation : 'hbase://SampleNamespace:SampleTableCopy'
 The patch is available. I will look to extend the unittest for the namespace 
 option.
 Please review my changes and let me know if I can help with something.
 Kind regard,
 Andi



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4107) Calcite for query optimization

2015-07-06 Thread Julian Hyde (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Hyde updated PIG-4107:
-
Description: Calcite (formerly called Optiq) is a query planning engine 
which is currently in Apache incubator. We'd like to explore the possibility to 
use Calcite to do query optimization, material view generation for Pig.  (was: 
Optiq is a query planning engine which is currently in Apache incubator. We'd 
like to explore the possibility to use Optiq to do query optimization, material 
view generation for Pig.)

 Calcite for query optimization
 --

 Key: PIG-4107
 URL: https://issues.apache.org/jira/browse/PIG-4107
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Daniel Dai

 Calcite (formerly called Optiq) is a query planning engine which is currently 
 in Apache incubator. We'd like to explore the possibility to use Calcite to 
 do query optimization, material view generation for Pig.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4107) Calcite for query optimization

2015-07-06 Thread Julian Hyde (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Hyde updated PIG-4107:
-
Summary: Calcite for query optimization  (was: Optiq for query optimization)

 Calcite for query optimization
 --

 Key: PIG-4107
 URL: https://issues.apache.org/jira/browse/PIG-4107
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Daniel Dai

 Optiq is a query planning engine which is currently in Apache incubator. We'd 
 like to explore the possibility to use Optiq to do query optimization, 
 material view generation for Pig.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4611) Fix remaining unit test failures about TestHBaseStorage

2015-07-06 Thread Mohit Sabharwal (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615915#comment-14615915
 ] 

Mohit Sabharwal commented on PIG-4611:
--

Thanks for the explanation and addressing this issue, [~kellyzly]!!!

Let me know if I understand this correctly:

1) Spark Executor will serialize all objects referenced in supplied closures. 
Since UDFContext.getUDFContext() is not initialized (because Spark does not 
expose a setup() interface like MR), we always default defaultCaster to 
STRING_CASTER.

2) However later on, in the *same* Executor thread,  the record reader creation 
will correctly deserialize the UDFContext from JobConf 
(PigInputFormatSpark.createRecordReader-PigInputFormat.createRecordReader-MapRedUtil.setupUDFContext-UDFContext.deserialize)

3) Next, in the same Executor thread, when HBaseStorage is initialized by the 
load function, it will find a correctly populated UDFContext.

This sounds reasonable to me. Since this a core change, could you please add 
comments to HBaseStorage.java explaining why we handling this as a special case 
for Spark ?


I assume it is a typo, but you need -Dexectype argument to be {{spark}}, not 
{{TestHBaseStorage}} when running TestHBaseStorage:
{code}
ant test -Dhadoopversion=23 -Dtestcase=TestHBaseStorage -Dexectype=spark 
-DdebugPort=
{code}

 Fix remaining unit test failures about TestHBaseStorage
 -

 Key: PIG-4611
 URL: https://issues.apache.org/jira/browse/PIG-4611
 Project: Pig
  Issue Type: Sub-task
  Components: spark
Reporter: liyunzhang_intel
Assignee: liyunzhang_intel
 Fix For: spark-branch

 Attachments: PIG-4611.patch


 In https://builds.apache.org/job/Pig-spark/lastCompletedBuild/testReport/, it 
 shows following unit test failures about TestHBaseStorage:
  org.apache.pig.test.TestHBaseStorage.testStoreToHBase_1_with_delete  
  org.apache.pig.test.TestHBaseStorage.testLoadWithProjection_1
  org.apache.pig.test.TestHBaseStorage.testLoadWithProjection_2
  org.apache.pig.test.TestHBaseStorage.testStoreToHBase_2_with_projection
  org.apache.pig.test.TestHBaseStorage.testCollectedGroup  
  org.apache.pig.test.TestHBaseStorage.testHeterogeneousScans



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4624) pig errors out on ORC empty file without schema

2015-07-06 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-4624:
---
Attachment: PIG-4624.1.patch

The ORC issue should be separately addressed in ORC/Hive, however, it would be 
good if pig can handle this case with already generated files.

Attaching patch from [~daijy].


 pig errors out on ORC empty file without schema
 ---

 Key: PIG-4624
 URL: https://issues.apache.org/jira/browse/PIG-4624
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.15.0
Reporter: Thejas M Nair
Assignee: Daniel Dai
 Fix For: 0.16.0, 0.15.1

 Attachments: PIG-4624.1.patch


 If ORC produces an empty file without schema (which ideally, it is not 
 supposed to), then pig query reading the data gives the following error - 
 org.apache.pig.tools.grunt.Grunt - ERROR 1031: Incompatable schema 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PIG-4624) pig errors out on ORC empty file without schema

2015-07-06 Thread Thejas M Nair (JIRA)
Thejas M Nair created PIG-4624:
--

 Summary: pig errors out on ORC empty file without schema
 Key: PIG-4624
 URL: https://issues.apache.org/jira/browse/PIG-4624
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.15.0
Reporter: Thejas M Nair
Assignee: Daniel Dai


If ORC produces an empty file without schema (which ideally, it is not supposed 
to), then pig query reading the data gives the following error - 
org.apache.pig.tools.grunt.Grunt - ERROR 1031: Incompatable schema 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4107) Calcite for query optimization

2015-07-06 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615859#comment-14615859
 ] 

Julian Hyde commented on PIG-4107:
--

I have started prototyping a Pig-like language called Piglet in CALCITE-785. 
Lessons learned there could be brought into Pig.

 Calcite for query optimization
 --

 Key: PIG-4107
 URL: https://issues.apache.org/jira/browse/PIG-4107
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Daniel Dai

 Calcite (formerly called Optiq) is a query planning engine which is currently 
 in Apache incubator. We'd like to explore the possibility to use Calcite to 
 do query optimization, material view generation for Pig.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4622) Skip TestCubeOperator.testIllustrate and TestMultiQueryLocal.testMultiQueryWithIllustrate

2015-07-06 Thread Mohit Sabharwal (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615874#comment-14615874
 ] 

Mohit Sabharwal commented on PIG-4622:
--

Thanks, [~kellyzly].

+1 (non-binding)

 Skip TestCubeOperator.testIllustrate and 
 TestMultiQueryLocal.testMultiQueryWithIllustrate
 -

 Key: PIG-4622
 URL: https://issues.apache.org/jira/browse/PIG-4622
 Project: Pig
  Issue Type: Sub-task
  Components: spark
Reporter: liyunzhang_intel
Assignee: liyunzhang_intel
 Fix For: spark-branch

 Attachments: PIG-4622.patch


 it shows that in 
 https://builds.apache.org/job/Pig-spark/236/#showFailuresLink following two 
 unit tests fail:
 TestCubeOperator.testIllustrate and 
 TestMultiQueryLocal.testMultiQueryWithIllustrate
 This because current we don't support illustrate in spark mode(see PIG-4621).
 why after PIG-4614_1.patch was merged to branch, these two unit test fail?
 in PIG-4614_1.patch, we edit [SparkExecutionEngine 
 #instantiateScriptState|https://github.com/apache/pig/blob/a0bea12c3d5600a4c3137a8d05c054d10430b1ce/src/org/apache/pig/backend/hadoop/executionengine/spark/SparkExecutionEngine.java#L37].
   When running following script with illustrate.
 illustrate.pig
 {code}
 a = load 'test/org/apache/pig/test/data/passwd' using PigStorage(':') as 
 (uname:chararray, passwd:chararray, uid:int,gid:int);
 b = filter a by uid 5;
 illustrate b;
 store b into './testMultiQueryWithIllustrate.out';
 {code}
 the exception is thrown out at 
 [MRScriptState.get|https://github.com/apache/pig/blob/spark/src/org/apache/pig/tools/pigstats/mapreduce/MRScriptState.java#L67]:java.lang.ClassCastException:
  org.apache.pig.tools.pigstats.spark.SparkScriptState cannot be cast to 
 org.apache.pig.tools.pigstats.mapreduce.MRScriptState.
 stacktrace:
 {code}
   java.lang.ClassCastException: 
 org.apache.pig.tools.pigstats.spark.SparkScriptState cannot be cast to 
 org.apache.pig.tools.pigstats.mapreduce.MRScriptState
 at 
 org.apache.pig.tools.pigstats.mapreduce.MRScriptState.get(MRScriptState.java:67)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:512)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:327)
 at 
 org.apache.pig.pen.LocalMapReduceSimulator.launchPig(LocalMapReduceSimulator.java:110)
 at 
 org.apache.pig.pen.ExampleGenerator.getData(ExampleGenerator.java:259)
 at 
 org.apache.pig.pen.ExampleGenerator.readBaseData(ExampleGenerator.java:223)
 at 
 org.apache.pig.pen.ExampleGenerator.getExamples(ExampleGenerator.java:155)
 at org.apache.pig.PigServer.getExamples(PigServer.java:1305)
 at 
 org.apache.pig.tools.grunt.GruntParser.processIllustrate(GruntParser.java:812)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.Illustrate(PigScriptParser.java:818)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:385)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81)
 at org.apache.pig.Main.run(Main.java:624)
 at org.apache.pig.Main.main(Main.java:170)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke0(NativeMethodAccessorImpl.java:-1)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (PIG-4622) Skip TestCubeOperator.testIllustrate and TestMultiQueryLocal.testMultiQueryWithIllustrate

2015-07-06 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang resolved PIG-4622.
--
Resolution: Fixed

Committed to Spark branch. Thanks, Liyun.

 Skip TestCubeOperator.testIllustrate and 
 TestMultiQueryLocal.testMultiQueryWithIllustrate
 -

 Key: PIG-4622
 URL: https://issues.apache.org/jira/browse/PIG-4622
 Project: Pig
  Issue Type: Sub-task
  Components: spark
Reporter: liyunzhang_intel
Assignee: liyunzhang_intel
 Fix For: spark-branch

 Attachments: PIG-4622.patch


 it shows that in 
 https://builds.apache.org/job/Pig-spark/236/#showFailuresLink following two 
 unit tests fail:
 TestCubeOperator.testIllustrate and 
 TestMultiQueryLocal.testMultiQueryWithIllustrate
 This because current we don't support illustrate in spark mode(see PIG-4621).
 why after PIG-4614_1.patch was merged to branch, these two unit test fail?
 in PIG-4614_1.patch, we edit [SparkExecutionEngine 
 #instantiateScriptState|https://github.com/apache/pig/blob/a0bea12c3d5600a4c3137a8d05c054d10430b1ce/src/org/apache/pig/backend/hadoop/executionengine/spark/SparkExecutionEngine.java#L37].
   When running following script with illustrate.
 illustrate.pig
 {code}
 a = load 'test/org/apache/pig/test/data/passwd' using PigStorage(':') as 
 (uname:chararray, passwd:chararray, uid:int,gid:int);
 b = filter a by uid 5;
 illustrate b;
 store b into './testMultiQueryWithIllustrate.out';
 {code}
 the exception is thrown out at 
 [MRScriptState.get|https://github.com/apache/pig/blob/spark/src/org/apache/pig/tools/pigstats/mapreduce/MRScriptState.java#L67]:java.lang.ClassCastException:
  org.apache.pig.tools.pigstats.spark.SparkScriptState cannot be cast to 
 org.apache.pig.tools.pigstats.mapreduce.MRScriptState.
 stacktrace:
 {code}
   java.lang.ClassCastException: 
 org.apache.pig.tools.pigstats.spark.SparkScriptState cannot be cast to 
 org.apache.pig.tools.pigstats.mapreduce.MRScriptState
 at 
 org.apache.pig.tools.pigstats.mapreduce.MRScriptState.get(MRScriptState.java:67)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:512)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:327)
 at 
 org.apache.pig.pen.LocalMapReduceSimulator.launchPig(LocalMapReduceSimulator.java:110)
 at 
 org.apache.pig.pen.ExampleGenerator.getData(ExampleGenerator.java:259)
 at 
 org.apache.pig.pen.ExampleGenerator.readBaseData(ExampleGenerator.java:223)
 at 
 org.apache.pig.pen.ExampleGenerator.getExamples(ExampleGenerator.java:155)
 at org.apache.pig.PigServer.getExamples(PigServer.java:1305)
 at 
 org.apache.pig.tools.grunt.GruntParser.processIllustrate(GruntParser.java:812)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.Illustrate(PigScriptParser.java:818)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:385)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81)
 at org.apache.pig.Main.run(Main.java:624)
 at org.apache.pig.Main.main(Main.java:170)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke0(NativeMethodAccessorImpl.java:-1)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4624) pig errors out on ORC empty file without schema

2015-07-06 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-4624:
---
Fix Version/s: 0.15.1
   0.16.0

 pig errors out on ORC empty file without schema
 ---

 Key: PIG-4624
 URL: https://issues.apache.org/jira/browse/PIG-4624
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.15.0
Reporter: Thejas M Nair
Assignee: Daniel Dai
 Fix For: 0.16.0, 0.15.1


 If ORC produces an empty file without schema (which ideally, it is not 
 supposed to), then pig query reading the data gives the following error - 
 org.apache.pig.tools.grunt.Grunt - ERROR 1031: Incompatable schema 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: pig and parquet-bundle*jar

2015-07-06 Thread Lorand Bendig


Hi Oleksiy,

Initially the idea was that not to include an additional dependency to 
the pig fatjar. Instead, let the

user ship the necessary parquet bundle.
However, with PIG-3737 the dependent jars are now copied to the 
$PIG_HOME/lib directory.
I suspect, you are right, the patch in PIG-3737 need to be extended in 
order to have parquet-pig-bundle-*.jar

in the /lib directory as well.
On the other hand, it would be also great to bump parquet-bundle version 
from 1.2.3 to 1.7.0.


@Daniel, what do you think?

Thanks,
Lorand

On 06/07/15 18:34, Олексій Саянкін wrote:

Hi team!

I have found strange issue using pig and parquet files. There is no
parquet-bundle*jar in pig/lib folder so I have to manually add it to avoid
this exception:

pig script failed to validate:
org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not
resolve parquet.pig.ParquetLoader using imports: [, java.lang.,
org.apache.pig.builtin., org.apache.pig.impl.builtin.]

I have investigated build.xml files from pig-0.12 to pig-0.15 and found
that parquet-bundle*jar is only compile time dependency. ANT does not
copy parquet-bundle*jar
to lib folder. Similar issue you can see here
https://issues.apache.org/jira/browse/PIG-3445 (see last comment in the
thread).

So my question is: Was absence of parquet-bundle*jar file done on purpose
or we have a bug here?

Thanks.
Oleksiy Sayankin.





[jira] [Updated] (PIG-4624) Error on ORC empty file without schema

2015-07-06 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-4624:
---
Attachment: PIG-4624.1.patch

 Error on ORC empty file without schema
 --

 Key: PIG-4624
 URL: https://issues.apache.org/jira/browse/PIG-4624
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.15.0
Reporter: Thejas M Nair
Assignee: Daniel Dai
 Fix For: 0.16.0, 0.15.1

 Attachments: PIG-4624.1.patch


 If ORC produces an empty file without schema (which ideally, it is not 
 supposed to), then pig query reading the data gives the following error - 
 org.apache.pig.tools.grunt.Grunt - ERROR 1031: Incompatable schema 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4624) Error on ORC empty file without schema

2015-07-06 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-4624:
---
Attachment: (was: PIG-4624.1.patch)

 Error on ORC empty file without schema
 --

 Key: PIG-4624
 URL: https://issues.apache.org/jira/browse/PIG-4624
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.15.0
Reporter: Thejas M Nair
Assignee: Daniel Dai
 Fix For: 0.16.0, 0.15.1

 Attachments: PIG-4624.1.patch


 If ORC produces an empty file without schema (which ideally, it is not 
 supposed to), then pig query reading the data gives the following error - 
 org.apache.pig.tools.grunt.Grunt - ERROR 1031: Incompatable schema 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4624) Error on ORC empty file without schema

2015-07-06 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-4624:
---
Summary: Error on ORC empty file without schema  (was: pig errors out on 
ORC empty file without schema)

 Error on ORC empty file without schema
 --

 Key: PIG-4624
 URL: https://issues.apache.org/jira/browse/PIG-4624
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.15.0
Reporter: Thejas M Nair
Assignee: Daniel Dai
 Fix For: 0.16.0, 0.15.1

 Attachments: PIG-4624.1.patch


 If ORC produces an empty file without schema (which ideally, it is not 
 supposed to), then pig query reading the data gives the following error - 
 org.apache.pig.tools.grunt.Grunt - ERROR 1031: Incompatable schema 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4624) Error on ORC empty file without schema

2015-07-06 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-4624:
---
Attachment: (was: PIG-4624.1.patch)

 Error on ORC empty file without schema
 --

 Key: PIG-4624
 URL: https://issues.apache.org/jira/browse/PIG-4624
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.15.0
Reporter: Thejas M Nair
Assignee: Daniel Dai
 Fix For: 0.16.0, 0.15.1

 Attachments: PIG-4624.1.patch


 If ORC produces an empty file without schema (which ideally, it is not 
 supposed to), then pig query reading the data gives the following error - 
 org.apache.pig.tools.grunt.Grunt - ERROR 1031: Incompatable schema 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4624) Error on ORC empty file without schema

2015-07-06 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-4624:
---
Attachment: PIG-4624.1.patch

 Error on ORC empty file without schema
 --

 Key: PIG-4624
 URL: https://issues.apache.org/jira/browse/PIG-4624
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.15.0
Reporter: Thejas M Nair
Assignee: Daniel Dai
 Fix For: 0.16.0, 0.15.1

 Attachments: PIG-4624.1.patch


 If ORC produces an empty file without schema (which ideally, it is not 
 supposed to), then pig query reading the data gives the following error - 
 org.apache.pig.tools.grunt.Grunt - ERROR 1031: Incompatable schema 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4611) Fix remaining unit test failures about TestHBaseStorage

2015-07-06 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616082#comment-14616082
 ] 

liyunzhang_intel commented on PIG-4611:
---

PIG-4611_2.patch is based on  PIG-4622: Skip TestCubeOperator.testIllustrate 
and TestMultiQueryLocal.testMultiQueryWithIllustrate (Liyun via Xuefu) 

 Fix remaining unit test failures about TestHBaseStorage
 -

 Key: PIG-4611
 URL: https://issues.apache.org/jira/browse/PIG-4611
 Project: Pig
  Issue Type: Sub-task
  Components: spark
Reporter: liyunzhang_intel
Assignee: liyunzhang_intel
 Fix For: spark-branch

 Attachments: PIG-4611.patch, PIG-4611_2.patch


 In https://builds.apache.org/job/Pig-spark/lastCompletedBuild/testReport/, it 
 shows following unit test failures about TestHBaseStorage:
  org.apache.pig.test.TestHBaseStorage.testStoreToHBase_1_with_delete  
  org.apache.pig.test.TestHBaseStorage.testLoadWithProjection_1
  org.apache.pig.test.TestHBaseStorage.testLoadWithProjection_2
  org.apache.pig.test.TestHBaseStorage.testStoreToHBase_2_with_projection
  org.apache.pig.test.TestHBaseStorage.testCollectedGroup  
  org.apache.pig.test.TestHBaseStorage.testHeterogeneousScans



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4611) Fix remaining unit test failures about TestHBaseStorage

2015-07-06 Thread liyunzhang_intel (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang_intel updated PIG-4611:
--
Attachment: PIG-4611_2.patch

[~mohitsabharwal]:
Your understanding is right.
According to your suggestion, i add some comment to explain why i need make 
some changes in HBaseStorage#init and SelfSpillBag#memLimit in PIG-4611_2.patch

 Fix remaining unit test failures about TestHBaseStorage
 -

 Key: PIG-4611
 URL: https://issues.apache.org/jira/browse/PIG-4611
 Project: Pig
  Issue Type: Sub-task
  Components: spark
Reporter: liyunzhang_intel
Assignee: liyunzhang_intel
 Fix For: spark-branch

 Attachments: PIG-4611.patch, PIG-4611_2.patch


 In https://builds.apache.org/job/Pig-spark/lastCompletedBuild/testReport/, it 
 shows following unit test failures about TestHBaseStorage:
  org.apache.pig.test.TestHBaseStorage.testStoreToHBase_1_with_delete  
  org.apache.pig.test.TestHBaseStorage.testLoadWithProjection_1
  org.apache.pig.test.TestHBaseStorage.testLoadWithProjection_2
  org.apache.pig.test.TestHBaseStorage.testStoreToHBase_2_with_projection
  org.apache.pig.test.TestHBaseStorage.testCollectedGroup  
  org.apache.pig.test.TestHBaseStorage.testHeterogeneousScans



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Build failed in Jenkins: Pig-trunk-commit #2186

2015-07-06 Thread Apache Jenkins Server
See https://builds.apache.org/job/Pig-trunk-commit/2186/

--
[...truncated 2986 lines...]
  [javadoc] Loading source files for package org.apache.pig.impl.streaming...
  [javadoc] Loading source files for package org.apache.pig.impl.util...
  [javadoc] Loading source files for package org.apache.pig.impl.util.avro...
  [javadoc] Loading source files for package org.apache.pig.impl.util.hive...
  [javadoc] Loading source files for package org.apache.pig.newplan...
  [javadoc] Loading source files for package org.apache.pig.newplan.logical...
  [javadoc] Loading source files for package 
org.apache.pig.newplan.logical.expression...
  [javadoc] Loading source files for package 
org.apache.pig.newplan.logical.optimizer...
  [javadoc] Loading source files for package 
org.apache.pig.newplan.logical.relational...
  [javadoc] Loading source files for package 
org.apache.pig.newplan.logical.rules...
  [javadoc] Loading source files for package 
org.apache.pig.newplan.logical.visitor...
  [javadoc] Loading source files for package org.apache.pig.newplan.optimizer...
  [javadoc] Loading source files for package org.apache.pig.parser...
  [javadoc] Loading source files for package org.apache.pig.pen...
  [javadoc] Loading source files for package org.apache.pig.pen.util...
  [javadoc] Loading source files for package org.apache.pig.scripting...
  [javadoc] Loading source files for package org.apache.pig.scripting.groovy...
  [javadoc] Loading source files for package org.apache.pig.scripting.jruby...
  [javadoc] Loading source files for package org.apache.pig.scripting.js...
  [javadoc] Loading source files for package org.apache.pig.scripting.jython...
  [javadoc] Loading source files for package 
org.apache.pig.scripting.streaming.python...
  [javadoc] Loading source files for package org.apache.pig.tools...
  [javadoc] Loading source files for package org.apache.pig.tools.cmdline...
  [javadoc] Loading source files for package org.apache.pig.tools.counters...
  [javadoc] Loading source files for package org.apache.pig.tools.grunt...
  [javadoc] Loading source files for package org.apache.pig.tools.parameters...
  [javadoc] Loading source files for package org.apache.pig.tools.pigstats...
  [javadoc] Loading source files for package 
org.apache.pig.tools.pigstats.mapreduce...
  [javadoc] Loading source files for package 
org.apache.pig.tools.pigstats.tez...
  [javadoc] Loading source files for package org.apache.pig.tools.streams...
  [javadoc] Loading source files for package org.apache.pig.tools.timer...
  [javadoc] Loading source files for package org.apache.pig.validator...
  [javadoc] Constructing Javadoc information...
  [javadoc] 
/home/jenkins/.ivy2/cache/org.apache.hbase/hbase-common/jars/hbase-common-0.98.12-hadoop2.jar(org/apache/hadoop/hbase/io/ImmutableBytesWritable.class):
 warning: Cannot find annotation method 'value()' in type 'SuppressWarnings': 
class file for edu.umd.cs.findbugs.annotations.SuppressWarnings not found
  [javadoc] 
/home/jenkins/.ivy2/cache/org.apache.hbase/hbase-common/jars/hbase-common-0.98.12-hadoop2.jar(org/apache/hadoop/hbase/io/ImmutableBytesWritable.class):
 warning: Cannot find annotation method 'justification()' in type 
'SuppressWarnings'
  [javadoc] 
https://builds.apache.org/job/Pig-trunk-commit/ws/trunk/src/org/apache/pig/EvalFunc.java:96:
 warning - Tag @see:illegal character: 123 in {@link 
EvalFunc#getSchemaType()}
  [javadoc] 
https://builds.apache.org/job/Pig-trunk-commit/ws/trunk/src/org/apache/pig/EvalFunc.java:96:
 warning - Tag @see:illegal character: 64 in {@link 
EvalFunc#getSchemaType()}
  [javadoc] Standard Doclet version 1.7.0_65
  [javadoc] Building tree for all the packages and classes...
  [javadoc] 
https://builds.apache.org/job/Pig-trunk-commit/ws/trunk/src/org/apache/pig/EvalFunc.java:295:
 warning - Tag @link: reference not found: FuncUtils
  [javadoc] 
https://builds.apache.org/job/Pig-trunk-commit/ws/trunk/src/org/apache/pig/EvalFunc.java:96:
 warning - Tag @see: reference not found: {@link EvalFunc#getSchemaType()}
  [javadoc] 
https://builds.apache.org/job/Pig-trunk-commit/ws/trunk/src/org/apache/pig/ExecType.java:90:
 warning - @return tag has no arguments.
  [javadoc] 
https://builds.apache.org/job/Pig-trunk-commit/ws/trunk/src/org/apache/pig/ExecTypeProvider.java:61:
 warning - @return tag has no arguments.
  [javadoc] 
https://builds.apache.org/job/Pig-trunk-commit/ws/trunk/src/org/apache/pig/JVMReuseManager.java:63:
 warning - @StaticDataCleanup is an unknown tag.
  [javadoc] 
https://builds.apache.org/job/Pig-trunk-commit/ws/trunk/src/org/apache/pig/LoadCaster.java:134:
 warning - @param argument fieldSchema is not a parameter name.
  [javadoc] 
https://builds.apache.org/job/Pig-trunk-commit/ws/trunk/src/org/apache/pig/LoadCaster.java:143:
 warning - @param argument fieldSchema is not a parameter name.
  [javadoc] 

[jira] Subscription: PIG patch available

2015-07-06 Thread jira
Issue Subscription
Filter: PIG patch available (25 issues)

Subscriber: pigdaily

Key Summary
PIG-4618When use tez as the engine , set pig.user.cache.enabled=true  do  
not take effect  
https://issues.apache.org/jira/browse/PIG-4618
PIG-4598Allow user defined plan optimizer rules
https://issues.apache.org/jira/browse/PIG-4598
PIG-4581thread safe issue in NodeIdGenerator
https://issues.apache.org/jira/browse/PIG-4581
PIG-4539New PigUnit
https://issues.apache.org/jira/browse/PIG-4539
PIG-4526Make setting up the build environment easier
https://issues.apache.org/jira/browse/PIG-4526
PIG-4468Pig's jackson version conflicts with that of hadoop 2.6.0
https://issues.apache.org/jira/browse/PIG-4468
PIG-4455Should use DependencyOrderWalker instead of DepthFirstWalker in 
MRPrinter
https://issues.apache.org/jira/browse/PIG-4455
PIG-4417Pig's register command should support automatic fetching of jars 
from repo.
https://issues.apache.org/jira/browse/PIG-4417
PIG-4373Implement Optimize the use of DistributedCache(PIG-2672) and 
PIG-3861 in Tez
https://issues.apache.org/jira/browse/PIG-4373
PIG-4341Add CMX support to pig.tmpfilecompression.codec
https://issues.apache.org/jira/browse/PIG-4341
PIG-4323PackageConverter hanging in Spark
https://issues.apache.org/jira/browse/PIG-4323
PIG-4313StackOverflowError in LIMIT operation on Spark
https://issues.apache.org/jira/browse/PIG-4313
PIG-4251Pig on Storm
https://issues.apache.org/jira/browse/PIG-4251
PIG-4111Make Pig compiles with avro-1.7.7
https://issues.apache.org/jira/browse/PIG-4111
PIG-4002Disable combiner when map-side aggregation is used
https://issues.apache.org/jira/browse/PIG-4002
PIG-3952PigStorage accepts '-tagSplit' to return full split information
https://issues.apache.org/jira/browse/PIG-3952
PIG-3911Define unique fields with @OutputSchema
https://issues.apache.org/jira/browse/PIG-3911
PIG-3877Getting Geo Latitude/Longitude from Address Lines
https://issues.apache.org/jira/browse/PIG-3877
PIG-3873Geo distance calculation using Haversine
https://issues.apache.org/jira/browse/PIG-3873
PIG-3866Create ThreadLocal classloader per PigContext
https://issues.apache.org/jira/browse/PIG-3866
PIG-3864ToDate(userstring, format, timezone) computes DateTime with strange 
handling of Daylight Saving Time with location based timezones
https://issues.apache.org/jira/browse/PIG-3864
PIG-3851Upgrade jline to 2.11
https://issues.apache.org/jira/browse/PIG-3851
PIG-3668COR built-in function when atleast one of the coefficient values is 
NaN
https://issues.apache.org/jira/browse/PIG-3668
PIG-3635Fix e2e tests for Hadoop 2.X on Windows
https://issues.apache.org/jira/browse/PIG-3635
PIG-3587add functionality for rolling over dates
https://issues.apache.org/jira/browse/PIG-3587

You may edit this subscription at:
https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=16328filterId=12322384


[jira] [Commented] (PIG-4515) org.apache.pig.builtin.Distinct throws ClassCastException

2015-07-06 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615168#comment-14615168
 ] 

Rohini Palaniswamy commented on PIG-4515:
-

bq. What is the status of this bug? It is still present in 0.15.0!
   Missed reviewing it as it was not marked Patch Available and also did not 
have a Fix Version. You need to click on Submit Patch after posting a patch.  
Could you do that and also set version to 0.16?  Also as I mentioned earlier, 
this bug is not a blocker and required for 0.15 as there are 3 different ways 
of writing the script which is normally used by folks to achieve the result. 
builtin.Distinct was mainly used internally for combiner optimization and I 
have not seen it being used by many generally. They use the DISTINCT operator.

 org.apache.pig.builtin.Distinct throws ClassCastException
 -

 Key: PIG-4515
 URL: https://issues.apache.org/jira/browse/PIG-4515
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.14.0
 Environment: 2015-04-23 08:37:49,117 [main] INFO  org.apache.pig.Main 
 - Apache Pig version 0.14.0 (r1640057) compiled Nov 16 2014, 18:02:05
Reporter: Mikko Kupsu
 Attachments: fix_singletuplebag_classcast_exception.patch, 
 fix_singletuplebag_classcast_exception_2.patch


 Running below script causes *ClassCastException*.
 {code}
 A = LOAD 'A' AS (a:int, b:int);
 B = GROUP A BY a;
 C = FOREACH B GENERATE Distinct(A);
 DUMP C;
 {code}
 Content of A:
 {code}
 1 1
 2 1
 3 1
 4 1
 5 2
 6 2
 7 2
 8 2
 9 2
 {code}
 {code}
 Caused by: java.lang.ClassCastException: org.apache.pig.data.SingleTupleBag 
 cannot be cast to org.apache.pig.data.Tuple
   at org.apache.pig.builtin.Distinct$Initial.exec(Distinct.java:86)
   at org.apache.pig.builtin.Distinct$Initial.exec(Distinct.java:78)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:323)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNextTuple(POUserFunc.java:362)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:361)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)