[jira] [Commented] (PIG-4059) Pig on Spark

2017-05-30 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16029733#comment-16029733
 ] 

Julien Le Dem commented on PIG-4059:


Congrats all! 
Thanks [~rohini]!

> Pig on Spark
> 
>
> Key: PIG-4059
> URL: https://issues.apache.org/jira/browse/PIG-4059
> Project: Pig
>  Issue Type: New Feature
>  Components: spark
>Reporter: Rohini Palaniswamy
>Assignee: Praveen Rachabattuni
>  Labels: spork
> Fix For: spark-branch, 0.17.0
>
> Attachments: Pig-on-Spark-Design-Doc.pdf, Pig-on-Spark-Scope.pdf
>
>
> Setting up your development environment:
> 0. download spark release package(currently pig on spark only support spark 
> 1.6).
> 1. Check out Pig Spark branch.
> 2. Build Pig by running "ant jar" and "ant -Dhadoopversion=23 jar" for 
> hadoop-2.x versions
> 3. Configure these environmental variables:
> export HADOOP_USER_CLASSPATH_FIRST="true"
> Now we support “local” and "yarn-client" mode, you can export system variable 
> “SPARK_MASTER” like:
> export SPARK_MASTER=local or export SPARK_MASTER="yarn-client"
> 4. In local mode: ./pig -x spark_local xxx.pig
> In yarn-client mode: 
> export SPARK_HOME=xx; 
> export SPARK_JAR=hdfs://example.com:8020/ (the hdfs location where 
> you upload the spark-assembly*.jar)
> ./pig -x spark xxx.pig



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (PIG-2845) Configure hadoop.tmp.dir under build/tmp for MiniCluster tests

2017-02-01 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem reassigned PIG-2845:
--

Assignee: (was: Julien Le Dem)

> Configure hadoop.tmp.dir under build/tmp for MiniCluster tests
> --
>
> Key: PIG-2845
> URL: https://issues.apache.org/jira/browse/PIG-2845
> Project: Pig
>  Issue Type: Bug
>Reporter: Julien Le Dem
> Attachments: PIG-2845_0.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PIG-3764) Compile physical operators to bytecode

2016-07-06 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15365633#comment-15365633
 ] 

Julien Le Dem commented on PIG-3764:


[~rohini] This sounds good. It does not have to be totally inlined since the 
JIT will inline method calls, you want to avoid virtual calls though. My 
prototype is still out there [1]. One thing it did not take into account is 
nulls. But I think this can be branch out separately (evaluate ignoring the 
nulls and then evaluate the is null)
Generating asm directly can be unwieldy. That's why I had made Brennus [2] to 
factor out a lot of the logic (different operations per type, different stack 
frame size per type, all sorts of special cases) see proto. [1]

1: https://github.com/julienledem/pig/compare/trunk...compile_physical_plan
2: https://github.com/julienledem/brennus

> Compile physical operators to bytecode
> --
>
> Key: PIG-3764
> URL: https://issues.apache.org/jira/browse/PIG-3764
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Julien Le Dem
>  Labels: GSOC2014
>
> I started a prototype here:
> https://github.com/julienledem/pig/compare/trunk...compile_physical_plan
> The current physical plan is relatively inefficient at evaluating expressions.
> In the context of a better execution engine (Tez, Spark, ...), compiling 
> expressions to bytecode would be a significant speedup.
> This is a candidate project for Google summer of code 2014. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2014



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PIG-4219) When parsing a schema, pig drops tuple inside of Bag if it contains only one field

2014-10-01 Thread Julien Le Dem (JIRA)
Julien Le Dem created PIG-4219:
--

 Summary: When parsing a schema, pig drops tuple inside of Bag if 
it contains only one field
 Key: PIG-4219
 URL: https://issues.apache.org/jira/browse/PIG-4219
 Project: Pig
  Issue Type: Bug
Reporter: Julien Le Dem


Example
{code:java}
//We generate a schema object and call toString()
String schemaStr = my_list: {array: (array_element: (num1: int,num2: int))};
// Reparsed using org.apache.pig.impl.util.Utils
Schema schema = Utils.getSchemaFromString(schemaStr);
// But no longer matches the original structure
schema.toString();
// = {my_list: {array_element: (num1: int,num2: int)}}
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-3760) Predicate pushdown for columnar file formats

2014-07-31 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081944#comment-14081944
 ] 

Julien Le Dem commented on PIG-3760:


[~rohini] I added to the description of PIG-4092

 Predicate pushdown for columnar file formats
 

 Key: PIG-3760
 URL: https://issues.apache.org/jira/browse/PIG-3760
 Project: Pig
  Issue Type: New Feature
Reporter: Andrew Musselman
 Fix For: 0.14.0


 From the conversation on dev@pig:
 Partition pruning for ORC is not addressed in PIG-3558. We will need
 to do partition pruning for both ORC and Parquet in a new ticket.
 Curently there is no interface to deal with this kind of pushdown
 (LoadMetadata.setPartitionFilter push the filter to loader, but remove
 the filter statement, for ORC/Parquet, filter is a hint, and we need
 to do the filter again in Pig even it is pushed to loader), we will
 need to define a new interface for that. You are welcome to initiate
 the work. I know Aniket is also interested in doing that, so be sure
 the talk with him about this work.
 Thanks,
 Daniel
 On Mon, Feb 10, 2014 at 11:42 AM, Andrew Musselman
 andrew.mussel...@gmail.com wrote:
  I had a chat with a couple people last week about a feature request for
  Pig:  in a where or filter clause, when loading an ORC file, to skip
  directly to the right offset instead of scanning the whole file.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-4092) Predicate pushdown for Parquet

2014-07-31 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-4092:
---

Description: 
See:
https://github.com/apache/incubator-parquet-mr/pull/4
and:
https://github.com/apache/incubator-parquet-mr/blob/master/parquet-column/src/main/java/parquet/filter2/predicate/FilterApi.java

 Predicate pushdown for Parquet
 --

 Key: PIG-4092
 URL: https://issues.apache.org/jira/browse/PIG-4092
 Project: Pig
  Issue Type: Sub-task
Reporter: Rohini Palaniswamy
 Fix For: 0.14.0


 See:
 https://github.com/apache/incubator-parquet-mr/pull/4
 and:
 https://github.com/apache/incubator-parquet-mr/blob/master/parquet-column/src/main/java/parquet/filter2/predicate/FilterApi.java



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-4092) Predicate pushdown for Parquet

2014-07-31 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-4092:
---

Description: 
See:
https://github.com/apache/incubator-parquet-mr/pull/4
and:
https://github.com/apache/incubator-parquet-mr/blob/master/parquet-column/src/main/java/parquet/filter2/predicate/FilterApi.java
[~alexlevenson] is the main author of this API

  was:
See:
https://github.com/apache/incubator-parquet-mr/pull/4
and:
https://github.com/apache/incubator-parquet-mr/blob/master/parquet-column/src/main/java/parquet/filter2/predicate/FilterApi.java


 Predicate pushdown for Parquet
 --

 Key: PIG-4092
 URL: https://issues.apache.org/jira/browse/PIG-4092
 Project: Pig
  Issue Type: Sub-task
Reporter: Rohini Palaniswamy
 Fix For: 0.14.0


 See:
 https://github.com/apache/incubator-parquet-mr/pull/4
 and:
 https://github.com/apache/incubator-parquet-mr/blob/master/parquet-column/src/main/java/parquet/filter2/predicate/FilterApi.java
 [~alexlevenson] is the main author of this API



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3760) Predicate pushdown for columnar file formats

2014-07-31 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081948#comment-14081948
 ] 

Julien Le Dem commented on PIG-3760:


FYI in Parquet the filter is not a hint and it will be applied to records after 
the metadata

 Predicate pushdown for columnar file formats
 

 Key: PIG-3760
 URL: https://issues.apache.org/jira/browse/PIG-3760
 Project: Pig
  Issue Type: New Feature
Reporter: Andrew Musselman
 Fix For: 0.14.0


 From the conversation on dev@pig:
 Partition pruning for ORC is not addressed in PIG-3558. We will need
 to do partition pruning for both ORC and Parquet in a new ticket.
 Curently there is no interface to deal with this kind of pushdown
 (LoadMetadata.setPartitionFilter push the filter to loader, but remove
 the filter statement, for ORC/Parquet, filter is a hint, and we need
 to do the filter again in Pig even it is pushed to loader), we will
 need to define a new interface for that. You are welcome to initiate
 the work. I know Aniket is also interested in doing that, so be sure
 the talk with him about this work.
 Thanks,
 Daniel
 On Mon, Feb 10, 2014 at 11:42 AM, Andrew Musselman
 andrew.mussel...@gmail.com wrote:
  I had a chat with a couple people last week about a feature request for
  Pig:  in a where or filter clause, when loading an ORC file, to skip
  directly to the right offset instead of scanning the whole file.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3913) Pig should use job's jobClient wherever possible (fixes local mode counters)

2014-04-29 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13985005#comment-13985005
 ] 

Julien Le Dem commented on PIG-3913:


This looks good to me.
Please add javadoc for deprecated methods with information about the new way of 
doing the same thing.
+1

 Pig should use job's jobClient wherever possible (fixes local mode counters)
 

 Key: PIG-3913
 URL: https://issues.apache.org/jira/browse/PIG-3913
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Aniket Mokashi
Assignee: Aniket Mokashi
 Fix For: 0.13.0

 Attachments: PIG-3913-1.patch


 MapreduceLauncher initializes a statsJobClient to poll counter information of 
 jobs. This works fine in mapreduce mode but it reports incorrect information 
 in local (auto-local) mode. Pig code should try to use 
 org.apache.hadoop.mapred.jobcontrol.Job's getJobClient api to get handle to 
 jobClient wherever possible. statsJobClient (and wherever its references are 
 passed) should be deprecated.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3815) Hadoop bug causes to pig to fail silently with jar cache

2014-03-18 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13939606#comment-13939606
 ] 

Julien Le Dem commented on PIG-3815:


same comment as 1. from Cheolsoo
otherwise, this looks good to me.

 Hadoop bug causes to pig to fail silently with jar cache
 

 Key: PIG-3815
 URL: https://issues.apache.org/jira/browse/PIG-3815
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Aniket Mokashi
Assignee: Aniket Mokashi
 Fix For: 0.13.0

 Attachments: PIG-3815-1.patch, PIG-3815.patch


 Pig uses DistributedCache.addFileToClassPath api that puts jars on 
 distributed cache configuration. This uses : to separate list of files to be 
 put of classpath via distributed cache. If fs.default.name has port number in 
 it, it causes the tokenization logic to fail in hadoop for retrieving list of 
 cache filenames in backend.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3815) Hadoop bug causes to pig to fail silently with jar cache

2014-03-18 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13939749#comment-13939749
 ] 

Julien Le Dem commented on PIG-3815:


[~rohini] in the code you quoted, don't you think it is putting the port back 
in the following line?
{noformat}
URI uri = fs.makeQualified(file).toUri();
{noformat}

 Hadoop bug causes to pig to fail silently with jar cache
 

 Key: PIG-3815
 URL: https://issues.apache.org/jira/browse/PIG-3815
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Aniket Mokashi
Assignee: Aniket Mokashi
 Fix For: 0.13.0

 Attachments: PIG-3815-1.patch, PIG-3815-2.patch, PIG-3815.patch


 Pig uses DistributedCache.addFileToClassPath api that puts jars on 
 distributed cache configuration. This uses : to separate list of files to be 
 put of classpath via distributed cache. If fs.default.name has port number in 
 it, it causes the tokenization logic to fail in hadoop for retrieving list of 
 cache filenames in backend.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3801) Auto local mode does not call storeSchema

2014-03-10 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13926350#comment-13926350
 ] 

Julien Le Dem commented on PIG-3801:


I would use properties.getProperty(MAPREDUCE_FRAMEWORK_NAME).equals(LOCAL) to 
decide if it's running locally, but otherwise this looks good to me.

 Auto local mode does not call storeSchema
 -

 Key: PIG-3801
 URL: https://issues.apache.org/jira/browse/PIG-3801
 Project: Pig
  Issue Type: Bug
Reporter: Aniket Mokashi
Assignee: Aniket Mokashi
 Attachments: PIG-3801.patch


 https://github.com/apache/pig/blob/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceLauncher.java#L481
 Pig code explicitly runs PigOutputCommitter.storeCleanup for local jobs. We 
 also need to add this for auto-local jobs.
 To repro this problem, run-
   a = load '2.txt' as (a0:chararray, a1:int);
   store a into 'a' using PigStorage(',','-schema');
 This creates .pig_schema file in pig -x local mode, but does not create 
 .pig_schema file in auto-local mode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3801) Auto local mode does not call storeSchema

2014-03-10 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13926351#comment-13926351
 ] 

Julien Le Dem commented on PIG-3801:


+1

 Auto local mode does not call storeSchema
 -

 Key: PIG-3801
 URL: https://issues.apache.org/jira/browse/PIG-3801
 Project: Pig
  Issue Type: Bug
Reporter: Aniket Mokashi
Assignee: Aniket Mokashi
 Attachments: PIG-3801.patch


 https://github.com/apache/pig/blob/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceLauncher.java#L481
 Pig code explicitly runs PigOutputCommitter.storeCleanup for local jobs. We 
 also need to add this for auto-local jobs.
 To repro this problem, run-
   a = load '2.txt' as (a0:chararray, a1:int);
   store a into 'a' using PigStorage(',','-schema');
 This creates .pig_schema file in pig -x local mode, but does not create 
 .pig_schema file in auto-local mode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3754) InputSizeReducerEstimator.getTotalInputFileSize reports incorrect size

2014-03-06 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923365#comment-13923365
 ] 

Julien Le Dem commented on PIG-3754:


LGTM too

 InputSizeReducerEstimator.getTotalInputFileSize reports incorrect size
 --

 Key: PIG-3754
 URL: https://issues.apache.org/jira/browse/PIG-3754
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Aniket Mokashi
Assignee: Aniket Mokashi
Priority: Trivial
 Fix For: 0.13.0

 Attachments: PIG-3754-1.patch


 If you have more than one input, 
 InputSizeReducerEstimator.getTotalInputFileSize can return incorrect value if 
 one of the loader returns \-1 and is not file based (eg- hbase). This causes 
 incorrect reducer estimation and problems in auto.local mode.
 If size of input is not found in for any of the inputs, we should bail out 
 with return value of -1.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3558) ORC support for Pig

2014-02-14 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13902218#comment-13902218
 ] 

Julien Le Dem commented on PIG-3558:


Is hive-exec the fat jar that assembles the runtime dependencies of hive in one 
jar?
Could we depend on the individual hive modules that we need instead?


 ORC support for Pig
 ---

 Key: PIG-3558
 URL: https://issues.apache.org/jira/browse/PIG-3558
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.13.0

 Attachments: PIG-3558-1.patch, PIG-3558-2.patch, PIG-3558-3.patch


 Adding LoadFunc and StoreFunc for ORC.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (PIG-3764) Compile physical operators to bytecode

2014-02-12 Thread Julien Le Dem (JIRA)
Julien Le Dem created PIG-3764:
--

 Summary: Compile physical operators to bytecode
 Key: PIG-3764
 URL: https://issues.apache.org/jira/browse/PIG-3764
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Julien Le Dem


I started a prototype here:
https://github.com/julienledem/pig/compare/trunk...compile_physical_plan

The current physical plan is relatively inefficient at evaluating expressions.
In the context of a better execution engine (Tez, Spark, ...), compiling 
expressions to bytecode would be a significant speedup.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (PIG-2599) Mavenize Pig

2014-02-12 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-2599:
---

Labels: gsoc2014  (was: gsoc2013)

 Mavenize Pig
 

 Key: PIG-2599
 URL: https://issues.apache.org/jira/browse/PIG-2599
 Project: Pig
  Issue Type: New Feature
  Components: build
Reporter: Daniel Dai
  Labels: gsoc2014
 Fix For: 0.13.0

 Attachments: maven-pig.1.zip


 Switch Pig build system from ant to maven.
 This is a candidate project for Google summer of code 2013. More information 
 about the program can be found at 
 https://cwiki.apache.org/confluence/display/PIG/GSoc2013



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (PIG-3741) Utils.setTmpFileCompressionOnConf can cause side effect for SequenceFileInterStorage

2014-02-04 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13891030#comment-13891030
 ] 

Julien Le Dem commented on PIG-3741:


Ideally each store would get its own config object, but that would be a major 
refactoring.
In the meantime, this looks like a good improvement to me.
+1

 Utils.setTmpFileCompressionOnConf can cause side effect for 
 SequenceFileInterStorage
 

 Key: PIG-3741
 URL: https://issues.apache.org/jira/browse/PIG-3741
 Project: Pig
  Issue Type: Bug
Reporter: Aniket Mokashi
Assignee: Aniket Mokashi
 Fix For: 0.12.1

 Attachments: PIG-3741.patch


 Currently, Utils.setTmpFileCompressionOnConf(pigContext, conf); is invoked 
 for every job. In case of Seqfile, this api sets mapreduce params on conf to 
 assist SequenceFileInterStorage. However, as a side effect, this might change 
 the behavior of other storers due to these mapred properties. This api should 
 only be called for jobs with intermediate storage.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (PIG-3347) Store invocation brings side effect

2014-01-30 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13887203#comment-13887203
 ] 

Julien Le Dem commented on PIG-3347:


I thought that the field UIDs were used to track lineage across the plan.
[~aniket486] correct me if I'm wrong but it is used to determine which fields 
are reads for projection push down.
In the case of self join (directly or indirectly) we end up with duplicate ids 
in the same relation because the same field is derived to 2 different fields.
Otherwise I'm as lost as [~knoguchi] regarding the actual mechanisms around the 
UID.
I tried to fix some of these in the past (PIG-3020) but it appears they created 
more problems (PIG-3492)
[~daijy] maybe you can enlighten us?

 Store invocation brings side effect
 ---

 Key: PIG-3347
 URL: https://issues.apache.org/jira/browse/PIG-3347
 Project: Pig
  Issue Type: Bug
  Components: grunt
Affects Versions: 0.11
 Environment: local mode
Reporter: Sergey
Assignee: Daniel Dai
Priority: Critical
 Fix For: 0.12.1

 Attachments: PIG-3347-1.patch


 The problem is that intermediate 'store' invocation changes the final store 
 output. Looks like it brings some kind of side effect. We did use 'local' 
 mode to run script
 here is the input data:
 1
 1
 Here is the script:
 {code}
 a = load 'test';
 a_group = group a by $0;
 b = foreach a_group {
   a_distinct = distinct a.$0;
   generate group, a_distinct;
 }
 --store b into 'b';
 c = filter b by SIZE(a_distinct) == 1;
 store c into 'out';
 {code}
 We expect output to be:
 1 1
 The output is empty file.
 Uncomment {code}--store b into 'b';{code} line and see the diffrence.
 Yuo would get expected output.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (PIG-3082) outputSchema of a UDF allows two usages when describing a Tuple schema

2013-10-03 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785427#comment-13785427
 ] 

Julien Le Dem commented on PIG-3082:


This is intended.
The second behavior described above is really problematic.
If a UDF breaks because it returns a schema of more than one field it should be 
changed to return one field of type tuple.
Once fixed it works in all versions of Pig.
This is only removing an unsafe use of outputSchema in favor of the existing 
correct use.

 outputSchema of a UDF allows two usages when describing a Tuple schema
 --

 Key: PIG-3082
 URL: https://issues.apache.org/jira/browse/PIG-3082
 Project: Pig
  Issue Type: Bug
Reporter: Julien Le Dem
Assignee: Jonathan Coveney
 Fix For: 0.12.0

 Attachments: PIG-3082-0.patch, PIG-3082-1.patch


 When defining an evalfunc that returns a Tuple there are two ways you can 
 implement outputSchema().
 - The right way: return a schema that contains one Field that contains the 
 type and schema of the return type of the UDF
 - The unreliable way: return a schema that contains more than one field and 
 it will be understood as a tuple schema even though there is no type (which 
 is in Field class) to specify that. This is particularly deceitful when the 
 output schema is derived from the input schema and the outputted Tuple 
 sometimes contain only one field. In such cases Pig understands the output 
 schema as a tuple only if there is more than one field. And sometimes it 
 works, sometimes it does not.
 We should at least issue a warning (backward compatibility) if not plain 
 throw an exception when the output schema contains more than one Field.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (PIG-3445) Make Parquet format available out of the box in Pig

2013-10-03 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785564#comment-13785564
 ] 

Julien Le Dem commented on PIG-3445:


I add a parquet-pig-bundle and the shading of fastutil:
https://github.com/Parquet/parquet-mr/pull/186
We can make a new release to simplify

 Make Parquet format available out of the box in Pig
 ---

 Key: PIG-3445
 URL: https://issues.apache.org/jira/browse/PIG-3445
 Project: Pig
  Issue Type: Improvement
Reporter: Julien Le Dem
 Fix For: 0.12.0

 Attachments: PIG-3445-2.patch, PIG-3445-3.patch, PIG-3445-4.patch, 
 PIG-3445.patch


 We would add the Parquet jar in the Pig packages to make it available out of 
 the box to pig users.
 On top of that we could add the parquet.pig package to the list of packages 
 to search for UDFs. (alternatively, the parquet jar could contain classes 
 name or.apache.pig.builtin.ParquetLoader and ParquetStorer)
 This way users can use Parquet simply by typing:
 A = LOAD 'foo' USING ParquetLoader();
 STORE A INTO 'bar' USING ParquetStorer();



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (PIG-3445) Make Parquet format available out of the box in Pig

2013-10-03 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785565#comment-13785565
 ] 

Julien Le Dem commented on PIG-3445:



parquet-format.version should be 1.0.0

 Make Parquet format available out of the box in Pig
 ---

 Key: PIG-3445
 URL: https://issues.apache.org/jira/browse/PIG-3445
 Project: Pig
  Issue Type: Improvement
Reporter: Julien Le Dem
 Fix For: 0.12.0

 Attachments: PIG-3445-2.patch, PIG-3445-3.patch, PIG-3445-4.patch, 
 PIG-3445.patch


 We would add the Parquet jar in the Pig packages to make it available out of 
 the box to pig users.
 On top of that we could add the parquet.pig package to the list of packages 
 to search for UDFs. (alternatively, the parquet jar could contain classes 
 name or.apache.pig.builtin.ParquetLoader and ParquetStorer)
 This way users can use Parquet simply by typing:
 A = LOAD 'foo' USING ParquetLoader();
 STORE A INTO 'bar' USING ParquetStorer();



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (PIG-3445) Make Parquet format available out of the box in Pig

2013-10-03 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785610#comment-13785610
 ] 

Julien Le Dem commented on PIG-3445:


We merged the PR for parquet-pig-bundle
I'm making a release so that this can be merge in pig 0.12


 Make Parquet format available out of the box in Pig
 ---

 Key: PIG-3445
 URL: https://issues.apache.org/jira/browse/PIG-3445
 Project: Pig
  Issue Type: Improvement
Reporter: Julien Le Dem
 Fix For: 0.12.0

 Attachments: PIG-3445-2.patch, PIG-3445-3.patch, PIG-3445-4.patch, 
 PIG-3445.patch


 We would add the Parquet jar in the Pig packages to make it available out of 
 the box to pig users.
 On top of that we could add the parquet.pig package to the list of packages 
 to search for UDFs. (alternatively, the parquet jar could contain classes 
 name or.apache.pig.builtin.ParquetLoader and ParquetStorer)
 This way users can use Parquet simply by typing:
 A = LOAD 'foo' USING ParquetLoader();
 STORE A INTO 'bar' USING ParquetStorer();



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (PIG-3445) Make Parquet format available out of the box in Pig

2013-10-03 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785742#comment-13785742
 ] 

Julien Le Dem commented on PIG-3445:


I just released parquet-pig-bundle-1.2.3
this should show up in maven central overnight

 Make Parquet format available out of the box in Pig
 ---

 Key: PIG-3445
 URL: https://issues.apache.org/jira/browse/PIG-3445
 Project: Pig
  Issue Type: Improvement
Reporter: Julien Le Dem
 Fix For: 0.12.0

 Attachments: PIG-3445-2.patch, PIG-3445-3.patch, PIG-3445-4.patch, 
 PIG-3445.patch


 We would add the Parquet jar in the Pig packages to make it available out of 
 the box to pig users.
 On top of that we could add the parquet.pig package to the list of packages 
 to search for UDFs. (alternatively, the parquet jar could contain classes 
 name or.apache.pig.builtin.ParquetLoader and ParquetStorer)
 This way users can use Parquet simply by typing:
 A = LOAD 'foo' USING ParquetLoader();
 STORE A INTO 'bar' USING ParquetStorer();



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (PIG-3446) Umbrella jira for Pig on Tez

2013-09-20 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773569#comment-13773569
 ] 

Julien Le Dem commented on PIG-3446:


Here is the work that Achal did for Pig-on-Tez
https://github.com/achalsoni81/pigeon

 Umbrella jira for Pig on Tez
 

 Key: PIG-3446
 URL: https://issues.apache.org/jira/browse/PIG-3446
 Project: Pig
  Issue Type: New Feature
  Components: tez
Affects Versions: tez-branch
Reporter: Cheolsoo Park
Assignee: Cheolsoo Park
 Fix For: tez-branch


 This is a umbrella jira for Pig on Tez. More detailed subtasks will be added.
 More information can be found on the following wiki page:
 https://cwiki.apache.org/confluence/display/PIG/Pig+on+Tez

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3367) Add assert keyword (operator) in pig

2013-09-17 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769985#comment-13769985
 ] 

Julien Le Dem commented on PIG-3367:


Looks good to me.
Is there a way you can factor out some of the content of buildAssertOp() ? It 
looks like some of this would be common with other methods.

 Add assert keyword (operator) in pig
 

 Key: PIG-3367
 URL: https://issues.apache.org/jira/browse/PIG-3367
 Project: Pig
  Issue Type: New Feature
  Components: parser
Reporter: Aniket Mokashi
Assignee: Aniket Mokashi
 Fix For: 0.12

 Attachments: PIG-3367.patch


 Assert operator can be used for data validation. With assert you can write 
 script as following-
 {code}
 a = load 'something' as (a0:int, a1:int);
 assert a by a0  0, 'a cant be negative for reasons';
 {code}
 This script will fail if assert is violated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-30 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754975#comment-13754975
 ] 

Julien Le Dem commented on PIG-3419:


+1
[~cheolsoo] LGTM!

 Pluggable Execution Engine 
 ---

 Key: PIG-3419
 URL: https://issues.apache.org/jira/browse/PIG-3419
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.12
Reporter: Achal Soni
Assignee: Achal Soni
Priority: Minor
 Attachments: execengine.patch, mapreduce_execengine.patch, 
 stats_scriptstate.patch, test_failures.txt, test_suite.patch, 
 updated-8-22-2013-exec-engine.patch, updated-8-23-2013-exec-engine.patch, 
 updated-8-27-2013-exec-engine.patch, updated-8-28-2013-exec-engine.patch, 
 updated-8-29-2013-exec-engine.patch


 In an effort to adapt Pig to work using Apache Tez 
 (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
 a cleaner ExecutionEngine abstraction than existed before. The changes are 
 not that major as Pig was already relatively abstracted out between the 
 frontend and backend. The changes in the attached commit are essentially the 
 barebones changes -- I tried to not change the structure of Pig's different 
 components too much. I think it will be interesting to see in the future how 
 we can refactor more areas of Pig to really honor this abstraction between 
 the frontend and backend. 
 Some of the changes was to reinstate an ExecutionEngine interface to tie 
 together the front end and backend, and making the changes in Pig to delegate 
 to the EE when necessary, and creating an MRExecutionEngine that implements 
 this interface. Other work included changing ExecType to cycle through the 
 ExecutionEngines on the classpath and select the appropriate one (this is 
 done using Java ServiceLoader, exactly how MapReduce does for choosing the 
 framework to use between local and distributed mode). Also I tried to make 
 ScriptState, JobStats, and PigStats as abstract as possible in its current 
 state. I think in the future some work will need to be done here to perhaps 
 re-evaluate the usage of ScriptState and the responsibilities of the 
 different statistics classes. I haven't touched the PPNL, but I think more 
 abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-29 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753912#comment-13753912
 ] 

Julien Le Dem commented on PIG-3419:


[~cheolsoo]: thanks a lot for looking into this.

Here are my thoughts:

1. let's change it back

2. 4. 5. 6. 7. are either internal to Pig or necessary to add the execution 
engine abstraction.

3.
JobStats still exists but the MR specific part is split into MRJobStats which 
extends JobStats
Same thing for PigStatsUtil and ScriptState. Those classes are not disappearing 
but the MR specific part is abstracted out.
HExecutionEngine could be renamed back to what it was but this is again what is 
becoming the new abstraction.
Unfortunately tools like Ambrose and Lipstick depend on the MR specific parts 
of Pig and look at the internals. This patch is a necessary change so that 
those tools can work independently of the execution engine in the future.
The changes to Ambrose and Lipstick should be minimal though with this patch. 
But yes they would suffer from some incompatibility, but again there is no way 
around it when a tool looks inside the execution engine internals.

I think we should revert 1. and commit the patch.



 Pluggable Execution Engine 
 ---

 Key: PIG-3419
 URL: https://issues.apache.org/jira/browse/PIG-3419
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.12
Reporter: Achal Soni
Assignee: Achal Soni
Priority: Minor
 Attachments: execengine.patch, mapreduce_execengine.patch, 
 stats_scriptstate.patch, test_failures.txt, test_suite.patch, 
 updated-8-22-2013-exec-engine.patch, updated-8-23-2013-exec-engine.patch, 
 updated-8-27-2013-exec-engine.patch, updated-8-28-2013-exec-engine.patch


 In an effort to adapt Pig to work using Apache Tez 
 (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
 a cleaner ExecutionEngine abstraction than existed before. The changes are 
 not that major as Pig was already relatively abstracted out between the 
 frontend and backend. The changes in the attached commit are essentially the 
 barebones changes -- I tried to not change the structure of Pig's different 
 components too much. I think it will be interesting to see in the future how 
 we can refactor more areas of Pig to really honor this abstraction between 
 the frontend and backend. 
 Some of the changes was to reinstate an ExecutionEngine interface to tie 
 together the front end and backend, and making the changes in Pig to delegate 
 to the EE when necessary, and creating an MRExecutionEngine that implements 
 this interface. Other work included changing ExecType to cycle through the 
 ExecutionEngines on the classpath and select the appropriate one (this is 
 done using Java ServiceLoader, exactly how MapReduce does for choosing the 
 framework to use between local and distributed mode). Also I tried to make 
 ScriptState, JobStats, and PigStats as abstract as possible in its current 
 state. I think in the future some work will need to be done here to perhaps 
 re-evaluate the usage of ScriptState and the responsibilities of the 
 different statistics classes. I haven't touched the PPNL, but I think more 
 abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3445) Make Parquet format available out of the box in Pig

2013-08-29 Thread Julien Le Dem (JIRA)
Julien Le Dem created PIG-3445:
--

 Summary: Make Parquet format available out of the box in Pig
 Key: PIG-3445
 URL: https://issues.apache.org/jira/browse/PIG-3445
 Project: Pig
  Issue Type: Improvement
Reporter: Julien Le Dem


We would add the Parquet jar in the Pig packages to make it available out of 
the box to pig users.
On top of that we could add the parquet.pig package to the list of packages to 
search for UDFs. (alternatively, the parquet jar could contain classes name 
or.apache.pig.builtin.ParquetLoader and ParquetStorer)
This way users can use Parquet simply by typing:
A = LOAD 'foo' USING ParquetLoader();
STORE A INTO 'bar' USING ParquetStorer();

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-23 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13749226#comment-13749226
 ] 

Julien Le Dem commented on PIG-3419:


The advantage of having the Execution engine abstraction in trunk is it allows 
running experimental Pig execution engines implementations like Tez or Spark on 
an official release of Pig without having to build from a specific branch.
The execution engine implementations themselves are fairly independent of Pig 
and do not need to  be maintained in a Pig branch.
If the ExecutionEngine abstraction evolves over time that can be done in Trunk 
and can be merged independently of the Tez implementation itself.


 Pluggable Execution Engine 
 ---

 Key: PIG-3419
 URL: https://issues.apache.org/jira/browse/PIG-3419
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.12
Reporter: Achal Soni
Assignee: Achal Soni
Priority: Minor
 Attachments: execengine.patch, mapreduce_execengine.patch, 
 stats_scriptstate.patch, test_failures.txt, test_suite.patch, 
 updated-8-22-2013-exec-engine.patch


 In an effort to adapt Pig to work using Apache Tez 
 (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
 a cleaner ExecutionEngine abstraction than existed before. The changes are 
 not that major as Pig was already relatively abstracted out between the 
 frontend and backend. The changes in the attached commit are essentially the 
 barebones changes -- I tried to not change the structure of Pig's different 
 components too much. I think it will be interesting to see in the future how 
 we can refactor more areas of Pig to really honor this abstraction between 
 the frontend and backend. 
 Some of the changes was to reinstate an ExecutionEngine interface to tie 
 together the front end and backend, and making the changes in Pig to delegate 
 to the EE when necessary, and creating an MRExecutionEngine that implements 
 this interface. Other work included changing ExecType to cycle through the 
 ExecutionEngines on the classpath and select the appropriate one (this is 
 done using Java ServiceLoader, exactly how MapReduce does for choosing the 
 framework to use between local and distributed mode). Also I tried to make 
 ScriptState, JobStats, and PigStats as abstract as possible in its current 
 state. I think in the future some work will need to be done here to perhaps 
 re-evaluate the usage of ScriptState and the responsibilities of the 
 different statistics classes. I haven't touched the PPNL, but I think more 
 abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-22 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13748183#comment-13748183
 ] 

Julien Le Dem commented on PIG-3419:


+1 LGTM
If test-commit passes I think we can commit to TRUNK

 Pluggable Execution Engine 
 ---

 Key: PIG-3419
 URL: https://issues.apache.org/jira/browse/PIG-3419
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.12
Reporter: Achal Soni
Assignee: Achal Soni
Priority: Minor
 Attachments: execengine.patch, mapreduce_execengine.patch, 
 stats_scriptstate.patch, test_failures.txt, test_suite.patch, 
 updated-8-22-2013-exec-engine.patch


 In an effort to adapt Pig to work using Apache Tez 
 (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
 a cleaner ExecutionEngine abstraction than existed before. The changes are 
 not that major as Pig was already relatively abstracted out between the 
 frontend and backend. The changes in the attached commit are essentially the 
 barebones changes -- I tried to not change the structure of Pig's different 
 components too much. I think it will be interesting to see in the future how 
 we can refactor more areas of Pig to really honor this abstraction between 
 the frontend and backend. 
 Some of the changes was to reinstate an ExecutionEngine interface to tie 
 together the front end and backend, and making the changes in Pig to delegate 
 to the EE when necessary, and creating an MRExecutionEngine that implements 
 this interface. Other work included changing ExecType to cycle through the 
 ExecutionEngines on the classpath and select the appropriate one (this is 
 done using Java ServiceLoader, exactly how MapReduce does for choosing the 
 framework to use between local and distributed mode). Also I tried to make 
 ScriptState, JobStats, and PigStats as abstract as possible in its current 
 state. I think in the future some work will need to be done here to perhaps 
 re-evaluate the usage of ScriptState and the responsibilities of the 
 different statistics classes. I haven't touched the PPNL, but I think more 
 abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-21 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13746999#comment-13746999
 ] 

Julien Le Dem commented on PIG-3419:


I have submitted my review. This looks great [~achalsoni81]!
[~cheolsoo] does it look good to you?
Once Achal has updated his patch I'm willing to commit.

 Pluggable Execution Engine 
 ---

 Key: PIG-3419
 URL: https://issues.apache.org/jira/browse/PIG-3419
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.12
Reporter: Achal Soni
Assignee: Achal Soni
Priority: Minor
 Attachments: execengine.patch, finalpatch.patch, 
 mapreduce_execengine.patch, stats_scriptstate.patch, test_suite.patch


 In an effort to adapt Pig to work using Apache Tez 
 (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
 a cleaner ExecutionEngine abstraction than existed before. The changes are 
 not that major as Pig was already relatively abstracted out between the 
 frontend and backend. The changes in the attached commit are essentially the 
 barebones changes -- I tried to not change the structure of Pig's different 
 components too much. I think it will be interesting to see in the future how 
 we can refactor more areas of Pig to really honor this abstraction between 
 the frontend and backend. 
 Some of the changes was to reinstate an ExecutionEngine interface to tie 
 together the front end and backend, and making the changes in Pig to delegate 
 to the EE when necessary, and creating an MRExecutionEngine that implements 
 this interface. Other work included changing ExecType to cycle through the 
 ExecutionEngines on the classpath and select the appropriate one (this is 
 done using Java ServiceLoader, exactly how MapReduce does for choosing the 
 framework to use between local and distributed mode). Also I tried to make 
 ScriptState, JobStats, and PigStats as abstract as possible in its current 
 state. I think in the future some work will need to be done here to perhaps 
 re-evaluate the usage of ScriptState and the responsibilities of the 
 different statistics classes. I haven't touched the PPNL, but I think more 
 abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-21 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13747064#comment-13747064
 ] 

Julien Le Dem commented on PIG-3419:


The point is to be able to implement alternate execution engines without having 
to fork Pig.
I think it should go in trunk.

 Pluggable Execution Engine 
 ---

 Key: PIG-3419
 URL: https://issues.apache.org/jira/browse/PIG-3419
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.12
Reporter: Achal Soni
Assignee: Achal Soni
Priority: Minor
 Attachments: execengine.patch, finalpatch.patch, 
 mapreduce_execengine.patch, stats_scriptstate.patch, test_suite.patch


 In an effort to adapt Pig to work using Apache Tez 
 (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
 a cleaner ExecutionEngine abstraction than existed before. The changes are 
 not that major as Pig was already relatively abstracted out between the 
 frontend and backend. The changes in the attached commit are essentially the 
 barebones changes -- I tried to not change the structure of Pig's different 
 components too much. I think it will be interesting to see in the future how 
 we can refactor more areas of Pig to really honor this abstraction between 
 the frontend and backend. 
 Some of the changes was to reinstate an ExecutionEngine interface to tie 
 together the front end and backend, and making the changes in Pig to delegate 
 to the EE when necessary, and creating an MRExecutionEngine that implements 
 this interface. Other work included changing ExecType to cycle through the 
 ExecutionEngines on the classpath and select the appropriate one (this is 
 done using Java ServiceLoader, exactly how MapReduce does for choosing the 
 framework to use between local and distributed mode). Also I tried to make 
 ScriptState, JobStats, and PigStats as abstract as possible in its current 
 state. I think in the future some work will need to be done here to perhaps 
 re-evaluate the usage of ScriptState and the responsibilities of the 
 different statistics classes. I haven't touched the PPNL, but I think more 
 abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-20 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13745398#comment-13745398
 ] 

Julien Le Dem commented on PIG-3419:


[~cheolsoo] 
1. Do we really throw Exception ? If yes, then let's just throw that. If not 
then let's instead have FrontEndException, ExecException, IOException. i.e. 
let's remove the exceptions that are already included by the highest exception 
level.
2. agreed with you. I would expect the execution engine to handle the 
Properties internally and the signature of this method to be:
{noformat}
public void setProperty(String property, String value);
{noformat}

 Pluggable Execution Engine 
 ---

 Key: PIG-3419
 URL: https://issues.apache.org/jira/browse/PIG-3419
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.12
Reporter: Achal Soni
Assignee: Achal Soni
Priority: Minor
 Attachments: execengine.patch, mapreduce_execengine.patch, 
 stats_scriptstate.patch, test_suite.patch


 In an effort to adapt Pig to work using Apache Tez 
 (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
 a cleaner ExecutionEngine abstraction than existed before. The changes are 
 not that major as Pig was already relatively abstracted out between the 
 frontend and backend. The changes in the attached commit are essentially the 
 barebones changes -- I tried to not change the structure of Pig's different 
 components too much. I think it will be interesting to see in the future how 
 we can refactor more areas of Pig to really honor this abstraction between 
 the frontend and backend. 
 Some of the changes was to reinstate an ExecutionEngine interface to tie 
 together the front end and backend, and making the changes in Pig to delegate 
 to the EE when necessary, and creating an MRExecutionEngine that implements 
 this interface. Other work included changing ExecType to cycle through the 
 ExecutionEngines on the classpath and select the appropriate one (this is 
 done using Java ServiceLoader, exactly how MapReduce does for choosing the 
 framework to use between local and distributed mode). Also I tried to make 
 ScriptState, JobStats, and PigStats as abstract as possible in its current 
 state. I think in the future some work will need to be done here to perhaps 
 re-evaluate the usage of ScriptState and the responsibilities of the 
 different statistics classes. I haven't touched the PPNL, but I think more 
 abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-13 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738478#comment-13738478
 ] 

Julien Le Dem commented on PIG-3419:


Hi Achal
for large patches, please create a review here: https://reviews.apache.org

 Pluggable Execution Engine 
 ---

 Key: PIG-3419
 URL: https://issues.apache.org/jira/browse/PIG-3419
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.12
Reporter: Achal Soni
Priority: Minor
 Attachments: pluggable_execengine.patch


 In an effort to adapt Pig to work using Apache Tez 
 (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
 a cleaner ExecutionEngine abstraction than existed before. The changes are 
 not that major as Pig was already relatively abstracted out between the 
 frontend and backend. The changes in the attached commit are essentially the 
 barebones changes -- I tried to not change the structure of Pig's different 
 components too much. I think it will be interesting to see in the future how 
 we can refactor more areas of Pig to really honor this abstraction between 
 the frontend and backend. 
 Some of the changes was to reinstate an ExecutionEngine interface to tie 
 together the front end and backend, and making the changes in Pig to delegate 
 to the EE when necessary, and creating an MRExecutionEngine that implements 
 this interface. Other work included changing ExecType to cycle through the 
 ExecutionEngines on the classpath and select the appropriate one (this is 
 done using Java ServiceLoader, exactly how MapReduce does for choosing the 
 framework to use between local and distributed mode). Also I tried to make 
 ScriptState, JobStats, and PigStats as abstract as possible in its current 
 state. I think in the future some work will need to be done here to perhaps 
 re-evaluate the usage of ScriptState and the responsibilities of the 
 different statistics classes. I haven't touched the PPNL, but I think more 
 abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3367) Add assert keyword (operator) in pig

2013-06-27 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694832#comment-13694832
 ] 

Julien Le Dem commented on PIG-3367:


I was thinking we could make the syntax part of FOREACH.
{noformat}
B = FOREACH A GENERATE a, b, c ASSERT a = 0, b IS NOT NULL;
{noformat}
That way it is easy to integrate asserts in the flow.

The advantage of having it part of the language:
- the error message can be clear without extra user input.
- it's more natural than doing a filter that does not filter. Also if the 
filter is not in the predecessors of a STORE, it won't be executed.

A UDF can stop the job by throwing an exception. Although the task will retry 
before failing completely.

For reference, the UDF based syntax:
{noformat}
FILTER members BY ASSERT( (member_id = 0 ? 1 : 0), 'Doh! Some member ID is 
negative.' );
{noformat}

Yes adding new keywords is inconvenient when the keyword was used for relation 
or column names.
When a field collides with a keyword it is sometimes difficult to rename it.
I think we should:
 - try to avoid new keywords if possible
 - provide a mechanism to escape field names to facilitate fixing conflicts 
when they happen (using quotes or a similar mechanism)

 Add assert keyword (operator) in pig
 

 Key: PIG-3367
 URL: https://issues.apache.org/jira/browse/PIG-3367
 Project: Pig
  Issue Type: New Feature
  Components: parser
Reporter: Aniket Mokashi
Assignee: Aniket Mokashi

 Assert operator can be used for data validation. With assert you can write 
 script as following-
 {code}
 a = load 'something' as (a0:int, a1:int);
 assert a by a0  0, 'a cant be negative for reasons';
 {code}
 This script will fail if assert is violated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2828) DataType.compare null

2013-06-03 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672896#comment-13672896
 ] 

Julien Le Dem commented on PIG-2828:


Sounds good to me.

 DataType.compare null
 -

 Key: PIG-2828
 URL: https://issues.apache.org/jira/browse/PIG-2828
 Project: Pig
  Issue Type: Bug
Reporter: Haitao Yao
 Attachments: DataType.patch, PIG-2828.patch, test.patch


 While using TOP, and if the DataBag contains null value to compare, it will 
 generate the following exception:
 Caused by: java.lang.NullPointerException
   at org.apache.pig.data.DataType.compare(DataType.java:427)
   at org.apache.pig.builtin.TOP$TupleComparator.compare(TOP.java:97)
   at org.apache.pig.builtin.TOP$TupleComparator.compare(TOP.java:1)
   at java.util.PriorityQueue.siftUpUsingComparator(PriorityQueue.java:649)
   at java.util.PriorityQueue.siftUp(PriorityQueue.java:627)
   at java.util.PriorityQueue.offer(PriorityQueue.java:329)
   at java.util.PriorityQueue.add(PriorityQueue.java:306)
   at org.apache.pig.builtin.TOP.updateTop(TOP.java:141)
   at org.apache.pig.builtin.TOP.exec(TOP.java:116)
 code: (TOP.java, starts with line 91)
 Object field1 = o1.get(fieldNum);
 Object field2 = o2.get(fieldNum);
 if (!typeFound) {
 datatype = DataType.findType(field1);
 typeFound = true;
 }
 return DataType.compare(field1, field2, datatype, datatype);
 The reason is that if the typeFound is true , and the dataType is not null, 
 and field1 is null, the script failed.
 So we need to judge the field1 whether is null.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3307) Refactor physical operators to remove methods parameters that are always null

2013-05-17 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-3307:
---

Attachment: PIG-3307_3.patch

PIG-3307_3.patch addresses [~cheolsoo]'s comments

 Refactor physical operators to remove methods parameters that are always null
 -

 Key: PIG-3307
 URL: https://issues.apache.org/jira/browse/PIG-3307
 Project: Pig
  Issue Type: Improvement
Reporter: Julien Le Dem
Assignee: Julien Le Dem
 Attachments: PIG-3307_0.patch, PIG-3307_1.patch, PIG-3307_2.patch, 
 PIG-3307_3.patch


 The physical operators are sometimes overly complex. I'm trying to cleanup 
 some unnecessary code.
 in particular there is an array of getNext(*T* v) where the value v does not 
 seem to have any importance and is just used to pick the correct method.
 I have started a refactoring for a more readable getNext*T*().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3307) Refactor physical operators to remove methods parameters that are always null

2013-05-17 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13661202#comment-13661202
 ] 

Julien Le Dem commented on PIG-3307:


committed to TRUNK
Committed revision 1484037

 Refactor physical operators to remove methods parameters that are always null
 -

 Key: PIG-3307
 URL: https://issues.apache.org/jira/browse/PIG-3307
 Project: Pig
  Issue Type: Improvement
Reporter: Julien Le Dem
Assignee: Julien Le Dem
 Attachments: PIG-3307_0.patch, PIG-3307_1.patch, PIG-3307_2.patch, 
 PIG-3307_3.patch


 The physical operators are sometimes overly complex. I'm trying to cleanup 
 some unnecessary code.
 in particular there is an array of getNext(*T* v) where the value v does not 
 seem to have any importance and is just used to pick the correct method.
 I have started a refactoring for a more readable getNext*T*().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3307) Refactor physical operators to remove methods parameters that are always null

2013-05-16 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13660033#comment-13660033
 ] 

Julien Le Dem commented on PIG-3307:


https://reviews.apache.org/r/11203/diff/#index_header
thanks [~cheolsoo] and [~daijy]!

 Refactor physical operators to remove methods parameters that are always null
 -

 Key: PIG-3307
 URL: https://issues.apache.org/jira/browse/PIG-3307
 Project: Pig
  Issue Type: Improvement
Reporter: Julien Le Dem
Assignee: Julien Le Dem
 Attachments: PIG-3307_0.patch, PIG-3307_1.patch, PIG-3307_2.patch


 The physical operators are sometimes overly complex. I'm trying to cleanup 
 some unnecessary code.
 in particular there is an array of getNext(*T* v) where the value v does not 
 seem to have any importance and is just used to pick the correct method.
 I have started a refactoring for a more readable getNext*T*().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3307) Refactor physical operators to remove methods parameters that are always null

2013-05-10 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13654887#comment-13654887
 ] 

Julien Le Dem commented on PIG-3307:


[~daijy] Also most likely it wont make any difference performance wise.

 Refactor physical operators to remove methods parameters that are always null
 -

 Key: PIG-3307
 URL: https://issues.apache.org/jira/browse/PIG-3307
 Project: Pig
  Issue Type: Improvement
Reporter: Julien Le Dem
Assignee: Julien Le Dem
 Attachments: PIG-3307_0.patch, PIG-3307_1.patch, PIG-3307_2.patch


 The physical operators are sometimes overly complex. I'm trying to cleanup 
 some unnecessary code.
 in particular there is an array of getNext(*T* v) where the value v does not 
 seem to have any importance and is just used to pick the correct method.
 I have started a refactoring for a more readable getNext*T*().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3307) Refactor physical operators to remove methods parameters that are always null

2013-05-07 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13651101#comment-13651101
 ] 

Julien Le Dem commented on PIG-3307:


[~daijy]what's the recommended approach? If you have a setup to do perf test 
that would be helpful.


 Refactor physical operators to remove methods parameters that are always null
 -

 Key: PIG-3307
 URL: https://issues.apache.org/jira/browse/PIG-3307
 Project: Pig
  Issue Type: Improvement
Reporter: Julien Le Dem
Assignee: Julien Le Dem
 Attachments: PIG-3307_0.patch, PIG-3307_1.patch, PIG-3307_2.patch


 The physical operators are sometimes overly complex. I'm trying to cleanup 
 some unnecessary code.
 in particular there is an array of getNext(*T* v) where the value v does not 
 seem to have any importance and is just used to pick the correct method.
 I have started a refactoring for a more readable getNext*T*().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3311) add pig-withouthadoop-h2 to mvn-jar

2013-05-06 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649907#comment-13649907
 ] 

Julien Le Dem commented on PIG-3311:


Committed to TRUNK

 add pig-withouthadoop-h2 to mvn-jar
 ---

 Key: PIG-3311
 URL: https://issues.apache.org/jira/browse/PIG-3311
 Project: Pig
  Issue Type: Improvement
  Components: build
Reporter: Julien Le Dem
Assignee: Julien Le Dem
 Attachments: PIG-3311.patch


 mvn-jar currently creates pig-version.jar and pig-version-h2.jar
 I'm adding pig-version-withouthadoop.jar and pig-version-withouthadoop-h2.jar 
 that are needed to run pig from the command line.
 This will allow a dual-version package.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-3311) add pig-withouthadoop-h2 to mvn-jar

2013-05-06 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem resolved PIG-3311.


   Resolution: Fixed
Fix Version/s: 0.12

 add pig-withouthadoop-h2 to mvn-jar
 ---

 Key: PIG-3311
 URL: https://issues.apache.org/jira/browse/PIG-3311
 Project: Pig
  Issue Type: Improvement
  Components: build
Reporter: Julien Le Dem
Assignee: Julien Le Dem
 Fix For: 0.12

 Attachments: PIG-3311.patch


 mvn-jar currently creates pig-version.jar and pig-version-h2.jar
 I'm adding pig-version-withouthadoop.jar and pig-version-withouthadoop-h2.jar 
 that are needed to run pig from the command line.
 This will allow a dual-version package.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3307) Refactor physical operators to remove methods parameters that are always null

2013-05-06 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13650021#comment-13650021
 ] 

Julien Le Dem commented on PIG-3307:


[~daijy] This is removing parameters that were not used. I have not tested 
performance but I think it could only improve performance.
(see latest patch PIG-3307_2.patch)

 Refactor physical operators to remove methods parameters that are always null
 -

 Key: PIG-3307
 URL: https://issues.apache.org/jira/browse/PIG-3307
 Project: Pig
  Issue Type: Improvement
Reporter: Julien Le Dem
Assignee: Julien Le Dem
 Attachments: PIG-3307_0.patch, PIG-3307_1.patch, PIG-3307_2.patch


 The physical operators are sometimes overly complex. I'm trying to cleanup 
 some unnecessary code.
 in particular there is an array of getNext(*T* v) where the value v does not 
 seem to have any importance and is just used to pick the correct method.
 I have started a refactoring for a more readable getNext*T*().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3307) Refactor physical operators to remove methods parameters that are always null

2013-05-03 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-3307:
---

Attachment: PIG-3307_2.patch

PIG-3307_2.patch removes the unused parameter in getNext(\*)


 Refactor physical operators to remove methods parameters that are always null
 -

 Key: PIG-3307
 URL: https://issues.apache.org/jira/browse/PIG-3307
 Project: Pig
  Issue Type: Improvement
Reporter: Julien Le Dem
Assignee: Julien Le Dem
 Attachments: PIG-3307_0.patch, PIG-3307_1.patch, PIG-3307_2.patch


 The physical operators are sometimes overly complex. I'm trying to cleanup 
 some unnecessary code.
 in particular there is an array of getNext(*T* v) where the value v does not 
 seem to have any importance and is just used to pick the correct method.
 I have started a refactoring for a more readable getNext*T*().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3311) add pig-withouthadoop-h2 to mvn-jar

2013-05-03 Thread Julien Le Dem (JIRA)
Julien Le Dem created PIG-3311:
--

 Summary: add pig-withouthadoop-h2 to mvn-jar
 Key: PIG-3311
 URL: https://issues.apache.org/jira/browse/PIG-3311
 Project: Pig
  Issue Type: Improvement
  Components: build
Reporter: Julien Le Dem
Assignee: Julien Le Dem
 Attachments: PIG-3311.patch

mvn-jar currently creates pig-version.jar and pig-version-h2.jar
I'm adding pig-version-withouthadoop.jar and pig-version-withouthadoop-h2.jar 
that are needed to run pig from the command line.
This will allow a dual-version package.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3311) add pig-withouthadoop-h2 to mvn-jar

2013-05-03 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-3311:
---

Attachment: PIG-3311.patch

PIG-3311.patch adds -withouthadoop to the mvn-jar target

 add pig-withouthadoop-h2 to mvn-jar
 ---

 Key: PIG-3311
 URL: https://issues.apache.org/jira/browse/PIG-3311
 Project: Pig
  Issue Type: Improvement
  Components: build
Reporter: Julien Le Dem
Assignee: Julien Le Dem
 Attachments: PIG-3311.patch


 mvn-jar currently creates pig-version.jar and pig-version-h2.jar
 I'm adding pig-version-withouthadoop.jar and pig-version-withouthadoop-h2.jar 
 that are needed to run pig from the command line.
 This will allow a dual-version package.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3311) add pig-withouthadoop-h2 to mvn-jar

2013-05-03 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-3311:
---

Patch Info: Patch Available

 add pig-withouthadoop-h2 to mvn-jar
 ---

 Key: PIG-3311
 URL: https://issues.apache.org/jira/browse/PIG-3311
 Project: Pig
  Issue Type: Improvement
  Components: build
Reporter: Julien Le Dem
Assignee: Julien Le Dem
 Attachments: PIG-3311.patch


 mvn-jar currently creates pig-version.jar and pig-version-h2.jar
 I'm adding pig-version-withouthadoop.jar and pig-version-withouthadoop-h2.jar 
 that are needed to run pig from the command line.
 This will allow a dual-version package.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-3303) add hadoop h2 artifact to publications in ivy.xml

2013-05-01 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem resolved PIG-3303.


   Resolution: Fixed
Fix Version/s: 0.12

Merged in trunk

 add hadoop h2 artifact to publications in ivy.xml
 -

 Key: PIG-3303
 URL: https://issues.apache.org/jira/browse/PIG-3303
 Project: Pig
  Issue Type: Bug
Reporter: Julien Le Dem
Assignee: Julien Le Dem
 Fix For: 0.12

 Attachments: PIG-3303.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3307) Refactor physical operators to remove methods parameters that are always null

2013-05-01 Thread Julien Le Dem (JIRA)
Julien Le Dem created PIG-3307:
--

 Summary: Refactor physical operators to remove methods parameters 
that are always null
 Key: PIG-3307
 URL: https://issues.apache.org/jira/browse/PIG-3307
 Project: Pig
  Issue Type: Improvement
Reporter: Julien Le Dem
Assignee: Julien Le Dem
 Attachments: PIG-3307_0.patch

The physical operators are sometimes overly complex. I'm trying to cleanup some 
unnecessary code.
in particular there is an array of getNext(*T* v) where the value v does not 
seem to have any importance and is just use to pick the correct method.
I have started a refactoring for a more readable getNext*T*().


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3307) Refactor physical operators to remove methods parameters that are always null

2013-05-01 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-3307:
---

Attachment: PIG-3307_0.patch

PIG-3307_0.patch contains the initial refactoring

 Refactor physical operators to remove methods parameters that are always null
 -

 Key: PIG-3307
 URL: https://issues.apache.org/jira/browse/PIG-3307
 Project: Pig
  Issue Type: Improvement
Reporter: Julien Le Dem
Assignee: Julien Le Dem
 Attachments: PIG-3307_0.patch


 The physical operators are sometimes overly complex. I'm trying to cleanup 
 some unnecessary code.
 in particular there is an array of getNext(*T* v) where the value v does not 
 seem to have any importance and is just use to pick the correct method.
 I have started a refactoring for a more readable getNext*T*().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3307) Refactor physical operators to remove methods parameters that are always null

2013-05-01 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-3307:
---

Description: 
The physical operators are sometimes overly complex. I'm trying to cleanup some 
unnecessary code.
in particular there is an array of getNext(*T* v) where the value v does not 
seem to have any importance and is just used to pick the correct method.
I have started a refactoring for a more readable getNext*T*().


  was:
The physical operators are sometimes overly complex. I'm trying to cleanup some 
unnecessary code.
in particular there is an array of getNext(*T* v) where the value v does not 
seem to have any importance and is just use to pick the correct method.
I have started a refactoring for a more readable getNext*T*().



 Refactor physical operators to remove methods parameters that are always null
 -

 Key: PIG-3307
 URL: https://issues.apache.org/jira/browse/PIG-3307
 Project: Pig
  Issue Type: Improvement
Reporter: Julien Le Dem
Assignee: Julien Le Dem
 Attachments: PIG-3307_0.patch


 The physical operators are sometimes overly complex. I'm trying to cleanup 
 some unnecessary code.
 in particular there is an array of getNext(*T* v) where the value v does not 
 seem to have any importance and is just used to pick the correct method.
 I have started a refactoring for a more readable getNext*T*().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3307) Refactor physical operators to remove methods parameters that are always null

2013-05-01 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-3307:
---

Attachment: PIG-3307_1.patch

PIG-3307_1.patch introduces some more refactoring

 Refactor physical operators to remove methods parameters that are always null
 -

 Key: PIG-3307
 URL: https://issues.apache.org/jira/browse/PIG-3307
 Project: Pig
  Issue Type: Improvement
Reporter: Julien Le Dem
Assignee: Julien Le Dem
 Attachments: PIG-3307_0.patch, PIG-3307_1.patch


 The physical operators are sometimes overly complex. I'm trying to cleanup 
 some unnecessary code.
 in particular there is an array of getNext(*T* v) where the value v does not 
 seem to have any importance and is just used to pick the correct method.
 I have started a refactoring for a more readable getNext*T*().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3307) Refactor physical operators to remove methods parameters that are always null

2013-05-01 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647174#comment-13647174
 ] 

Julien Le Dem commented on PIG-3307:


It looks like we can get rid of the parameter that is only used for method 
dispatch.
I will replace all calls to getNext(Tuple t) to getNextTuple() in 
PhysicalOperator.

 Refactor physical operators to remove methods parameters that are always null
 -

 Key: PIG-3307
 URL: https://issues.apache.org/jira/browse/PIG-3307
 Project: Pig
  Issue Type: Improvement
Reporter: Julien Le Dem
Assignee: Julien Le Dem
 Attachments: PIG-3307_0.patch, PIG-3307_1.patch


 The physical operators are sometimes overly complex. I'm trying to cleanup 
 some unnecessary code.
 in particular there is an array of getNext(*T* v) where the value v does not 
 seem to have any importance and is just used to pick the correct method.
 I have started a refactoring for a more readable getNext*T*().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3303) add hadoop h2 artifact to publications in ivy.xml

2013-04-30 Thread Julien Le Dem (JIRA)
Julien Le Dem created PIG-3303:
--

 Summary: add hadoop h2 artifact to publications in ivy.xml
 Key: PIG-3303
 URL: https://issues.apache.org/jira/browse/PIG-3303
 Project: Pig
  Issue Type: Bug
Reporter: Julien Le Dem




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3303) add hadoop h2 artifact to publications in ivy.xml

2013-04-30 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-3303:
---

Attachment: PIG-3303.patch

 add hadoop h2 artifact to publications in ivy.xml
 -

 Key: PIG-3303
 URL: https://issues.apache.org/jira/browse/PIG-3303
 Project: Pig
  Issue Type: Bug
Reporter: Julien Le Dem
 Attachments: PIG-3303.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3303) add hadoop h2 artifact to publications in ivy.xml

2013-04-30 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-3303:
---

Patch Info: Patch Available

 add hadoop h2 artifact to publications in ivy.xml
 -

 Key: PIG-3303
 URL: https://issues.apache.org/jira/browse/PIG-3303
 Project: Pig
  Issue Type: Bug
Reporter: Julien Le Dem
 Attachments: PIG-3303.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3214) New/improved mascot

2013-03-07 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13596273#comment-13596273
 ] 

Julien Le Dem commented on PIG-3214:


Who let the trolls out?

 New/improved mascot
 ---

 Key: PIG-3214
 URL: https://issues.apache.org/jira/browse/PIG-3214
 Project: Pig
  Issue Type: Wish
  Components: site
Affects Versions: 0.11
Reporter: Andrew Musselman
Priority: Minor
 Fix For: 0.12

 Attachments: apache-pig-yellow-logo.png, newlogo1.png, newlogo2.png, 
 newlogo3.png, newlogo4.png, newlogo5.png, new_logo_7.png, pig_6.JPG, 
 pig-logo-10.png, pig-logo-11.png, pig-logo-12.png, pig-logo-13.png, 
 pig-logo-8a.png, pig-logo-8b.png, pig-logo-9a.png, pig-logo-9b.png, 
 pig_logo_new.png


 Request to change pig mascot to something more graphically appealing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Issue Comment Deleted] (PIG-3214) New/improved mascot

2013-03-07 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-3214:
---

Comment: was deleted

(was: Who let the trolls out?)

 New/improved mascot
 ---

 Key: PIG-3214
 URL: https://issues.apache.org/jira/browse/PIG-3214
 Project: Pig
  Issue Type: Wish
  Components: site
Affects Versions: 0.11
Reporter: Andrew Musselman
Priority: Minor
 Fix For: 0.12

 Attachments: apache-pig-yellow-logo.png, newlogo1.png, newlogo2.png, 
 newlogo3.png, newlogo4.png, newlogo5.png, new_logo_7.png, pig_6.JPG, 
 pig-logo-10.png, pig-logo-11.png, pig-logo-12.png, pig-logo-13.png, 
 pig-logo-8a.png, pig-logo-8b.png, pig-logo-9a.png, pig-logo-9b.png, 
 pig_logo_new.png


 Request to change pig mascot to something more graphically appealing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3214) New/improved mascot

2013-03-05 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-3214:
---

Attachment: pig_6.JPG

I like the idea of number 4 too.
Here is my little contribution (#6) to it.
just to illustrate the idea.
(I do agree with Alan that we should only change if we have something much 
better)

 New/improved mascot
 ---

 Key: PIG-3214
 URL: https://issues.apache.org/jira/browse/PIG-3214
 Project: Pig
  Issue Type: Wish
  Components: site
Affects Versions: 0.11
Reporter: Andrew Musselman
Priority: Minor
 Fix For: 0.12

 Attachments: newlogo1.png, newlogo2.png, newlogo3.png, newlogo4.png, 
 newlogo5.png, pig_6.JPG


 Request to change pig mascot to something more graphically appealing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3194) Changes to ObjectSerializer.java break compatibility with Hadoop 0.20.2

2013-03-05 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13593747#comment-13593747
 ] 

Julien Le Dem commented on PIG-3194:


same as Jon. we can just use the methods present in 1.3 and we don't need to be 
URL safe.
Let's not repackage commons.codec or duplicate part of it just for this. 

 Changes to ObjectSerializer.java break compatibility with Hadoop 0.20.2
 ---

 Key: PIG-3194
 URL: https://issues.apache.org/jira/browse/PIG-3194
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11
Reporter: Kai Londenberg

 The changes to ObjectSerializer.java in the following commit
 http://svn.apache.org/viewvc?view=revisionrevision=1403934 break 
 compatibility with Hadoop 0.20.2 Clusters.
 The reason is, that the code uses methods from Apache Commons Codec 1.4 - 
 which are not available in Apache Commons Codec 1.3 which is shipping with 
 Hadoop 0.20.2.
 The offending methods are Base64.decodeBase64(String) and 
 Base64.encodeBase64URLSafeString(byte[])
 If I revert these changes, Pig 0.11.0 candidate 2 works well with our Hadoop 
 0.20.2 Clusters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3235) Enable DEBUG log messages in unit tests by default

2013-03-05 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13593990#comment-13593990
 ] 

Julien Le Dem commented on PIG-3235:


What would be useful is to have a log4j.properties for tests in a known 
location that is automatically picked up in test and can be easily modified on 
a case by case basis.

 Enable DEBUG log messages in unit tests by default
 --

 Key: PIG-3235
 URL: https://issues.apache.org/jira/browse/PIG-3235
 Project: Pig
  Issue Type: Improvement
  Components: tools
Reporter: Cheolsoo Park
Priority: Minor

 Currently, debug level messages are not logged for unit tests. It is helpful 
 to enable them to debug unit tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3214) New/improved mascot

2013-03-05 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13594152#comment-13594152
 ] 

Julien Le Dem commented on PIG-3214:


and PI can be the front legs
;)

 New/improved mascot
 ---

 Key: PIG-3214
 URL: https://issues.apache.org/jira/browse/PIG-3214
 Project: Pig
  Issue Type: Wish
  Components: site
Affects Versions: 0.11
Reporter: Andrew Musselman
Priority: Minor
 Fix For: 0.12

 Attachments: newlogo1.png, newlogo2.png, newlogo3.png, newlogo4.png, 
 newlogo5.png, pig_6.JPG


 Request to change pig mascot to something more graphically appealing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3140) Document PigProgressNotificationListener configs

2013-01-28 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13564901#comment-13564901
 ] 

Julien Le Dem commented on PIG-3140:


+1

 Document PigProgressNotificationListener configs
 

 Key: PIG-3140
 URL: https://issues.apache.org/jira/browse/PIG-3140
 Project: Pig
  Issue Type: Bug
Reporter: Bill Graham
Assignee: Bill Graham
 Fix For: 0.11

 Attachments: PIG-3140_1.patch


 Add docs to describe what PPNL is and how to configure it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3139) Document reducer estimation

2013-01-28 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13564902#comment-13564902
 ] 

Julien Le Dem commented on PIG-3139:


+1

 Document reducer estimation
 ---

 Key: PIG-3139
 URL: https://issues.apache.org/jira/browse/PIG-3139
 Project: Pig
  Issue Type: Bug
Reporter: Bill Graham
Assignee: Bill Graham
 Fix For: 0.11

 Attachments: PIG-3139_1.patch


 Add docs to describe how default reducer estimation algo works and how to 
 override it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3105) Fix TestJobSubmission unit test failure.

2013-01-22 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-3105:
---

Fix Version/s: (was: 0.11)
   0.12

I'm moving this to the next release as it does not seem to be a blocker for Pig 
0.11


 Fix TestJobSubmission unit test failure.
 

 Key: PIG-3105
 URL: https://issues.apache.org/jira/browse/PIG-3105
 Project: Pig
  Issue Type: Bug
  Components: tools
Affects Versions: 0.10.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Fix For: 0.12

 Attachments: PIG-3105.patch


 Currently with Hadoop 1.0, the TestJobSubmission unit test fails. This is due 
 to HBASE-7423. This is a work around to that issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2846) Can we skip hcat related e2e when hcat is not installed?

2013-01-22 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560184#comment-13560184
 ] 

Julien Le Dem commented on PIG-2846:


Hey [~cheolsoo] should we detach this from Pig 0.11 ?

 Can we skip hcat related e2e when hcat is not installed?
 

 Key: PIG-2846
 URL: https://issues.apache.org/jira/browse/PIG-2846
 Project: Pig
  Issue Type: Sub-task
Reporter: Koji Noguchi
Priority: Trivial
 Attachments: pig-2846-trunk-v1.txt


 Trying pig e2e for the first time, I see couple of the tests 
 (HCatDDL_1,HCatDDL_2 and Jython_Command_1) failing with 
 bq. java.io.IOException: Cannot run program /usr/local/hcat/bin/hcat:
 bq. java.io.IOException: error=2, No such file or directory
 Is it ok to change the test_harness to skip these tests when hcat does not 
 exist?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3005) TestLargeFile#testOrderBy is failing

2013-01-22 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-3005:
---

Issue Type: Bug  (was: Sub-task)
Parent: (was: PIG-2972)

 TestLargeFile#testOrderBy is failing
 

 Key: PIG-3005
 URL: https://issues.apache.org/jira/browse/PIG-3005
 Project: Pig
  Issue Type: Bug
 Environment: Mac OSX 10.6.8
Reporter: Jonathan Coveney
 Fix For: 0.12


 When run locally, at least, this test is failing for me.
 Has anyone else noticed this failing?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3082) outputSchema of a UDF allows two usages when describing a Tuple schema

2013-01-17 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556686#comment-13556686
 ] 

Julien Le Dem commented on PIG-3082:


Thanks for fixing Jon!
I find the error message a little confusing:
{noformat}
 throw new FrontendException(Given UDF returns an improper Schema. Should only 
return Tuple, Bag, or a single item. Returns:  + udfSchema);
{noformat}
It should contain something along the lines of ... outputSchema should return 
a Schema containing a single Field 
Otherwise, it looks good to me.
Thanks

 outputSchema of a UDF allows two usages when describing a Tuple schema
 --

 Key: PIG-3082
 URL: https://issues.apache.org/jira/browse/PIG-3082
 Project: Pig
  Issue Type: Bug
Reporter: Julien Le Dem
Assignee: Jonathan Coveney
 Fix For: 0.12

 Attachments: PIG-3082-0.patch


 When defining an evalfunc that returns a Tuple there are two ways you can 
 implement outputSchema().
 - The right way: return a schema that contains one Field that contains the 
 type and schema of the return type of the UDF
 - The unreliable way: return a schema that contains more than one field and 
 it will be understood as a tuple schema even though there is no type (which 
 is in Field class) to specify that. This is particularly deceitful when the 
 output schema is derived from the input schema and the outputted Tuple 
 sometimes contain only one field. In such cases Pig understands the output 
 schema as a tuple only if there is more than one field. And sometimes it 
 works, sometimes it does not.
 We should at least issue a warning (backward compatibility) if not plain 
 throw an exception when the output schema contains more than one Field.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3098) Add another test for the self join case

2013-01-17 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556692#comment-13556692
 ] 

Julien Le Dem commented on PIG-3098:


one minor comment regarding asserts:
{noformat}
  assertEquals(tuples.size(), out.size());
  for (Tuple t : out) {
assertTrue(tuples.remove(t));
  }
  assertTrue(tuples.isEmpty());
{noformat}
if wrong it is not going to give much information.
please add a message as the first parameter with some info:
{noformat}
  assertEquals(tuple count for  + out, tuples.size(), out.size());
  for (Tuple t : out) {
assertTrue(existence of  + t, tuples.remove(t));
  }
  assertTrue(all tuples consumed in  + tuples, tuples.isEmpty());
{noformat}



 Add another test for the self join case
 ---

 Key: PIG-3098
 URL: https://issues.apache.org/jira/browse/PIG-3098
 Project: Pig
  Issue Type: Bug
Reporter: Jonathan Coveney
Assignee: Jonathan Coveney
 Fix For: 0.12

 Attachments: PIG-3098-0.patch


 This adds a test to TestJoin that doesn't just make sure that self joins work 
 semantically in the parser, but also that it pulls the right data through. 
 Thought it'd be easier to just make a new JIRA than to reopen PIG-3020.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3103) make mockito a test dependency (instead of compile)

2012-12-21 Thread Julien Le Dem (JIRA)
Julien Le Dem created PIG-3103:
--

 Summary: make mockito a test dependency (instead of compile)
 Key: PIG-3103
 URL: https://issues.apache.org/jira/browse/PIG-3103
 Project: Pig
  Issue Type: Bug
Reporter: Julien Le Dem




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3076) make TestScalarAliases more reliable

2012-12-19 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-3076:
---

Attachment: PIG-3076_1.patch

PIG-3076_1.patch addresses comments

 make TestScalarAliases more reliable
 

 Key: PIG-3076
 URL: https://issues.apache.org/jira/browse/PIG-3076
 Project: Pig
  Issue Type: Test
Reporter: Julien Le Dem
Assignee: Julien Le Dem
 Fix For: 0.11, 0.12

 Attachments: PIG-3076_1.patch, PIG-3076.patch


 currently, this test writes in the root directory so its output is not 
 deleted by ant clean.
 Also it deletes its output in the end instead of the begining.
 The consequence is that if the test fail once then it will keep failing until 
 the directory is manually cleaned up (not good for CI)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2959) Add a pig.cmd for Pig to run under Windows

2012-12-18 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13535220#comment-13535220
 ] 

Julien Le Dem commented on PIG-2959:


hey Daniel, are you going to commit this?

 Add a pig.cmd for Pig to run under Windows
 --

 Key: PIG-2959
 URL: https://issues.apache.org/jira/browse/PIG-2959
 Project: Pig
  Issue Type: Sub-task
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.11

 Attachments: pig.cmd




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2957) TetsScriptUDF fail due to volume prefix in jar

2012-12-18 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13535249#comment-13535249
 ] 

Julien Le Dem commented on PIG-2957:


could you call the method something more explicit than cleanupPath. Something 
like getPathForJar maybe?
Also add comments to explain what exactly this is doing:
{noformat}
if (path.charAt(1)==':') {
newPath = path.charAt(0) + path.substring(2);
}
{noformat}
It would be useful to describe what it is changing in the path and why.
In particular the drive letter becomes a root dir in the jar (C:/foo becomes 
C/foo). If that's what we want then it should be clearer.

 TetsScriptUDF fail due to volume prefix in jar
 --

 Key: PIG-2957
 URL: https://issues.apache.org/jira/browse/PIG-2957
 Project: Pig
  Issue Type: Sub-task
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.11

 Attachments: PIG-2957-1.patch, PIG-2957-2_0.10.patch, PIG-2957-2.patch


 testPythonAbsolutePath fail. Stack is:
 java.io.IOException: Mkdirs failed to create 
 C:\tmp\hadoop-Administrator\mapred\local\1_0\taskTracker\Administrator\jobcache\job_20120725074728013_0011\jars\C:\Users\Administrator\pig-monarch
 at org.apache.hadoop.util.RunJar.unJar(RunJar.java:47)
 at 
 org.apache.hadoop.mapred.JobLocalizer.localizeJobJarFile(JobLocalizer.java:277)
 at 
 org.apache.hadoop.mapred.JobLocalizer.localizeJobFiles(JobLocalizer.java:377)
 at 
 org.apache.hadoop.mapred.JobLocalizer.localizeJobFiles(JobLocalizer.java:367)
 at 
 org.apache.hadoop.mapred.DefaultTaskController.initializeJob(DefaultTaskController.java:214)
 at org.apache.hadoop.mapred.TaskTracker$4.run(TaskTracker.java:1237)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1107)
 at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1212)
 at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1127)
 at org.apache.hadoop.mapred.TaskTracker$5.run(TaskTracker.java:2417)
 at java.lang.Thread.run(Thread.java:662)
 The reason is we pack the volume prefix into the job.jar.
 jar tvf C:\Users\ADMINI~1\AppData\Local\Temp\Job6350
 669482684441868.jar|grep testPythonAbsolutePath
 98 Wed Jul 25 11:12:58 PDT 2012 C:\Users\Administrator\pig-monarch\testPytho
 nAbsolutePath.py

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2956) Invalid cache specification for some streaming statement

2012-12-18 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13535335#comment-13535335
 ] 

Julien Le Dem commented on PIG-2956:


Daniel? any update on this?

 Invalid cache specification for some streaming statement
 

 Key: PIG-2956
 URL: https://issues.apache.org/jira/browse/PIG-2956
 Project: Pig
  Issue Type: Sub-task
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.11

 Attachments: PIG-2956-1_0.10.patch, PIG-2956-1.patch


 Another category of failure in e2e tests, such as ComputeSpec_1, 
 ComputeSpec_2, ComputeSpec_3, RaceConditions_1, RaceConditions_3, 
 RaceConditions_4, RaceConditions_7, RaceConditions_8.
 Here is stack:
 ERROR 6003: Invalid cache specification. File doesn't exist: C:/Program Files 
 (x86)/GnuWin32/bin/head.exe
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException:
  ERROR 2017: Internal error creating job configuration.
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:723)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:258)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:151)
 at org.apache.pig.PigServer.launchPlan(PigServer.java:1318)
 at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1303)
 at org.apache.pig.PigServer.execute(PigServer.java:1293)
 at org.apache.pig.PigServer.executeBatch(PigServer.java:364)
 at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:133)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
 at org.apache.pig.Main.run(Main.java:561)
 at org.apache.pig.Main.main(Main.java:111)
 Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 6003: 
 Invalid cache specification. File doesn't exist: C:/Program Files 
 (x86)/GnuWin32/bin/head.exe
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.setupDistributedCache(JobControlCompiler.java:1151)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.setupDistributedCache(JobControlCompiler.java:1129)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:447)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2955) Fix bunch of Pig e2e tests on Windows

2012-12-18 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13535339#comment-13535339
 ] 

Julien Le Dem commented on PIG-2955:


Daniel, do you want to check that in?

  Fix bunch of Pig e2e tests on Windows 
 ---

 Key: PIG-2955
 URL: https://issues.apache.org/jira/browse/PIG-2955
 Project: Pig
  Issue Type: Sub-task
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.11, 0.10.1, 0.12

 Attachments: PIG-2955-1.patch, PIG-2955-2_0.10.patch, PIG-2955-2.patch


 Fix the following test aborts and failures:
 ComputeSpec_1
 ComputeSpec_2
 Unicode_cmdline_1
 Warning_1
 Warning_4
 Checkin_2
 UdfDistributedCache_1
 Jython_Checkin_2
 Jython_Diagnostics_4
 Jython_Diagnostics_5
 Jython_Diagnostics_6
 Jython_Error_3
 Jython_Error_4
 Jython_Error_5
 Jython_Error_6
 Jython_Error_7
 Grunt_6
 Grunt_8
 Grunt_13
 Grunt_14

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2954) TestParamSubPreproc still depends on bash to run

2012-12-18 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13535340#comment-13535340
 ] 

Julien Le Dem commented on PIG-2954:


is this still on target for pig-0.11?

  TestParamSubPreproc still depends on bash to run 
 

 Key: PIG-2954
 URL: https://issues.apache.org/jira/browse/PIG-2954
 Project: Pig
  Issue Type: Sub-task
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.11

 Attachments: PIG-2954-1.patch, PIG-2954-2.patch


 If bash is not exist in path, there are 3 test failures.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2927) SHIP and use JRuby gems in JRuby UDFs

2012-12-18 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-2927:
---

Fix Version/s: (was: 0.11)
   0.12

This will go in the next release as we are stabilizing the 0.11 branch

 SHIP and use JRuby gems in JRuby UDFs
 -

 Key: PIG-2927
 URL: https://issues.apache.org/jira/browse/PIG-2927
 Project: Pig
  Issue Type: New Feature
  Components: parser
Affects Versions: 0.11
 Environment: JRuby UDFs
Reporter: Russell Jurney
Assignee: Jonathan Coveney
Priority: Minor
 Fix For: 0.12

 Attachments: PIG-2927-0.patch, PIG-2927-1.patch, PIG-2927-2.patch, 
 PIG-2927-3.patch, PIG-2927-4.patch


 It would be great to use JRuby gems in JRuby UDFs without installing them on 
 all machines on the cluster. Some way to SHIP them automatically with the job 
 would be great.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2614) AvroStorage crashes on LOADING a single bad error

2012-12-18 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-2614:
---

Fix Version/s: (was: 0.10.1)
   (was: 0.11)
   0.12

moving this to next release so that we can converge on pig 0.11

 AvroStorage crashes on LOADING a single bad error
 -

 Key: PIG-2614
 URL: https://issues.apache.org/jira/browse/PIG-2614
 Project: Pig
  Issue Type: Bug
  Components: piggybank
Affects Versions: 0.10.0, 0.11
Reporter: Russell Jurney
Assignee: Jonathan Coveney
  Labels: avro, avrostorage, bad, book, cutting, doug, for, my, 
 pig, sadism
 Fix For: 0.12

 Attachments: PIG-2614_0.patch, PIG-2614_1.patch, PIG-2614_2.patch, 
 test_avro_files.tar.gz


 AvroStorage dies when a single bad record exists, such as one with missing 
 fields.  This is very bad on 'big data,' where bad records are inevitable.  
 See discussion at 
 http://www.quora.com/Big-Data/In-Big-Data-ETL-how-many-records-are-an-acceptable-loss
  for more theory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-2583) Add Grunt command to list the statements in cache

2012-12-18 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem resolved PIG-2583.


Resolution: Fixed

 Add Grunt command to list the statements in cache
 -

 Key: PIG-2583
 URL: https://issues.apache.org/jira/browse/PIG-2583
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.10.0
Reporter: Daniel Dai
Assignee: Allan Avendaño
Priority: Minor
  Labels: newbie
 Fix For: 0.11

 Attachments: gruntHistory1.patch, gruntHistory2.patch, 
 gruntHistory3.patch, gruntHistory4.patch, gruntHistory.patch


 It is convenient to list statements in cache:
 grunt a = load '1.txt'; 
 grunt b = foreach a generate $0, $1;
 grunt list
 a = load '1.txt';
 b = foreach a generate $0, $1;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2583) Add Grunt command to list the statements in cache

2012-12-18 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13535355#comment-13535355
 ] 

Julien Le Dem commented on PIG-2583:


[~xalan] I'm closing this ticket as it has been committed.
Please open a new ticket to further improve your contribution.
Thanks again


 Add Grunt command to list the statements in cache
 -

 Key: PIG-2583
 URL: https://issues.apache.org/jira/browse/PIG-2583
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.10.0
Reporter: Daniel Dai
Assignee: Allan Avendaño
Priority: Minor
  Labels: newbie
 Fix For: 0.11

 Attachments: gruntHistory1.patch, gruntHistory2.patch, 
 gruntHistory3.patch, gruntHistory4.patch, gruntHistory.patch


 It is convenient to list statements in cache:
 grunt a = load '1.txt'; 
 grunt b = foreach a generate $0, $1;
 grunt list
 a = load '1.txt';
 b = foreach a generate $0, $1;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-2493) UNION causes casting issues

2012-12-18 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem resolved PIG-2493.


Resolution: Fixed

I'm closing this issue as it has been committed and we are stabilizing a 
release.
[~arov] please open a new JIRA if you still see problems

 UNION causes casting issues
 ---

 Key: PIG-2493
 URL: https://issues.apache.org/jira/browse/PIG-2493
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.9.1, 0.10.0
Reporter: Anitha Raju
Assignee: Vivek Padmanabhan
 Fix For: 0.9.3, 0.11, 0.10.1

 Attachments: PIG-2493_2.patch, PIG-2493-3.patch, PIG-2493.patch


 Hi,
 For the below script,
 {code}
 A = load '/user/anithar/ip' as (a);
 B = load '/user/anithar/ip1' as (a);
 C = union  A , B ;
 D = foreach C generate (chararray)a;
 dump D;
 {code}
 it gives casting error at runtime
 {code}
 org.apache.pig.backend.executionengine.ExecException: ERROR 1075: Received a 
 bytearray from the UDF. Cannot determine how to convert the bytearray to 
 string.
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:660)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:322)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:332)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:284)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:267)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:262)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1082)
 at org.apache.hadoop.mapred.Child.main(Child.java:249)
 {code}
 It looks like in POCast.java the value of funcSpec is not getting any 
 value(stays null when there is a UNION involved), causing caster to get 
 null and thus the exception.
 The same works in 0.8 without any issue.
 Regards,
 Anitha

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3020) Duplicate uid in schema error when joining two relations derived from the same load statement

2012-12-17 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13534401#comment-13534401
 ] 

Julien Le Dem commented on PIG-3020:


looks good to me
+1

 Duplicate uid in schema error when joining two relations derived from the 
 same load statement
 ---

 Key: PIG-3020
 URL: https://issues.apache.org/jira/browse/PIG-3020
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11
Reporter: Julien Le Dem
Assignee: Julien Le Dem
 Attachments: PIG-3020-2.patch, PIG-3020-2_ws.patch, 
 PIG-3020_branch-0.11_1.patch, PIG-3020.patch, PIG-3093-testcase.patch


 The following validates OK with pig 0.9 and fails with the following error in 
 0.11 (and I suspect 0.10)
 pig -c debug2.pig
 Script: debug2.pig
 {noformat}
 A = LOAD 'foo' AS (group:tuple(uid, dst_id), uids_with_recs:bag{} , 
 uids_with_flock:bag{});
 edges_both = FILTER A BY NOT IsEmpty(uids_with_recs) AND NOT 
 IsEmpty(uids_with_flock);
 edges_both = FOREACH edges_both GENERATE
 group.uid AS src_id,
 group.dst_id AS dst_id;
 both_counts = GROUP edges_both BY src_id;
 both_counts = FOREACH both_counts GENERATE
 group AS src_id, SIZE(edges_both) AS size_both;
 edges_bq = FILTER A BY NOT IsEmpty(uids_with_recs);
 edges_bq = FOREACH edges_bq GENERATE
 group.uid AS src_id,
 group.dst_id AS dst_id;
 bq_counts = GROUP edges_bq BY src_id;
 bq_counts = FOREACH bq_counts GENERATE
 group AS src_id, SIZE(edges_bq) AS size_bq;
 per_user_set_sizes = JOIN bq_counts BY src_id LEFT OUTER, both_counts BY 
 src_id;
 store per_user_set_sizes into  'foo';
 {noformat}
 Error:
 {noformat}
 ERROR 2270: Logical plan invalid state: duplicate uid in schema : 
 bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to 
 explain alias null
   at org.apache.pig.PigServer.explain(PigServer.java:999)
   at 
 org.apache.pig.tools.grunt.GruntParser.explainCurrentBatch(GruntParser.java:398)
   at 
 org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:330)
   at org.apache.pig.tools.grunt.Grunt.checkScript(Grunt.java:98)
   at org.apache.pig.Main.run(Main.java:600)
   at org.apache.pig.Main.main(Main.java:154)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
 Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: 
 Error processing rule LoadTypeCastInserter
   at 
 org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:122)
   at 
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:277)
   at org.apache.pig.PigServer.compilePp(PigServer.java:1322)
   at org.apache.pig.PigServer.explain(PigServer.java:984)
   ... 10 more
 Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2270: 
 Logical plan invalid state: duplicate uid in schema : 
 bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long
   at 
 org.apache.pig.newplan.logical.optimizer.SchemaResetter.validate(SchemaResetter.java:232)
   at 
 org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:105)
   at 
 org.apache.pig.newplan.logical.relational.LOJoin.accept(LOJoin.java:171)
   at 
 org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
   at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
   at 
 org.apache.pig.newplan.logical.optimizer.SchemaPatcher.transformed(SchemaPatcher.java:43)
   at 
 org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:113)
   ... 13 more
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (PIG-3020) Duplicate uid in schema error when joining two relations derived from the same load statement

2012-12-13 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem reassigned PIG-3020:
--

Assignee: Julien Le Dem

 Duplicate uid in schema error when joining two relations derived from the 
 same load statement
 ---

 Key: PIG-3020
 URL: https://issues.apache.org/jira/browse/PIG-3020
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11
Reporter: Julien Le Dem
Assignee: Julien Le Dem
 Attachments: PIG-3020.patch


 The following vali=dates OK with pig 0.9 and fails with the following error 
 in 0.11 (and I suspect 0.10)
 pig -c debug2.pig
 Script: debug2.pig
 {noformat}
 A = LOAD 'foo' AS (group:tuple(uid, dst_id), uids_with_recs:bag{} , 
 uids_with_flock:bag{});
 edges_both = FILTER A BY NOT IsEmpty(uids_with_recs) AND NOT 
 IsEmpty(uids_with_flock);
 edges_both = FOREACH edges_both GENERATE
 group.uid AS src_id,
 group.dst_id AS dst_id;
 both_counts = GROUP edges_both BY src_id;
 both_counts = FOREACH both_counts GENERATE
 group AS src_id, SIZE(edges_both) AS size_both;
 edges_bq = FILTER A BY NOT IsEmpty(uids_with_recs);
 edges_bq = FOREACH edges_bq GENERATE
 group.uid AS src_id,
 group.dst_id AS dst_id;
 bq_counts = GROUP edges_bq BY src_id;
 bq_counts = FOREACH bq_counts GENERATE
 group AS src_id, SIZE(edges_bq) AS size_bq;
 per_user_set_sizes = JOIN bq_counts BY src_id LEFT OUTER, both_counts BY 
 src_id;
 store per_user_set_sizes into  'foo';
 {noformat}
 Error:
 {noformat}
 ERROR 2270: Logical plan invalid state: duplicate uid in schema : 
 bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to 
 explain alias null
   at org.apache.pig.PigServer.explain(PigServer.java:999)
   at 
 org.apache.pig.tools.grunt.GruntParser.explainCurrentBatch(GruntParser.java:398)
   at 
 org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:330)
   at org.apache.pig.tools.grunt.Grunt.checkScript(Grunt.java:98)
   at org.apache.pig.Main.run(Main.java:600)
   at org.apache.pig.Main.main(Main.java:154)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
 Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: 
 Error processing rule LoadTypeCastInserter
   at 
 org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:122)
   at 
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:277)
   at org.apache.pig.PigServer.compilePp(PigServer.java:1322)
   at org.apache.pig.PigServer.explain(PigServer.java:984)
   ... 10 more
 Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2270: 
 Logical plan invalid state: duplicate uid in schema : 
 bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long
   at 
 org.apache.pig.newplan.logical.optimizer.SchemaResetter.validate(SchemaResetter.java:232)
   at 
 org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:105)
   at 
 org.apache.pig.newplan.logical.relational.LOJoin.accept(LOJoin.java:171)
   at 
 org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
   at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
   at 
 org.apache.pig.newplan.logical.optimizer.SchemaPatcher.transformed(SchemaPatcher.java:43)
   at 
 org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:113)
   ... 13 more
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3020) Duplicate uid in schema error when joining two relations derived from the same load statement

2012-12-13 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-3020:
---

Patch Info: Patch Available

 Duplicate uid in schema error when joining two relations derived from the 
 same load statement
 ---

 Key: PIG-3020
 URL: https://issues.apache.org/jira/browse/PIG-3020
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11
Reporter: Julien Le Dem
Assignee: Julien Le Dem
 Attachments: PIG-3020.patch


 The following vali=dates OK with pig 0.9 and fails with the following error 
 in 0.11 (and I suspect 0.10)
 pig -c debug2.pig
 Script: debug2.pig
 {noformat}
 A = LOAD 'foo' AS (group:tuple(uid, dst_id), uids_with_recs:bag{} , 
 uids_with_flock:bag{});
 edges_both = FILTER A BY NOT IsEmpty(uids_with_recs) AND NOT 
 IsEmpty(uids_with_flock);
 edges_both = FOREACH edges_both GENERATE
 group.uid AS src_id,
 group.dst_id AS dst_id;
 both_counts = GROUP edges_both BY src_id;
 both_counts = FOREACH both_counts GENERATE
 group AS src_id, SIZE(edges_both) AS size_both;
 edges_bq = FILTER A BY NOT IsEmpty(uids_with_recs);
 edges_bq = FOREACH edges_bq GENERATE
 group.uid AS src_id,
 group.dst_id AS dst_id;
 bq_counts = GROUP edges_bq BY src_id;
 bq_counts = FOREACH bq_counts GENERATE
 group AS src_id, SIZE(edges_bq) AS size_bq;
 per_user_set_sizes = JOIN bq_counts BY src_id LEFT OUTER, both_counts BY 
 src_id;
 store per_user_set_sizes into  'foo';
 {noformat}
 Error:
 {noformat}
 ERROR 2270: Logical plan invalid state: duplicate uid in schema : 
 bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to 
 explain alias null
   at org.apache.pig.PigServer.explain(PigServer.java:999)
   at 
 org.apache.pig.tools.grunt.GruntParser.explainCurrentBatch(GruntParser.java:398)
   at 
 org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:330)
   at org.apache.pig.tools.grunt.Grunt.checkScript(Grunt.java:98)
   at org.apache.pig.Main.run(Main.java:600)
   at org.apache.pig.Main.main(Main.java:154)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
 Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: 
 Error processing rule LoadTypeCastInserter
   at 
 org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:122)
   at 
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:277)
   at org.apache.pig.PigServer.compilePp(PigServer.java:1322)
   at org.apache.pig.PigServer.explain(PigServer.java:984)
   ... 10 more
 Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2270: 
 Logical plan invalid state: duplicate uid in schema : 
 bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long
   at 
 org.apache.pig.newplan.logical.optimizer.SchemaResetter.validate(SchemaResetter.java:232)
   at 
 org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:105)
   at 
 org.apache.pig.newplan.logical.relational.LOJoin.accept(LOJoin.java:171)
   at 
 org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
   at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
   at 
 org.apache.pig.newplan.logical.optimizer.SchemaPatcher.transformed(SchemaPatcher.java:43)
   at 
 org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:113)
   ... 13 more
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3020) Duplicate uid in schema error when joining two relations derived from the same load statement

2012-12-13 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-3020:
---

Description: 
The following validates OK with pig 0.9 and fails with the following error in 
0.11 (and I suspect 0.10)

pig -c debug2.pig

Script: debug2.pig
{noformat}
A = LOAD 'foo' AS (group:tuple(uid, dst_id), uids_with_recs:bag{} , 
uids_with_flock:bag{});
edges_both = FILTER A BY NOT IsEmpty(uids_with_recs) AND NOT 
IsEmpty(uids_with_flock);
edges_both = FOREACH edges_both GENERATE
group.uid AS src_id,
group.dst_id AS dst_id;
both_counts = GROUP edges_both BY src_id;
both_counts = FOREACH both_counts GENERATE
group AS src_id, SIZE(edges_both) AS size_both;

edges_bq = FILTER A BY NOT IsEmpty(uids_with_recs);
edges_bq = FOREACH edges_bq GENERATE
group.uid AS src_id,
group.dst_id AS dst_id;
bq_counts = GROUP edges_bq BY src_id;
bq_counts = FOREACH bq_counts GENERATE
group AS src_id, SIZE(edges_bq) AS size_bq;

per_user_set_sizes = JOIN bq_counts BY src_id LEFT OUTER, both_counts BY src_id;
store per_user_set_sizes into  'foo';
{noformat}

Error:
{noformat}
ERROR 2270: Logical plan invalid state: duplicate uid in schema : 
bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to 
explain alias null
at org.apache.pig.PigServer.explain(PigServer.java:999)
at 
org.apache.pig.tools.grunt.GruntParser.explainCurrentBatch(GruntParser.java:398)
at 
org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:330)
at org.apache.pig.tools.grunt.Grunt.checkScript(Grunt.java:98)
at org.apache.pig.Main.run(Main.java:600)
at org.apache.pig.Main.main(Main.java:154)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: 
Error processing rule LoadTypeCastInserter
at 
org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:122)
at 
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:277)
at org.apache.pig.PigServer.compilePp(PigServer.java:1322)
at org.apache.pig.PigServer.explain(PigServer.java:984)
... 10 more
Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2270: 
Logical plan invalid state: duplicate uid in schema : 
bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long
at 
org.apache.pig.newplan.logical.optimizer.SchemaResetter.validate(SchemaResetter.java:232)
at 
org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:105)
at 
org.apache.pig.newplan.logical.relational.LOJoin.accept(LOJoin.java:171)
at 
org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
at 
org.apache.pig.newplan.logical.optimizer.SchemaPatcher.transformed(SchemaPatcher.java:43)
at 
org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:113)
... 13 more
{noformat}

  was:
The following vali=dates OK with pig 0.9 and fails with the following error in 
0.11 (and I suspect 0.10)

pig -c debug2.pig

Script: debug2.pig
{noformat}
A = LOAD 'foo' AS (group:tuple(uid, dst_id), uids_with_recs:bag{} , 
uids_with_flock:bag{});
edges_both = FILTER A BY NOT IsEmpty(uids_with_recs) AND NOT 
IsEmpty(uids_with_flock);
edges_both = FOREACH edges_both GENERATE
group.uid AS src_id,
group.dst_id AS dst_id;
both_counts = GROUP edges_both BY src_id;
both_counts = FOREACH both_counts GENERATE
group AS src_id, SIZE(edges_both) AS size_both;

edges_bq = FILTER A BY NOT IsEmpty(uids_with_recs);
edges_bq = FOREACH edges_bq GENERATE
group.uid AS src_id,
group.dst_id AS dst_id;
bq_counts = GROUP edges_bq BY src_id;
bq_counts = FOREACH bq_counts GENERATE
group AS src_id, SIZE(edges_bq) AS size_bq;

per_user_set_sizes = JOIN bq_counts BY src_id LEFT OUTER, both_counts BY src_id;
store per_user_set_sizes into  'foo';
{noformat}

Error:
{noformat}
ERROR 2270: Logical plan invalid state: duplicate uid in schema : 
bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to 
explain alias null
at 

[jira] [Updated] (PIG-3020) Duplicate uid in schema error when joining two relations derived from the same load statement

2012-12-13 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-3020:
---

Attachment: PIG-3020_branch-0.11_1.patch

 Duplicate uid in schema error when joining two relations derived from the 
 same load statement
 ---

 Key: PIG-3020
 URL: https://issues.apache.org/jira/browse/PIG-3020
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11
Reporter: Julien Le Dem
Assignee: Julien Le Dem
 Attachments: PIG-3020_branch-0.11_1.patch, PIG-3020.patch


 The following validates OK with pig 0.9 and fails with the following error in 
 0.11 (and I suspect 0.10)
 pig -c debug2.pig
 Script: debug2.pig
 {noformat}
 A = LOAD 'foo' AS (group:tuple(uid, dst_id), uids_with_recs:bag{} , 
 uids_with_flock:bag{});
 edges_both = FILTER A BY NOT IsEmpty(uids_with_recs) AND NOT 
 IsEmpty(uids_with_flock);
 edges_both = FOREACH edges_both GENERATE
 group.uid AS src_id,
 group.dst_id AS dst_id;
 both_counts = GROUP edges_both BY src_id;
 both_counts = FOREACH both_counts GENERATE
 group AS src_id, SIZE(edges_both) AS size_both;
 edges_bq = FILTER A BY NOT IsEmpty(uids_with_recs);
 edges_bq = FOREACH edges_bq GENERATE
 group.uid AS src_id,
 group.dst_id AS dst_id;
 bq_counts = GROUP edges_bq BY src_id;
 bq_counts = FOREACH bq_counts GENERATE
 group AS src_id, SIZE(edges_bq) AS size_bq;
 per_user_set_sizes = JOIN bq_counts BY src_id LEFT OUTER, both_counts BY 
 src_id;
 store per_user_set_sizes into  'foo';
 {noformat}
 Error:
 {noformat}
 ERROR 2270: Logical plan invalid state: duplicate uid in schema : 
 bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to 
 explain alias null
   at org.apache.pig.PigServer.explain(PigServer.java:999)
   at 
 org.apache.pig.tools.grunt.GruntParser.explainCurrentBatch(GruntParser.java:398)
   at 
 org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:330)
   at org.apache.pig.tools.grunt.Grunt.checkScript(Grunt.java:98)
   at org.apache.pig.Main.run(Main.java:600)
   at org.apache.pig.Main.main(Main.java:154)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
 Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: 
 Error processing rule LoadTypeCastInserter
   at 
 org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:122)
   at 
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:277)
   at org.apache.pig.PigServer.compilePp(PigServer.java:1322)
   at org.apache.pig.PigServer.explain(PigServer.java:984)
   ... 10 more
 Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2270: 
 Logical plan invalid state: duplicate uid in schema : 
 bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long
   at 
 org.apache.pig.newplan.logical.optimizer.SchemaResetter.validate(SchemaResetter.java:232)
   at 
 org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:105)
   at 
 org.apache.pig.newplan.logical.relational.LOJoin.accept(LOJoin.java:171)
   at 
 org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
   at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
   at 
 org.apache.pig.newplan.logical.optimizer.SchemaPatcher.transformed(SchemaPatcher.java:43)
   at 
 org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:113)
   ... 13 more
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3020) Duplicate uid in schema error when joining two relations derived from the same load statement

2012-12-12 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530255#comment-13530255
 ] 

Julien Le Dem commented on PIG-3020:


[~dvryaboy] I just noticed it was logging a warning with a NullPointerException 
when running tests from eclipse. I just fixed the log line to something 
clearer. It is not related but I feel it is small enough to be done here.
[~jcoveney] I also added a unit test with a pig script that was failing before 
and works now to validate my change.

 Duplicate uid in schema error when joining two relations derived from the 
 same load statement
 ---

 Key: PIG-3020
 URL: https://issues.apache.org/jira/browse/PIG-3020
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11
Reporter: Julien Le Dem
 Attachments: PIG-3020.patch


 The following vali=dates OK with pig 0.9 and fails with the following error 
 in 0.11 (and I suspect 0.10)
 pig -c debug2.pig
 Script: debug2.pig
 {noformat}
 A = LOAD 'foo' AS (group:tuple(uid, dst_id), uids_with_recs:bag{} , 
 uids_with_flock:bag{});
 edges_both = FILTER A BY NOT IsEmpty(uids_with_recs) AND NOT 
 IsEmpty(uids_with_flock);
 edges_both = FOREACH edges_both GENERATE
 group.uid AS src_id,
 group.dst_id AS dst_id;
 both_counts = GROUP edges_both BY src_id;
 both_counts = FOREACH both_counts GENERATE
 group AS src_id, SIZE(edges_both) AS size_both;
 edges_bq = FILTER A BY NOT IsEmpty(uids_with_recs);
 edges_bq = FOREACH edges_bq GENERATE
 group.uid AS src_id,
 group.dst_id AS dst_id;
 bq_counts = GROUP edges_bq BY src_id;
 bq_counts = FOREACH bq_counts GENERATE
 group AS src_id, SIZE(edges_bq) AS size_bq;
 per_user_set_sizes = JOIN bq_counts BY src_id LEFT OUTER, both_counts BY 
 src_id;
 store per_user_set_sizes into  'foo';
 {noformat}
 Error:
 {noformat}
 ERROR 2270: Logical plan invalid state: duplicate uid in schema : 
 bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to 
 explain alias null
   at org.apache.pig.PigServer.explain(PigServer.java:999)
   at 
 org.apache.pig.tools.grunt.GruntParser.explainCurrentBatch(GruntParser.java:398)
   at 
 org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:330)
   at org.apache.pig.tools.grunt.Grunt.checkScript(Grunt.java:98)
   at org.apache.pig.Main.run(Main.java:600)
   at org.apache.pig.Main.main(Main.java:154)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
 Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: 
 Error processing rule LoadTypeCastInserter
   at 
 org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:122)
   at 
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:277)
   at org.apache.pig.PigServer.compilePp(PigServer.java:1322)
   at org.apache.pig.PigServer.explain(PigServer.java:984)
   ... 10 more
 Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2270: 
 Logical plan invalid state: duplicate uid in schema : 
 bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long
   at 
 org.apache.pig.newplan.logical.optimizer.SchemaResetter.validate(SchemaResetter.java:232)
   at 
 org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:105)
   at 
 org.apache.pig.newplan.logical.relational.LOJoin.accept(LOJoin.java:171)
   at 
 org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
   at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
   at 
 org.apache.pig.newplan.logical.optimizer.SchemaPatcher.transformed(SchemaPatcher.java:43)
   at 
 org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:113)
   ... 13 more
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3020) Duplicate uid in schema error when joining two relations derived from the same load statement

2012-12-07 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-3020:
---

Attachment: PIG-3020.patch

PIG-3020.patch fixes the issue


 Duplicate uid in schema error when joining two relations derived from the 
 same load statement
 ---

 Key: PIG-3020
 URL: https://issues.apache.org/jira/browse/PIG-3020
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11
Reporter: Julien Le Dem
 Attachments: PIG-3020.patch


 The following vali=dates OK with pig 0.9 and fails with the following error 
 in 0.11 (and I suspect 0.10)
 pig -c debug2.pig
 Script: debug2.pig
 {noformat}
 A = LOAD 'foo' AS (group:tuple(uid, dst_id), uids_with_recs:bag{} , 
 uids_with_flock:bag{});
 edges_both = FILTER A BY NOT IsEmpty(uids_with_recs) AND NOT 
 IsEmpty(uids_with_flock);
 edges_both = FOREACH edges_both GENERATE
 group.uid AS src_id,
 group.dst_id AS dst_id;
 both_counts = GROUP edges_both BY src_id;
 both_counts = FOREACH both_counts GENERATE
 group AS src_id, SIZE(edges_both) AS size_both;
 edges_bq = FILTER A BY NOT IsEmpty(uids_with_recs);
 edges_bq = FOREACH edges_bq GENERATE
 group.uid AS src_id,
 group.dst_id AS dst_id;
 bq_counts = GROUP edges_bq BY src_id;
 bq_counts = FOREACH bq_counts GENERATE
 group AS src_id, SIZE(edges_bq) AS size_bq;
 per_user_set_sizes = JOIN bq_counts BY src_id LEFT OUTER, both_counts BY 
 src_id;
 store per_user_set_sizes into  'foo';
 {noformat}
 Error:
 {noformat}
 ERROR 2270: Logical plan invalid state: duplicate uid in schema : 
 bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to 
 explain alias null
   at org.apache.pig.PigServer.explain(PigServer.java:999)
   at 
 org.apache.pig.tools.grunt.GruntParser.explainCurrentBatch(GruntParser.java:398)
   at 
 org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:330)
   at org.apache.pig.tools.grunt.Grunt.checkScript(Grunt.java:98)
   at org.apache.pig.Main.run(Main.java:600)
   at org.apache.pig.Main.main(Main.java:154)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
 Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: 
 Error processing rule LoadTypeCastInserter
   at 
 org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:122)
   at 
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:277)
   at org.apache.pig.PigServer.compilePp(PigServer.java:1322)
   at org.apache.pig.PigServer.explain(PigServer.java:984)
   ... 10 more
 Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2270: 
 Logical plan invalid state: duplicate uid in schema : 
 bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long
   at 
 org.apache.pig.newplan.logical.optimizer.SchemaResetter.validate(SchemaResetter.java:232)
   at 
 org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:105)
   at 
 org.apache.pig.newplan.logical.relational.LOJoin.accept(LOJoin.java:171)
   at 
 org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
   at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
   at 
 org.apache.pig.newplan.logical.optimizer.SchemaPatcher.transformed(SchemaPatcher.java:43)
   at 
 org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:113)
   ... 13 more
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3084) Improve exceptions messages in POPackage

2012-12-07 Thread Julien Le Dem (JIRA)
Julien Le Dem created PIG-3084:
--

 Summary: Improve exceptions messages in POPackage
 Key: PIG-3084
 URL: https://issues.apache.org/jira/browse/PIG-3084
 Project: Pig
  Issue Type: Bug
Reporter: Julien Le Dem
Assignee: Julien Le Dem




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-3084) Improve exceptions messages in POPackage

2012-12-07 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem resolved PIG-3084.


   Resolution: Fixed
Fix Version/s: 0.12

 Improve exceptions messages in POPackage
 

 Key: PIG-3084
 URL: https://issues.apache.org/jira/browse/PIG-3084
 Project: Pig
  Issue Type: Bug
Reporter: Julien Le Dem
Assignee: Julien Le Dem
 Fix For: 0.12

 Attachments: PIG-3084_1.patch, PIG-3084.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2599) Mavenize Pig

2012-12-06 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13525797#comment-13525797
 ] 

Julien Le Dem commented on PIG-2599:


Hey, that sounds like a great start.
Usually attaching a patch to the JIRA and optionnaly posting a review to 
https://reviews.apache.org/dashboard/ is how a review is done.
If you need help you can ping the pig-dev mailing list or comment on the JIRA.
a shell script sounds good to me.
Is it intended to be a one time move that is then checked in to replace the 
current layout?
Why do you need jdo to be installed in your local maven repo, isn't maven going 
to do it?
could you provide a short description of each folder? Not all of them are clear 
to me. 
Do you deal with hadoop 20 vs 23 ?
I think zebra has had issues for a while. I'm not sure what the status of this 
is right now. Maybe Olga knows.
fixing checkstyle and findbugs later sound ok to me. It should be relatively 
easy to do.
what about the shim layer ?

Anyways, thanks for looking into this.

 Mavenize Pig
 

 Key: PIG-2599
 URL: https://issues.apache.org/jira/browse/PIG-2599
 Project: Pig
  Issue Type: New Feature
  Components: build
Reporter: Daniel Dai
  Labels: gsoc2012
 Attachments: maven-pig.1.zip


 Switch Pig build system from ant to maven.
 This is a candidate project for Google summer of code 2012. More information 
 about the program can be found at 
 https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2599) Mavenize Pig

2012-12-06 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-2599:
---

  Description: 
Switch Pig build system from ant to maven.



  was:
Switch Pig build system from ant to maven.

This is a candidate project for Google summer of code 2012. More information 
about the program can be found at 
https://cwiki.apache.org/confluence/display/PIG/GSoc2012

Fix Version/s: 0.12
   Labels:   (was: gsoc2012)

 Mavenize Pig
 

 Key: PIG-2599
 URL: https://issues.apache.org/jira/browse/PIG-2599
 Project: Pig
  Issue Type: New Feature
  Components: build
Reporter: Daniel Dai
 Fix For: 0.12

 Attachments: maven-pig.1.zip


 Switch Pig build system from ant to maven.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2815) class loader management in PigContext

2012-12-06 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13525809#comment-13525809
 ] 

Julien Le Dem commented on PIG-2815:


0.11/trunk sounds good to me


 class loader management in PigContext
 -

 Key: PIG-2815
 URL: https://issues.apache.org/jira/browse/PIG-2815
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.9.0
Reporter: Raghu Angadi
Assignee: Raghu Angadi
 Fix For: 0.11

 Attachments: PIG-2815-branch-0.9.patch, PIG-2815-branch-0.9.patch, 
 PIG-2815.patch, PIG-2815.patch


 The way {{PigContext.classloader}} and resolveClassName() are managed can 
 lead to strange class loading issues, especially when not all {{register}} 
 statements are at the top (example in the first comment).
 Two factors contribute to this: sometimes only one of them and sometimes 
 together:
  # a new classloader (CL) is created after registering each jar.
 ** but the new jar's parent is the root CL rather than previous CL, 
 effectively throwing previous CL away.
  # resolveClassName() caches classes based on just the name
 ** A class is not defined by name alone. Classes loaded by two different 
 unrelated CLs are different objects even if both extract the class from same 
 physical jar file.
 ** because of (1), the cached class is not necessarily same as the class 
 that would be loaded based on 'current' CL
 having different class objects for same class have many subtle side effects. 
 e.g. there would be two instances of static variables. 
 I think both should be fixed.. thought fixing one of them might be good 
 enough in many cases. I will add a patch.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2204) Allow passing arguments to custom Partitioners

2012-12-06 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13525819#comment-13525819
 ] 

Julien Le Dem commented on PIG-2204:


maybe just updating the doc to mention this?

 Allow passing arguments to custom Partitioners
 --

 Key: PIG-2204
 URL: https://issues.apache.org/jira/browse/PIG-2204
 Project: Pig
  Issue Type: Improvement
Reporter: Dmitriy V. Ryaboy

 Currently, this works:
 {code}
 y = group x by $0 partition by MyPartitioner PARALLEL 2;
 {code}
 However, passing an argument to the partitioner constructor does not work, 
 and dies with a misleading error:
 {code}
 y = group x by $0 partition by MyPartitioner(0) PARALLEL 2;
 2011-08-03 22:53:23,074 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1000: Error during parsing. Encountered  ( (  at line 1, column 91.
 Was expecting one of:
 parallel ...
 ; ...
 . ...
 $ ...
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2812) Spill InternalCachedBag into only 1 file

2012-12-04 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-2812:
---

Fix Version/s: (was: 0.11)

I'm detaching this from pig-0.11 as it is not ready yet

 Spill InternalCachedBag into only 1 file
 

 Key: PIG-2812
 URL: https://issues.apache.org/jira/browse/PIG-2812
 Project: Pig
  Issue Type: Bug
  Components: data
Reporter: Haitao Yao
Assignee: Haitao Yao
 Attachments: aa.jpg, spill.patch


 I encountered a reducer's OOM because of java.io.DeleteOnExitHook. And I 
 found out that the InternalCachedBag creates a seperate tmp file, and the tmp 
 files is deleted on exit. So the file delete hook caused the OOM. 
 Why not just hold the tmp file handle and spill only one tmp file?
 Too many tmp files may block the tasktracker start process, if the tmp files 
 are not cleaned on time and the tasktracker restarts at this specific time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3076) make TestScalarAliases more reliable

2012-12-04 Thread Julien Le Dem (JIRA)
Julien Le Dem created PIG-3076:
--

 Summary: make TestScalarAliases more reliable
 Key: PIG-3076
 URL: https://issues.apache.org/jira/browse/PIG-3076
 Project: Pig
  Issue Type: Test
Reporter: Julien Le Dem
Assignee: Julien Le Dem
 Fix For: 0.11, 0.12


currently, this test writes in the root directory so its output is not deleted 
by ant clean.
Also it deletes its output in the end instead of the begining.
The consequence is that if the test fail once then it will keep failing until 
the directory is manually cleaned up (not good for CI)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3077) TestMultiQueryLocal should not write in /tmp

2012-12-04 Thread Julien Le Dem (JIRA)
Julien Le Dem created PIG-3077:
--

 Summary: TestMultiQueryLocal should not write in /tmp
 Key: PIG-3077
 URL: https://issues.apache.org/jira/browse/PIG-3077
 Project: Pig
  Issue Type: Test
Reporter: Julien Le Dem


temporary files from tests should be under build/test so that they are cleaned 
by ant clean
Currently two test suites running on the same machine step on each other and 
create flaky tests results

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Reopened] (PIG-3014) CurrentTime() UDF has undesirable characteristics

2012-11-30 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem reopened PIG-3014:



I see a failing test:
org.apache.pig.test.TestBuiltin.testConversionBetweenDateTimeAndString

java.lang.NullPointerException
at org.apache.pig.builtin.CurrentTime.exec(CurrentTime.java:41)
at 
org.apache.pig.test.TestBuiltin.testConversionBetweenDateTimeAndString(TestBuiltin.java:450)


 CurrentTime() UDF has undesirable characteristics
 -

 Key: PIG-3014
 URL: https://issues.apache.org/jira/browse/PIG-3014
 Project: Pig
  Issue Type: Bug
Reporter: Jonathan Coveney
Assignee: Jonathan Coveney
 Fix For: 0.12

 Attachments: PIG-3014-0.patch, PIG-3014-1.patch, PIG-3014-2.patch


 As part of the explanation of the new DateTime datatype I noticed that we had 
 added a CurrentTime() UDF. The issue with this UDF is that it returns the 
 current time _of every exec invocation_, which can lead to confusing results. 
 In PIG-1431 I proposed a way such that every instance of the same NOW() will 
 return the same time, which I think is better. Would enjoy thoughts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


  1   2   3   4   >