[jira] [Commented] (PIG-4059) Pig on Spark

2017-05-30 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16029733#comment-16029733
 ] 

Julien Le Dem commented on PIG-4059:


Congrats all! 
Thanks [~rohini]!

> Pig on Spark
> 
>
> Key: PIG-4059
> URL: https://issues.apache.org/jira/browse/PIG-4059
> Project: Pig
>  Issue Type: New Feature
>  Components: spark
>Reporter: Rohini Palaniswamy
>Assignee: Praveen Rachabattuni
>  Labels: spork
> Fix For: spark-branch, 0.17.0
>
> Attachments: Pig-on-Spark-Design-Doc.pdf, Pig-on-Spark-Scope.pdf
>
>
> Setting up your development environment:
> 0. download spark release package(currently pig on spark only support spark 
> 1.6).
> 1. Check out Pig Spark branch.
> 2. Build Pig by running "ant jar" and "ant -Dhadoopversion=23 jar" for 
> hadoop-2.x versions
> 3. Configure these environmental variables:
> export HADOOP_USER_CLASSPATH_FIRST="true"
> Now we support “local” and "yarn-client" mode, you can export system variable 
> “SPARK_MASTER” like:
> export SPARK_MASTER=local or export SPARK_MASTER="yarn-client"
> 4. In local mode: ./pig -x spark_local xxx.pig
> In yarn-client mode: 
> export SPARK_HOME=xx; 
> export SPARK_JAR=hdfs://example.com:8020/ (the hdfs location where 
> you upload the spark-assembly*.jar)
> ./pig -x spark xxx.pig



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (PIG-2845) Configure hadoop.tmp.dir under build/tmp for MiniCluster tests

2017-02-01 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem reassigned PIG-2845:
--

Assignee: (was: Julien Le Dem)

> Configure hadoop.tmp.dir under build/tmp for MiniCluster tests
> --
>
> Key: PIG-2845
> URL: https://issues.apache.org/jira/browse/PIG-2845
> Project: Pig
>  Issue Type: Bug
>Reporter: Julien Le Dem
> Attachments: PIG-2845_0.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (PIG-2914) Logs from MiniCluster are too verbose in tests

2017-02-01 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem reassigned PIG-2914:
--

Assignee: (was: Julien Le Dem)

> Logs from MiniCluster are too verbose in tests
> --
>
> Key: PIG-2914
> URL: https://issues.apache.org/jira/browse/PIG-2914
> Project: Pig
>  Issue Type: Test
>Reporter: Julien Le Dem
> Attachments: PIG-2914.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PIG-3764) Compile physical operators to bytecode

2016-07-06 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15365633#comment-15365633
 ] 

Julien Le Dem commented on PIG-3764:


[~rohini] This sounds good. It does not have to be totally inlined since the 
JIT will inline method calls, you want to avoid virtual calls though. My 
prototype is still out there [1]. One thing it did not take into account is 
nulls. But I think this can be branch out separately (evaluate ignoring the 
nulls and then evaluate the is null)
Generating asm directly can be unwieldy. That's why I had made Brennus [2] to 
factor out a lot of the logic (different operations per type, different stack 
frame size per type, all sorts of special cases) see proto. [1]

1: https://github.com/julienledem/pig/compare/trunk...compile_physical_plan
2: https://github.com/julienledem/brennus

> Compile physical operators to bytecode
> --
>
> Key: PIG-3764
> URL: https://issues.apache.org/jira/browse/PIG-3764
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Julien Le Dem
>  Labels: GSOC2014
>
> I started a prototype here:
> https://github.com/julienledem/pig/compare/trunk...compile_physical_plan
> The current physical plan is relatively inefficient at evaluating expressions.
> In the context of a better execution engine (Tez, Spark, ...), compiling 
> expressions to bytecode would be a significant speedup.
> This is a candidate project for Google summer of code 2014. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2014



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PIG-4219) When parsing a schema, pig drops tuple inside of Bag if it contains only one field

2014-10-01 Thread Julien Le Dem (JIRA)
Julien Le Dem created PIG-4219:
--

 Summary: When parsing a schema, pig drops tuple inside of Bag if 
it contains only one field
 Key: PIG-4219
 URL: https://issues.apache.org/jira/browse/PIG-4219
 Project: Pig
  Issue Type: Bug
Reporter: Julien Le Dem


Example
{code:java}
//We generate a schema object and call toString()
String schemaStr = "my_list: {array: (array_element: (num1: int,num2: int))}";
// Reparsed using org.apache.pig.impl.util.Utils
Schema schema = Utils.getSchemaFromString(schemaStr);
// But no longer matches the original structure
schema.toString();
// => {my_list: {array_element: (num1: int,num2: int)}}
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-3760) Predicate pushdown for columnar file formats

2014-07-31 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081948#comment-14081948
 ] 

Julien Le Dem commented on PIG-3760:


FYI in Parquet the filter is not a hint and it will be applied to records after 
the metadata

> Predicate pushdown for columnar file formats
> 
>
> Key: PIG-3760
> URL: https://issues.apache.org/jira/browse/PIG-3760
> Project: Pig
>  Issue Type: New Feature
>Reporter: Andrew Musselman
> Fix For: 0.14.0
>
>
> From the conversation on dev@pig:
> "Partition pruning for ORC is not addressed in PIG-3558. We will need
> to do partition pruning for both ORC and Parquet in a new ticket.
> Curently there is no interface to deal with this kind of pushdown
> (LoadMetadata.setPartitionFilter push the filter to loader, but remove
> the filter statement, for ORC/Parquet, filter is a hint, and we need
> to do the filter again in Pig even it is pushed to loader), we will
> need to define a new interface for that. You are welcome to initiate
> the work. I know Aniket is also interested in doing that, so be sure
> the talk with him about this work.
> Thanks,
> Daniel
> On Mon, Feb 10, 2014 at 11:42 AM, Andrew Musselman
>  wrote:
> > I had a chat with a couple people last week about a feature request for
> > Pig:  in a "where" or "filter" clause, when loading an ORC file, to skip
> > directly to the right offset instead of scanning the whole file."



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-4092) Predicate pushdown for Parquet

2014-07-31 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-4092:
---

Description: 
See:
https://github.com/apache/incubator-parquet-mr/pull/4
and:
https://github.com/apache/incubator-parquet-mr/blob/master/parquet-column/src/main/java/parquet/filter2/predicate/FilterApi.java
[~alexlevenson] is the main author of this API

  was:
See:
https://github.com/apache/incubator-parquet-mr/pull/4
and:
https://github.com/apache/incubator-parquet-mr/blob/master/parquet-column/src/main/java/parquet/filter2/predicate/FilterApi.java


> Predicate pushdown for Parquet
> --
>
> Key: PIG-4092
> URL: https://issues.apache.org/jira/browse/PIG-4092
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Rohini Palaniswamy
> Fix For: 0.14.0
>
>
> See:
> https://github.com/apache/incubator-parquet-mr/pull/4
> and:
> https://github.com/apache/incubator-parquet-mr/blob/master/parquet-column/src/main/java/parquet/filter2/predicate/FilterApi.java
> [~alexlevenson] is the main author of this API



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-4092) Predicate pushdown for Parquet

2014-07-31 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-4092:
---

Description: 
See:
https://github.com/apache/incubator-parquet-mr/pull/4
and:
https://github.com/apache/incubator-parquet-mr/blob/master/parquet-column/src/main/java/parquet/filter2/predicate/FilterApi.java

> Predicate pushdown for Parquet
> --
>
> Key: PIG-4092
> URL: https://issues.apache.org/jira/browse/PIG-4092
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Rohini Palaniswamy
> Fix For: 0.14.0
>
>
> See:
> https://github.com/apache/incubator-parquet-mr/pull/4
> and:
> https://github.com/apache/incubator-parquet-mr/blob/master/parquet-column/src/main/java/parquet/filter2/predicate/FilterApi.java



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3760) Predicate pushdown for columnar file formats

2014-07-31 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081944#comment-14081944
 ] 

Julien Le Dem commented on PIG-3760:


[~rohini] I added to the description of PIG-4092

> Predicate pushdown for columnar file formats
> 
>
> Key: PIG-3760
> URL: https://issues.apache.org/jira/browse/PIG-3760
> Project: Pig
>  Issue Type: New Feature
>Reporter: Andrew Musselman
> Fix For: 0.14.0
>
>
> From the conversation on dev@pig:
> "Partition pruning for ORC is not addressed in PIG-3558. We will need
> to do partition pruning for both ORC and Parquet in a new ticket.
> Curently there is no interface to deal with this kind of pushdown
> (LoadMetadata.setPartitionFilter push the filter to loader, but remove
> the filter statement, for ORC/Parquet, filter is a hint, and we need
> to do the filter again in Pig even it is pushed to loader), we will
> need to define a new interface for that. You are welcome to initiate
> the work. I know Aniket is also interested in doing that, so be sure
> the talk with him about this work.
> Thanks,
> Daniel
> On Mon, Feb 10, 2014 at 11:42 AM, Andrew Musselman
>  wrote:
> > I had a chat with a couple people last week about a feature request for
> > Pig:  in a "where" or "filter" clause, when loading an ORC file, to skip
> > directly to the right offset instead of scanning the whole file."



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3913) Pig should use job's jobClient wherever possible (fixes local mode counters)

2014-04-29 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13985005#comment-13985005
 ] 

Julien Le Dem commented on PIG-3913:


This looks good to me.
Please add javadoc for deprecated methods with information about the new way of 
doing the same thing.
+1

> Pig should use job's jobClient wherever possible (fixes local mode counters)
> 
>
> Key: PIG-3913
> URL: https://issues.apache.org/jira/browse/PIG-3913
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
> Fix For: 0.13.0
>
> Attachments: PIG-3913-1.patch
>
>
> MapreduceLauncher initializes a statsJobClient to poll counter information of 
> jobs. This works fine in mapreduce mode but it reports incorrect information 
> in local (auto-local) mode. Pig code should try to use 
> org.apache.hadoop.mapred.jobcontrol.Job's getJobClient api to get handle to 
> jobClient wherever possible. statsJobClient (and wherever its references are 
> passed) should be deprecated.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3815) Hadoop bug causes to pig to fail silently with jar cache

2014-03-18 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13939749#comment-13939749
 ] 

Julien Le Dem commented on PIG-3815:


[~rohini] in the code you quoted, don't you think it is putting the port back 
in the following line?
{noformat}
URI uri = fs.makeQualified(file).toUri();
{noformat}

> Hadoop bug causes to pig to fail silently with jar cache
> 
>
> Key: PIG-3815
> URL: https://issues.apache.org/jira/browse/PIG-3815
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
> Fix For: 0.13.0
>
> Attachments: PIG-3815-1.patch, PIG-3815-2.patch, PIG-3815.patch
>
>
> Pig uses DistributedCache.addFileToClassPath api that puts jars on 
> distributed cache configuration. This uses : to separate list of files to be 
> put of classpath via distributed cache. If fs.default.name has port number in 
> it, it causes the tokenization logic to fail in hadoop for retrieving list of 
> cache filenames in backend.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3815) Hadoop bug causes to pig to fail silently with jar cache

2014-03-18 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13939606#comment-13939606
 ] 

Julien Le Dem commented on PIG-3815:


same comment as 1. from Cheolsoo
otherwise, this looks good to me.

> Hadoop bug causes to pig to fail silently with jar cache
> 
>
> Key: PIG-3815
> URL: https://issues.apache.org/jira/browse/PIG-3815
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
> Fix For: 0.13.0
>
> Attachments: PIG-3815-1.patch, PIG-3815.patch
>
>
> Pig uses DistributedCache.addFileToClassPath api that puts jars on 
> distributed cache configuration. This uses : to separate list of files to be 
> put of classpath via distributed cache. If fs.default.name has port number in 
> it, it causes the tokenization logic to fail in hadoop for retrieving list of 
> cache filenames in backend.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3801) Auto local mode does not call storeSchema

2014-03-10 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13926350#comment-13926350
 ] 

Julien Le Dem commented on PIG-3801:


I would use properties.getProperty(MAPREDUCE_FRAMEWORK_NAME).equals(LOCAL) to 
decide if it's running locally, but otherwise this looks good to me.

> Auto local mode does not call storeSchema
> -
>
> Key: PIG-3801
> URL: https://issues.apache.org/jira/browse/PIG-3801
> Project: Pig
>  Issue Type: Bug
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
> Attachments: PIG-3801.patch
>
>
> https://github.com/apache/pig/blob/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceLauncher.java#L481
> Pig code explicitly runs PigOutputCommitter.storeCleanup for local jobs. We 
> also need to add this for auto-local jobs.
> To repro this problem, run-
> >  a = load '2.txt' as (a0:chararray, a1:int);
> >  store a into 'a' using PigStorage(',','-schema');
> This creates .pig_schema file in pig -x local mode, but does not create 
> .pig_schema file in auto-local mode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3801) Auto local mode does not call storeSchema

2014-03-10 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13926351#comment-13926351
 ] 

Julien Le Dem commented on PIG-3801:


+1

> Auto local mode does not call storeSchema
> -
>
> Key: PIG-3801
> URL: https://issues.apache.org/jira/browse/PIG-3801
> Project: Pig
>  Issue Type: Bug
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
> Attachments: PIG-3801.patch
>
>
> https://github.com/apache/pig/blob/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceLauncher.java#L481
> Pig code explicitly runs PigOutputCommitter.storeCleanup for local jobs. We 
> also need to add this for auto-local jobs.
> To repro this problem, run-
> >  a = load '2.txt' as (a0:chararray, a1:int);
> >  store a into 'a' using PigStorage(',','-schema');
> This creates .pig_schema file in pig -x local mode, but does not create 
> .pig_schema file in auto-local mode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3754) InputSizeReducerEstimator.getTotalInputFileSize reports incorrect size

2014-03-06 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13923365#comment-13923365
 ] 

Julien Le Dem commented on PIG-3754:


LGTM too

> InputSizeReducerEstimator.getTotalInputFileSize reports incorrect size
> --
>
> Key: PIG-3754
> URL: https://issues.apache.org/jira/browse/PIG-3754
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
>Priority: Trivial
> Fix For: 0.13.0
>
> Attachments: PIG-3754-1.patch
>
>
> If you have more than one input, 
> InputSizeReducerEstimator.getTotalInputFileSize can return incorrect value if 
> one of the loader returns \-1 and is not file based (eg- hbase). This causes 
> incorrect reducer estimation and problems in auto.local mode.
> If size of input is not found in for any of the inputs, we should bail out 
> with return value of -1.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3558) ORC support for Pig

2014-02-14 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902218#comment-13902218
 ] 

Julien Le Dem commented on PIG-3558:


Is hive-exec the fat jar that assembles the runtime dependencies of hive in one 
jar?
Could we depend on the individual hive modules that we need instead?


> ORC support for Pig
> ---
>
> Key: PIG-3558
> URL: https://issues.apache.org/jira/browse/PIG-3558
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.13.0
>
> Attachments: PIG-3558-1.patch, PIG-3558-2.patch, PIG-3558-3.patch
>
>
> Adding LoadFunc and StoreFunc for ORC.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (PIG-2599) Mavenize Pig

2014-02-12 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-2599:
---

Labels: gsoc2014  (was: gsoc2013)

> Mavenize Pig
> 
>
> Key: PIG-2599
> URL: https://issues.apache.org/jira/browse/PIG-2599
> Project: Pig
>  Issue Type: New Feature
>  Components: build
>Reporter: Daniel Dai
>  Labels: gsoc2014
> Fix For: 0.13.0
>
> Attachments: maven-pig.1.zip
>
>
> Switch Pig build system from ant to maven.
> This is a candidate project for Google summer of code 2013. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2013



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (PIG-3764) Compile physical operators to bytecode

2014-02-12 Thread Julien Le Dem (JIRA)
Julien Le Dem created PIG-3764:
--

 Summary: Compile physical operators to bytecode
 Key: PIG-3764
 URL: https://issues.apache.org/jira/browse/PIG-3764
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Julien Le Dem


I started a prototype here:
https://github.com/julienledem/pig/compare/trunk...compile_physical_plan

The current physical plan is relatively inefficient at evaluating expressions.
In the context of a better execution engine (Tez, Spark, ...), compiling 
expressions to bytecode would be a significant speedup.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (PIG-3741) Utils.setTmpFileCompressionOnConf can cause side effect for SequenceFileInterStorage

2014-02-04 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891030#comment-13891030
 ] 

Julien Le Dem commented on PIG-3741:


Ideally each store would get its own config object, but that would be a major 
refactoring.
In the meantime, this looks like a good improvement to me.
+1

> Utils.setTmpFileCompressionOnConf can cause side effect for 
> SequenceFileInterStorage
> 
>
> Key: PIG-3741
> URL: https://issues.apache.org/jira/browse/PIG-3741
> Project: Pig
>  Issue Type: Bug
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
> Fix For: 0.12.1
>
> Attachments: PIG-3741.patch
>
>
> Currently, Utils.setTmpFileCompressionOnConf(pigContext, conf); is invoked 
> for every job. In case of Seqfile, this api sets mapreduce params on conf to 
> assist SequenceFileInterStorage. However, as a side effect, this might change 
> the behavior of other storers due to these mapred properties. This api should 
> only be called for jobs with intermediate storage.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (PIG-3347) Store invocation brings side effect

2014-01-30 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887203#comment-13887203
 ] 

Julien Le Dem commented on PIG-3347:


I thought that the field UIDs were used to track lineage across the plan.
[~aniket486] correct me if I'm wrong but it is used to determine which fields 
are reads for projection push down.
In the case of self join (directly or indirectly) we end up with duplicate ids 
in the same relation because the same field is derived to 2 different fields.
Otherwise I'm as lost as [~knoguchi] regarding the actual mechanisms around the 
UID.
I tried to fix some of these in the past (PIG-3020) but it appears they created 
more problems (PIG-3492)
[~daijy] maybe you can enlighten us?

> Store invocation brings side effect
> ---
>
> Key: PIG-3347
> URL: https://issues.apache.org/jira/browse/PIG-3347
> Project: Pig
>  Issue Type: Bug
>  Components: grunt
>Affects Versions: 0.11
> Environment: local mode
>Reporter: Sergey
>Assignee: Daniel Dai
>Priority: Critical
> Fix For: 0.12.1
>
> Attachments: PIG-3347-1.patch
>
>
> The problem is that intermediate 'store' invocation "changes" the final store 
> output. Looks like it brings some kind of side effect. We did use 'local' 
> mode to run script
> here is the input data:
> 1
> 1
> Here is the script:
> {code}
> a = load 'test';
> a_group = group a by $0;
> b = foreach a_group {
>   a_distinct = distinct a.$0;
>   generate group, a_distinct;
> }
> --store b into 'b';
> c = filter b by SIZE(a_distinct) == 1;
> store c into 'out';
> {code}
> We expect output to be:
> 1 1
> The output is empty file.
> Uncomment {code}--store b into 'b';{code} line and see the diffrence.
> Yuo would get expected output.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (PIG-3445) Make Parquet format available out of the box in Pig

2013-10-03 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13785742#comment-13785742
 ] 

Julien Le Dem commented on PIG-3445:


I just released parquet-pig-bundle-1.2.3
this should show up in maven central overnight

> Make Parquet format available out of the box in Pig
> ---
>
> Key: PIG-3445
> URL: https://issues.apache.org/jira/browse/PIG-3445
> Project: Pig
>  Issue Type: Improvement
>Reporter: Julien Le Dem
> Fix For: 0.12.0
>
> Attachments: PIG-3445-2.patch, PIG-3445-3.patch, PIG-3445-4.patch, 
> PIG-3445.patch
>
>
> We would add the Parquet jar in the Pig packages to make it available out of 
> the box to pig users.
> On top of that we could add the parquet.pig package to the list of packages 
> to search for UDFs. (alternatively, the parquet jar could contain classes 
> name or.apache.pig.builtin.ParquetLoader and ParquetStorer)
> This way users can use Parquet simply by typing:
> A = LOAD 'foo' USING ParquetLoader();
> STORE A INTO 'bar' USING ParquetStorer();



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (PIG-3445) Make Parquet format available out of the box in Pig

2013-10-03 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13785610#comment-13785610
 ] 

Julien Le Dem commented on PIG-3445:


We merged the PR for parquet-pig-bundle
I'm making a release so that this can be merge in pig 0.12


> Make Parquet format available out of the box in Pig
> ---
>
> Key: PIG-3445
> URL: https://issues.apache.org/jira/browse/PIG-3445
> Project: Pig
>  Issue Type: Improvement
>Reporter: Julien Le Dem
> Fix For: 0.12.0
>
> Attachments: PIG-3445-2.patch, PIG-3445-3.patch, PIG-3445-4.patch, 
> PIG-3445.patch
>
>
> We would add the Parquet jar in the Pig packages to make it available out of 
> the box to pig users.
> On top of that we could add the parquet.pig package to the list of packages 
> to search for UDFs. (alternatively, the parquet jar could contain classes 
> name or.apache.pig.builtin.ParquetLoader and ParquetStorer)
> This way users can use Parquet simply by typing:
> A = LOAD 'foo' USING ParquetLoader();
> STORE A INTO 'bar' USING ParquetStorer();



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (PIG-3445) Make Parquet format available out of the box in Pig

2013-10-03 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13785565#comment-13785565
 ] 

Julien Le Dem commented on PIG-3445:



parquet-format.version should be 1.0.0

> Make Parquet format available out of the box in Pig
> ---
>
> Key: PIG-3445
> URL: https://issues.apache.org/jira/browse/PIG-3445
> Project: Pig
>  Issue Type: Improvement
>Reporter: Julien Le Dem
> Fix For: 0.12.0
>
> Attachments: PIG-3445-2.patch, PIG-3445-3.patch, PIG-3445-4.patch, 
> PIG-3445.patch
>
>
> We would add the Parquet jar in the Pig packages to make it available out of 
> the box to pig users.
> On top of that we could add the parquet.pig package to the list of packages 
> to search for UDFs. (alternatively, the parquet jar could contain classes 
> name or.apache.pig.builtin.ParquetLoader and ParquetStorer)
> This way users can use Parquet simply by typing:
> A = LOAD 'foo' USING ParquetLoader();
> STORE A INTO 'bar' USING ParquetStorer();



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (PIG-3445) Make Parquet format available out of the box in Pig

2013-10-03 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13785564#comment-13785564
 ] 

Julien Le Dem commented on PIG-3445:


I add a parquet-pig-bundle and the shading of fastutil:
https://github.com/Parquet/parquet-mr/pull/186
We can make a new release to simplify

> Make Parquet format available out of the box in Pig
> ---
>
> Key: PIG-3445
> URL: https://issues.apache.org/jira/browse/PIG-3445
> Project: Pig
>  Issue Type: Improvement
>Reporter: Julien Le Dem
> Fix For: 0.12.0
>
> Attachments: PIG-3445-2.patch, PIG-3445-3.patch, PIG-3445-4.patch, 
> PIG-3445.patch
>
>
> We would add the Parquet jar in the Pig packages to make it available out of 
> the box to pig users.
> On top of that we could add the parquet.pig package to the list of packages 
> to search for UDFs. (alternatively, the parquet jar could contain classes 
> name or.apache.pig.builtin.ParquetLoader and ParquetStorer)
> This way users can use Parquet simply by typing:
> A = LOAD 'foo' USING ParquetLoader();
> STORE A INTO 'bar' USING ParquetStorer();



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (PIG-3082) outputSchema of a UDF allows two usages when describing a Tuple schema

2013-10-03 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13785427#comment-13785427
 ] 

Julien Le Dem commented on PIG-3082:


This is intended.
The second behavior described above is really problematic.
If a UDF breaks because it returns a schema of more than one field it should be 
changed to return one field of type tuple.
Once fixed it works in all versions of Pig.
This is only removing an unsafe use of outputSchema in favor of the existing 
correct use.

> outputSchema of a UDF allows two usages when describing a Tuple schema
> --
>
> Key: PIG-3082
> URL: https://issues.apache.org/jira/browse/PIG-3082
> Project: Pig
>  Issue Type: Bug
>Reporter: Julien Le Dem
>Assignee: Jonathan Coveney
> Fix For: 0.12.0
>
> Attachments: PIG-3082-0.patch, PIG-3082-1.patch
>
>
> When defining an evalfunc that returns a Tuple there are two ways you can 
> implement outputSchema().
> - The right way: return a schema that contains one Field that contains the 
> type and schema of the return type of the UDF
> - The unreliable way: return a schema that contains more than one field and 
> it will be understood as a tuple schema even though there is no type (which 
> is in Field class) to specify that. This is particularly deceitful when the 
> output schema is derived from the input schema and the outputted Tuple 
> sometimes contain only one field. In such cases Pig understands the output 
> schema as a tuple only if there is more than one field. And sometimes it 
> works, sometimes it does not.
> We should at least issue a warning (backward compatibility) if not plain 
> throw an exception when the output schema contains more than one Field.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (PIG-3446) Umbrella jira for Pig on Tez

2013-09-20 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773569#comment-13773569
 ] 

Julien Le Dem commented on PIG-3446:


Here is the work that Achal did for Pig-on-Tez
https://github.com/achalsoni81/pigeon

> Umbrella jira for Pig on Tez
> 
>
> Key: PIG-3446
> URL: https://issues.apache.org/jira/browse/PIG-3446
> Project: Pig
>  Issue Type: New Feature
>  Components: tez
>Affects Versions: tez-branch
>Reporter: Cheolsoo Park
>Assignee: Cheolsoo Park
> Fix For: tez-branch
>
>
> This is a umbrella jira for Pig on Tez. More detailed subtasks will be added.
> More information can be found on the following wiki page:
> https://cwiki.apache.org/confluence/display/PIG/Pig+on+Tez

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3367) Add assert keyword (operator) in pig

2013-09-17 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13769985#comment-13769985
 ] 

Julien Le Dem commented on PIG-3367:


Looks good to me.
Is there a way you can factor out some of the content of buildAssertOp() ? It 
looks like some of this would be common with other methods.

> Add assert keyword (operator) in pig
> 
>
> Key: PIG-3367
> URL: https://issues.apache.org/jira/browse/PIG-3367
> Project: Pig
>  Issue Type: New Feature
>  Components: parser
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
> Fix For: 0.12
>
> Attachments: PIG-3367.patch
>
>
> Assert operator can be used for data validation. With assert you can write 
> script as following-
> {code}
> a = load 'something' as (a0:int, a1:int);
> assert a by a0 > 0, 'a cant be negative for reasons';
> {code}
> This script will fail if assert is violated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-30 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13754975#comment-13754975
 ] 

Julien Le Dem commented on PIG-3419:


+1
[~cheolsoo] LGTM!

> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Assignee: Achal Soni
>Priority: Minor
> Attachments: execengine.patch, mapreduce_execengine.patch, 
> stats_scriptstate.patch, test_failures.txt, test_suite.patch, 
> updated-8-22-2013-exec-engine.patch, updated-8-23-2013-exec-engine.patch, 
> updated-8-27-2013-exec-engine.patch, updated-8-28-2013-exec-engine.patch, 
> updated-8-29-2013-exec-engine.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3445) Make Parquet format available out of the box in Pig

2013-08-29 Thread Julien Le Dem (JIRA)
Julien Le Dem created PIG-3445:
--

 Summary: Make Parquet format available out of the box in Pig
 Key: PIG-3445
 URL: https://issues.apache.org/jira/browse/PIG-3445
 Project: Pig
  Issue Type: Improvement
Reporter: Julien Le Dem


We would add the Parquet jar in the Pig packages to make it available out of 
the box to pig users.
On top of that we could add the parquet.pig package to the list of packages to 
search for UDFs. (alternatively, the parquet jar could contain classes name 
or.apache.pig.builtin.ParquetLoader and ParquetStorer)
This way users can use Parquet simply by typing:
A = LOAD 'foo' USING ParquetLoader();
STORE A INTO 'bar' USING ParquetStorer();

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-29 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13753912#comment-13753912
 ] 

Julien Le Dem commented on PIG-3419:


[~cheolsoo]: thanks a lot for looking into this.

Here are my thoughts:

1. let's change it back

2. 4. 5. 6. 7. are either internal to Pig or necessary to add the execution 
engine abstraction.

3.
JobStats still exists but the MR specific part is split into MRJobStats which 
extends JobStats
Same thing for PigStatsUtil and ScriptState. Those classes are not disappearing 
but the MR specific part is abstracted out.
HExecutionEngine could be renamed back to what it was but this is again what is 
becoming the new abstraction.
Unfortunately tools like Ambrose and Lipstick depend on the MR specific parts 
of Pig and look at the internals. This patch is a necessary change so that 
those tools can work independently of the execution engine in the future.
The changes to Ambrose and Lipstick should be minimal though with this patch. 
But yes they would suffer from some incompatibility, but again there is no way 
around it when a tool looks inside the execution engine internals.

I think we should revert 1. and commit the patch.



> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Assignee: Achal Soni
>Priority: Minor
> Attachments: execengine.patch, mapreduce_execengine.patch, 
> stats_scriptstate.patch, test_failures.txt, test_suite.patch, 
> updated-8-22-2013-exec-engine.patch, updated-8-23-2013-exec-engine.patch, 
> updated-8-27-2013-exec-engine.patch, updated-8-28-2013-exec-engine.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-23 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13749226#comment-13749226
 ] 

Julien Le Dem commented on PIG-3419:


The advantage of having the Execution engine abstraction in trunk is it allows 
running experimental Pig execution engines implementations like Tez or Spark on 
an official release of Pig without having to build from a specific branch.
The execution engine implementations themselves are fairly independent of Pig 
and do not need to  be maintained in a Pig branch.
If the ExecutionEngine abstraction evolves over time that can be done in Trunk 
and can be merged independently of the Tez implementation itself.


> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Assignee: Achal Soni
>Priority: Minor
> Attachments: execengine.patch, mapreduce_execengine.patch, 
> stats_scriptstate.patch, test_failures.txt, test_suite.patch, 
> updated-8-22-2013-exec-engine.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-22 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13748183#comment-13748183
 ] 

Julien Le Dem commented on PIG-3419:


+1 LGTM
If test-commit passes I think we can commit to TRUNK

> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Assignee: Achal Soni
>Priority: Minor
> Attachments: execengine.patch, mapreduce_execengine.patch, 
> stats_scriptstate.patch, test_failures.txt, test_suite.patch, 
> updated-8-22-2013-exec-engine.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-21 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13747064#comment-13747064
 ] 

Julien Le Dem commented on PIG-3419:


The point is to be able to implement alternate execution engines without having 
to fork Pig.
I think it should go in trunk.

> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Assignee: Achal Soni
>Priority: Minor
> Attachments: execengine.patch, finalpatch.patch, 
> mapreduce_execengine.patch, stats_scriptstate.patch, test_suite.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-21 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13746999#comment-13746999
 ] 

Julien Le Dem commented on PIG-3419:


I have submitted my review. This looks great [~achalsoni81]!
[~cheolsoo] does it look good to you?
Once Achal has updated his patch I'm willing to commit.

> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Assignee: Achal Soni
>Priority: Minor
> Attachments: execengine.patch, finalpatch.patch, 
> mapreduce_execengine.patch, stats_scriptstate.patch, test_suite.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-20 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13745398#comment-13745398
 ] 

Julien Le Dem commented on PIG-3419:


[~cheolsoo] 
1. Do we really throw Exception ? If yes, then let's just throw that. If not 
then let's instead have FrontEndException, ExecException, IOException. i.e. 
let's remove the exceptions that are already included by the highest exception 
level.
2. agreed with you. I would expect the execution engine to handle the 
Properties internally and the signature of this method to be:
{noformat}
public void setProperty(String property, String value);
{noformat}

> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Assignee: Achal Soni
>Priority: Minor
> Attachments: execengine.patch, mapreduce_execengine.patch, 
> stats_scriptstate.patch, test_suite.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-13 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738478#comment-13738478
 ] 

Julien Le Dem commented on PIG-3419:


Hi Achal
for large patches, please create a review here: https://reviews.apache.org

> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Priority: Minor
> Attachments: pluggable_execengine.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3367) Add assert keyword (operator) in pig

2013-06-27 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13694832#comment-13694832
 ] 

Julien Le Dem commented on PIG-3367:


I was thinking we could make the syntax part of FOREACH.
{noformat}
B = FOREACH A GENERATE a, b, c ASSERT a >= 0, b IS NOT NULL;
{noformat}
That way it is easy to integrate asserts in the flow.

The advantage of having it part of the language:
- the error message can be clear without extra user input.
- it's more natural than doing a filter that does not filter. Also if the 
filter is not in the predecessors of a STORE, it won't be executed.

A UDF can stop the job by throwing an exception. Although the task will retry 
before failing completely.

For reference, the UDF based syntax:
{noformat}
FILTER members BY ASSERT( (member_id >= 0 ? 1 : 0), 'Doh! Some member ID is 
negative.' );
{noformat}

Yes adding new keywords is inconvenient when the keyword was used for relation 
or column names.
When a field collides with a keyword it is sometimes difficult to rename it.
I think we should:
 - try to avoid new keywords if possible
 - provide a mechanism to escape field names to facilitate fixing conflicts 
when they happen (using quotes or a similar mechanism)

> Add assert keyword (operator) in pig
> 
>
> Key: PIG-3367
> URL: https://issues.apache.org/jira/browse/PIG-3367
> Project: Pig
>  Issue Type: New Feature
>  Components: parser
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
>
> Assert operator can be used for data validation. With assert you can write 
> script as following-
> {code}
> a = load 'something' as (a0:int, a1:int);
> assert a by a0 > 0, 'a cant be negative for reasons';
> {code}
> This script will fail if assert is violated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2828) DataType.compare null

2013-06-03 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13672896#comment-13672896
 ] 

Julien Le Dem commented on PIG-2828:


Sounds good to me.

> DataType.compare null
> -
>
> Key: PIG-2828
> URL: https://issues.apache.org/jira/browse/PIG-2828
> Project: Pig
>  Issue Type: Bug
>Reporter: Haitao Yao
> Attachments: DataType.patch, PIG-2828.patch, test.patch
>
>
> While using TOP, and if the DataBag contains null value to compare, it will 
> generate the following exception:
> Caused by: java.lang.NullPointerException
>   at org.apache.pig.data.DataType.compare(DataType.java:427)
>   at org.apache.pig.builtin.TOP$TupleComparator.compare(TOP.java:97)
>   at org.apache.pig.builtin.TOP$TupleComparator.compare(TOP.java:1)
>   at java.util.PriorityQueue.siftUpUsingComparator(PriorityQueue.java:649)
>   at java.util.PriorityQueue.siftUp(PriorityQueue.java:627)
>   at java.util.PriorityQueue.offer(PriorityQueue.java:329)
>   at java.util.PriorityQueue.add(PriorityQueue.java:306)
>   at org.apache.pig.builtin.TOP.updateTop(TOP.java:141)
>   at org.apache.pig.builtin.TOP.exec(TOP.java:116)
> code: (TOP.java, starts with line 91)
> Object field1 = o1.get(fieldNum);
> Object field2 = o2.get(fieldNum);
> if (!typeFound) {
> datatype = DataType.findType(field1);
> typeFound = true;
> }
> return DataType.compare(field1, field2, datatype, datatype);
> The reason is that if the typeFound is true , and the dataType is not null, 
> and field1 is null, the script failed.
> So we need to judge the field1 whether is null.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-3307) Refactor physical operators to remove methods parameters that are always null

2013-05-17 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem resolved PIG-3307.


   Resolution: Fixed
Fix Version/s: 0.12
 Hadoop Flags: Reviewed

> Refactor physical operators to remove methods parameters that are always null
> -
>
> Key: PIG-3307
> URL: https://issues.apache.org/jira/browse/PIG-3307
> Project: Pig
>  Issue Type: Improvement
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Fix For: 0.12
>
> Attachments: PIG-3307_0.patch, PIG-3307_1.patch, PIG-3307_2.patch, 
> PIG-3307_3.patch
>
>
> The physical operators are sometimes overly complex. I'm trying to cleanup 
> some unnecessary code.
> in particular there is an array of getNext(*T* v) where the value v does not 
> seem to have any importance and is just used to pick the correct method.
> I have started a refactoring for a more readable getNext*T*().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3307) Refactor physical operators to remove methods parameters that are always null

2013-05-17 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13661202#comment-13661202
 ] 

Julien Le Dem commented on PIG-3307:


committed to TRUNK
Committed revision 1484037

> Refactor physical operators to remove methods parameters that are always null
> -
>
> Key: PIG-3307
> URL: https://issues.apache.org/jira/browse/PIG-3307
> Project: Pig
>  Issue Type: Improvement
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Attachments: PIG-3307_0.patch, PIG-3307_1.patch, PIG-3307_2.patch, 
> PIG-3307_3.patch
>
>
> The physical operators are sometimes overly complex. I'm trying to cleanup 
> some unnecessary code.
> in particular there is an array of getNext(*T* v) where the value v does not 
> seem to have any importance and is just used to pick the correct method.
> I have started a refactoring for a more readable getNext*T*().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3307) Refactor physical operators to remove methods parameters that are always null

2013-05-17 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-3307:
---

Attachment: PIG-3307_3.patch

PIG-3307_3.patch addresses [~cheolsoo]'s comments

> Refactor physical operators to remove methods parameters that are always null
> -
>
> Key: PIG-3307
> URL: https://issues.apache.org/jira/browse/PIG-3307
> Project: Pig
>  Issue Type: Improvement
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Attachments: PIG-3307_0.patch, PIG-3307_1.patch, PIG-3307_2.patch, 
> PIG-3307_3.patch
>
>
> The physical operators are sometimes overly complex. I'm trying to cleanup 
> some unnecessary code.
> in particular there is an array of getNext(*T* v) where the value v does not 
> seem to have any importance and is just used to pick the correct method.
> I have started a refactoring for a more readable getNext*T*().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3307) Refactor physical operators to remove methods parameters that are always null

2013-05-16 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13660033#comment-13660033
 ] 

Julien Le Dem commented on PIG-3307:


https://reviews.apache.org/r/11203/diff/#index_header
thanks [~cheolsoo] and [~daijy]!

> Refactor physical operators to remove methods parameters that are always null
> -
>
> Key: PIG-3307
> URL: https://issues.apache.org/jira/browse/PIG-3307
> Project: Pig
>  Issue Type: Improvement
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Attachments: PIG-3307_0.patch, PIG-3307_1.patch, PIG-3307_2.patch
>
>
> The physical operators are sometimes overly complex. I'm trying to cleanup 
> some unnecessary code.
> in particular there is an array of getNext(*T* v) where the value v does not 
> seem to have any importance and is just used to pick the correct method.
> I have started a refactoring for a more readable getNext*T*().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3307) Refactor physical operators to remove methods parameters that are always null

2013-05-14 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13657908#comment-13657908
 ] 

Julien Le Dem commented on PIG-3307:


It would be nice if someone could take a look before too many changes get in.

> Refactor physical operators to remove methods parameters that are always null
> -
>
> Key: PIG-3307
> URL: https://issues.apache.org/jira/browse/PIG-3307
> Project: Pig
>  Issue Type: Improvement
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Attachments: PIG-3307_0.patch, PIG-3307_1.patch, PIG-3307_2.patch
>
>
> The physical operators are sometimes overly complex. I'm trying to cleanup 
> some unnecessary code.
> in particular there is an array of getNext(*T* v) where the value v does not 
> seem to have any importance and is just used to pick the correct method.
> I have started a refactoring for a more readable getNext*T*().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3307) Refactor physical operators to remove methods parameters that are always null

2013-05-10 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13654887#comment-13654887
 ] 

Julien Le Dem commented on PIG-3307:


[~daijy] Also most likely it wont make any difference performance wise.

> Refactor physical operators to remove methods parameters that are always null
> -
>
> Key: PIG-3307
> URL: https://issues.apache.org/jira/browse/PIG-3307
> Project: Pig
>  Issue Type: Improvement
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Attachments: PIG-3307_0.patch, PIG-3307_1.patch, PIG-3307_2.patch
>
>
> The physical operators are sometimes overly complex. I'm trying to cleanup 
> some unnecessary code.
> in particular there is an array of getNext(*T* v) where the value v does not 
> seem to have any importance and is just used to pick the correct method.
> I have started a refactoring for a more readable getNext*T*().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3307) Refactor physical operators to remove methods parameters that are always null

2013-05-07 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13651101#comment-13651101
 ] 

Julien Le Dem commented on PIG-3307:


[~daijy]what's the recommended approach? If you have a setup to do perf test 
that would be helpful.


> Refactor physical operators to remove methods parameters that are always null
> -
>
> Key: PIG-3307
> URL: https://issues.apache.org/jira/browse/PIG-3307
> Project: Pig
>  Issue Type: Improvement
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Attachments: PIG-3307_0.patch, PIG-3307_1.patch, PIG-3307_2.patch
>
>
> The physical operators are sometimes overly complex. I'm trying to cleanup 
> some unnecessary code.
> in particular there is an array of getNext(*T* v) where the value v does not 
> seem to have any importance and is just used to pick the correct method.
> I have started a refactoring for a more readable getNext*T*().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3307) Refactor physical operators to remove methods parameters that are always null

2013-05-06 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13650021#comment-13650021
 ] 

Julien Le Dem commented on PIG-3307:


[~daijy] This is removing parameters that were not used. I have not tested 
performance but I think it could only improve performance.
(see latest patch PIG-3307_2.patch)

> Refactor physical operators to remove methods parameters that are always null
> -
>
> Key: PIG-3307
> URL: https://issues.apache.org/jira/browse/PIG-3307
> Project: Pig
>  Issue Type: Improvement
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Attachments: PIG-3307_0.patch, PIG-3307_1.patch, PIG-3307_2.patch
>
>
> The physical operators are sometimes overly complex. I'm trying to cleanup 
> some unnecessary code.
> in particular there is an array of getNext(*T* v) where the value v does not 
> seem to have any importance and is just used to pick the correct method.
> I have started a refactoring for a more readable getNext*T*().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-3311) add pig-withouthadoop-h2 to mvn-jar

2013-05-06 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem resolved PIG-3311.


   Resolution: Fixed
Fix Version/s: 0.12

> add pig-withouthadoop-h2 to mvn-jar
> ---
>
> Key: PIG-3311
> URL: https://issues.apache.org/jira/browse/PIG-3311
> Project: Pig
>  Issue Type: Improvement
>  Components: build
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Fix For: 0.12
>
> Attachments: PIG-3311.patch
>
>
> mvn-jar currently creates pig-version.jar and pig-version-h2.jar
> I'm adding pig-version-withouthadoop.jar and pig-version-withouthadoop-h2.jar 
> that are needed to run pig from the command line.
> This will allow a dual-version package.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3311) add pig-withouthadoop-h2 to mvn-jar

2013-05-06 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13649907#comment-13649907
 ] 

Julien Le Dem commented on PIG-3311:


Committed to TRUNK

> add pig-withouthadoop-h2 to mvn-jar
> ---
>
> Key: PIG-3311
> URL: https://issues.apache.org/jira/browse/PIG-3311
> Project: Pig
>  Issue Type: Improvement
>  Components: build
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Attachments: PIG-3311.patch
>
>
> mvn-jar currently creates pig-version.jar and pig-version-h2.jar
> I'm adding pig-version-withouthadoop.jar and pig-version-withouthadoop-h2.jar 
> that are needed to run pig from the command line.
> This will allow a dual-version package.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3311) add pig-withouthadoop-h2 to mvn-jar

2013-05-03 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-3311:
---

Patch Info: Patch Available

> add pig-withouthadoop-h2 to mvn-jar
> ---
>
> Key: PIG-3311
> URL: https://issues.apache.org/jira/browse/PIG-3311
> Project: Pig
>  Issue Type: Improvement
>  Components: build
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Attachments: PIG-3311.patch
>
>
> mvn-jar currently creates pig-version.jar and pig-version-h2.jar
> I'm adding pig-version-withouthadoop.jar and pig-version-withouthadoop-h2.jar 
> that are needed to run pig from the command line.
> This will allow a dual-version package.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3311) add pig-withouthadoop-h2 to mvn-jar

2013-05-03 Thread Julien Le Dem (JIRA)
Julien Le Dem created PIG-3311:
--

 Summary: add pig-withouthadoop-h2 to mvn-jar
 Key: PIG-3311
 URL: https://issues.apache.org/jira/browse/PIG-3311
 Project: Pig
  Issue Type: Improvement
  Components: build
Reporter: Julien Le Dem
Assignee: Julien Le Dem
 Attachments: PIG-3311.patch

mvn-jar currently creates pig-version.jar and pig-version-h2.jar
I'm adding pig-version-withouthadoop.jar and pig-version-withouthadoop-h2.jar 
that are needed to run pig from the command line.
This will allow a dual-version package.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3311) add pig-withouthadoop-h2 to mvn-jar

2013-05-03 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-3311:
---

Attachment: PIG-3311.patch

PIG-3311.patch adds -withouthadoop to the mvn-jar target

> add pig-withouthadoop-h2 to mvn-jar
> ---
>
> Key: PIG-3311
> URL: https://issues.apache.org/jira/browse/PIG-3311
> Project: Pig
>  Issue Type: Improvement
>  Components: build
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Attachments: PIG-3311.patch
>
>
> mvn-jar currently creates pig-version.jar and pig-version-h2.jar
> I'm adding pig-version-withouthadoop.jar and pig-version-withouthadoop-h2.jar 
> that are needed to run pig from the command line.
> This will allow a dual-version package.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3307) Refactor physical operators to remove methods parameters that are always null

2013-05-03 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-3307:
---

Attachment: PIG-3307_2.patch

PIG-3307_2.patch removes the unused parameter in getNext(\*)


> Refactor physical operators to remove methods parameters that are always null
> -
>
> Key: PIG-3307
> URL: https://issues.apache.org/jira/browse/PIG-3307
> Project: Pig
>  Issue Type: Improvement
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Attachments: PIG-3307_0.patch, PIG-3307_1.patch, PIG-3307_2.patch
>
>
> The physical operators are sometimes overly complex. I'm trying to cleanup 
> some unnecessary code.
> in particular there is an array of getNext(*T* v) where the value v does not 
> seem to have any importance and is just used to pick the correct method.
> I have started a refactoring for a more readable getNext*T*().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3307) Refactor physical operators to remove methods parameters that are always null

2013-05-01 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13647174#comment-13647174
 ] 

Julien Le Dem commented on PIG-3307:


It looks like we can get rid of the parameter that is only used for method 
dispatch.
I will replace all calls to getNext(Tuple t) to getNextTuple() in 
PhysicalOperator.

> Refactor physical operators to remove methods parameters that are always null
> -
>
> Key: PIG-3307
> URL: https://issues.apache.org/jira/browse/PIG-3307
> Project: Pig
>  Issue Type: Improvement
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Attachments: PIG-3307_0.patch, PIG-3307_1.patch
>
>
> The physical operators are sometimes overly complex. I'm trying to cleanup 
> some unnecessary code.
> in particular there is an array of getNext(*T* v) where the value v does not 
> seem to have any importance and is just used to pick the correct method.
> I have started a refactoring for a more readable getNext*T*().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3307) Refactor physical operators to remove methods parameters that are always null

2013-05-01 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-3307:
---

Attachment: PIG-3307_1.patch

PIG-3307_1.patch introduces some more refactoring

> Refactor physical operators to remove methods parameters that are always null
> -
>
> Key: PIG-3307
> URL: https://issues.apache.org/jira/browse/PIG-3307
> Project: Pig
>  Issue Type: Improvement
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Attachments: PIG-3307_0.patch, PIG-3307_1.patch
>
>
> The physical operators are sometimes overly complex. I'm trying to cleanup 
> some unnecessary code.
> in particular there is an array of getNext(*T* v) where the value v does not 
> seem to have any importance and is just used to pick the correct method.
> I have started a refactoring for a more readable getNext*T*().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3307) Refactor physical operators to remove methods parameters that are always null

2013-05-01 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-3307:
---

Description: 
The physical operators are sometimes overly complex. I'm trying to cleanup some 
unnecessary code.
in particular there is an array of getNext(*T* v) where the value v does not 
seem to have any importance and is just used to pick the correct method.
I have started a refactoring for a more readable getNext*T*().


  was:
The physical operators are sometimes overly complex. I'm trying to cleanup some 
unnecessary code.
in particular there is an array of getNext(*T* v) where the value v does not 
seem to have any importance and is just use to pick the correct method.
I have started a refactoring for a more readable getNext*T*().



> Refactor physical operators to remove methods parameters that are always null
> -
>
> Key: PIG-3307
> URL: https://issues.apache.org/jira/browse/PIG-3307
> Project: Pig
>  Issue Type: Improvement
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Attachments: PIG-3307_0.patch
>
>
> The physical operators are sometimes overly complex. I'm trying to cleanup 
> some unnecessary code.
> in particular there is an array of getNext(*T* v) where the value v does not 
> seem to have any importance and is just used to pick the correct method.
> I have started a refactoring for a more readable getNext*T*().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3307) Refactor physical operators to remove methods parameters that are always null

2013-05-01 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-3307:
---

Attachment: PIG-3307_0.patch

PIG-3307_0.patch contains the initial refactoring

> Refactor physical operators to remove methods parameters that are always null
> -
>
> Key: PIG-3307
> URL: https://issues.apache.org/jira/browse/PIG-3307
> Project: Pig
>  Issue Type: Improvement
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Attachments: PIG-3307_0.patch
>
>
> The physical operators are sometimes overly complex. I'm trying to cleanup 
> some unnecessary code.
> in particular there is an array of getNext(*T* v) where the value v does not 
> seem to have any importance and is just use to pick the correct method.
> I have started a refactoring for a more readable getNext*T*().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3307) Refactor physical operators to remove methods parameters that are always null

2013-05-01 Thread Julien Le Dem (JIRA)
Julien Le Dem created PIG-3307:
--

 Summary: Refactor physical operators to remove methods parameters 
that are always null
 Key: PIG-3307
 URL: https://issues.apache.org/jira/browse/PIG-3307
 Project: Pig
  Issue Type: Improvement
Reporter: Julien Le Dem
Assignee: Julien Le Dem
 Attachments: PIG-3307_0.patch

The physical operators are sometimes overly complex. I'm trying to cleanup some 
unnecessary code.
in particular there is an array of getNext(*T* v) where the value v does not 
seem to have any importance and is just use to pick the correct method.
I have started a refactoring for a more readable getNext*T*().


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-3303) add hadoop h2 artifact to publications in ivy.xml

2013-05-01 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem resolved PIG-3303.


   Resolution: Fixed
Fix Version/s: 0.12

Merged in trunk

> add hadoop h2 artifact to publications in ivy.xml
> -
>
> Key: PIG-3303
> URL: https://issues.apache.org/jira/browse/PIG-3303
> Project: Pig
>  Issue Type: Bug
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Fix For: 0.12
>
> Attachments: PIG-3303.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3303) add hadoop h2 artifact to publications in ivy.xml

2013-04-30 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-3303:
---

Patch Info: Patch Available

> add hadoop h2 artifact to publications in ivy.xml
> -
>
> Key: PIG-3303
> URL: https://issues.apache.org/jira/browse/PIG-3303
> Project: Pig
>  Issue Type: Bug
>Reporter: Julien Le Dem
> Attachments: PIG-3303.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3303) add hadoop h2 artifact to publications in ivy.xml

2013-04-30 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-3303:
---

Attachment: PIG-3303.patch

> add hadoop h2 artifact to publications in ivy.xml
> -
>
> Key: PIG-3303
> URL: https://issues.apache.org/jira/browse/PIG-3303
> Project: Pig
>  Issue Type: Bug
>Reporter: Julien Le Dem
> Attachments: PIG-3303.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3303) add hadoop h2 artifact to publications in ivy.xml

2013-04-30 Thread Julien Le Dem (JIRA)
Julien Le Dem created PIG-3303:
--

 Summary: add hadoop h2 artifact to publications in ivy.xml
 Key: PIG-3303
 URL: https://issues.apache.org/jira/browse/PIG-3303
 Project: Pig
  Issue Type: Bug
Reporter: Julien Le Dem




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Issue Comment Deleted] (PIG-3214) New/improved mascot

2013-03-07 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-3214:
---

Comment: was deleted

(was: Who let the trolls out?)

> New/improved mascot
> ---
>
> Key: PIG-3214
> URL: https://issues.apache.org/jira/browse/PIG-3214
> Project: Pig
>  Issue Type: Wish
>  Components: site
>Affects Versions: 0.11
>Reporter: Andrew Musselman
>Priority: Minor
> Fix For: 0.12
>
> Attachments: apache-pig-yellow-logo.png, newlogo1.png, newlogo2.png, 
> newlogo3.png, newlogo4.png, newlogo5.png, new_logo_7.png, pig_6.JPG, 
> pig-logo-10.png, pig-logo-11.png, pig-logo-12.png, pig-logo-13.png, 
> pig-logo-8a.png, pig-logo-8b.png, pig-logo-9a.png, pig-logo-9b.png, 
> pig_logo_new.png
>
>
> Request to change pig mascot to something more graphically appealing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3214) New/improved mascot

2013-03-07 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13596273#comment-13596273
 ] 

Julien Le Dem commented on PIG-3214:


Who let the trolls out?

> New/improved mascot
> ---
>
> Key: PIG-3214
> URL: https://issues.apache.org/jira/browse/PIG-3214
> Project: Pig
>  Issue Type: Wish
>  Components: site
>Affects Versions: 0.11
>Reporter: Andrew Musselman
>Priority: Minor
> Fix For: 0.12
>
> Attachments: apache-pig-yellow-logo.png, newlogo1.png, newlogo2.png, 
> newlogo3.png, newlogo4.png, newlogo5.png, new_logo_7.png, pig_6.JPG, 
> pig-logo-10.png, pig-logo-11.png, pig-logo-12.png, pig-logo-13.png, 
> pig-logo-8a.png, pig-logo-8b.png, pig-logo-9a.png, pig-logo-9b.png, 
> pig_logo_new.png
>
>
> Request to change pig mascot to something more graphically appealing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3214) New/improved mascot

2013-03-05 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13594152#comment-13594152
 ] 

Julien Le Dem commented on PIG-3214:


and PI can be the front legs
;)

> New/improved mascot
> ---
>
> Key: PIG-3214
> URL: https://issues.apache.org/jira/browse/PIG-3214
> Project: Pig
>  Issue Type: Wish
>  Components: site
>Affects Versions: 0.11
>Reporter: Andrew Musselman
>Priority: Minor
> Fix For: 0.12
>
> Attachments: newlogo1.png, newlogo2.png, newlogo3.png, newlogo4.png, 
> newlogo5.png, pig_6.JPG
>
>
> Request to change pig mascot to something more graphically appealing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3235) Enable DEBUG log messages in unit tests by default

2013-03-05 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13593990#comment-13593990
 ] 

Julien Le Dem commented on PIG-3235:


What would be useful is to have a log4j.properties for tests in a known 
location that is automatically picked up in test and can be easily modified on 
a case by case basis.

> Enable DEBUG log messages in unit tests by default
> --
>
> Key: PIG-3235
> URL: https://issues.apache.org/jira/browse/PIG-3235
> Project: Pig
>  Issue Type: Improvement
>  Components: tools
>Reporter: Cheolsoo Park
>Priority: Minor
>
> Currently, debug level messages are not logged for unit tests. It is helpful 
> to enable them to debug unit tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3194) Changes to ObjectSerializer.java break compatibility with Hadoop 0.20.2

2013-03-05 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13593747#comment-13593747
 ] 

Julien Le Dem commented on PIG-3194:


same as Jon. we can just use the methods present in 1.3 and we don't need to be 
URL safe.
Let's not repackage commons.codec or duplicate part of it just for this. 

> Changes to ObjectSerializer.java break compatibility with Hadoop 0.20.2
> ---
>
> Key: PIG-3194
> URL: https://issues.apache.org/jira/browse/PIG-3194
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11
>Reporter: Kai Londenberg
>
> The changes to ObjectSerializer.java in the following commit
> http://svn.apache.org/viewvc?view=revision&revision=1403934 break 
> compatibility with Hadoop 0.20.2 Clusters.
> The reason is, that the code uses methods from Apache Commons Codec 1.4 - 
> which are not available in Apache Commons Codec 1.3 which is shipping with 
> Hadoop 0.20.2.
> The offending methods are Base64.decodeBase64(String) and 
> Base64.encodeBase64URLSafeString(byte[])
> If I revert these changes, Pig 0.11.0 candidate 2 works well with our Hadoop 
> 0.20.2 Clusters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3214) New/improved mascot

2013-03-05 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-3214:
---

Attachment: pig_6.JPG

I like the idea of number 4 too.
Here is my little contribution (#6) to it.
just to illustrate the idea.
(I do agree with Alan that we should only change if we have something much 
better)

> New/improved mascot
> ---
>
> Key: PIG-3214
> URL: https://issues.apache.org/jira/browse/PIG-3214
> Project: Pig
>  Issue Type: Wish
>  Components: site
>Affects Versions: 0.11
>Reporter: Andrew Musselman
>Priority: Minor
> Fix For: 0.12
>
> Attachments: newlogo1.png, newlogo2.png, newlogo3.png, newlogo4.png, 
> newlogo5.png, pig_6.JPG
>
>
> Request to change pig mascot to something more graphically appealing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3140) Document PigProgressNotificationListener configs

2013-01-28 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13564901#comment-13564901
 ] 

Julien Le Dem commented on PIG-3140:


+1

> Document PigProgressNotificationListener configs
> 
>
> Key: PIG-3140
> URL: https://issues.apache.org/jira/browse/PIG-3140
> Project: Pig
>  Issue Type: Bug
>Reporter: Bill Graham
>Assignee: Bill Graham
> Fix For: 0.11
>
> Attachments: PIG-3140_1.patch
>
>
> Add docs to describe what PPNL is and how to configure it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3139) Document reducer estimation

2013-01-28 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13564902#comment-13564902
 ] 

Julien Le Dem commented on PIG-3139:


+1

> Document reducer estimation
> ---
>
> Key: PIG-3139
> URL: https://issues.apache.org/jira/browse/PIG-3139
> Project: Pig
>  Issue Type: Bug
>Reporter: Bill Graham
>Assignee: Bill Graham
> Fix For: 0.11
>
> Attachments: PIG-3139_1.patch
>
>
> Add docs to describe how default reducer estimation algo works and how to 
> override it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3005) TestLargeFile#testOrderBy is failing

2013-01-22 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-3005:
---

Issue Type: Bug  (was: Sub-task)
Parent: (was: PIG-2972)

> TestLargeFile#testOrderBy is failing
> 
>
> Key: PIG-3005
> URL: https://issues.apache.org/jira/browse/PIG-3005
> Project: Pig
>  Issue Type: Bug
> Environment: Mac OSX 10.6.8
>Reporter: Jonathan Coveney
> Fix For: 0.12
>
>
> When run locally, at least, this test is failing for me.
> Has anyone else noticed this failing?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2846) Can we skip hcat related e2e when hcat is not installed?

2013-01-22 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560184#comment-13560184
 ] 

Julien Le Dem commented on PIG-2846:


Hey [~cheolsoo] should we detach this from Pig 0.11 ?

> Can we skip hcat related e2e when hcat is not installed?
> 
>
> Key: PIG-2846
> URL: https://issues.apache.org/jira/browse/PIG-2846
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Koji Noguchi
>Priority: Trivial
> Attachments: pig-2846-trunk-v1.txt
>
>
> Trying pig e2e for the first time, I see couple of the tests 
> (HCatDDL_1,HCatDDL_2 and Jython_Command_1) failing with 
> bq. java.io.IOException: Cannot run program /usr/local/hcat/bin/hcat:
> bq. java.io.IOException: error=2, No such file or directory
> Is it ok to change the test_harness to skip these tests when hcat does not 
> exist?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3105) Fix TestJobSubmission unit test failure.

2013-01-22 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-3105:
---

Fix Version/s: (was: 0.11)
   0.12

I'm moving this to the next release as it does not seem to be a blocker for Pig 
0.11


> Fix TestJobSubmission unit test failure.
> 
>
> Key: PIG-3105
> URL: https://issues.apache.org/jira/browse/PIG-3105
> Project: Pig
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 0.10.0
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Fix For: 0.12
>
> Attachments: PIG-3105.patch
>
>
> Currently with Hadoop 1.0, the TestJobSubmission unit test fails. This is due 
> to HBASE-7423. This is a work around to that issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3098) Add another test for the self join case

2013-01-17 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13556692#comment-13556692
 ] 

Julien Le Dem commented on PIG-3098:


one minor comment regarding asserts:
{noformat}
  assertEquals(tuples.size(), out.size());
  for (Tuple t : out) {
assertTrue(tuples.remove(t));
  }
  assertTrue(tuples.isEmpty());
{noformat}
if wrong it is not going to give much information.
please add a message as the first parameter with some info:
{noformat}
  assertEquals("tuple count for " + out, tuples.size(), out.size());
  for (Tuple t : out) {
assertTrue("existence of " + t, tuples.remove(t));
  }
  assertTrue("all tuples consumed in " + tuples, tuples.isEmpty());
{noformat}



> Add another test for the self join case
> ---
>
> Key: PIG-3098
> URL: https://issues.apache.org/jira/browse/PIG-3098
> Project: Pig
>  Issue Type: Bug
>Reporter: Jonathan Coveney
>Assignee: Jonathan Coveney
> Fix For: 0.12
>
> Attachments: PIG-3098-0.patch
>
>
> This adds a test to TestJoin that doesn't just make sure that self joins work 
> semantically in the parser, but also that it pulls the right data through. 
> Thought it'd be easier to just make a new JIRA than to reopen PIG-3020.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3082) outputSchema of a UDF allows two usages when describing a Tuple schema

2013-01-17 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13556686#comment-13556686
 ] 

Julien Le Dem commented on PIG-3082:


Thanks for fixing Jon!
I find the error message a little confusing:
{noformat}
 throw new FrontendException("Given UDF returns an improper Schema. Should only 
return Tuple, Bag, or a single item. Returns: " + udfSchema);
{noformat}
It should contain something along the lines of "... outputSchema should return 
a Schema containing a single Field ...".
Otherwise, it looks good to me.
Thanks

> outputSchema of a UDF allows two usages when describing a Tuple schema
> --
>
> Key: PIG-3082
> URL: https://issues.apache.org/jira/browse/PIG-3082
> Project: Pig
>  Issue Type: Bug
>Reporter: Julien Le Dem
>Assignee: Jonathan Coveney
> Fix For: 0.12
>
> Attachments: PIG-3082-0.patch
>
>
> When defining an evalfunc that returns a Tuple there are two ways you can 
> implement outputSchema().
> - The right way: return a schema that contains one Field that contains the 
> type and schema of the return type of the UDF
> - The unreliable way: return a schema that contains more than one field and 
> it will be understood as a tuple schema even though there is no type (which 
> is in Field class) to specify that. This is particularly deceitful when the 
> output schema is derived from the input schema and the outputted Tuple 
> sometimes contain only one field. In such cases Pig understands the output 
> schema as a tuple only if there is more than one field. And sometimes it 
> works, sometimes it does not.
> We should at least issue a warning (backward compatibility) if not plain 
> throw an exception when the output schema contains more than one Field.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3103) make mockito a test dependency (instead of compile)

2012-12-21 Thread Julien Le Dem (JIRA)
Julien Le Dem created PIG-3103:
--

 Summary: make mockito a test dependency (instead of compile)
 Key: PIG-3103
 URL: https://issues.apache.org/jira/browse/PIG-3103
 Project: Pig
  Issue Type: Bug
Reporter: Julien Le Dem




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-3076) make TestScalarAliases more reliable

2012-12-20 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem resolved PIG-3076.


  Resolution: Fixed
Release Note: committed to trunk and branch-0.11

> make TestScalarAliases more reliable
> 
>
> Key: PIG-3076
> URL: https://issues.apache.org/jira/browse/PIG-3076
> Project: Pig
>  Issue Type: Test
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Fix For: 0.11, 0.12
>
> Attachments: PIG-3076_1.patch, PIG-3076.patch
>
>
> currently, this test writes in the root directory so its output is not 
> deleted by ant clean.
> Also it deletes its output in the end instead of the begining.
> The consequence is that if the test fail once then it will keep failing until 
> the directory is manually cleaned up (not good for CI)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3076) make TestScalarAliases more reliable

2012-12-19 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-3076:
---

Attachment: PIG-3076_1.patch

PIG-3076_1.patch addresses comments

> make TestScalarAliases more reliable
> 
>
> Key: PIG-3076
> URL: https://issues.apache.org/jira/browse/PIG-3076
> Project: Pig
>  Issue Type: Test
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Fix For: 0.11, 0.12
>
> Attachments: PIG-3076_1.patch, PIG-3076.patch
>
>
> currently, this test writes in the root directory so its output is not 
> deleted by ant clean.
> Also it deletes its output in the end instead of the begining.
> The consequence is that if the test fail once then it will keep failing until 
> the directory is manually cleaned up (not good for CI)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-2493) UNION causes casting issues

2012-12-18 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem resolved PIG-2493.


Resolution: Fixed

I'm closing this issue as it has been committed and we are stabilizing a 
release.
[~arov] please open a new JIRA if you still see problems

> UNION causes casting issues
> ---
>
> Key: PIG-2493
> URL: https://issues.apache.org/jira/browse/PIG-2493
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.1, 0.10.0
>Reporter: Anitha Raju
>Assignee: Vivek Padmanabhan
> Fix For: 0.9.3, 0.11, 0.10.1
>
> Attachments: PIG-2493_2.patch, PIG-2493-3.patch, PIG-2493.patch
>
>
> Hi,
> For the below script,
> {code}
> A = load '/user/anithar/ip' as (a);
> B = load '/user/anithar/ip1' as (a);
> C = union  A , B ;
> D = foreach C generate (chararray)a;
> dump D;
> {code}
> it gives casting error at runtime
> {code}
> org.apache.pig.backend.executionengine.ExecException: ERROR 1075: Received a 
> bytearray from the UDF. Cannot determine how to convert the bytearray to 
> string.
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:660)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:322)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:332)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:284)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:267)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:262)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1082)
> at org.apache.hadoop.mapred.Child.main(Child.java:249)
> {code}
> It looks like in POCast.java the value of "funcSpec" is not getting any 
> value(stays null when there is a UNION involved), causing "caster" to get 
> null and thus the exception.
> The same works in 0.8 without any issue.
> Regards,
> Anitha

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-2583) Add Grunt command to list the statements in cache

2012-12-18 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem resolved PIG-2583.


Resolution: Fixed

> Add Grunt command to list the statements in cache
> -
>
> Key: PIG-2583
> URL: https://issues.apache.org/jira/browse/PIG-2583
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.10.0
>Reporter: Daniel Dai
>Assignee: Allan Avendaño
>Priority: Minor
>  Labels: newbie
> Fix For: 0.11
>
> Attachments: gruntHistory1.patch, gruntHistory2.patch, 
> gruntHistory3.patch, gruntHistory4.patch, gruntHistory.patch
>
>
> It is convenient to list statements in cache:
> grunt> a = load '1.txt'; 
> grunt> b = foreach a generate $0, $1;
> grunt> list
> a = load '1.txt';
> b = foreach a generate $0, $1;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2583) Add Grunt command to list the statements in cache

2012-12-18 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13535355#comment-13535355
 ] 

Julien Le Dem commented on PIG-2583:


[~xalan] I'm closing this ticket as it has been committed.
Please open a new ticket to further improve your contribution.
Thanks again


> Add Grunt command to list the statements in cache
> -
>
> Key: PIG-2583
> URL: https://issues.apache.org/jira/browse/PIG-2583
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.10.0
>Reporter: Daniel Dai
>Assignee: Allan Avendaño
>Priority: Minor
>  Labels: newbie
> Fix For: 0.11
>
> Attachments: gruntHistory1.patch, gruntHistory2.patch, 
> gruntHistory3.patch, gruntHistory4.patch, gruntHistory.patch
>
>
> It is convenient to list statements in cache:
> grunt> a = load '1.txt'; 
> grunt> b = foreach a generate $0, $1;
> grunt> list
> a = load '1.txt';
> b = foreach a generate $0, $1;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2614) AvroStorage crashes on LOADING a single bad error

2012-12-18 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-2614:
---

Fix Version/s: (was: 0.10.1)
   (was: 0.11)
   0.12

moving this to next release so that we can converge on pig 0.11

> AvroStorage crashes on LOADING a single bad error
> -
>
> Key: PIG-2614
> URL: https://issues.apache.org/jira/browse/PIG-2614
> Project: Pig
>  Issue Type: Bug
>  Components: piggybank
>Affects Versions: 0.10.0, 0.11
>Reporter: Russell Jurney
>Assignee: Jonathan Coveney
>  Labels: avro, avrostorage, bad, book, cutting, doug, for, my, 
> pig, sadism
> Fix For: 0.12
>
> Attachments: PIG-2614_0.patch, PIG-2614_1.patch, PIG-2614_2.patch, 
> test_avro_files.tar.gz
>
>
> AvroStorage dies when a single bad record exists, such as one with missing 
> fields.  This is very bad on 'big data,' where bad records are inevitable.  
> See discussion at 
> http://www.quora.com/Big-Data/In-Big-Data-ETL-how-many-records-are-an-acceptable-loss
>  for more theory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2927) SHIP and use JRuby gems in JRuby UDFs

2012-12-18 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-2927:
---

Fix Version/s: (was: 0.11)
   0.12

This will go in the next release as we are stabilizing the 0.11 branch

> SHIP and use JRuby gems in JRuby UDFs
> -
>
> Key: PIG-2927
> URL: https://issues.apache.org/jira/browse/PIG-2927
> Project: Pig
>  Issue Type: New Feature
>  Components: parser
>Affects Versions: 0.11
> Environment: JRuby UDFs
>Reporter: Russell Jurney
>Assignee: Jonathan Coveney
>Priority: Minor
> Fix For: 0.12
>
> Attachments: PIG-2927-0.patch, PIG-2927-1.patch, PIG-2927-2.patch, 
> PIG-2927-3.patch, PIG-2927-4.patch
>
>
> It would be great to use JRuby gems in JRuby UDFs without installing them on 
> all machines on the cluster. Some way to SHIP them automatically with the job 
> would be great.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2955) Fix bunch of Pig e2e tests on Windows

2012-12-18 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13535339#comment-13535339
 ] 

Julien Le Dem commented on PIG-2955:


Daniel, do you want to check that in?

>  Fix bunch of Pig e2e tests on Windows 
> ---
>
> Key: PIG-2955
> URL: https://issues.apache.org/jira/browse/PIG-2955
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.11, 0.10.1, 0.12
>
> Attachments: PIG-2955-1.patch, PIG-2955-2_0.10.patch, PIG-2955-2.patch
>
>
> Fix the following test aborts and failures:
> ComputeSpec_1
> ComputeSpec_2
> Unicode_cmdline_1
> Warning_1
> Warning_4
> Checkin_2
> UdfDistributedCache_1
> Jython_Checkin_2
> Jython_Diagnostics_4
> Jython_Diagnostics_5
> Jython_Diagnostics_6
> Jython_Error_3
> Jython_Error_4
> Jython_Error_5
> Jython_Error_6
> Jython_Error_7
> Grunt_6
> Grunt_8
> Grunt_13
> Grunt_14

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2954) TestParamSubPreproc still depends on "bash" to run

2012-12-18 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13535340#comment-13535340
 ] 

Julien Le Dem commented on PIG-2954:


is this still on target for pig-0.11?

>  TestParamSubPreproc still depends on "bash" to run 
> 
>
> Key: PIG-2954
> URL: https://issues.apache.org/jira/browse/PIG-2954
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.11
>
> Attachments: PIG-2954-1.patch, PIG-2954-2.patch
>
>
> If bash is not exist in path, there are 3 test failures.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2956) Invalid cache specification for some streaming statement

2012-12-18 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13535335#comment-13535335
 ] 

Julien Le Dem commented on PIG-2956:


Daniel? any update on this?

> Invalid cache specification for some streaming statement
> 
>
> Key: PIG-2956
> URL: https://issues.apache.org/jira/browse/PIG-2956
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.11
>
> Attachments: PIG-2956-1_0.10.patch, PIG-2956-1.patch
>
>
> Another category of failure in e2e tests, such as ComputeSpec_1, 
> ComputeSpec_2, ComputeSpec_3, RaceConditions_1, RaceConditions_3, 
> RaceConditions_4, RaceConditions_7, RaceConditions_8.
> Here is stack:
> ERROR 6003: Invalid cache specification. File doesn't exist: C:/Program Files 
> (x86)/GnuWin32/bin/head.exe
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException:
>  ERROR 2017: Internal error creating job configuration.
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:723)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:258)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:151)
> at org.apache.pig.PigServer.launchPlan(PigServer.java:1318)
> at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1303)
> at org.apache.pig.PigServer.execute(PigServer.java:1293)
> at org.apache.pig.PigServer.executeBatch(PigServer.java:364)
> at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:133)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
> at org.apache.pig.Main.run(Main.java:561)
> at org.apache.pig.Main.main(Main.java:111)
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 6003: 
> Invalid cache specification. File doesn't exist: C:/Program Files 
> (x86)/GnuWin32/bin/head.exe
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.setupDistributedCache(JobControlCompiler.java:1151)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.setupDistributedCache(JobControlCompiler.java:1129)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:447)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2957) TetsScriptUDF fail due to volume prefix in jar

2012-12-18 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13535249#comment-13535249
 ] 

Julien Le Dem commented on PIG-2957:


could you call the method something more explicit than "cleanupPath". Something 
like "getPathForJar" maybe?
Also add comments to explain what exactly this is doing:
{noformat}
if (path.charAt(1)==':') {
newPath = path.charAt(0) + path.substring(2);
}
{noformat}
It would be useful to describe what it is changing in the path and why.
In particular the drive letter becomes a root dir in the jar (C:/foo becomes 
C/foo). If that's what we want then it should be clearer.

> TetsScriptUDF fail due to volume prefix in jar
> --
>
> Key: PIG-2957
> URL: https://issues.apache.org/jira/browse/PIG-2957
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.11
>
> Attachments: PIG-2957-1.patch, PIG-2957-2_0.10.patch, PIG-2957-2.patch
>
>
> testPythonAbsolutePath fail. Stack is:
> java.io.IOException: Mkdirs failed to create 
> C:\tmp\hadoop-Administrator\mapred\local\1_0\taskTracker\Administrator\jobcache\job_20120725074728013_0011\jars\C:\Users\Administrator\pig-monarch
> at org.apache.hadoop.util.RunJar.unJar(RunJar.java:47)
> at 
> org.apache.hadoop.mapred.JobLocalizer.localizeJobJarFile(JobLocalizer.java:277)
> at 
> org.apache.hadoop.mapred.JobLocalizer.localizeJobFiles(JobLocalizer.java:377)
> at 
> org.apache.hadoop.mapred.JobLocalizer.localizeJobFiles(JobLocalizer.java:367)
> at 
> org.apache.hadoop.mapred.DefaultTaskController.initializeJob(DefaultTaskController.java:214)
> at org.apache.hadoop.mapred.TaskTracker$4.run(TaskTracker.java:1237)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1107)
> at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1212)
> at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1127)
> at org.apache.hadoop.mapred.TaskTracker$5.run(TaskTracker.java:2417)
> at java.lang.Thread.run(Thread.java:662)
> The reason is we pack the volume prefix into the job.jar.
> jar tvf C:\Users\ADMINI~1\AppData\Local\Temp\Job6350
> 669482684441868.jar|grep testPythonAbsolutePath
> 98 Wed Jul 25 11:12:58 PDT 2012 C:\Users\Administrator\pig-monarch\testPytho
> nAbsolutePath.py

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2959) Add a pig.cmd for Pig to run under Windows

2012-12-18 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13535220#comment-13535220
 ] 

Julien Le Dem commented on PIG-2959:


hey Daniel, are you going to commit this?

> Add a pig.cmd for Pig to run under Windows
> --
>
> Key: PIG-2959
> URL: https://issues.apache.org/jira/browse/PIG-2959
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.11
>
> Attachments: pig.cmd
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3020) "Duplicate uid in schema" error when joining two relations derived from the same load statement

2012-12-17 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13534401#comment-13534401
 ] 

Julien Le Dem commented on PIG-3020:


looks good to me
+1

> "Duplicate uid in schema" error when joining two relations derived from the 
> same load statement
> ---
>
> Key: PIG-3020
> URL: https://issues.apache.org/jira/browse/PIG-3020
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Attachments: PIG-3020-2.patch, PIG-3020-2_ws.patch, 
> PIG-3020_branch-0.11_1.patch, PIG-3020.patch, PIG-3093-testcase.patch
>
>
> The following validates OK with pig 0.9 and fails with the following error in 
> 0.11 (and I suspect 0.10)
> pig -c debug2.pig
> Script: debug2.pig
> {noformat}
> A = LOAD 'foo' AS (group:tuple(uid, dst_id), uids_with_recs:bag{} , 
> uids_with_flock:bag{});
> edges_both = FILTER A BY NOT IsEmpty(uids_with_recs) AND NOT 
> IsEmpty(uids_with_flock);
> edges_both = FOREACH edges_both GENERATE
> group.uid AS src_id,
> group.dst_id AS dst_id;
> both_counts = GROUP edges_both BY src_id;
> both_counts = FOREACH both_counts GENERATE
> group AS src_id, SIZE(edges_both) AS size_both;
> edges_bq = FILTER A BY NOT IsEmpty(uids_with_recs);
> edges_bq = FOREACH edges_bq GENERATE
> group.uid AS src_id,
> group.dst_id AS dst_id;
> bq_counts = GROUP edges_bq BY src_id;
> bq_counts = FOREACH bq_counts GENERATE
> group AS src_id, SIZE(edges_bq) AS size_bq;
> per_user_set_sizes = JOIN bq_counts BY src_id LEFT OUTER, both_counts BY 
> src_id;
> store per_user_set_sizes into  'foo';
> {noformat}
> Error:
> {noformat}
> ERROR 2270: Logical plan invalid state: duplicate uid in schema : 
> bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to 
> explain alias null
>   at org.apache.pig.PigServer.explain(PigServer.java:999)
>   at 
> org.apache.pig.tools.grunt.GruntParser.explainCurrentBatch(GruntParser.java:398)
>   at 
> org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:330)
>   at org.apache.pig.tools.grunt.Grunt.checkScript(Grunt.java:98)
>   at org.apache.pig.Main.run(Main.java:600)
>   at org.apache.pig.Main.main(Main.java:154)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
> Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: 
> Error processing rule LoadTypeCastInserter
>   at 
> org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:122)
>   at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:277)
>   at org.apache.pig.PigServer.compilePp(PigServer.java:1322)
>   at org.apache.pig.PigServer.explain(PigServer.java:984)
>   ... 10 more
> Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2270: 
> Logical plan invalid state: duplicate uid in schema : 
> bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long
>   at 
> org.apache.pig.newplan.logical.optimizer.SchemaResetter.validate(SchemaResetter.java:232)
>   at 
> org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:105)
>   at 
> org.apache.pig.newplan.logical.relational.LOJoin.accept(LOJoin.java:171)
>   at 
> org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
>   at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
>   at 
> org.apache.pig.newplan.logical.optimizer.SchemaPatcher.transformed(SchemaPatcher.java:43)
>   at 
> org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:113)
>   ... 13 more
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3020) "Duplicate uid in schema" error when joining two relations derived from the same load statement

2012-12-13 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-3020:
---

Attachment: PIG-3020_branch-0.11_1.patch

> "Duplicate uid in schema" error when joining two relations derived from the 
> same load statement
> ---
>
> Key: PIG-3020
> URL: https://issues.apache.org/jira/browse/PIG-3020
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Attachments: PIG-3020_branch-0.11_1.patch, PIG-3020.patch
>
>
> The following validates OK with pig 0.9 and fails with the following error in 
> 0.11 (and I suspect 0.10)
> pig -c debug2.pig
> Script: debug2.pig
> {noformat}
> A = LOAD 'foo' AS (group:tuple(uid, dst_id), uids_with_recs:bag{} , 
> uids_with_flock:bag{});
> edges_both = FILTER A BY NOT IsEmpty(uids_with_recs) AND NOT 
> IsEmpty(uids_with_flock);
> edges_both = FOREACH edges_both GENERATE
> group.uid AS src_id,
> group.dst_id AS dst_id;
> both_counts = GROUP edges_both BY src_id;
> both_counts = FOREACH both_counts GENERATE
> group AS src_id, SIZE(edges_both) AS size_both;
> edges_bq = FILTER A BY NOT IsEmpty(uids_with_recs);
> edges_bq = FOREACH edges_bq GENERATE
> group.uid AS src_id,
> group.dst_id AS dst_id;
> bq_counts = GROUP edges_bq BY src_id;
> bq_counts = FOREACH bq_counts GENERATE
> group AS src_id, SIZE(edges_bq) AS size_bq;
> per_user_set_sizes = JOIN bq_counts BY src_id LEFT OUTER, both_counts BY 
> src_id;
> store per_user_set_sizes into  'foo';
> {noformat}
> Error:
> {noformat}
> ERROR 2270: Logical plan invalid state: duplicate uid in schema : 
> bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to 
> explain alias null
>   at org.apache.pig.PigServer.explain(PigServer.java:999)
>   at 
> org.apache.pig.tools.grunt.GruntParser.explainCurrentBatch(GruntParser.java:398)
>   at 
> org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:330)
>   at org.apache.pig.tools.grunt.Grunt.checkScript(Grunt.java:98)
>   at org.apache.pig.Main.run(Main.java:600)
>   at org.apache.pig.Main.main(Main.java:154)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
> Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: 
> Error processing rule LoadTypeCastInserter
>   at 
> org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:122)
>   at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:277)
>   at org.apache.pig.PigServer.compilePp(PigServer.java:1322)
>   at org.apache.pig.PigServer.explain(PigServer.java:984)
>   ... 10 more
> Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2270: 
> Logical plan invalid state: duplicate uid in schema : 
> bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long
>   at 
> org.apache.pig.newplan.logical.optimizer.SchemaResetter.validate(SchemaResetter.java:232)
>   at 
> org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:105)
>   at 
> org.apache.pig.newplan.logical.relational.LOJoin.accept(LOJoin.java:171)
>   at 
> org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
>   at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
>   at 
> org.apache.pig.newplan.logical.optimizer.SchemaPatcher.transformed(SchemaPatcher.java:43)
>   at 
> org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:113)
>   ... 13 more
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3020) "Duplicate uid in schema" error when joining two relations derived from the same load statement

2012-12-13 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-3020:
---

Description: 
The following validates OK with pig 0.9 and fails with the following error in 
0.11 (and I suspect 0.10)

pig -c debug2.pig

Script: debug2.pig
{noformat}
A = LOAD 'foo' AS (group:tuple(uid, dst_id), uids_with_recs:bag{} , 
uids_with_flock:bag{});
edges_both = FILTER A BY NOT IsEmpty(uids_with_recs) AND NOT 
IsEmpty(uids_with_flock);
edges_both = FOREACH edges_both GENERATE
group.uid AS src_id,
group.dst_id AS dst_id;
both_counts = GROUP edges_both BY src_id;
both_counts = FOREACH both_counts GENERATE
group AS src_id, SIZE(edges_both) AS size_both;

edges_bq = FILTER A BY NOT IsEmpty(uids_with_recs);
edges_bq = FOREACH edges_bq GENERATE
group.uid AS src_id,
group.dst_id AS dst_id;
bq_counts = GROUP edges_bq BY src_id;
bq_counts = FOREACH bq_counts GENERATE
group AS src_id, SIZE(edges_bq) AS size_bq;

per_user_set_sizes = JOIN bq_counts BY src_id LEFT OUTER, both_counts BY src_id;
store per_user_set_sizes into  'foo';
{noformat}

Error:
{noformat}
ERROR 2270: Logical plan invalid state: duplicate uid in schema : 
bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to 
explain alias null
at org.apache.pig.PigServer.explain(PigServer.java:999)
at 
org.apache.pig.tools.grunt.GruntParser.explainCurrentBatch(GruntParser.java:398)
at 
org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:330)
at org.apache.pig.tools.grunt.Grunt.checkScript(Grunt.java:98)
at org.apache.pig.Main.run(Main.java:600)
at org.apache.pig.Main.main(Main.java:154)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: 
Error processing rule LoadTypeCastInserter
at 
org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:122)
at 
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:277)
at org.apache.pig.PigServer.compilePp(PigServer.java:1322)
at org.apache.pig.PigServer.explain(PigServer.java:984)
... 10 more
Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2270: 
Logical plan invalid state: duplicate uid in schema : 
bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long
at 
org.apache.pig.newplan.logical.optimizer.SchemaResetter.validate(SchemaResetter.java:232)
at 
org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:105)
at 
org.apache.pig.newplan.logical.relational.LOJoin.accept(LOJoin.java:171)
at 
org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
at 
org.apache.pig.newplan.logical.optimizer.SchemaPatcher.transformed(SchemaPatcher.java:43)
at 
org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:113)
... 13 more
{noformat}

  was:
The following vali=dates OK with pig 0.9 and fails with the following error in 
0.11 (and I suspect 0.10)

pig -c debug2.pig

Script: debug2.pig
{noformat}
A = LOAD 'foo' AS (group:tuple(uid, dst_id), uids_with_recs:bag{} , 
uids_with_flock:bag{});
edges_both = FILTER A BY NOT IsEmpty(uids_with_recs) AND NOT 
IsEmpty(uids_with_flock);
edges_both = FOREACH edges_both GENERATE
group.uid AS src_id,
group.dst_id AS dst_id;
both_counts = GROUP edges_both BY src_id;
both_counts = FOREACH both_counts GENERATE
group AS src_id, SIZE(edges_both) AS size_both;

edges_bq = FILTER A BY NOT IsEmpty(uids_with_recs);
edges_bq = FOREACH edges_bq GENERATE
group.uid AS src_id,
group.dst_id AS dst_id;
bq_counts = GROUP edges_bq BY src_id;
bq_counts = FOREACH bq_counts GENERATE
group AS src_id, SIZE(edges_bq) AS size_bq;

per_user_set_sizes = JOIN bq_counts BY src_id LEFT OUTER, both_counts BY src_id;
store per_user_set_sizes into  'foo';
{noformat}

Error:
{noformat}
ERROR 2270: Logical plan invalid state: duplicate uid in schema : 
bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to 
explain alias null
at org.apache.pig.PigSer

[jira] [Updated] (PIG-3020) "Duplicate uid in schema" error when joining two relations derived from the same load statement

2012-12-13 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-3020:
---

Patch Info: Patch Available

> "Duplicate uid in schema" error when joining two relations derived from the 
> same load statement
> ---
>
> Key: PIG-3020
> URL: https://issues.apache.org/jira/browse/PIG-3020
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Attachments: PIG-3020.patch
>
>
> The following vali=dates OK with pig 0.9 and fails with the following error 
> in 0.11 (and I suspect 0.10)
> pig -c debug2.pig
> Script: debug2.pig
> {noformat}
> A = LOAD 'foo' AS (group:tuple(uid, dst_id), uids_with_recs:bag{} , 
> uids_with_flock:bag{});
> edges_both = FILTER A BY NOT IsEmpty(uids_with_recs) AND NOT 
> IsEmpty(uids_with_flock);
> edges_both = FOREACH edges_both GENERATE
> group.uid AS src_id,
> group.dst_id AS dst_id;
> both_counts = GROUP edges_both BY src_id;
> both_counts = FOREACH both_counts GENERATE
> group AS src_id, SIZE(edges_both) AS size_both;
> edges_bq = FILTER A BY NOT IsEmpty(uids_with_recs);
> edges_bq = FOREACH edges_bq GENERATE
> group.uid AS src_id,
> group.dst_id AS dst_id;
> bq_counts = GROUP edges_bq BY src_id;
> bq_counts = FOREACH bq_counts GENERATE
> group AS src_id, SIZE(edges_bq) AS size_bq;
> per_user_set_sizes = JOIN bq_counts BY src_id LEFT OUTER, both_counts BY 
> src_id;
> store per_user_set_sizes into  'foo';
> {noformat}
> Error:
> {noformat}
> ERROR 2270: Logical plan invalid state: duplicate uid in schema : 
> bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to 
> explain alias null
>   at org.apache.pig.PigServer.explain(PigServer.java:999)
>   at 
> org.apache.pig.tools.grunt.GruntParser.explainCurrentBatch(GruntParser.java:398)
>   at 
> org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:330)
>   at org.apache.pig.tools.grunt.Grunt.checkScript(Grunt.java:98)
>   at org.apache.pig.Main.run(Main.java:600)
>   at org.apache.pig.Main.main(Main.java:154)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
> Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: 
> Error processing rule LoadTypeCastInserter
>   at 
> org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:122)
>   at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:277)
>   at org.apache.pig.PigServer.compilePp(PigServer.java:1322)
>   at org.apache.pig.PigServer.explain(PigServer.java:984)
>   ... 10 more
> Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2270: 
> Logical plan invalid state: duplicate uid in schema : 
> bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long
>   at 
> org.apache.pig.newplan.logical.optimizer.SchemaResetter.validate(SchemaResetter.java:232)
>   at 
> org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:105)
>   at 
> org.apache.pig.newplan.logical.relational.LOJoin.accept(LOJoin.java:171)
>   at 
> org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
>   at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
>   at 
> org.apache.pig.newplan.logical.optimizer.SchemaPatcher.transformed(SchemaPatcher.java:43)
>   at 
> org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:113)
>   ... 13 more
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (PIG-3020) "Duplicate uid in schema" error when joining two relations derived from the same load statement

2012-12-13 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem reassigned PIG-3020:
--

Assignee: Julien Le Dem

> "Duplicate uid in schema" error when joining two relations derived from the 
> same load statement
> ---
>
> Key: PIG-3020
> URL: https://issues.apache.org/jira/browse/PIG-3020
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Attachments: PIG-3020.patch
>
>
> The following vali=dates OK with pig 0.9 and fails with the following error 
> in 0.11 (and I suspect 0.10)
> pig -c debug2.pig
> Script: debug2.pig
> {noformat}
> A = LOAD 'foo' AS (group:tuple(uid, dst_id), uids_with_recs:bag{} , 
> uids_with_flock:bag{});
> edges_both = FILTER A BY NOT IsEmpty(uids_with_recs) AND NOT 
> IsEmpty(uids_with_flock);
> edges_both = FOREACH edges_both GENERATE
> group.uid AS src_id,
> group.dst_id AS dst_id;
> both_counts = GROUP edges_both BY src_id;
> both_counts = FOREACH both_counts GENERATE
> group AS src_id, SIZE(edges_both) AS size_both;
> edges_bq = FILTER A BY NOT IsEmpty(uids_with_recs);
> edges_bq = FOREACH edges_bq GENERATE
> group.uid AS src_id,
> group.dst_id AS dst_id;
> bq_counts = GROUP edges_bq BY src_id;
> bq_counts = FOREACH bq_counts GENERATE
> group AS src_id, SIZE(edges_bq) AS size_bq;
> per_user_set_sizes = JOIN bq_counts BY src_id LEFT OUTER, both_counts BY 
> src_id;
> store per_user_set_sizes into  'foo';
> {noformat}
> Error:
> {noformat}
> ERROR 2270: Logical plan invalid state: duplicate uid in schema : 
> bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to 
> explain alias null
>   at org.apache.pig.PigServer.explain(PigServer.java:999)
>   at 
> org.apache.pig.tools.grunt.GruntParser.explainCurrentBatch(GruntParser.java:398)
>   at 
> org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:330)
>   at org.apache.pig.tools.grunt.Grunt.checkScript(Grunt.java:98)
>   at org.apache.pig.Main.run(Main.java:600)
>   at org.apache.pig.Main.main(Main.java:154)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
> Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: 
> Error processing rule LoadTypeCastInserter
>   at 
> org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:122)
>   at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:277)
>   at org.apache.pig.PigServer.compilePp(PigServer.java:1322)
>   at org.apache.pig.PigServer.explain(PigServer.java:984)
>   ... 10 more
> Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2270: 
> Logical plan invalid state: duplicate uid in schema : 
> bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long
>   at 
> org.apache.pig.newplan.logical.optimizer.SchemaResetter.validate(SchemaResetter.java:232)
>   at 
> org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:105)
>   at 
> org.apache.pig.newplan.logical.relational.LOJoin.accept(LOJoin.java:171)
>   at 
> org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
>   at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
>   at 
> org.apache.pig.newplan.logical.optimizer.SchemaPatcher.transformed(SchemaPatcher.java:43)
>   at 
> org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:113)
>   ... 13 more
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3020) "Duplicate uid in schema" error when joining two relations derived from the same load statement

2012-12-12 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13530255#comment-13530255
 ] 

Julien Le Dem commented on PIG-3020:


[~dvryaboy] I just noticed it was logging a warning with a NullPointerException 
when running tests from eclipse. I just fixed the log line to something 
clearer. It is not related but I feel it is small enough to be done here.
[~jcoveney] I also added a unit test with a pig script that was failing before 
and works now to validate my change.

> "Duplicate uid in schema" error when joining two relations derived from the 
> same load statement
> ---
>
> Key: PIG-3020
> URL: https://issues.apache.org/jira/browse/PIG-3020
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11
>Reporter: Julien Le Dem
> Attachments: PIG-3020.patch
>
>
> The following vali=dates OK with pig 0.9 and fails with the following error 
> in 0.11 (and I suspect 0.10)
> pig -c debug2.pig
> Script: debug2.pig
> {noformat}
> A = LOAD 'foo' AS (group:tuple(uid, dst_id), uids_with_recs:bag{} , 
> uids_with_flock:bag{});
> edges_both = FILTER A BY NOT IsEmpty(uids_with_recs) AND NOT 
> IsEmpty(uids_with_flock);
> edges_both = FOREACH edges_both GENERATE
> group.uid AS src_id,
> group.dst_id AS dst_id;
> both_counts = GROUP edges_both BY src_id;
> both_counts = FOREACH both_counts GENERATE
> group AS src_id, SIZE(edges_both) AS size_both;
> edges_bq = FILTER A BY NOT IsEmpty(uids_with_recs);
> edges_bq = FOREACH edges_bq GENERATE
> group.uid AS src_id,
> group.dst_id AS dst_id;
> bq_counts = GROUP edges_bq BY src_id;
> bq_counts = FOREACH bq_counts GENERATE
> group AS src_id, SIZE(edges_bq) AS size_bq;
> per_user_set_sizes = JOIN bq_counts BY src_id LEFT OUTER, both_counts BY 
> src_id;
> store per_user_set_sizes into  'foo';
> {noformat}
> Error:
> {noformat}
> ERROR 2270: Logical plan invalid state: duplicate uid in schema : 
> bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to 
> explain alias null
>   at org.apache.pig.PigServer.explain(PigServer.java:999)
>   at 
> org.apache.pig.tools.grunt.GruntParser.explainCurrentBatch(GruntParser.java:398)
>   at 
> org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:330)
>   at org.apache.pig.tools.grunt.Grunt.checkScript(Grunt.java:98)
>   at org.apache.pig.Main.run(Main.java:600)
>   at org.apache.pig.Main.main(Main.java:154)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
> Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: 
> Error processing rule LoadTypeCastInserter
>   at 
> org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:122)
>   at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:277)
>   at org.apache.pig.PigServer.compilePp(PigServer.java:1322)
>   at org.apache.pig.PigServer.explain(PigServer.java:984)
>   ... 10 more
> Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2270: 
> Logical plan invalid state: duplicate uid in schema : 
> bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long
>   at 
> org.apache.pig.newplan.logical.optimizer.SchemaResetter.validate(SchemaResetter.java:232)
>   at 
> org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:105)
>   at 
> org.apache.pig.newplan.logical.relational.LOJoin.accept(LOJoin.java:171)
>   at 
> org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
>   at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
>   at 
> org.apache.pig.newplan.logical.optimizer.SchemaPatcher.transformed(SchemaPatcher.java:43)
>   at 
> org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:113)
>   ... 13 more
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-3084) Improve exceptions messages in POPackage

2012-12-07 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem resolved PIG-3084.


   Resolution: Fixed
Fix Version/s: 0.12

> Improve exceptions messages in POPackage
> 
>
> Key: PIG-3084
> URL: https://issues.apache.org/jira/browse/PIG-3084
> Project: Pig
>  Issue Type: Bug
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Fix For: 0.12
>
> Attachments: PIG-3084_1.patch, PIG-3084.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3084) Improve exceptions messages in POPackage

2012-12-07 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-3084:
---

Attachment: PIG-3084_1.patch

PIG-3084_1.patch same patch with white space adjusted

> Improve exceptions messages in POPackage
> 
>
> Key: PIG-3084
> URL: https://issues.apache.org/jira/browse/PIG-3084
> Project: Pig
>  Issue Type: Bug
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Attachments: PIG-3084_1.patch, PIG-3084.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3084) Improve exceptions messages in POPackage

2012-12-07 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-3084:
---

Attachment: PIG-3084.patch

better exception in PIG-3084.patch

> Improve exceptions messages in POPackage
> 
>
> Key: PIG-3084
> URL: https://issues.apache.org/jira/browse/PIG-3084
> Project: Pig
>  Issue Type: Bug
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Attachments: PIG-3084.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3084) Improve exceptions messages in POPackage

2012-12-07 Thread Julien Le Dem (JIRA)
Julien Le Dem created PIG-3084:
--

 Summary: Improve exceptions messages in POPackage
 Key: PIG-3084
 URL: https://issues.apache.org/jira/browse/PIG-3084
 Project: Pig
  Issue Type: Bug
Reporter: Julien Le Dem
Assignee: Julien Le Dem




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3020) "Duplicate uid in schema" error when joining two relations derived from the same load statement

2012-12-07 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-3020:
---

Attachment: PIG-3020.patch

PIG-3020.patch fixes the issue


> "Duplicate uid in schema" error when joining two relations derived from the 
> same load statement
> ---
>
> Key: PIG-3020
> URL: https://issues.apache.org/jira/browse/PIG-3020
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11
>Reporter: Julien Le Dem
> Attachments: PIG-3020.patch
>
>
> The following vali=dates OK with pig 0.9 and fails with the following error 
> in 0.11 (and I suspect 0.10)
> pig -c debug2.pig
> Script: debug2.pig
> {noformat}
> A = LOAD 'foo' AS (group:tuple(uid, dst_id), uids_with_recs:bag{} , 
> uids_with_flock:bag{});
> edges_both = FILTER A BY NOT IsEmpty(uids_with_recs) AND NOT 
> IsEmpty(uids_with_flock);
> edges_both = FOREACH edges_both GENERATE
> group.uid AS src_id,
> group.dst_id AS dst_id;
> both_counts = GROUP edges_both BY src_id;
> both_counts = FOREACH both_counts GENERATE
> group AS src_id, SIZE(edges_both) AS size_both;
> edges_bq = FILTER A BY NOT IsEmpty(uids_with_recs);
> edges_bq = FOREACH edges_bq GENERATE
> group.uid AS src_id,
> group.dst_id AS dst_id;
> bq_counts = GROUP edges_bq BY src_id;
> bq_counts = FOREACH bq_counts GENERATE
> group AS src_id, SIZE(edges_bq) AS size_bq;
> per_user_set_sizes = JOIN bq_counts BY src_id LEFT OUTER, both_counts BY 
> src_id;
> store per_user_set_sizes into  'foo';
> {noformat}
> Error:
> {noformat}
> ERROR 2270: Logical plan invalid state: duplicate uid in schema : 
> bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to 
> explain alias null
>   at org.apache.pig.PigServer.explain(PigServer.java:999)
>   at 
> org.apache.pig.tools.grunt.GruntParser.explainCurrentBatch(GruntParser.java:398)
>   at 
> org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:330)
>   at org.apache.pig.tools.grunt.Grunt.checkScript(Grunt.java:98)
>   at org.apache.pig.Main.run(Main.java:600)
>   at org.apache.pig.Main.main(Main.java:154)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
> Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: 
> Error processing rule LoadTypeCastInserter
>   at 
> org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:122)
>   at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:277)
>   at org.apache.pig.PigServer.compilePp(PigServer.java:1322)
>   at org.apache.pig.PigServer.explain(PigServer.java:984)
>   ... 10 more
> Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2270: 
> Logical plan invalid state: duplicate uid in schema : 
> bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long
>   at 
> org.apache.pig.newplan.logical.optimizer.SchemaResetter.validate(SchemaResetter.java:232)
>   at 
> org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:105)
>   at 
> org.apache.pig.newplan.logical.relational.LOJoin.accept(LOJoin.java:171)
>   at 
> org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
>   at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
>   at 
> org.apache.pig.newplan.logical.optimizer.SchemaPatcher.transformed(SchemaPatcher.java:43)
>   at 
> org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:113)
>   ... 13 more
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3082) outputSchema of a UDF allows two usages when describing a Tuple schema

2012-12-06 Thread Julien Le Dem (JIRA)
Julien Le Dem created PIG-3082:
--

 Summary: outputSchema of a UDF allows two usages when describing a 
Tuple schema
 Key: PIG-3082
 URL: https://issues.apache.org/jira/browse/PIG-3082
 Project: Pig
  Issue Type: Bug
Reporter: Julien Le Dem


When defining an evalfunc that returns a Tuple there are two ways you can 
implement outputSchema().
- The right way: return a schema that contains one Field that contains the type 
and schema of the return type of the UDF
- The unreliable way: return a schema that contains more than one field and it 
will be understood as a tuple schema even though there is no type (which is in 
Field class) to specify that. This is particularly deceitful when the output 
schema is derived from the input schema and the outputted Tuple sometimes 
contain only one field. In such cases Pig understands the output schema as a 
tuple only if there is more than one field. And sometimes it works, sometimes 
it does not.

We should at least issue a warning (backward compatibility) if not plain throw 
an exception when the output schema contains more than one Field.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2204) Allow passing arguments to custom Partitioners

2012-12-06 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13525819#comment-13525819
 ] 

Julien Le Dem commented on PIG-2204:


maybe just updating the doc to mention this?

> Allow passing arguments to custom Partitioners
> --
>
> Key: PIG-2204
> URL: https://issues.apache.org/jira/browse/PIG-2204
> Project: Pig
>  Issue Type: Improvement
>Reporter: Dmitriy V. Ryaboy
>
> Currently, this works:
> {code}
> y = group x by $0 partition by MyPartitioner PARALLEL 2;
> {code}
> However, passing an argument to the partitioner constructor does not work, 
> and dies with a misleading error:
> {code}
> y = group x by $0 partition by MyPartitioner(0) PARALLEL 2;
> 2011-08-03 22:53:23,074 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1000: Error during parsing. Encountered " "(" "( "" at line 1, column 91.
> Was expecting one of:
> "parallel" ...
> ";" ...
> "." ...
> "$" ...
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


  1   2   3   4   5   >