[jira] [Commented] (PIG-4059) Pig on Spark
[ https://issues.apache.org/jira/browse/PIG-4059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16029733#comment-16029733 ] Julien Le Dem commented on PIG-4059: Congrats all! Thanks [~rohini]! > Pig on Spark > > > Key: PIG-4059 > URL: https://issues.apache.org/jira/browse/PIG-4059 > Project: Pig > Issue Type: New Feature > Components: spark >Reporter: Rohini Palaniswamy >Assignee: Praveen Rachabattuni > Labels: spork > Fix For: spark-branch, 0.17.0 > > Attachments: Pig-on-Spark-Design-Doc.pdf, Pig-on-Spark-Scope.pdf > > > Setting up your development environment: > 0. download spark release package(currently pig on spark only support spark > 1.6). > 1. Check out Pig Spark branch. > 2. Build Pig by running "ant jar" and "ant -Dhadoopversion=23 jar" for > hadoop-2.x versions > 3. Configure these environmental variables: > export HADOOP_USER_CLASSPATH_FIRST="true" > Now we support “local” and "yarn-client" mode, you can export system variable > “SPARK_MASTER” like: > export SPARK_MASTER=local or export SPARK_MASTER="yarn-client" > 4. In local mode: ./pig -x spark_local xxx.pig > In yarn-client mode: > export SPARK_HOME=xx; > export SPARK_JAR=hdfs://example.com:8020/ (the hdfs location where > you upload the spark-assembly*.jar) > ./pig -x spark xxx.pig -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (PIG-2845) Configure hadoop.tmp.dir under build/tmp for MiniCluster tests
[ https://issues.apache.org/jira/browse/PIG-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem reassigned PIG-2845: -- Assignee: (was: Julien Le Dem) > Configure hadoop.tmp.dir under build/tmp for MiniCluster tests > -- > > Key: PIG-2845 > URL: https://issues.apache.org/jira/browse/PIG-2845 > Project: Pig > Issue Type: Bug >Reporter: Julien Le Dem > Attachments: PIG-2845_0.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-3764) Compile physical operators to bytecode
[ https://issues.apache.org/jira/browse/PIG-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15365633#comment-15365633 ] Julien Le Dem commented on PIG-3764: [~rohini] This sounds good. It does not have to be totally inlined since the JIT will inline method calls, you want to avoid virtual calls though. My prototype is still out there [1]. One thing it did not take into account is nulls. But I think this can be branch out separately (evaluate ignoring the nulls and then evaluate the is null) Generating asm directly can be unwieldy. That's why I had made Brennus [2] to factor out a lot of the logic (different operations per type, different stack frame size per type, all sorts of special cases) see proto. [1] 1: https://github.com/julienledem/pig/compare/trunk...compile_physical_plan 2: https://github.com/julienledem/brennus > Compile physical operators to bytecode > -- > > Key: PIG-3764 > URL: https://issues.apache.org/jira/browse/PIG-3764 > Project: Pig > Issue Type: Improvement > Components: impl >Reporter: Julien Le Dem > Labels: GSOC2014 > > I started a prototype here: > https://github.com/julienledem/pig/compare/trunk...compile_physical_plan > The current physical plan is relatively inefficient at evaluating expressions. > In the context of a better execution engine (Tez, Spark, ...), compiling > expressions to bytecode would be a significant speedup. > This is a candidate project for Google summer of code 2014. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2014 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PIG-4219) When parsing a schema, pig drops tuple inside of Bag if it contains only one field
Julien Le Dem created PIG-4219: -- Summary: When parsing a schema, pig drops tuple inside of Bag if it contains only one field Key: PIG-4219 URL: https://issues.apache.org/jira/browse/PIG-4219 Project: Pig Issue Type: Bug Reporter: Julien Le Dem Example {code:java} //We generate a schema object and call toString() String schemaStr = my_list: {array: (array_element: (num1: int,num2: int))}; // Reparsed using org.apache.pig.impl.util.Utils Schema schema = Utils.getSchemaFromString(schemaStr); // But no longer matches the original structure schema.toString(); // = {my_list: {array_element: (num1: int,num2: int)}} {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-3760) Predicate pushdown for columnar file formats
[ https://issues.apache.org/jira/browse/PIG-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081944#comment-14081944 ] Julien Le Dem commented on PIG-3760: [~rohini] I added to the description of PIG-4092 Predicate pushdown for columnar file formats Key: PIG-3760 URL: https://issues.apache.org/jira/browse/PIG-3760 Project: Pig Issue Type: New Feature Reporter: Andrew Musselman Fix For: 0.14.0 From the conversation on dev@pig: Partition pruning for ORC is not addressed in PIG-3558. We will need to do partition pruning for both ORC and Parquet in a new ticket. Curently there is no interface to deal with this kind of pushdown (LoadMetadata.setPartitionFilter push the filter to loader, but remove the filter statement, for ORC/Parquet, filter is a hint, and we need to do the filter again in Pig even it is pushed to loader), we will need to define a new interface for that. You are welcome to initiate the work. I know Aniket is also interested in doing that, so be sure the talk with him about this work. Thanks, Daniel On Mon, Feb 10, 2014 at 11:42 AM, Andrew Musselman andrew.mussel...@gmail.com wrote: I had a chat with a couple people last week about a feature request for Pig: in a where or filter clause, when loading an ORC file, to skip directly to the right offset instead of scanning the whole file. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PIG-4092) Predicate pushdown for Parquet
[ https://issues.apache.org/jira/browse/PIG-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem updated PIG-4092: --- Description: See: https://github.com/apache/incubator-parquet-mr/pull/4 and: https://github.com/apache/incubator-parquet-mr/blob/master/parquet-column/src/main/java/parquet/filter2/predicate/FilterApi.java Predicate pushdown for Parquet -- Key: PIG-4092 URL: https://issues.apache.org/jira/browse/PIG-4092 Project: Pig Issue Type: Sub-task Reporter: Rohini Palaniswamy Fix For: 0.14.0 See: https://github.com/apache/incubator-parquet-mr/pull/4 and: https://github.com/apache/incubator-parquet-mr/blob/master/parquet-column/src/main/java/parquet/filter2/predicate/FilterApi.java -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PIG-4092) Predicate pushdown for Parquet
[ https://issues.apache.org/jira/browse/PIG-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem updated PIG-4092: --- Description: See: https://github.com/apache/incubator-parquet-mr/pull/4 and: https://github.com/apache/incubator-parquet-mr/blob/master/parquet-column/src/main/java/parquet/filter2/predicate/FilterApi.java [~alexlevenson] is the main author of this API was: See: https://github.com/apache/incubator-parquet-mr/pull/4 and: https://github.com/apache/incubator-parquet-mr/blob/master/parquet-column/src/main/java/parquet/filter2/predicate/FilterApi.java Predicate pushdown for Parquet -- Key: PIG-4092 URL: https://issues.apache.org/jira/browse/PIG-4092 Project: Pig Issue Type: Sub-task Reporter: Rohini Palaniswamy Fix For: 0.14.0 See: https://github.com/apache/incubator-parquet-mr/pull/4 and: https://github.com/apache/incubator-parquet-mr/blob/master/parquet-column/src/main/java/parquet/filter2/predicate/FilterApi.java [~alexlevenson] is the main author of this API -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PIG-3760) Predicate pushdown for columnar file formats
[ https://issues.apache.org/jira/browse/PIG-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081948#comment-14081948 ] Julien Le Dem commented on PIG-3760: FYI in Parquet the filter is not a hint and it will be applied to records after the metadata Predicate pushdown for columnar file formats Key: PIG-3760 URL: https://issues.apache.org/jira/browse/PIG-3760 Project: Pig Issue Type: New Feature Reporter: Andrew Musselman Fix For: 0.14.0 From the conversation on dev@pig: Partition pruning for ORC is not addressed in PIG-3558. We will need to do partition pruning for both ORC and Parquet in a new ticket. Curently there is no interface to deal with this kind of pushdown (LoadMetadata.setPartitionFilter push the filter to loader, but remove the filter statement, for ORC/Parquet, filter is a hint, and we need to do the filter again in Pig even it is pushed to loader), we will need to define a new interface for that. You are welcome to initiate the work. I know Aniket is also interested in doing that, so be sure the talk with him about this work. Thanks, Daniel On Mon, Feb 10, 2014 at 11:42 AM, Andrew Musselman andrew.mussel...@gmail.com wrote: I had a chat with a couple people last week about a feature request for Pig: in a where or filter clause, when loading an ORC file, to skip directly to the right offset instead of scanning the whole file. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PIG-3913) Pig should use job's jobClient wherever possible (fixes local mode counters)
[ https://issues.apache.org/jira/browse/PIG-3913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13985005#comment-13985005 ] Julien Le Dem commented on PIG-3913: This looks good to me. Please add javadoc for deprecated methods with information about the new way of doing the same thing. +1 Pig should use job's jobClient wherever possible (fixes local mode counters) Key: PIG-3913 URL: https://issues.apache.org/jira/browse/PIG-3913 Project: Pig Issue Type: Bug Affects Versions: 0.13.0 Reporter: Aniket Mokashi Assignee: Aniket Mokashi Fix For: 0.13.0 Attachments: PIG-3913-1.patch MapreduceLauncher initializes a statsJobClient to poll counter information of jobs. This works fine in mapreduce mode but it reports incorrect information in local (auto-local) mode. Pig code should try to use org.apache.hadoop.mapred.jobcontrol.Job's getJobClient api to get handle to jobClient wherever possible. statsJobClient (and wherever its references are passed) should be deprecated. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PIG-3815) Hadoop bug causes to pig to fail silently with jar cache
[ https://issues.apache.org/jira/browse/PIG-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13939606#comment-13939606 ] Julien Le Dem commented on PIG-3815: same comment as 1. from Cheolsoo otherwise, this looks good to me. Hadoop bug causes to pig to fail silently with jar cache Key: PIG-3815 URL: https://issues.apache.org/jira/browse/PIG-3815 Project: Pig Issue Type: Bug Affects Versions: 0.13.0 Reporter: Aniket Mokashi Assignee: Aniket Mokashi Fix For: 0.13.0 Attachments: PIG-3815-1.patch, PIG-3815.patch Pig uses DistributedCache.addFileToClassPath api that puts jars on distributed cache configuration. This uses : to separate list of files to be put of classpath via distributed cache. If fs.default.name has port number in it, it causes the tokenization logic to fail in hadoop for retrieving list of cache filenames in backend. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PIG-3815) Hadoop bug causes to pig to fail silently with jar cache
[ https://issues.apache.org/jira/browse/PIG-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13939749#comment-13939749 ] Julien Le Dem commented on PIG-3815: [~rohini] in the code you quoted, don't you think it is putting the port back in the following line? {noformat} URI uri = fs.makeQualified(file).toUri(); {noformat} Hadoop bug causes to pig to fail silently with jar cache Key: PIG-3815 URL: https://issues.apache.org/jira/browse/PIG-3815 Project: Pig Issue Type: Bug Affects Versions: 0.13.0 Reporter: Aniket Mokashi Assignee: Aniket Mokashi Fix For: 0.13.0 Attachments: PIG-3815-1.patch, PIG-3815-2.patch, PIG-3815.patch Pig uses DistributedCache.addFileToClassPath api that puts jars on distributed cache configuration. This uses : to separate list of files to be put of classpath via distributed cache. If fs.default.name has port number in it, it causes the tokenization logic to fail in hadoop for retrieving list of cache filenames in backend. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PIG-3801) Auto local mode does not call storeSchema
[ https://issues.apache.org/jira/browse/PIG-3801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13926350#comment-13926350 ] Julien Le Dem commented on PIG-3801: I would use properties.getProperty(MAPREDUCE_FRAMEWORK_NAME).equals(LOCAL) to decide if it's running locally, but otherwise this looks good to me. Auto local mode does not call storeSchema - Key: PIG-3801 URL: https://issues.apache.org/jira/browse/PIG-3801 Project: Pig Issue Type: Bug Reporter: Aniket Mokashi Assignee: Aniket Mokashi Attachments: PIG-3801.patch https://github.com/apache/pig/blob/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceLauncher.java#L481 Pig code explicitly runs PigOutputCommitter.storeCleanup for local jobs. We also need to add this for auto-local jobs. To repro this problem, run- a = load '2.txt' as (a0:chararray, a1:int); store a into 'a' using PigStorage(',','-schema'); This creates .pig_schema file in pig -x local mode, but does not create .pig_schema file in auto-local mode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PIG-3801) Auto local mode does not call storeSchema
[ https://issues.apache.org/jira/browse/PIG-3801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13926351#comment-13926351 ] Julien Le Dem commented on PIG-3801: +1 Auto local mode does not call storeSchema - Key: PIG-3801 URL: https://issues.apache.org/jira/browse/PIG-3801 Project: Pig Issue Type: Bug Reporter: Aniket Mokashi Assignee: Aniket Mokashi Attachments: PIG-3801.patch https://github.com/apache/pig/blob/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceLauncher.java#L481 Pig code explicitly runs PigOutputCommitter.storeCleanup for local jobs. We also need to add this for auto-local jobs. To repro this problem, run- a = load '2.txt' as (a0:chararray, a1:int); store a into 'a' using PigStorage(',','-schema'); This creates .pig_schema file in pig -x local mode, but does not create .pig_schema file in auto-local mode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PIG-3754) InputSizeReducerEstimator.getTotalInputFileSize reports incorrect size
[ https://issues.apache.org/jira/browse/PIG-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923365#comment-13923365 ] Julien Le Dem commented on PIG-3754: LGTM too InputSizeReducerEstimator.getTotalInputFileSize reports incorrect size -- Key: PIG-3754 URL: https://issues.apache.org/jira/browse/PIG-3754 Project: Pig Issue Type: Bug Affects Versions: 0.13.0 Reporter: Aniket Mokashi Assignee: Aniket Mokashi Priority: Trivial Fix For: 0.13.0 Attachments: PIG-3754-1.patch If you have more than one input, InputSizeReducerEstimator.getTotalInputFileSize can return incorrect value if one of the loader returns \-1 and is not file based (eg- hbase). This causes incorrect reducer estimation and problems in auto.local mode. If size of input is not found in for any of the inputs, we should bail out with return value of -1. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PIG-3558) ORC support for Pig
[ https://issues.apache.org/jira/browse/PIG-3558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13902218#comment-13902218 ] Julien Le Dem commented on PIG-3558: Is hive-exec the fat jar that assembles the runtime dependencies of hive in one jar? Could we depend on the individual hive modules that we need instead? ORC support for Pig --- Key: PIG-3558 URL: https://issues.apache.org/jira/browse/PIG-3558 Project: Pig Issue Type: Improvement Components: impl Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.13.0 Attachments: PIG-3558-1.patch, PIG-3558-2.patch, PIG-3558-3.patch Adding LoadFunc and StoreFunc for ORC. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (PIG-3764) Compile physical operators to bytecode
Julien Le Dem created PIG-3764: -- Summary: Compile physical operators to bytecode Key: PIG-3764 URL: https://issues.apache.org/jira/browse/PIG-3764 Project: Pig Issue Type: Improvement Components: impl Reporter: Julien Le Dem I started a prototype here: https://github.com/julienledem/pig/compare/trunk...compile_physical_plan The current physical plan is relatively inefficient at evaluating expressions. In the context of a better execution engine (Tez, Spark, ...), compiling expressions to bytecode would be a significant speedup. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (PIG-2599) Mavenize Pig
[ https://issues.apache.org/jira/browse/PIG-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem updated PIG-2599: --- Labels: gsoc2014 (was: gsoc2013) Mavenize Pig Key: PIG-2599 URL: https://issues.apache.org/jira/browse/PIG-2599 Project: Pig Issue Type: New Feature Components: build Reporter: Daniel Dai Labels: gsoc2014 Fix For: 0.13.0 Attachments: maven-pig.1.zip Switch Pig build system from ant to maven. This is a candidate project for Google summer of code 2013. More information about the program can be found at https://cwiki.apache.org/confluence/display/PIG/GSoc2013 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (PIG-3741) Utils.setTmpFileCompressionOnConf can cause side effect for SequenceFileInterStorage
[ https://issues.apache.org/jira/browse/PIG-3741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13891030#comment-13891030 ] Julien Le Dem commented on PIG-3741: Ideally each store would get its own config object, but that would be a major refactoring. In the meantime, this looks like a good improvement to me. +1 Utils.setTmpFileCompressionOnConf can cause side effect for SequenceFileInterStorage Key: PIG-3741 URL: https://issues.apache.org/jira/browse/PIG-3741 Project: Pig Issue Type: Bug Reporter: Aniket Mokashi Assignee: Aniket Mokashi Fix For: 0.12.1 Attachments: PIG-3741.patch Currently, Utils.setTmpFileCompressionOnConf(pigContext, conf); is invoked for every job. In case of Seqfile, this api sets mapreduce params on conf to assist SequenceFileInterStorage. However, as a side effect, this might change the behavior of other storers due to these mapred properties. This api should only be called for jobs with intermediate storage. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (PIG-3347) Store invocation brings side effect
[ https://issues.apache.org/jira/browse/PIG-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13887203#comment-13887203 ] Julien Le Dem commented on PIG-3347: I thought that the field UIDs were used to track lineage across the plan. [~aniket486] correct me if I'm wrong but it is used to determine which fields are reads for projection push down. In the case of self join (directly or indirectly) we end up with duplicate ids in the same relation because the same field is derived to 2 different fields. Otherwise I'm as lost as [~knoguchi] regarding the actual mechanisms around the UID. I tried to fix some of these in the past (PIG-3020) but it appears they created more problems (PIG-3492) [~daijy] maybe you can enlighten us? Store invocation brings side effect --- Key: PIG-3347 URL: https://issues.apache.org/jira/browse/PIG-3347 Project: Pig Issue Type: Bug Components: grunt Affects Versions: 0.11 Environment: local mode Reporter: Sergey Assignee: Daniel Dai Priority: Critical Fix For: 0.12.1 Attachments: PIG-3347-1.patch The problem is that intermediate 'store' invocation changes the final store output. Looks like it brings some kind of side effect. We did use 'local' mode to run script here is the input data: 1 1 Here is the script: {code} a = load 'test'; a_group = group a by $0; b = foreach a_group { a_distinct = distinct a.$0; generate group, a_distinct; } --store b into 'b'; c = filter b by SIZE(a_distinct) == 1; store c into 'out'; {code} We expect output to be: 1 1 The output is empty file. Uncomment {code}--store b into 'b';{code} line and see the diffrence. Yuo would get expected output. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (PIG-3082) outputSchema of a UDF allows two usages when describing a Tuple schema
[ https://issues.apache.org/jira/browse/PIG-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785427#comment-13785427 ] Julien Le Dem commented on PIG-3082: This is intended. The second behavior described above is really problematic. If a UDF breaks because it returns a schema of more than one field it should be changed to return one field of type tuple. Once fixed it works in all versions of Pig. This is only removing an unsafe use of outputSchema in favor of the existing correct use. outputSchema of a UDF allows two usages when describing a Tuple schema -- Key: PIG-3082 URL: https://issues.apache.org/jira/browse/PIG-3082 Project: Pig Issue Type: Bug Reporter: Julien Le Dem Assignee: Jonathan Coveney Fix For: 0.12.0 Attachments: PIG-3082-0.patch, PIG-3082-1.patch When defining an evalfunc that returns a Tuple there are two ways you can implement outputSchema(). - The right way: return a schema that contains one Field that contains the type and schema of the return type of the UDF - The unreliable way: return a schema that contains more than one field and it will be understood as a tuple schema even though there is no type (which is in Field class) to specify that. This is particularly deceitful when the output schema is derived from the input schema and the outputted Tuple sometimes contain only one field. In such cases Pig understands the output schema as a tuple only if there is more than one field. And sometimes it works, sometimes it does not. We should at least issue a warning (backward compatibility) if not plain throw an exception when the output schema contains more than one Field. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (PIG-3445) Make Parquet format available out of the box in Pig
[ https://issues.apache.org/jira/browse/PIG-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785564#comment-13785564 ] Julien Le Dem commented on PIG-3445: I add a parquet-pig-bundle and the shading of fastutil: https://github.com/Parquet/parquet-mr/pull/186 We can make a new release to simplify Make Parquet format available out of the box in Pig --- Key: PIG-3445 URL: https://issues.apache.org/jira/browse/PIG-3445 Project: Pig Issue Type: Improvement Reporter: Julien Le Dem Fix For: 0.12.0 Attachments: PIG-3445-2.patch, PIG-3445-3.patch, PIG-3445-4.patch, PIG-3445.patch We would add the Parquet jar in the Pig packages to make it available out of the box to pig users. On top of that we could add the parquet.pig package to the list of packages to search for UDFs. (alternatively, the parquet jar could contain classes name or.apache.pig.builtin.ParquetLoader and ParquetStorer) This way users can use Parquet simply by typing: A = LOAD 'foo' USING ParquetLoader(); STORE A INTO 'bar' USING ParquetStorer(); -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (PIG-3445) Make Parquet format available out of the box in Pig
[ https://issues.apache.org/jira/browse/PIG-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785565#comment-13785565 ] Julien Le Dem commented on PIG-3445: parquet-format.version should be 1.0.0 Make Parquet format available out of the box in Pig --- Key: PIG-3445 URL: https://issues.apache.org/jira/browse/PIG-3445 Project: Pig Issue Type: Improvement Reporter: Julien Le Dem Fix For: 0.12.0 Attachments: PIG-3445-2.patch, PIG-3445-3.patch, PIG-3445-4.patch, PIG-3445.patch We would add the Parquet jar in the Pig packages to make it available out of the box to pig users. On top of that we could add the parquet.pig package to the list of packages to search for UDFs. (alternatively, the parquet jar could contain classes name or.apache.pig.builtin.ParquetLoader and ParquetStorer) This way users can use Parquet simply by typing: A = LOAD 'foo' USING ParquetLoader(); STORE A INTO 'bar' USING ParquetStorer(); -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (PIG-3445) Make Parquet format available out of the box in Pig
[ https://issues.apache.org/jira/browse/PIG-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785610#comment-13785610 ] Julien Le Dem commented on PIG-3445: We merged the PR for parquet-pig-bundle I'm making a release so that this can be merge in pig 0.12 Make Parquet format available out of the box in Pig --- Key: PIG-3445 URL: https://issues.apache.org/jira/browse/PIG-3445 Project: Pig Issue Type: Improvement Reporter: Julien Le Dem Fix For: 0.12.0 Attachments: PIG-3445-2.patch, PIG-3445-3.patch, PIG-3445-4.patch, PIG-3445.patch We would add the Parquet jar in the Pig packages to make it available out of the box to pig users. On top of that we could add the parquet.pig package to the list of packages to search for UDFs. (alternatively, the parquet jar could contain classes name or.apache.pig.builtin.ParquetLoader and ParquetStorer) This way users can use Parquet simply by typing: A = LOAD 'foo' USING ParquetLoader(); STORE A INTO 'bar' USING ParquetStorer(); -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (PIG-3445) Make Parquet format available out of the box in Pig
[ https://issues.apache.org/jira/browse/PIG-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785742#comment-13785742 ] Julien Le Dem commented on PIG-3445: I just released parquet-pig-bundle-1.2.3 this should show up in maven central overnight Make Parquet format available out of the box in Pig --- Key: PIG-3445 URL: https://issues.apache.org/jira/browse/PIG-3445 Project: Pig Issue Type: Improvement Reporter: Julien Le Dem Fix For: 0.12.0 Attachments: PIG-3445-2.patch, PIG-3445-3.patch, PIG-3445-4.patch, PIG-3445.patch We would add the Parquet jar in the Pig packages to make it available out of the box to pig users. On top of that we could add the parquet.pig package to the list of packages to search for UDFs. (alternatively, the parquet jar could contain classes name or.apache.pig.builtin.ParquetLoader and ParquetStorer) This way users can use Parquet simply by typing: A = LOAD 'foo' USING ParquetLoader(); STORE A INTO 'bar' USING ParquetStorer(); -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (PIG-3446) Umbrella jira for Pig on Tez
[ https://issues.apache.org/jira/browse/PIG-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773569#comment-13773569 ] Julien Le Dem commented on PIG-3446: Here is the work that Achal did for Pig-on-Tez https://github.com/achalsoni81/pigeon Umbrella jira for Pig on Tez Key: PIG-3446 URL: https://issues.apache.org/jira/browse/PIG-3446 Project: Pig Issue Type: New Feature Components: tez Affects Versions: tez-branch Reporter: Cheolsoo Park Assignee: Cheolsoo Park Fix For: tez-branch This is a umbrella jira for Pig on Tez. More detailed subtasks will be added. More information can be found on the following wiki page: https://cwiki.apache.org/confluence/display/PIG/Pig+on+Tez -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3367) Add assert keyword (operator) in pig
[ https://issues.apache.org/jira/browse/PIG-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769985#comment-13769985 ] Julien Le Dem commented on PIG-3367: Looks good to me. Is there a way you can factor out some of the content of buildAssertOp() ? It looks like some of this would be common with other methods. Add assert keyword (operator) in pig Key: PIG-3367 URL: https://issues.apache.org/jira/browse/PIG-3367 Project: Pig Issue Type: New Feature Components: parser Reporter: Aniket Mokashi Assignee: Aniket Mokashi Fix For: 0.12 Attachments: PIG-3367.patch Assert operator can be used for data validation. With assert you can write script as following- {code} a = load 'something' as (a0:int, a1:int); assert a by a0 0, 'a cant be negative for reasons'; {code} This script will fail if assert is violated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3419) Pluggable Execution Engine
[ https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754975#comment-13754975 ] Julien Le Dem commented on PIG-3419: +1 [~cheolsoo] LGTM! Pluggable Execution Engine --- Key: PIG-3419 URL: https://issues.apache.org/jira/browse/PIG-3419 Project: Pig Issue Type: New Feature Affects Versions: 0.12 Reporter: Achal Soni Assignee: Achal Soni Priority: Minor Attachments: execengine.patch, mapreduce_execengine.patch, stats_scriptstate.patch, test_failures.txt, test_suite.patch, updated-8-22-2013-exec-engine.patch, updated-8-23-2013-exec-engine.patch, updated-8-27-2013-exec-engine.patch, updated-8-28-2013-exec-engine.patch, updated-8-29-2013-exec-engine.patch In an effort to adapt Pig to work using Apache Tez (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for a cleaner ExecutionEngine abstraction than existed before. The changes are not that major as Pig was already relatively abstracted out between the frontend and backend. The changes in the attached commit are essentially the barebones changes -- I tried to not change the structure of Pig's different components too much. I think it will be interesting to see in the future how we can refactor more areas of Pig to really honor this abstraction between the frontend and backend. Some of the changes was to reinstate an ExecutionEngine interface to tie together the front end and backend, and making the changes in Pig to delegate to the EE when necessary, and creating an MRExecutionEngine that implements this interface. Other work included changing ExecType to cycle through the ExecutionEngines on the classpath and select the appropriate one (this is done using Java ServiceLoader, exactly how MapReduce does for choosing the framework to use between local and distributed mode). Also I tried to make ScriptState, JobStats, and PigStats as abstract as possible in its current state. I think in the future some work will need to be done here to perhaps re-evaluate the usage of ScriptState and the responsibilities of the different statistics classes. I haven't touched the PPNL, but I think more abstraction is needed here, perhaps in a separate patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3419) Pluggable Execution Engine
[ https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753912#comment-13753912 ] Julien Le Dem commented on PIG-3419: [~cheolsoo]: thanks a lot for looking into this. Here are my thoughts: 1. let's change it back 2. 4. 5. 6. 7. are either internal to Pig or necessary to add the execution engine abstraction. 3. JobStats still exists but the MR specific part is split into MRJobStats which extends JobStats Same thing for PigStatsUtil and ScriptState. Those classes are not disappearing but the MR specific part is abstracted out. HExecutionEngine could be renamed back to what it was but this is again what is becoming the new abstraction. Unfortunately tools like Ambrose and Lipstick depend on the MR specific parts of Pig and look at the internals. This patch is a necessary change so that those tools can work independently of the execution engine in the future. The changes to Ambrose and Lipstick should be minimal though with this patch. But yes they would suffer from some incompatibility, but again there is no way around it when a tool looks inside the execution engine internals. I think we should revert 1. and commit the patch. Pluggable Execution Engine --- Key: PIG-3419 URL: https://issues.apache.org/jira/browse/PIG-3419 Project: Pig Issue Type: New Feature Affects Versions: 0.12 Reporter: Achal Soni Assignee: Achal Soni Priority: Minor Attachments: execengine.patch, mapreduce_execengine.patch, stats_scriptstate.patch, test_failures.txt, test_suite.patch, updated-8-22-2013-exec-engine.patch, updated-8-23-2013-exec-engine.patch, updated-8-27-2013-exec-engine.patch, updated-8-28-2013-exec-engine.patch In an effort to adapt Pig to work using Apache Tez (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for a cleaner ExecutionEngine abstraction than existed before. The changes are not that major as Pig was already relatively abstracted out between the frontend and backend. The changes in the attached commit are essentially the barebones changes -- I tried to not change the structure of Pig's different components too much. I think it will be interesting to see in the future how we can refactor more areas of Pig to really honor this abstraction between the frontend and backend. Some of the changes was to reinstate an ExecutionEngine interface to tie together the front end and backend, and making the changes in Pig to delegate to the EE when necessary, and creating an MRExecutionEngine that implements this interface. Other work included changing ExecType to cycle through the ExecutionEngines on the classpath and select the appropriate one (this is done using Java ServiceLoader, exactly how MapReduce does for choosing the framework to use between local and distributed mode). Also I tried to make ScriptState, JobStats, and PigStats as abstract as possible in its current state. I think in the future some work will need to be done here to perhaps re-evaluate the usage of ScriptState and the responsibilities of the different statistics classes. I haven't touched the PPNL, but I think more abstraction is needed here, perhaps in a separate patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-3445) Make Parquet format available out of the box in Pig
Julien Le Dem created PIG-3445: -- Summary: Make Parquet format available out of the box in Pig Key: PIG-3445 URL: https://issues.apache.org/jira/browse/PIG-3445 Project: Pig Issue Type: Improvement Reporter: Julien Le Dem We would add the Parquet jar in the Pig packages to make it available out of the box to pig users. On top of that we could add the parquet.pig package to the list of packages to search for UDFs. (alternatively, the parquet jar could contain classes name or.apache.pig.builtin.ParquetLoader and ParquetStorer) This way users can use Parquet simply by typing: A = LOAD 'foo' USING ParquetLoader(); STORE A INTO 'bar' USING ParquetStorer(); -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3419) Pluggable Execution Engine
[ https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13749226#comment-13749226 ] Julien Le Dem commented on PIG-3419: The advantage of having the Execution engine abstraction in trunk is it allows running experimental Pig execution engines implementations like Tez or Spark on an official release of Pig without having to build from a specific branch. The execution engine implementations themselves are fairly independent of Pig and do not need to be maintained in a Pig branch. If the ExecutionEngine abstraction evolves over time that can be done in Trunk and can be merged independently of the Tez implementation itself. Pluggable Execution Engine --- Key: PIG-3419 URL: https://issues.apache.org/jira/browse/PIG-3419 Project: Pig Issue Type: New Feature Affects Versions: 0.12 Reporter: Achal Soni Assignee: Achal Soni Priority: Minor Attachments: execengine.patch, mapreduce_execengine.patch, stats_scriptstate.patch, test_failures.txt, test_suite.patch, updated-8-22-2013-exec-engine.patch In an effort to adapt Pig to work using Apache Tez (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for a cleaner ExecutionEngine abstraction than existed before. The changes are not that major as Pig was already relatively abstracted out between the frontend and backend. The changes in the attached commit are essentially the barebones changes -- I tried to not change the structure of Pig's different components too much. I think it will be interesting to see in the future how we can refactor more areas of Pig to really honor this abstraction between the frontend and backend. Some of the changes was to reinstate an ExecutionEngine interface to tie together the front end and backend, and making the changes in Pig to delegate to the EE when necessary, and creating an MRExecutionEngine that implements this interface. Other work included changing ExecType to cycle through the ExecutionEngines on the classpath and select the appropriate one (this is done using Java ServiceLoader, exactly how MapReduce does for choosing the framework to use between local and distributed mode). Also I tried to make ScriptState, JobStats, and PigStats as abstract as possible in its current state. I think in the future some work will need to be done here to perhaps re-evaluate the usage of ScriptState and the responsibilities of the different statistics classes. I haven't touched the PPNL, but I think more abstraction is needed here, perhaps in a separate patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3419) Pluggable Execution Engine
[ https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13748183#comment-13748183 ] Julien Le Dem commented on PIG-3419: +1 LGTM If test-commit passes I think we can commit to TRUNK Pluggable Execution Engine --- Key: PIG-3419 URL: https://issues.apache.org/jira/browse/PIG-3419 Project: Pig Issue Type: New Feature Affects Versions: 0.12 Reporter: Achal Soni Assignee: Achal Soni Priority: Minor Attachments: execengine.patch, mapreduce_execengine.patch, stats_scriptstate.patch, test_failures.txt, test_suite.patch, updated-8-22-2013-exec-engine.patch In an effort to adapt Pig to work using Apache Tez (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for a cleaner ExecutionEngine abstraction than existed before. The changes are not that major as Pig was already relatively abstracted out between the frontend and backend. The changes in the attached commit are essentially the barebones changes -- I tried to not change the structure of Pig's different components too much. I think it will be interesting to see in the future how we can refactor more areas of Pig to really honor this abstraction between the frontend and backend. Some of the changes was to reinstate an ExecutionEngine interface to tie together the front end and backend, and making the changes in Pig to delegate to the EE when necessary, and creating an MRExecutionEngine that implements this interface. Other work included changing ExecType to cycle through the ExecutionEngines on the classpath and select the appropriate one (this is done using Java ServiceLoader, exactly how MapReduce does for choosing the framework to use between local and distributed mode). Also I tried to make ScriptState, JobStats, and PigStats as abstract as possible in its current state. I think in the future some work will need to be done here to perhaps re-evaluate the usage of ScriptState and the responsibilities of the different statistics classes. I haven't touched the PPNL, but I think more abstraction is needed here, perhaps in a separate patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3419) Pluggable Execution Engine
[ https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13746999#comment-13746999 ] Julien Le Dem commented on PIG-3419: I have submitted my review. This looks great [~achalsoni81]! [~cheolsoo] does it look good to you? Once Achal has updated his patch I'm willing to commit. Pluggable Execution Engine --- Key: PIG-3419 URL: https://issues.apache.org/jira/browse/PIG-3419 Project: Pig Issue Type: New Feature Affects Versions: 0.12 Reporter: Achal Soni Assignee: Achal Soni Priority: Minor Attachments: execengine.patch, finalpatch.patch, mapreduce_execengine.patch, stats_scriptstate.patch, test_suite.patch In an effort to adapt Pig to work using Apache Tez (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for a cleaner ExecutionEngine abstraction than existed before. The changes are not that major as Pig was already relatively abstracted out between the frontend and backend. The changes in the attached commit are essentially the barebones changes -- I tried to not change the structure of Pig's different components too much. I think it will be interesting to see in the future how we can refactor more areas of Pig to really honor this abstraction between the frontend and backend. Some of the changes was to reinstate an ExecutionEngine interface to tie together the front end and backend, and making the changes in Pig to delegate to the EE when necessary, and creating an MRExecutionEngine that implements this interface. Other work included changing ExecType to cycle through the ExecutionEngines on the classpath and select the appropriate one (this is done using Java ServiceLoader, exactly how MapReduce does for choosing the framework to use between local and distributed mode). Also I tried to make ScriptState, JobStats, and PigStats as abstract as possible in its current state. I think in the future some work will need to be done here to perhaps re-evaluate the usage of ScriptState and the responsibilities of the different statistics classes. I haven't touched the PPNL, but I think more abstraction is needed here, perhaps in a separate patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3419) Pluggable Execution Engine
[ https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13747064#comment-13747064 ] Julien Le Dem commented on PIG-3419: The point is to be able to implement alternate execution engines without having to fork Pig. I think it should go in trunk. Pluggable Execution Engine --- Key: PIG-3419 URL: https://issues.apache.org/jira/browse/PIG-3419 Project: Pig Issue Type: New Feature Affects Versions: 0.12 Reporter: Achal Soni Assignee: Achal Soni Priority: Minor Attachments: execengine.patch, finalpatch.patch, mapreduce_execengine.patch, stats_scriptstate.patch, test_suite.patch In an effort to adapt Pig to work using Apache Tez (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for a cleaner ExecutionEngine abstraction than existed before. The changes are not that major as Pig was already relatively abstracted out between the frontend and backend. The changes in the attached commit are essentially the barebones changes -- I tried to not change the structure of Pig's different components too much. I think it will be interesting to see in the future how we can refactor more areas of Pig to really honor this abstraction between the frontend and backend. Some of the changes was to reinstate an ExecutionEngine interface to tie together the front end and backend, and making the changes in Pig to delegate to the EE when necessary, and creating an MRExecutionEngine that implements this interface. Other work included changing ExecType to cycle through the ExecutionEngines on the classpath and select the appropriate one (this is done using Java ServiceLoader, exactly how MapReduce does for choosing the framework to use between local and distributed mode). Also I tried to make ScriptState, JobStats, and PigStats as abstract as possible in its current state. I think in the future some work will need to be done here to perhaps re-evaluate the usage of ScriptState and the responsibilities of the different statistics classes. I haven't touched the PPNL, but I think more abstraction is needed here, perhaps in a separate patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3419) Pluggable Execution Engine
[ https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13745398#comment-13745398 ] Julien Le Dem commented on PIG-3419: [~cheolsoo] 1. Do we really throw Exception ? If yes, then let's just throw that. If not then let's instead have FrontEndException, ExecException, IOException. i.e. let's remove the exceptions that are already included by the highest exception level. 2. agreed with you. I would expect the execution engine to handle the Properties internally and the signature of this method to be: {noformat} public void setProperty(String property, String value); {noformat} Pluggable Execution Engine --- Key: PIG-3419 URL: https://issues.apache.org/jira/browse/PIG-3419 Project: Pig Issue Type: New Feature Affects Versions: 0.12 Reporter: Achal Soni Assignee: Achal Soni Priority: Minor Attachments: execengine.patch, mapreduce_execengine.patch, stats_scriptstate.patch, test_suite.patch In an effort to adapt Pig to work using Apache Tez (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for a cleaner ExecutionEngine abstraction than existed before. The changes are not that major as Pig was already relatively abstracted out between the frontend and backend. The changes in the attached commit are essentially the barebones changes -- I tried to not change the structure of Pig's different components too much. I think it will be interesting to see in the future how we can refactor more areas of Pig to really honor this abstraction between the frontend and backend. Some of the changes was to reinstate an ExecutionEngine interface to tie together the front end and backend, and making the changes in Pig to delegate to the EE when necessary, and creating an MRExecutionEngine that implements this interface. Other work included changing ExecType to cycle through the ExecutionEngines on the classpath and select the appropriate one (this is done using Java ServiceLoader, exactly how MapReduce does for choosing the framework to use between local and distributed mode). Also I tried to make ScriptState, JobStats, and PigStats as abstract as possible in its current state. I think in the future some work will need to be done here to perhaps re-evaluate the usage of ScriptState and the responsibilities of the different statistics classes. I haven't touched the PPNL, but I think more abstraction is needed here, perhaps in a separate patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3419) Pluggable Execution Engine
[ https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738478#comment-13738478 ] Julien Le Dem commented on PIG-3419: Hi Achal for large patches, please create a review here: https://reviews.apache.org Pluggable Execution Engine --- Key: PIG-3419 URL: https://issues.apache.org/jira/browse/PIG-3419 Project: Pig Issue Type: New Feature Affects Versions: 0.12 Reporter: Achal Soni Priority: Minor Attachments: pluggable_execengine.patch In an effort to adapt Pig to work using Apache Tez (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for a cleaner ExecutionEngine abstraction than existed before. The changes are not that major as Pig was already relatively abstracted out between the frontend and backend. The changes in the attached commit are essentially the barebones changes -- I tried to not change the structure of Pig's different components too much. I think it will be interesting to see in the future how we can refactor more areas of Pig to really honor this abstraction between the frontend and backend. Some of the changes was to reinstate an ExecutionEngine interface to tie together the front end and backend, and making the changes in Pig to delegate to the EE when necessary, and creating an MRExecutionEngine that implements this interface. Other work included changing ExecType to cycle through the ExecutionEngines on the classpath and select the appropriate one (this is done using Java ServiceLoader, exactly how MapReduce does for choosing the framework to use between local and distributed mode). Also I tried to make ScriptState, JobStats, and PigStats as abstract as possible in its current state. I think in the future some work will need to be done here to perhaps re-evaluate the usage of ScriptState and the responsibilities of the different statistics classes. I haven't touched the PPNL, but I think more abstraction is needed here, perhaps in a separate patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3367) Add assert keyword (operator) in pig
[ https://issues.apache.org/jira/browse/PIG-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694832#comment-13694832 ] Julien Le Dem commented on PIG-3367: I was thinking we could make the syntax part of FOREACH. {noformat} B = FOREACH A GENERATE a, b, c ASSERT a = 0, b IS NOT NULL; {noformat} That way it is easy to integrate asserts in the flow. The advantage of having it part of the language: - the error message can be clear without extra user input. - it's more natural than doing a filter that does not filter. Also if the filter is not in the predecessors of a STORE, it won't be executed. A UDF can stop the job by throwing an exception. Although the task will retry before failing completely. For reference, the UDF based syntax: {noformat} FILTER members BY ASSERT( (member_id = 0 ? 1 : 0), 'Doh! Some member ID is negative.' ); {noformat} Yes adding new keywords is inconvenient when the keyword was used for relation or column names. When a field collides with a keyword it is sometimes difficult to rename it. I think we should: - try to avoid new keywords if possible - provide a mechanism to escape field names to facilitate fixing conflicts when they happen (using quotes or a similar mechanism) Add assert keyword (operator) in pig Key: PIG-3367 URL: https://issues.apache.org/jira/browse/PIG-3367 Project: Pig Issue Type: New Feature Components: parser Reporter: Aniket Mokashi Assignee: Aniket Mokashi Assert operator can be used for data validation. With assert you can write script as following- {code} a = load 'something' as (a0:int, a1:int); assert a by a0 0, 'a cant be negative for reasons'; {code} This script will fail if assert is violated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2828) DataType.compare null
[ https://issues.apache.org/jira/browse/PIG-2828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672896#comment-13672896 ] Julien Le Dem commented on PIG-2828: Sounds good to me. DataType.compare null - Key: PIG-2828 URL: https://issues.apache.org/jira/browse/PIG-2828 Project: Pig Issue Type: Bug Reporter: Haitao Yao Attachments: DataType.patch, PIG-2828.patch, test.patch While using TOP, and if the DataBag contains null value to compare, it will generate the following exception: Caused by: java.lang.NullPointerException at org.apache.pig.data.DataType.compare(DataType.java:427) at org.apache.pig.builtin.TOP$TupleComparator.compare(TOP.java:97) at org.apache.pig.builtin.TOP$TupleComparator.compare(TOP.java:1) at java.util.PriorityQueue.siftUpUsingComparator(PriorityQueue.java:649) at java.util.PriorityQueue.siftUp(PriorityQueue.java:627) at java.util.PriorityQueue.offer(PriorityQueue.java:329) at java.util.PriorityQueue.add(PriorityQueue.java:306) at org.apache.pig.builtin.TOP.updateTop(TOP.java:141) at org.apache.pig.builtin.TOP.exec(TOP.java:116) code: (TOP.java, starts with line 91) Object field1 = o1.get(fieldNum); Object field2 = o2.get(fieldNum); if (!typeFound) { datatype = DataType.findType(field1); typeFound = true; } return DataType.compare(field1, field2, datatype, datatype); The reason is that if the typeFound is true , and the dataType is not null, and field1 is null, the script failed. So we need to judge the field1 whether is null. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3307) Refactor physical operators to remove methods parameters that are always null
[ https://issues.apache.org/jira/browse/PIG-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem updated PIG-3307: --- Attachment: PIG-3307_3.patch PIG-3307_3.patch addresses [~cheolsoo]'s comments Refactor physical operators to remove methods parameters that are always null - Key: PIG-3307 URL: https://issues.apache.org/jira/browse/PIG-3307 Project: Pig Issue Type: Improvement Reporter: Julien Le Dem Assignee: Julien Le Dem Attachments: PIG-3307_0.patch, PIG-3307_1.patch, PIG-3307_2.patch, PIG-3307_3.patch The physical operators are sometimes overly complex. I'm trying to cleanup some unnecessary code. in particular there is an array of getNext(*T* v) where the value v does not seem to have any importance and is just used to pick the correct method. I have started a refactoring for a more readable getNext*T*(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3307) Refactor physical operators to remove methods parameters that are always null
[ https://issues.apache.org/jira/browse/PIG-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13661202#comment-13661202 ] Julien Le Dem commented on PIG-3307: committed to TRUNK Committed revision 1484037 Refactor physical operators to remove methods parameters that are always null - Key: PIG-3307 URL: https://issues.apache.org/jira/browse/PIG-3307 Project: Pig Issue Type: Improvement Reporter: Julien Le Dem Assignee: Julien Le Dem Attachments: PIG-3307_0.patch, PIG-3307_1.patch, PIG-3307_2.patch, PIG-3307_3.patch The physical operators are sometimes overly complex. I'm trying to cleanup some unnecessary code. in particular there is an array of getNext(*T* v) where the value v does not seem to have any importance and is just used to pick the correct method. I have started a refactoring for a more readable getNext*T*(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3307) Refactor physical operators to remove methods parameters that are always null
[ https://issues.apache.org/jira/browse/PIG-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13660033#comment-13660033 ] Julien Le Dem commented on PIG-3307: https://reviews.apache.org/r/11203/diff/#index_header thanks [~cheolsoo] and [~daijy]! Refactor physical operators to remove methods parameters that are always null - Key: PIG-3307 URL: https://issues.apache.org/jira/browse/PIG-3307 Project: Pig Issue Type: Improvement Reporter: Julien Le Dem Assignee: Julien Le Dem Attachments: PIG-3307_0.patch, PIG-3307_1.patch, PIG-3307_2.patch The physical operators are sometimes overly complex. I'm trying to cleanup some unnecessary code. in particular there is an array of getNext(*T* v) where the value v does not seem to have any importance and is just used to pick the correct method. I have started a refactoring for a more readable getNext*T*(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3307) Refactor physical operators to remove methods parameters that are always null
[ https://issues.apache.org/jira/browse/PIG-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13654887#comment-13654887 ] Julien Le Dem commented on PIG-3307: [~daijy] Also most likely it wont make any difference performance wise. Refactor physical operators to remove methods parameters that are always null - Key: PIG-3307 URL: https://issues.apache.org/jira/browse/PIG-3307 Project: Pig Issue Type: Improvement Reporter: Julien Le Dem Assignee: Julien Le Dem Attachments: PIG-3307_0.patch, PIG-3307_1.patch, PIG-3307_2.patch The physical operators are sometimes overly complex. I'm trying to cleanup some unnecessary code. in particular there is an array of getNext(*T* v) where the value v does not seem to have any importance and is just used to pick the correct method. I have started a refactoring for a more readable getNext*T*(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3307) Refactor physical operators to remove methods parameters that are always null
[ https://issues.apache.org/jira/browse/PIG-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13651101#comment-13651101 ] Julien Le Dem commented on PIG-3307: [~daijy]what's the recommended approach? If you have a setup to do perf test that would be helpful. Refactor physical operators to remove methods parameters that are always null - Key: PIG-3307 URL: https://issues.apache.org/jira/browse/PIG-3307 Project: Pig Issue Type: Improvement Reporter: Julien Le Dem Assignee: Julien Le Dem Attachments: PIG-3307_0.patch, PIG-3307_1.patch, PIG-3307_2.patch The physical operators are sometimes overly complex. I'm trying to cleanup some unnecessary code. in particular there is an array of getNext(*T* v) where the value v does not seem to have any importance and is just used to pick the correct method. I have started a refactoring for a more readable getNext*T*(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3311) add pig-withouthadoop-h2 to mvn-jar
[ https://issues.apache.org/jira/browse/PIG-3311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649907#comment-13649907 ] Julien Le Dem commented on PIG-3311: Committed to TRUNK add pig-withouthadoop-h2 to mvn-jar --- Key: PIG-3311 URL: https://issues.apache.org/jira/browse/PIG-3311 Project: Pig Issue Type: Improvement Components: build Reporter: Julien Le Dem Assignee: Julien Le Dem Attachments: PIG-3311.patch mvn-jar currently creates pig-version.jar and pig-version-h2.jar I'm adding pig-version-withouthadoop.jar and pig-version-withouthadoop-h2.jar that are needed to run pig from the command line. This will allow a dual-version package. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-3311) add pig-withouthadoop-h2 to mvn-jar
[ https://issues.apache.org/jira/browse/PIG-3311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem resolved PIG-3311. Resolution: Fixed Fix Version/s: 0.12 add pig-withouthadoop-h2 to mvn-jar --- Key: PIG-3311 URL: https://issues.apache.org/jira/browse/PIG-3311 Project: Pig Issue Type: Improvement Components: build Reporter: Julien Le Dem Assignee: Julien Le Dem Fix For: 0.12 Attachments: PIG-3311.patch mvn-jar currently creates pig-version.jar and pig-version-h2.jar I'm adding pig-version-withouthadoop.jar and pig-version-withouthadoop-h2.jar that are needed to run pig from the command line. This will allow a dual-version package. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3307) Refactor physical operators to remove methods parameters that are always null
[ https://issues.apache.org/jira/browse/PIG-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13650021#comment-13650021 ] Julien Le Dem commented on PIG-3307: [~daijy] This is removing parameters that were not used. I have not tested performance but I think it could only improve performance. (see latest patch PIG-3307_2.patch) Refactor physical operators to remove methods parameters that are always null - Key: PIG-3307 URL: https://issues.apache.org/jira/browse/PIG-3307 Project: Pig Issue Type: Improvement Reporter: Julien Le Dem Assignee: Julien Le Dem Attachments: PIG-3307_0.patch, PIG-3307_1.patch, PIG-3307_2.patch The physical operators are sometimes overly complex. I'm trying to cleanup some unnecessary code. in particular there is an array of getNext(*T* v) where the value v does not seem to have any importance and is just used to pick the correct method. I have started a refactoring for a more readable getNext*T*(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3307) Refactor physical operators to remove methods parameters that are always null
[ https://issues.apache.org/jira/browse/PIG-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem updated PIG-3307: --- Attachment: PIG-3307_2.patch PIG-3307_2.patch removes the unused parameter in getNext(\*) Refactor physical operators to remove methods parameters that are always null - Key: PIG-3307 URL: https://issues.apache.org/jira/browse/PIG-3307 Project: Pig Issue Type: Improvement Reporter: Julien Le Dem Assignee: Julien Le Dem Attachments: PIG-3307_0.patch, PIG-3307_1.patch, PIG-3307_2.patch The physical operators are sometimes overly complex. I'm trying to cleanup some unnecessary code. in particular there is an array of getNext(*T* v) where the value v does not seem to have any importance and is just used to pick the correct method. I have started a refactoring for a more readable getNext*T*(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-3311) add pig-withouthadoop-h2 to mvn-jar
Julien Le Dem created PIG-3311: -- Summary: add pig-withouthadoop-h2 to mvn-jar Key: PIG-3311 URL: https://issues.apache.org/jira/browse/PIG-3311 Project: Pig Issue Type: Improvement Components: build Reporter: Julien Le Dem Assignee: Julien Le Dem Attachments: PIG-3311.patch mvn-jar currently creates pig-version.jar and pig-version-h2.jar I'm adding pig-version-withouthadoop.jar and pig-version-withouthadoop-h2.jar that are needed to run pig from the command line. This will allow a dual-version package. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3311) add pig-withouthadoop-h2 to mvn-jar
[ https://issues.apache.org/jira/browse/PIG-3311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem updated PIG-3311: --- Attachment: PIG-3311.patch PIG-3311.patch adds -withouthadoop to the mvn-jar target add pig-withouthadoop-h2 to mvn-jar --- Key: PIG-3311 URL: https://issues.apache.org/jira/browse/PIG-3311 Project: Pig Issue Type: Improvement Components: build Reporter: Julien Le Dem Assignee: Julien Le Dem Attachments: PIG-3311.patch mvn-jar currently creates pig-version.jar and pig-version-h2.jar I'm adding pig-version-withouthadoop.jar and pig-version-withouthadoop-h2.jar that are needed to run pig from the command line. This will allow a dual-version package. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3311) add pig-withouthadoop-h2 to mvn-jar
[ https://issues.apache.org/jira/browse/PIG-3311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem updated PIG-3311: --- Patch Info: Patch Available add pig-withouthadoop-h2 to mvn-jar --- Key: PIG-3311 URL: https://issues.apache.org/jira/browse/PIG-3311 Project: Pig Issue Type: Improvement Components: build Reporter: Julien Le Dem Assignee: Julien Le Dem Attachments: PIG-3311.patch mvn-jar currently creates pig-version.jar and pig-version-h2.jar I'm adding pig-version-withouthadoop.jar and pig-version-withouthadoop-h2.jar that are needed to run pig from the command line. This will allow a dual-version package. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-3303) add hadoop h2 artifact to publications in ivy.xml
[ https://issues.apache.org/jira/browse/PIG-3303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem resolved PIG-3303. Resolution: Fixed Fix Version/s: 0.12 Merged in trunk add hadoop h2 artifact to publications in ivy.xml - Key: PIG-3303 URL: https://issues.apache.org/jira/browse/PIG-3303 Project: Pig Issue Type: Bug Reporter: Julien Le Dem Assignee: Julien Le Dem Fix For: 0.12 Attachments: PIG-3303.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-3307) Refactor physical operators to remove methods parameters that are always null
Julien Le Dem created PIG-3307: -- Summary: Refactor physical operators to remove methods parameters that are always null Key: PIG-3307 URL: https://issues.apache.org/jira/browse/PIG-3307 Project: Pig Issue Type: Improvement Reporter: Julien Le Dem Assignee: Julien Le Dem Attachments: PIG-3307_0.patch The physical operators are sometimes overly complex. I'm trying to cleanup some unnecessary code. in particular there is an array of getNext(*T* v) where the value v does not seem to have any importance and is just use to pick the correct method. I have started a refactoring for a more readable getNext*T*(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3307) Refactor physical operators to remove methods parameters that are always null
[ https://issues.apache.org/jira/browse/PIG-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem updated PIG-3307: --- Attachment: PIG-3307_0.patch PIG-3307_0.patch contains the initial refactoring Refactor physical operators to remove methods parameters that are always null - Key: PIG-3307 URL: https://issues.apache.org/jira/browse/PIG-3307 Project: Pig Issue Type: Improvement Reporter: Julien Le Dem Assignee: Julien Le Dem Attachments: PIG-3307_0.patch The physical operators are sometimes overly complex. I'm trying to cleanup some unnecessary code. in particular there is an array of getNext(*T* v) where the value v does not seem to have any importance and is just use to pick the correct method. I have started a refactoring for a more readable getNext*T*(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3307) Refactor physical operators to remove methods parameters that are always null
[ https://issues.apache.org/jira/browse/PIG-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem updated PIG-3307: --- Description: The physical operators are sometimes overly complex. I'm trying to cleanup some unnecessary code. in particular there is an array of getNext(*T* v) where the value v does not seem to have any importance and is just used to pick the correct method. I have started a refactoring for a more readable getNext*T*(). was: The physical operators are sometimes overly complex. I'm trying to cleanup some unnecessary code. in particular there is an array of getNext(*T* v) where the value v does not seem to have any importance and is just use to pick the correct method. I have started a refactoring for a more readable getNext*T*(). Refactor physical operators to remove methods parameters that are always null - Key: PIG-3307 URL: https://issues.apache.org/jira/browse/PIG-3307 Project: Pig Issue Type: Improvement Reporter: Julien Le Dem Assignee: Julien Le Dem Attachments: PIG-3307_0.patch The physical operators are sometimes overly complex. I'm trying to cleanup some unnecessary code. in particular there is an array of getNext(*T* v) where the value v does not seem to have any importance and is just used to pick the correct method. I have started a refactoring for a more readable getNext*T*(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3307) Refactor physical operators to remove methods parameters that are always null
[ https://issues.apache.org/jira/browse/PIG-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem updated PIG-3307: --- Attachment: PIG-3307_1.patch PIG-3307_1.patch introduces some more refactoring Refactor physical operators to remove methods parameters that are always null - Key: PIG-3307 URL: https://issues.apache.org/jira/browse/PIG-3307 Project: Pig Issue Type: Improvement Reporter: Julien Le Dem Assignee: Julien Le Dem Attachments: PIG-3307_0.patch, PIG-3307_1.patch The physical operators are sometimes overly complex. I'm trying to cleanup some unnecessary code. in particular there is an array of getNext(*T* v) where the value v does not seem to have any importance and is just used to pick the correct method. I have started a refactoring for a more readable getNext*T*(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3307) Refactor physical operators to remove methods parameters that are always null
[ https://issues.apache.org/jira/browse/PIG-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647174#comment-13647174 ] Julien Le Dem commented on PIG-3307: It looks like we can get rid of the parameter that is only used for method dispatch. I will replace all calls to getNext(Tuple t) to getNextTuple() in PhysicalOperator. Refactor physical operators to remove methods parameters that are always null - Key: PIG-3307 URL: https://issues.apache.org/jira/browse/PIG-3307 Project: Pig Issue Type: Improvement Reporter: Julien Le Dem Assignee: Julien Le Dem Attachments: PIG-3307_0.patch, PIG-3307_1.patch The physical operators are sometimes overly complex. I'm trying to cleanup some unnecessary code. in particular there is an array of getNext(*T* v) where the value v does not seem to have any importance and is just used to pick the correct method. I have started a refactoring for a more readable getNext*T*(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-3303) add hadoop h2 artifact to publications in ivy.xml
Julien Le Dem created PIG-3303: -- Summary: add hadoop h2 artifact to publications in ivy.xml Key: PIG-3303 URL: https://issues.apache.org/jira/browse/PIG-3303 Project: Pig Issue Type: Bug Reporter: Julien Le Dem -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3303) add hadoop h2 artifact to publications in ivy.xml
[ https://issues.apache.org/jira/browse/PIG-3303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem updated PIG-3303: --- Attachment: PIG-3303.patch add hadoop h2 artifact to publications in ivy.xml - Key: PIG-3303 URL: https://issues.apache.org/jira/browse/PIG-3303 Project: Pig Issue Type: Bug Reporter: Julien Le Dem Attachments: PIG-3303.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3303) add hadoop h2 artifact to publications in ivy.xml
[ https://issues.apache.org/jira/browse/PIG-3303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem updated PIG-3303: --- Patch Info: Patch Available add hadoop h2 artifact to publications in ivy.xml - Key: PIG-3303 URL: https://issues.apache.org/jira/browse/PIG-3303 Project: Pig Issue Type: Bug Reporter: Julien Le Dem Attachments: PIG-3303.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3214) New/improved mascot
[ https://issues.apache.org/jira/browse/PIG-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13596273#comment-13596273 ] Julien Le Dem commented on PIG-3214: Who let the trolls out? New/improved mascot --- Key: PIG-3214 URL: https://issues.apache.org/jira/browse/PIG-3214 Project: Pig Issue Type: Wish Components: site Affects Versions: 0.11 Reporter: Andrew Musselman Priority: Minor Fix For: 0.12 Attachments: apache-pig-yellow-logo.png, newlogo1.png, newlogo2.png, newlogo3.png, newlogo4.png, newlogo5.png, new_logo_7.png, pig_6.JPG, pig-logo-10.png, pig-logo-11.png, pig-logo-12.png, pig-logo-13.png, pig-logo-8a.png, pig-logo-8b.png, pig-logo-9a.png, pig-logo-9b.png, pig_logo_new.png Request to change pig mascot to something more graphically appealing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Deleted] (PIG-3214) New/improved mascot
[ https://issues.apache.org/jira/browse/PIG-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem updated PIG-3214: --- Comment: was deleted (was: Who let the trolls out?) New/improved mascot --- Key: PIG-3214 URL: https://issues.apache.org/jira/browse/PIG-3214 Project: Pig Issue Type: Wish Components: site Affects Versions: 0.11 Reporter: Andrew Musselman Priority: Minor Fix For: 0.12 Attachments: apache-pig-yellow-logo.png, newlogo1.png, newlogo2.png, newlogo3.png, newlogo4.png, newlogo5.png, new_logo_7.png, pig_6.JPG, pig-logo-10.png, pig-logo-11.png, pig-logo-12.png, pig-logo-13.png, pig-logo-8a.png, pig-logo-8b.png, pig-logo-9a.png, pig-logo-9b.png, pig_logo_new.png Request to change pig mascot to something more graphically appealing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3214) New/improved mascot
[ https://issues.apache.org/jira/browse/PIG-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem updated PIG-3214: --- Attachment: pig_6.JPG I like the idea of number 4 too. Here is my little contribution (#6) to it. just to illustrate the idea. (I do agree with Alan that we should only change if we have something much better) New/improved mascot --- Key: PIG-3214 URL: https://issues.apache.org/jira/browse/PIG-3214 Project: Pig Issue Type: Wish Components: site Affects Versions: 0.11 Reporter: Andrew Musselman Priority: Minor Fix For: 0.12 Attachments: newlogo1.png, newlogo2.png, newlogo3.png, newlogo4.png, newlogo5.png, pig_6.JPG Request to change pig mascot to something more graphically appealing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3194) Changes to ObjectSerializer.java break compatibility with Hadoop 0.20.2
[ https://issues.apache.org/jira/browse/PIG-3194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13593747#comment-13593747 ] Julien Le Dem commented on PIG-3194: same as Jon. we can just use the methods present in 1.3 and we don't need to be URL safe. Let's not repackage commons.codec or duplicate part of it just for this. Changes to ObjectSerializer.java break compatibility with Hadoop 0.20.2 --- Key: PIG-3194 URL: https://issues.apache.org/jira/browse/PIG-3194 Project: Pig Issue Type: Bug Affects Versions: 0.11 Reporter: Kai Londenberg The changes to ObjectSerializer.java in the following commit http://svn.apache.org/viewvc?view=revisionrevision=1403934 break compatibility with Hadoop 0.20.2 Clusters. The reason is, that the code uses methods from Apache Commons Codec 1.4 - which are not available in Apache Commons Codec 1.3 which is shipping with Hadoop 0.20.2. The offending methods are Base64.decodeBase64(String) and Base64.encodeBase64URLSafeString(byte[]) If I revert these changes, Pig 0.11.0 candidate 2 works well with our Hadoop 0.20.2 Clusters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3235) Enable DEBUG log messages in unit tests by default
[ https://issues.apache.org/jira/browse/PIG-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13593990#comment-13593990 ] Julien Le Dem commented on PIG-3235: What would be useful is to have a log4j.properties for tests in a known location that is automatically picked up in test and can be easily modified on a case by case basis. Enable DEBUG log messages in unit tests by default -- Key: PIG-3235 URL: https://issues.apache.org/jira/browse/PIG-3235 Project: Pig Issue Type: Improvement Components: tools Reporter: Cheolsoo Park Priority: Minor Currently, debug level messages are not logged for unit tests. It is helpful to enable them to debug unit tests. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3214) New/improved mascot
[ https://issues.apache.org/jira/browse/PIG-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13594152#comment-13594152 ] Julien Le Dem commented on PIG-3214: and PI can be the front legs ;) New/improved mascot --- Key: PIG-3214 URL: https://issues.apache.org/jira/browse/PIG-3214 Project: Pig Issue Type: Wish Components: site Affects Versions: 0.11 Reporter: Andrew Musselman Priority: Minor Fix For: 0.12 Attachments: newlogo1.png, newlogo2.png, newlogo3.png, newlogo4.png, newlogo5.png, pig_6.JPG Request to change pig mascot to something more graphically appealing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3140) Document PigProgressNotificationListener configs
[ https://issues.apache.org/jira/browse/PIG-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13564901#comment-13564901 ] Julien Le Dem commented on PIG-3140: +1 Document PigProgressNotificationListener configs Key: PIG-3140 URL: https://issues.apache.org/jira/browse/PIG-3140 Project: Pig Issue Type: Bug Reporter: Bill Graham Assignee: Bill Graham Fix For: 0.11 Attachments: PIG-3140_1.patch Add docs to describe what PPNL is and how to configure it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3139) Document reducer estimation
[ https://issues.apache.org/jira/browse/PIG-3139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13564902#comment-13564902 ] Julien Le Dem commented on PIG-3139: +1 Document reducer estimation --- Key: PIG-3139 URL: https://issues.apache.org/jira/browse/PIG-3139 Project: Pig Issue Type: Bug Reporter: Bill Graham Assignee: Bill Graham Fix For: 0.11 Attachments: PIG-3139_1.patch Add docs to describe how default reducer estimation algo works and how to override it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3105) Fix TestJobSubmission unit test failure.
[ https://issues.apache.org/jira/browse/PIG-3105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem updated PIG-3105: --- Fix Version/s: (was: 0.11) 0.12 I'm moving this to the next release as it does not seem to be a blocker for Pig 0.11 Fix TestJobSubmission unit test failure. Key: PIG-3105 URL: https://issues.apache.org/jira/browse/PIG-3105 Project: Pig Issue Type: Bug Components: tools Affects Versions: 0.10.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Fix For: 0.12 Attachments: PIG-3105.patch Currently with Hadoop 1.0, the TestJobSubmission unit test fails. This is due to HBASE-7423. This is a work around to that issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2846) Can we skip hcat related e2e when hcat is not installed?
[ https://issues.apache.org/jira/browse/PIG-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560184#comment-13560184 ] Julien Le Dem commented on PIG-2846: Hey [~cheolsoo] should we detach this from Pig 0.11 ? Can we skip hcat related e2e when hcat is not installed? Key: PIG-2846 URL: https://issues.apache.org/jira/browse/PIG-2846 Project: Pig Issue Type: Sub-task Reporter: Koji Noguchi Priority: Trivial Attachments: pig-2846-trunk-v1.txt Trying pig e2e for the first time, I see couple of the tests (HCatDDL_1,HCatDDL_2 and Jython_Command_1) failing with bq. java.io.IOException: Cannot run program /usr/local/hcat/bin/hcat: bq. java.io.IOException: error=2, No such file or directory Is it ok to change the test_harness to skip these tests when hcat does not exist? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3005) TestLargeFile#testOrderBy is failing
[ https://issues.apache.org/jira/browse/PIG-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem updated PIG-3005: --- Issue Type: Bug (was: Sub-task) Parent: (was: PIG-2972) TestLargeFile#testOrderBy is failing Key: PIG-3005 URL: https://issues.apache.org/jira/browse/PIG-3005 Project: Pig Issue Type: Bug Environment: Mac OSX 10.6.8 Reporter: Jonathan Coveney Fix For: 0.12 When run locally, at least, this test is failing for me. Has anyone else noticed this failing? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3082) outputSchema of a UDF allows two usages when describing a Tuple schema
[ https://issues.apache.org/jira/browse/PIG-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556686#comment-13556686 ] Julien Le Dem commented on PIG-3082: Thanks for fixing Jon! I find the error message a little confusing: {noformat} throw new FrontendException(Given UDF returns an improper Schema. Should only return Tuple, Bag, or a single item. Returns: + udfSchema); {noformat} It should contain something along the lines of ... outputSchema should return a Schema containing a single Field Otherwise, it looks good to me. Thanks outputSchema of a UDF allows two usages when describing a Tuple schema -- Key: PIG-3082 URL: https://issues.apache.org/jira/browse/PIG-3082 Project: Pig Issue Type: Bug Reporter: Julien Le Dem Assignee: Jonathan Coveney Fix For: 0.12 Attachments: PIG-3082-0.patch When defining an evalfunc that returns a Tuple there are two ways you can implement outputSchema(). - The right way: return a schema that contains one Field that contains the type and schema of the return type of the UDF - The unreliable way: return a schema that contains more than one field and it will be understood as a tuple schema even though there is no type (which is in Field class) to specify that. This is particularly deceitful when the output schema is derived from the input schema and the outputted Tuple sometimes contain only one field. In such cases Pig understands the output schema as a tuple only if there is more than one field. And sometimes it works, sometimes it does not. We should at least issue a warning (backward compatibility) if not plain throw an exception when the output schema contains more than one Field. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3098) Add another test for the self join case
[ https://issues.apache.org/jira/browse/PIG-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556692#comment-13556692 ] Julien Le Dem commented on PIG-3098: one minor comment regarding asserts: {noformat} assertEquals(tuples.size(), out.size()); for (Tuple t : out) { assertTrue(tuples.remove(t)); } assertTrue(tuples.isEmpty()); {noformat} if wrong it is not going to give much information. please add a message as the first parameter with some info: {noformat} assertEquals(tuple count for + out, tuples.size(), out.size()); for (Tuple t : out) { assertTrue(existence of + t, tuples.remove(t)); } assertTrue(all tuples consumed in + tuples, tuples.isEmpty()); {noformat} Add another test for the self join case --- Key: PIG-3098 URL: https://issues.apache.org/jira/browse/PIG-3098 Project: Pig Issue Type: Bug Reporter: Jonathan Coveney Assignee: Jonathan Coveney Fix For: 0.12 Attachments: PIG-3098-0.patch This adds a test to TestJoin that doesn't just make sure that self joins work semantically in the parser, but also that it pulls the right data through. Thought it'd be easier to just make a new JIRA than to reopen PIG-3020. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-3103) make mockito a test dependency (instead of compile)
Julien Le Dem created PIG-3103: -- Summary: make mockito a test dependency (instead of compile) Key: PIG-3103 URL: https://issues.apache.org/jira/browse/PIG-3103 Project: Pig Issue Type: Bug Reporter: Julien Le Dem -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3076) make TestScalarAliases more reliable
[ https://issues.apache.org/jira/browse/PIG-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem updated PIG-3076: --- Attachment: PIG-3076_1.patch PIG-3076_1.patch addresses comments make TestScalarAliases more reliable Key: PIG-3076 URL: https://issues.apache.org/jira/browse/PIG-3076 Project: Pig Issue Type: Test Reporter: Julien Le Dem Assignee: Julien Le Dem Fix For: 0.11, 0.12 Attachments: PIG-3076_1.patch, PIG-3076.patch currently, this test writes in the root directory so its output is not deleted by ant clean. Also it deletes its output in the end instead of the begining. The consequence is that if the test fail once then it will keep failing until the directory is manually cleaned up (not good for CI) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2959) Add a pig.cmd for Pig to run under Windows
[ https://issues.apache.org/jira/browse/PIG-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13535220#comment-13535220 ] Julien Le Dem commented on PIG-2959: hey Daniel, are you going to commit this? Add a pig.cmd for Pig to run under Windows -- Key: PIG-2959 URL: https://issues.apache.org/jira/browse/PIG-2959 Project: Pig Issue Type: Sub-task Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.11 Attachments: pig.cmd -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2957) TetsScriptUDF fail due to volume prefix in jar
[ https://issues.apache.org/jira/browse/PIG-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13535249#comment-13535249 ] Julien Le Dem commented on PIG-2957: could you call the method something more explicit than cleanupPath. Something like getPathForJar maybe? Also add comments to explain what exactly this is doing: {noformat} if (path.charAt(1)==':') { newPath = path.charAt(0) + path.substring(2); } {noformat} It would be useful to describe what it is changing in the path and why. In particular the drive letter becomes a root dir in the jar (C:/foo becomes C/foo). If that's what we want then it should be clearer. TetsScriptUDF fail due to volume prefix in jar -- Key: PIG-2957 URL: https://issues.apache.org/jira/browse/PIG-2957 Project: Pig Issue Type: Sub-task Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.11 Attachments: PIG-2957-1.patch, PIG-2957-2_0.10.patch, PIG-2957-2.patch testPythonAbsolutePath fail. Stack is: java.io.IOException: Mkdirs failed to create C:\tmp\hadoop-Administrator\mapred\local\1_0\taskTracker\Administrator\jobcache\job_20120725074728013_0011\jars\C:\Users\Administrator\pig-monarch at org.apache.hadoop.util.RunJar.unJar(RunJar.java:47) at org.apache.hadoop.mapred.JobLocalizer.localizeJobJarFile(JobLocalizer.java:277) at org.apache.hadoop.mapred.JobLocalizer.localizeJobFiles(JobLocalizer.java:377) at org.apache.hadoop.mapred.JobLocalizer.localizeJobFiles(JobLocalizer.java:367) at org.apache.hadoop.mapred.DefaultTaskController.initializeJob(DefaultTaskController.java:214) at org.apache.hadoop.mapred.TaskTracker$4.run(TaskTracker.java:1237) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1107) at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1212) at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1127) at org.apache.hadoop.mapred.TaskTracker$5.run(TaskTracker.java:2417) at java.lang.Thread.run(Thread.java:662) The reason is we pack the volume prefix into the job.jar. jar tvf C:\Users\ADMINI~1\AppData\Local\Temp\Job6350 669482684441868.jar|grep testPythonAbsolutePath 98 Wed Jul 25 11:12:58 PDT 2012 C:\Users\Administrator\pig-monarch\testPytho nAbsolutePath.py -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2956) Invalid cache specification for some streaming statement
[ https://issues.apache.org/jira/browse/PIG-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13535335#comment-13535335 ] Julien Le Dem commented on PIG-2956: Daniel? any update on this? Invalid cache specification for some streaming statement Key: PIG-2956 URL: https://issues.apache.org/jira/browse/PIG-2956 Project: Pig Issue Type: Sub-task Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.11 Attachments: PIG-2956-1_0.10.patch, PIG-2956-1.patch Another category of failure in e2e tests, such as ComputeSpec_1, ComputeSpec_2, ComputeSpec_3, RaceConditions_1, RaceConditions_3, RaceConditions_4, RaceConditions_7, RaceConditions_8. Here is stack: ERROR 6003: Invalid cache specification. File doesn't exist: C:/Program Files (x86)/GnuWin32/bin/head.exe org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException: ERROR 2017: Internal error creating job configuration. at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:723) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:258) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:151) at org.apache.pig.PigServer.launchPlan(PigServer.java:1318) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1303) at org.apache.pig.PigServer.execute(PigServer.java:1293) at org.apache.pig.PigServer.executeBatch(PigServer.java:364) at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:133) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84) at org.apache.pig.Main.run(Main.java:561) at org.apache.pig.Main.main(Main.java:111) Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 6003: Invalid cache specification. File doesn't exist: C:/Program Files (x86)/GnuWin32/bin/head.exe at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.setupDistributedCache(JobControlCompiler.java:1151) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.setupDistributedCache(JobControlCompiler.java:1129) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:447) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2955) Fix bunch of Pig e2e tests on Windows
[ https://issues.apache.org/jira/browse/PIG-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13535339#comment-13535339 ] Julien Le Dem commented on PIG-2955: Daniel, do you want to check that in? Fix bunch of Pig e2e tests on Windows --- Key: PIG-2955 URL: https://issues.apache.org/jira/browse/PIG-2955 Project: Pig Issue Type: Sub-task Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.11, 0.10.1, 0.12 Attachments: PIG-2955-1.patch, PIG-2955-2_0.10.patch, PIG-2955-2.patch Fix the following test aborts and failures: ComputeSpec_1 ComputeSpec_2 Unicode_cmdline_1 Warning_1 Warning_4 Checkin_2 UdfDistributedCache_1 Jython_Checkin_2 Jython_Diagnostics_4 Jython_Diagnostics_5 Jython_Diagnostics_6 Jython_Error_3 Jython_Error_4 Jython_Error_5 Jython_Error_6 Jython_Error_7 Grunt_6 Grunt_8 Grunt_13 Grunt_14 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2954) TestParamSubPreproc still depends on bash to run
[ https://issues.apache.org/jira/browse/PIG-2954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13535340#comment-13535340 ] Julien Le Dem commented on PIG-2954: is this still on target for pig-0.11? TestParamSubPreproc still depends on bash to run Key: PIG-2954 URL: https://issues.apache.org/jira/browse/PIG-2954 Project: Pig Issue Type: Sub-task Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.11 Attachments: PIG-2954-1.patch, PIG-2954-2.patch If bash is not exist in path, there are 3 test failures. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2927) SHIP and use JRuby gems in JRuby UDFs
[ https://issues.apache.org/jira/browse/PIG-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem updated PIG-2927: --- Fix Version/s: (was: 0.11) 0.12 This will go in the next release as we are stabilizing the 0.11 branch SHIP and use JRuby gems in JRuby UDFs - Key: PIG-2927 URL: https://issues.apache.org/jira/browse/PIG-2927 Project: Pig Issue Type: New Feature Components: parser Affects Versions: 0.11 Environment: JRuby UDFs Reporter: Russell Jurney Assignee: Jonathan Coveney Priority: Minor Fix For: 0.12 Attachments: PIG-2927-0.patch, PIG-2927-1.patch, PIG-2927-2.patch, PIG-2927-3.patch, PIG-2927-4.patch It would be great to use JRuby gems in JRuby UDFs without installing them on all machines on the cluster. Some way to SHIP them automatically with the job would be great. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2614) AvroStorage crashes on LOADING a single bad error
[ https://issues.apache.org/jira/browse/PIG-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem updated PIG-2614: --- Fix Version/s: (was: 0.10.1) (was: 0.11) 0.12 moving this to next release so that we can converge on pig 0.11 AvroStorage crashes on LOADING a single bad error - Key: PIG-2614 URL: https://issues.apache.org/jira/browse/PIG-2614 Project: Pig Issue Type: Bug Components: piggybank Affects Versions: 0.10.0, 0.11 Reporter: Russell Jurney Assignee: Jonathan Coveney Labels: avro, avrostorage, bad, book, cutting, doug, for, my, pig, sadism Fix For: 0.12 Attachments: PIG-2614_0.patch, PIG-2614_1.patch, PIG-2614_2.patch, test_avro_files.tar.gz AvroStorage dies when a single bad record exists, such as one with missing fields. This is very bad on 'big data,' where bad records are inevitable. See discussion at http://www.quora.com/Big-Data/In-Big-Data-ETL-how-many-records-are-an-acceptable-loss for more theory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-2583) Add Grunt command to list the statements in cache
[ https://issues.apache.org/jira/browse/PIG-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem resolved PIG-2583. Resolution: Fixed Add Grunt command to list the statements in cache - Key: PIG-2583 URL: https://issues.apache.org/jira/browse/PIG-2583 Project: Pig Issue Type: Improvement Affects Versions: 0.10.0 Reporter: Daniel Dai Assignee: Allan Avendaño Priority: Minor Labels: newbie Fix For: 0.11 Attachments: gruntHistory1.patch, gruntHistory2.patch, gruntHistory3.patch, gruntHistory4.patch, gruntHistory.patch It is convenient to list statements in cache: grunt a = load '1.txt'; grunt b = foreach a generate $0, $1; grunt list a = load '1.txt'; b = foreach a generate $0, $1; -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2583) Add Grunt command to list the statements in cache
[ https://issues.apache.org/jira/browse/PIG-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13535355#comment-13535355 ] Julien Le Dem commented on PIG-2583: [~xalan] I'm closing this ticket as it has been committed. Please open a new ticket to further improve your contribution. Thanks again Add Grunt command to list the statements in cache - Key: PIG-2583 URL: https://issues.apache.org/jira/browse/PIG-2583 Project: Pig Issue Type: Improvement Affects Versions: 0.10.0 Reporter: Daniel Dai Assignee: Allan Avendaño Priority: Minor Labels: newbie Fix For: 0.11 Attachments: gruntHistory1.patch, gruntHistory2.patch, gruntHistory3.patch, gruntHistory4.patch, gruntHistory.patch It is convenient to list statements in cache: grunt a = load '1.txt'; grunt b = foreach a generate $0, $1; grunt list a = load '1.txt'; b = foreach a generate $0, $1; -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-2493) UNION causes casting issues
[ https://issues.apache.org/jira/browse/PIG-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem resolved PIG-2493. Resolution: Fixed I'm closing this issue as it has been committed and we are stabilizing a release. [~arov] please open a new JIRA if you still see problems UNION causes casting issues --- Key: PIG-2493 URL: https://issues.apache.org/jira/browse/PIG-2493 Project: Pig Issue Type: Bug Affects Versions: 0.9.1, 0.10.0 Reporter: Anitha Raju Assignee: Vivek Padmanabhan Fix For: 0.9.3, 0.11, 0.10.1 Attachments: PIG-2493_2.patch, PIG-2493-3.patch, PIG-2493.patch Hi, For the below script, {code} A = load '/user/anithar/ip' as (a); B = load '/user/anithar/ip1' as (a); C = union A , B ; D = foreach C generate (chararray)a; dump D; {code} it gives casting error at runtime {code} org.apache.pig.backend.executionengine.ExecException: ERROR 1075: Received a bytearray from the UDF. Cannot determine how to convert the bytearray to string. at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:660) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:322) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:332) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:284) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:267) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:262) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1082) at org.apache.hadoop.mapred.Child.main(Child.java:249) {code} It looks like in POCast.java the value of funcSpec is not getting any value(stays null when there is a UNION involved), causing caster to get null and thus the exception. The same works in 0.8 without any issue. Regards, Anitha -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3020) Duplicate uid in schema error when joining two relations derived from the same load statement
[ https://issues.apache.org/jira/browse/PIG-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13534401#comment-13534401 ] Julien Le Dem commented on PIG-3020: looks good to me +1 Duplicate uid in schema error when joining two relations derived from the same load statement --- Key: PIG-3020 URL: https://issues.apache.org/jira/browse/PIG-3020 Project: Pig Issue Type: Bug Affects Versions: 0.11 Reporter: Julien Le Dem Assignee: Julien Le Dem Attachments: PIG-3020-2.patch, PIG-3020-2_ws.patch, PIG-3020_branch-0.11_1.patch, PIG-3020.patch, PIG-3093-testcase.patch The following validates OK with pig 0.9 and fails with the following error in 0.11 (and I suspect 0.10) pig -c debug2.pig Script: debug2.pig {noformat} A = LOAD 'foo' AS (group:tuple(uid, dst_id), uids_with_recs:bag{} , uids_with_flock:bag{}); edges_both = FILTER A BY NOT IsEmpty(uids_with_recs) AND NOT IsEmpty(uids_with_flock); edges_both = FOREACH edges_both GENERATE group.uid AS src_id, group.dst_id AS dst_id; both_counts = GROUP edges_both BY src_id; both_counts = FOREACH both_counts GENERATE group AS src_id, SIZE(edges_both) AS size_both; edges_bq = FILTER A BY NOT IsEmpty(uids_with_recs); edges_bq = FOREACH edges_bq GENERATE group.uid AS src_id, group.dst_id AS dst_id; bq_counts = GROUP edges_bq BY src_id; bq_counts = FOREACH bq_counts GENERATE group AS src_id, SIZE(edges_bq) AS size_bq; per_user_set_sizes = JOIN bq_counts BY src_id LEFT OUTER, both_counts BY src_id; store per_user_set_sizes into 'foo'; {noformat} Error: {noformat} ERROR 2270: Logical plan invalid state: duplicate uid in schema : bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to explain alias null at org.apache.pig.PigServer.explain(PigServer.java:999) at org.apache.pig.tools.grunt.GruntParser.explainCurrentBatch(GruntParser.java:398) at org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:330) at org.apache.pig.tools.grunt.Grunt.checkScript(Grunt.java:98) at org.apache.pig.Main.run(Main.java:600) at org.apache.pig.Main.main(Main.java:154) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:186) Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: Error processing rule LoadTypeCastInserter at org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:122) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:277) at org.apache.pig.PigServer.compilePp(PigServer.java:1322) at org.apache.pig.PigServer.explain(PigServer.java:984) ... 10 more Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2270: Logical plan invalid state: duplicate uid in schema : bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long at org.apache.pig.newplan.logical.optimizer.SchemaResetter.validate(SchemaResetter.java:232) at org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:105) at org.apache.pig.newplan.logical.relational.LOJoin.accept(LOJoin.java:171) at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52) at org.apache.pig.newplan.logical.optimizer.SchemaPatcher.transformed(SchemaPatcher.java:43) at org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:113) ... 13 more {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (PIG-3020) Duplicate uid in schema error when joining two relations derived from the same load statement
[ https://issues.apache.org/jira/browse/PIG-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem reassigned PIG-3020: -- Assignee: Julien Le Dem Duplicate uid in schema error when joining two relations derived from the same load statement --- Key: PIG-3020 URL: https://issues.apache.org/jira/browse/PIG-3020 Project: Pig Issue Type: Bug Affects Versions: 0.11 Reporter: Julien Le Dem Assignee: Julien Le Dem Attachments: PIG-3020.patch The following vali=dates OK with pig 0.9 and fails with the following error in 0.11 (and I suspect 0.10) pig -c debug2.pig Script: debug2.pig {noformat} A = LOAD 'foo' AS (group:tuple(uid, dst_id), uids_with_recs:bag{} , uids_with_flock:bag{}); edges_both = FILTER A BY NOT IsEmpty(uids_with_recs) AND NOT IsEmpty(uids_with_flock); edges_both = FOREACH edges_both GENERATE group.uid AS src_id, group.dst_id AS dst_id; both_counts = GROUP edges_both BY src_id; both_counts = FOREACH both_counts GENERATE group AS src_id, SIZE(edges_both) AS size_both; edges_bq = FILTER A BY NOT IsEmpty(uids_with_recs); edges_bq = FOREACH edges_bq GENERATE group.uid AS src_id, group.dst_id AS dst_id; bq_counts = GROUP edges_bq BY src_id; bq_counts = FOREACH bq_counts GENERATE group AS src_id, SIZE(edges_bq) AS size_bq; per_user_set_sizes = JOIN bq_counts BY src_id LEFT OUTER, both_counts BY src_id; store per_user_set_sizes into 'foo'; {noformat} Error: {noformat} ERROR 2270: Logical plan invalid state: duplicate uid in schema : bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to explain alias null at org.apache.pig.PigServer.explain(PigServer.java:999) at org.apache.pig.tools.grunt.GruntParser.explainCurrentBatch(GruntParser.java:398) at org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:330) at org.apache.pig.tools.grunt.Grunt.checkScript(Grunt.java:98) at org.apache.pig.Main.run(Main.java:600) at org.apache.pig.Main.main(Main.java:154) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:186) Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: Error processing rule LoadTypeCastInserter at org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:122) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:277) at org.apache.pig.PigServer.compilePp(PigServer.java:1322) at org.apache.pig.PigServer.explain(PigServer.java:984) ... 10 more Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2270: Logical plan invalid state: duplicate uid in schema : bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long at org.apache.pig.newplan.logical.optimizer.SchemaResetter.validate(SchemaResetter.java:232) at org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:105) at org.apache.pig.newplan.logical.relational.LOJoin.accept(LOJoin.java:171) at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52) at org.apache.pig.newplan.logical.optimizer.SchemaPatcher.transformed(SchemaPatcher.java:43) at org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:113) ... 13 more {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3020) Duplicate uid in schema error when joining two relations derived from the same load statement
[ https://issues.apache.org/jira/browse/PIG-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem updated PIG-3020: --- Patch Info: Patch Available Duplicate uid in schema error when joining two relations derived from the same load statement --- Key: PIG-3020 URL: https://issues.apache.org/jira/browse/PIG-3020 Project: Pig Issue Type: Bug Affects Versions: 0.11 Reporter: Julien Le Dem Assignee: Julien Le Dem Attachments: PIG-3020.patch The following vali=dates OK with pig 0.9 and fails with the following error in 0.11 (and I suspect 0.10) pig -c debug2.pig Script: debug2.pig {noformat} A = LOAD 'foo' AS (group:tuple(uid, dst_id), uids_with_recs:bag{} , uids_with_flock:bag{}); edges_both = FILTER A BY NOT IsEmpty(uids_with_recs) AND NOT IsEmpty(uids_with_flock); edges_both = FOREACH edges_both GENERATE group.uid AS src_id, group.dst_id AS dst_id; both_counts = GROUP edges_both BY src_id; both_counts = FOREACH both_counts GENERATE group AS src_id, SIZE(edges_both) AS size_both; edges_bq = FILTER A BY NOT IsEmpty(uids_with_recs); edges_bq = FOREACH edges_bq GENERATE group.uid AS src_id, group.dst_id AS dst_id; bq_counts = GROUP edges_bq BY src_id; bq_counts = FOREACH bq_counts GENERATE group AS src_id, SIZE(edges_bq) AS size_bq; per_user_set_sizes = JOIN bq_counts BY src_id LEFT OUTER, both_counts BY src_id; store per_user_set_sizes into 'foo'; {noformat} Error: {noformat} ERROR 2270: Logical plan invalid state: duplicate uid in schema : bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to explain alias null at org.apache.pig.PigServer.explain(PigServer.java:999) at org.apache.pig.tools.grunt.GruntParser.explainCurrentBatch(GruntParser.java:398) at org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:330) at org.apache.pig.tools.grunt.Grunt.checkScript(Grunt.java:98) at org.apache.pig.Main.run(Main.java:600) at org.apache.pig.Main.main(Main.java:154) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:186) Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: Error processing rule LoadTypeCastInserter at org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:122) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:277) at org.apache.pig.PigServer.compilePp(PigServer.java:1322) at org.apache.pig.PigServer.explain(PigServer.java:984) ... 10 more Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2270: Logical plan invalid state: duplicate uid in schema : bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long at org.apache.pig.newplan.logical.optimizer.SchemaResetter.validate(SchemaResetter.java:232) at org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:105) at org.apache.pig.newplan.logical.relational.LOJoin.accept(LOJoin.java:171) at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52) at org.apache.pig.newplan.logical.optimizer.SchemaPatcher.transformed(SchemaPatcher.java:43) at org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:113) ... 13 more {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3020) Duplicate uid in schema error when joining two relations derived from the same load statement
[ https://issues.apache.org/jira/browse/PIG-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem updated PIG-3020: --- Description: The following validates OK with pig 0.9 and fails with the following error in 0.11 (and I suspect 0.10) pig -c debug2.pig Script: debug2.pig {noformat} A = LOAD 'foo' AS (group:tuple(uid, dst_id), uids_with_recs:bag{} , uids_with_flock:bag{}); edges_both = FILTER A BY NOT IsEmpty(uids_with_recs) AND NOT IsEmpty(uids_with_flock); edges_both = FOREACH edges_both GENERATE group.uid AS src_id, group.dst_id AS dst_id; both_counts = GROUP edges_both BY src_id; both_counts = FOREACH both_counts GENERATE group AS src_id, SIZE(edges_both) AS size_both; edges_bq = FILTER A BY NOT IsEmpty(uids_with_recs); edges_bq = FOREACH edges_bq GENERATE group.uid AS src_id, group.dst_id AS dst_id; bq_counts = GROUP edges_bq BY src_id; bq_counts = FOREACH bq_counts GENERATE group AS src_id, SIZE(edges_bq) AS size_bq; per_user_set_sizes = JOIN bq_counts BY src_id LEFT OUTER, both_counts BY src_id; store per_user_set_sizes into 'foo'; {noformat} Error: {noformat} ERROR 2270: Logical plan invalid state: duplicate uid in schema : bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to explain alias null at org.apache.pig.PigServer.explain(PigServer.java:999) at org.apache.pig.tools.grunt.GruntParser.explainCurrentBatch(GruntParser.java:398) at org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:330) at org.apache.pig.tools.grunt.Grunt.checkScript(Grunt.java:98) at org.apache.pig.Main.run(Main.java:600) at org.apache.pig.Main.main(Main.java:154) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:186) Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: Error processing rule LoadTypeCastInserter at org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:122) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:277) at org.apache.pig.PigServer.compilePp(PigServer.java:1322) at org.apache.pig.PigServer.explain(PigServer.java:984) ... 10 more Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2270: Logical plan invalid state: duplicate uid in schema : bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long at org.apache.pig.newplan.logical.optimizer.SchemaResetter.validate(SchemaResetter.java:232) at org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:105) at org.apache.pig.newplan.logical.relational.LOJoin.accept(LOJoin.java:171) at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52) at org.apache.pig.newplan.logical.optimizer.SchemaPatcher.transformed(SchemaPatcher.java:43) at org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:113) ... 13 more {noformat} was: The following vali=dates OK with pig 0.9 and fails with the following error in 0.11 (and I suspect 0.10) pig -c debug2.pig Script: debug2.pig {noformat} A = LOAD 'foo' AS (group:tuple(uid, dst_id), uids_with_recs:bag{} , uids_with_flock:bag{}); edges_both = FILTER A BY NOT IsEmpty(uids_with_recs) AND NOT IsEmpty(uids_with_flock); edges_both = FOREACH edges_both GENERATE group.uid AS src_id, group.dst_id AS dst_id; both_counts = GROUP edges_both BY src_id; both_counts = FOREACH both_counts GENERATE group AS src_id, SIZE(edges_both) AS size_both; edges_bq = FILTER A BY NOT IsEmpty(uids_with_recs); edges_bq = FOREACH edges_bq GENERATE group.uid AS src_id, group.dst_id AS dst_id; bq_counts = GROUP edges_bq BY src_id; bq_counts = FOREACH bq_counts GENERATE group AS src_id, SIZE(edges_bq) AS size_bq; per_user_set_sizes = JOIN bq_counts BY src_id LEFT OUTER, both_counts BY src_id; store per_user_set_sizes into 'foo'; {noformat} Error: {noformat} ERROR 2270: Logical plan invalid state: duplicate uid in schema : bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to explain alias null at
[jira] [Updated] (PIG-3020) Duplicate uid in schema error when joining two relations derived from the same load statement
[ https://issues.apache.org/jira/browse/PIG-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem updated PIG-3020: --- Attachment: PIG-3020_branch-0.11_1.patch Duplicate uid in schema error when joining two relations derived from the same load statement --- Key: PIG-3020 URL: https://issues.apache.org/jira/browse/PIG-3020 Project: Pig Issue Type: Bug Affects Versions: 0.11 Reporter: Julien Le Dem Assignee: Julien Le Dem Attachments: PIG-3020_branch-0.11_1.patch, PIG-3020.patch The following validates OK with pig 0.9 and fails with the following error in 0.11 (and I suspect 0.10) pig -c debug2.pig Script: debug2.pig {noformat} A = LOAD 'foo' AS (group:tuple(uid, dst_id), uids_with_recs:bag{} , uids_with_flock:bag{}); edges_both = FILTER A BY NOT IsEmpty(uids_with_recs) AND NOT IsEmpty(uids_with_flock); edges_both = FOREACH edges_both GENERATE group.uid AS src_id, group.dst_id AS dst_id; both_counts = GROUP edges_both BY src_id; both_counts = FOREACH both_counts GENERATE group AS src_id, SIZE(edges_both) AS size_both; edges_bq = FILTER A BY NOT IsEmpty(uids_with_recs); edges_bq = FOREACH edges_bq GENERATE group.uid AS src_id, group.dst_id AS dst_id; bq_counts = GROUP edges_bq BY src_id; bq_counts = FOREACH bq_counts GENERATE group AS src_id, SIZE(edges_bq) AS size_bq; per_user_set_sizes = JOIN bq_counts BY src_id LEFT OUTER, both_counts BY src_id; store per_user_set_sizes into 'foo'; {noformat} Error: {noformat} ERROR 2270: Logical plan invalid state: duplicate uid in schema : bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to explain alias null at org.apache.pig.PigServer.explain(PigServer.java:999) at org.apache.pig.tools.grunt.GruntParser.explainCurrentBatch(GruntParser.java:398) at org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:330) at org.apache.pig.tools.grunt.Grunt.checkScript(Grunt.java:98) at org.apache.pig.Main.run(Main.java:600) at org.apache.pig.Main.main(Main.java:154) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:186) Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: Error processing rule LoadTypeCastInserter at org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:122) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:277) at org.apache.pig.PigServer.compilePp(PigServer.java:1322) at org.apache.pig.PigServer.explain(PigServer.java:984) ... 10 more Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2270: Logical plan invalid state: duplicate uid in schema : bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long at org.apache.pig.newplan.logical.optimizer.SchemaResetter.validate(SchemaResetter.java:232) at org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:105) at org.apache.pig.newplan.logical.relational.LOJoin.accept(LOJoin.java:171) at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52) at org.apache.pig.newplan.logical.optimizer.SchemaPatcher.transformed(SchemaPatcher.java:43) at org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:113) ... 13 more {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3020) Duplicate uid in schema error when joining two relations derived from the same load statement
[ https://issues.apache.org/jira/browse/PIG-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530255#comment-13530255 ] Julien Le Dem commented on PIG-3020: [~dvryaboy] I just noticed it was logging a warning with a NullPointerException when running tests from eclipse. I just fixed the log line to something clearer. It is not related but I feel it is small enough to be done here. [~jcoveney] I also added a unit test with a pig script that was failing before and works now to validate my change. Duplicate uid in schema error when joining two relations derived from the same load statement --- Key: PIG-3020 URL: https://issues.apache.org/jira/browse/PIG-3020 Project: Pig Issue Type: Bug Affects Versions: 0.11 Reporter: Julien Le Dem Attachments: PIG-3020.patch The following vali=dates OK with pig 0.9 and fails with the following error in 0.11 (and I suspect 0.10) pig -c debug2.pig Script: debug2.pig {noformat} A = LOAD 'foo' AS (group:tuple(uid, dst_id), uids_with_recs:bag{} , uids_with_flock:bag{}); edges_both = FILTER A BY NOT IsEmpty(uids_with_recs) AND NOT IsEmpty(uids_with_flock); edges_both = FOREACH edges_both GENERATE group.uid AS src_id, group.dst_id AS dst_id; both_counts = GROUP edges_both BY src_id; both_counts = FOREACH both_counts GENERATE group AS src_id, SIZE(edges_both) AS size_both; edges_bq = FILTER A BY NOT IsEmpty(uids_with_recs); edges_bq = FOREACH edges_bq GENERATE group.uid AS src_id, group.dst_id AS dst_id; bq_counts = GROUP edges_bq BY src_id; bq_counts = FOREACH bq_counts GENERATE group AS src_id, SIZE(edges_bq) AS size_bq; per_user_set_sizes = JOIN bq_counts BY src_id LEFT OUTER, both_counts BY src_id; store per_user_set_sizes into 'foo'; {noformat} Error: {noformat} ERROR 2270: Logical plan invalid state: duplicate uid in schema : bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to explain alias null at org.apache.pig.PigServer.explain(PigServer.java:999) at org.apache.pig.tools.grunt.GruntParser.explainCurrentBatch(GruntParser.java:398) at org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:330) at org.apache.pig.tools.grunt.Grunt.checkScript(Grunt.java:98) at org.apache.pig.Main.run(Main.java:600) at org.apache.pig.Main.main(Main.java:154) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:186) Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: Error processing rule LoadTypeCastInserter at org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:122) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:277) at org.apache.pig.PigServer.compilePp(PigServer.java:1322) at org.apache.pig.PigServer.explain(PigServer.java:984) ... 10 more Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2270: Logical plan invalid state: duplicate uid in schema : bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long at org.apache.pig.newplan.logical.optimizer.SchemaResetter.validate(SchemaResetter.java:232) at org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:105) at org.apache.pig.newplan.logical.relational.LOJoin.accept(LOJoin.java:171) at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52) at org.apache.pig.newplan.logical.optimizer.SchemaPatcher.transformed(SchemaPatcher.java:43) at org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:113) ... 13 more {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3020) Duplicate uid in schema error when joining two relations derived from the same load statement
[ https://issues.apache.org/jira/browse/PIG-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem updated PIG-3020: --- Attachment: PIG-3020.patch PIG-3020.patch fixes the issue Duplicate uid in schema error when joining two relations derived from the same load statement --- Key: PIG-3020 URL: https://issues.apache.org/jira/browse/PIG-3020 Project: Pig Issue Type: Bug Affects Versions: 0.11 Reporter: Julien Le Dem Attachments: PIG-3020.patch The following vali=dates OK with pig 0.9 and fails with the following error in 0.11 (and I suspect 0.10) pig -c debug2.pig Script: debug2.pig {noformat} A = LOAD 'foo' AS (group:tuple(uid, dst_id), uids_with_recs:bag{} , uids_with_flock:bag{}); edges_both = FILTER A BY NOT IsEmpty(uids_with_recs) AND NOT IsEmpty(uids_with_flock); edges_both = FOREACH edges_both GENERATE group.uid AS src_id, group.dst_id AS dst_id; both_counts = GROUP edges_both BY src_id; both_counts = FOREACH both_counts GENERATE group AS src_id, SIZE(edges_both) AS size_both; edges_bq = FILTER A BY NOT IsEmpty(uids_with_recs); edges_bq = FOREACH edges_bq GENERATE group.uid AS src_id, group.dst_id AS dst_id; bq_counts = GROUP edges_bq BY src_id; bq_counts = FOREACH bq_counts GENERATE group AS src_id, SIZE(edges_bq) AS size_bq; per_user_set_sizes = JOIN bq_counts BY src_id LEFT OUTER, both_counts BY src_id; store per_user_set_sizes into 'foo'; {noformat} Error: {noformat} ERROR 2270: Logical plan invalid state: duplicate uid in schema : bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to explain alias null at org.apache.pig.PigServer.explain(PigServer.java:999) at org.apache.pig.tools.grunt.GruntParser.explainCurrentBatch(GruntParser.java:398) at org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:330) at org.apache.pig.tools.grunt.Grunt.checkScript(Grunt.java:98) at org.apache.pig.Main.run(Main.java:600) at org.apache.pig.Main.main(Main.java:154) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:186) Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: Error processing rule LoadTypeCastInserter at org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:122) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:277) at org.apache.pig.PigServer.compilePp(PigServer.java:1322) at org.apache.pig.PigServer.explain(PigServer.java:984) ... 10 more Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2270: Logical plan invalid state: duplicate uid in schema : bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long at org.apache.pig.newplan.logical.optimizer.SchemaResetter.validate(SchemaResetter.java:232) at org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:105) at org.apache.pig.newplan.logical.relational.LOJoin.accept(LOJoin.java:171) at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52) at org.apache.pig.newplan.logical.optimizer.SchemaPatcher.transformed(SchemaPatcher.java:43) at org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:113) ... 13 more {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-3084) Improve exceptions messages in POPackage
Julien Le Dem created PIG-3084: -- Summary: Improve exceptions messages in POPackage Key: PIG-3084 URL: https://issues.apache.org/jira/browse/PIG-3084 Project: Pig Issue Type: Bug Reporter: Julien Le Dem Assignee: Julien Le Dem -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-3084) Improve exceptions messages in POPackage
[ https://issues.apache.org/jira/browse/PIG-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem resolved PIG-3084. Resolution: Fixed Fix Version/s: 0.12 Improve exceptions messages in POPackage Key: PIG-3084 URL: https://issues.apache.org/jira/browse/PIG-3084 Project: Pig Issue Type: Bug Reporter: Julien Le Dem Assignee: Julien Le Dem Fix For: 0.12 Attachments: PIG-3084_1.patch, PIG-3084.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2599) Mavenize Pig
[ https://issues.apache.org/jira/browse/PIG-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13525797#comment-13525797 ] Julien Le Dem commented on PIG-2599: Hey, that sounds like a great start. Usually attaching a patch to the JIRA and optionnaly posting a review to https://reviews.apache.org/dashboard/ is how a review is done. If you need help you can ping the pig-dev mailing list or comment on the JIRA. a shell script sounds good to me. Is it intended to be a one time move that is then checked in to replace the current layout? Why do you need jdo to be installed in your local maven repo, isn't maven going to do it? could you provide a short description of each folder? Not all of them are clear to me. Do you deal with hadoop 20 vs 23 ? I think zebra has had issues for a while. I'm not sure what the status of this is right now. Maybe Olga knows. fixing checkstyle and findbugs later sound ok to me. It should be relatively easy to do. what about the shim layer ? Anyways, thanks for looking into this. Mavenize Pig Key: PIG-2599 URL: https://issues.apache.org/jira/browse/PIG-2599 Project: Pig Issue Type: New Feature Components: build Reporter: Daniel Dai Labels: gsoc2012 Attachments: maven-pig.1.zip Switch Pig build system from ant to maven. This is a candidate project for Google summer of code 2012. More information about the program can be found at https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2599) Mavenize Pig
[ https://issues.apache.org/jira/browse/PIG-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem updated PIG-2599: --- Description: Switch Pig build system from ant to maven. was: Switch Pig build system from ant to maven. This is a candidate project for Google summer of code 2012. More information about the program can be found at https://cwiki.apache.org/confluence/display/PIG/GSoc2012 Fix Version/s: 0.12 Labels: (was: gsoc2012) Mavenize Pig Key: PIG-2599 URL: https://issues.apache.org/jira/browse/PIG-2599 Project: Pig Issue Type: New Feature Components: build Reporter: Daniel Dai Fix For: 0.12 Attachments: maven-pig.1.zip Switch Pig build system from ant to maven. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2815) class loader management in PigContext
[ https://issues.apache.org/jira/browse/PIG-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13525809#comment-13525809 ] Julien Le Dem commented on PIG-2815: 0.11/trunk sounds good to me class loader management in PigContext - Key: PIG-2815 URL: https://issues.apache.org/jira/browse/PIG-2815 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.9.0 Reporter: Raghu Angadi Assignee: Raghu Angadi Fix For: 0.11 Attachments: PIG-2815-branch-0.9.patch, PIG-2815-branch-0.9.patch, PIG-2815.patch, PIG-2815.patch The way {{PigContext.classloader}} and resolveClassName() are managed can lead to strange class loading issues, especially when not all {{register}} statements are at the top (example in the first comment). Two factors contribute to this: sometimes only one of them and sometimes together: # a new classloader (CL) is created after registering each jar. ** but the new jar's parent is the root CL rather than previous CL, effectively throwing previous CL away. # resolveClassName() caches classes based on just the name ** A class is not defined by name alone. Classes loaded by two different unrelated CLs are different objects even if both extract the class from same physical jar file. ** because of (1), the cached class is not necessarily same as the class that would be loaded based on 'current' CL having different class objects for same class have many subtle side effects. e.g. there would be two instances of static variables. I think both should be fixed.. thought fixing one of them might be good enough in many cases. I will add a patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2204) Allow passing arguments to custom Partitioners
[ https://issues.apache.org/jira/browse/PIG-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13525819#comment-13525819 ] Julien Le Dem commented on PIG-2204: maybe just updating the doc to mention this? Allow passing arguments to custom Partitioners -- Key: PIG-2204 URL: https://issues.apache.org/jira/browse/PIG-2204 Project: Pig Issue Type: Improvement Reporter: Dmitriy V. Ryaboy Currently, this works: {code} y = group x by $0 partition by MyPartitioner PARALLEL 2; {code} However, passing an argument to the partitioner constructor does not work, and dies with a misleading error: {code} y = group x by $0 partition by MyPartitioner(0) PARALLEL 2; 2011-08-03 22:53:23,074 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Encountered ( ( at line 1, column 91. Was expecting one of: parallel ... ; ... . ... $ ... {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2812) Spill InternalCachedBag into only 1 file
[ https://issues.apache.org/jira/browse/PIG-2812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem updated PIG-2812: --- Fix Version/s: (was: 0.11) I'm detaching this from pig-0.11 as it is not ready yet Spill InternalCachedBag into only 1 file Key: PIG-2812 URL: https://issues.apache.org/jira/browse/PIG-2812 Project: Pig Issue Type: Bug Components: data Reporter: Haitao Yao Assignee: Haitao Yao Attachments: aa.jpg, spill.patch I encountered a reducer's OOM because of java.io.DeleteOnExitHook. And I found out that the InternalCachedBag creates a seperate tmp file, and the tmp files is deleted on exit. So the file delete hook caused the OOM. Why not just hold the tmp file handle and spill only one tmp file? Too many tmp files may block the tasktracker start process, if the tmp files are not cleaned on time and the tasktracker restarts at this specific time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-3076) make TestScalarAliases more reliable
Julien Le Dem created PIG-3076: -- Summary: make TestScalarAliases more reliable Key: PIG-3076 URL: https://issues.apache.org/jira/browse/PIG-3076 Project: Pig Issue Type: Test Reporter: Julien Le Dem Assignee: Julien Le Dem Fix For: 0.11, 0.12 currently, this test writes in the root directory so its output is not deleted by ant clean. Also it deletes its output in the end instead of the begining. The consequence is that if the test fail once then it will keep failing until the directory is manually cleaned up (not good for CI) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-3077) TestMultiQueryLocal should not write in /tmp
Julien Le Dem created PIG-3077: -- Summary: TestMultiQueryLocal should not write in /tmp Key: PIG-3077 URL: https://issues.apache.org/jira/browse/PIG-3077 Project: Pig Issue Type: Test Reporter: Julien Le Dem temporary files from tests should be under build/test so that they are cleaned by ant clean Currently two test suites running on the same machine step on each other and create flaky tests results -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (PIG-3014) CurrentTime() UDF has undesirable characteristics
[ https://issues.apache.org/jira/browse/PIG-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem reopened PIG-3014: I see a failing test: org.apache.pig.test.TestBuiltin.testConversionBetweenDateTimeAndString java.lang.NullPointerException at org.apache.pig.builtin.CurrentTime.exec(CurrentTime.java:41) at org.apache.pig.test.TestBuiltin.testConversionBetweenDateTimeAndString(TestBuiltin.java:450) CurrentTime() UDF has undesirable characteristics - Key: PIG-3014 URL: https://issues.apache.org/jira/browse/PIG-3014 Project: Pig Issue Type: Bug Reporter: Jonathan Coveney Assignee: Jonathan Coveney Fix For: 0.12 Attachments: PIG-3014-0.patch, PIG-3014-1.patch, PIG-3014-2.patch As part of the explanation of the new DateTime datatype I noticed that we had added a CurrentTime() UDF. The issue with this UDF is that it returns the current time _of every exec invocation_, which can lead to confusing results. In PIG-1431 I proposed a way such that every instance of the same NOW() will return the same time, which I think is better. Would enjoy thoughts. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira