[jira] [Commented] (PIG-3497) JobControlCompiler should only do reducer estimation when the job has a reduce phase
[ https://issues.apache.org/jira/browse/PIG-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13785951#comment-13785951 ] Daniel Dai commented on PIG-3497: - +1 for 0.12. The change is simple enough and not likely to break anything. > JobControlCompiler should only do reducer estimation when the job has a > reduce phase > > > Key: PIG-3497 > URL: https://issues.apache.org/jira/browse/PIG-3497 > Project: Pig > Issue Type: Bug >Reporter: Akihiro Matsukawa >Assignee: Akihiro Matsukawa >Priority: Minor > Attachments: reducer_estimation.patch > > > Currently, JobControlCompiler makes an estimation for the number of reducers > required (by default based on input size into mappers) regardless of whether > there is a reduce phase in the job. This is unnecessary, especially when > running more complex custom reducer estimators. > Change to only estimate reducers when necessary. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (PIG-3483) Document ASSERT keyword
[ https://issues.apache.org/jira/browse/PIG-3483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13785940#comment-13785940 ] Cheolsoo Park commented on PIG-3483: +1. Thank you for taking care of this! Just found a typo in your example. The relation A doesn't have a0 field. Do you mind fixing it when committing the patch? {code} Suppose we have relation A. A = LOAD 'data' AS (a1:int,a2:int,a3:int); ... Now, you can assert that a0 column in your data is >0, fail if otherwise ASSERT A by a0 > 0 'a0 should be greater than 0'; {code} > Document ASSERT keyword > --- > > Key: PIG-3483 > URL: https://issues.apache.org/jira/browse/PIG-3483 > Project: Pig > Issue Type: Task > Components: documentation >Affects Versions: 0.12.0 >Reporter: Cheolsoo Park >Assignee: Aniket Mokashi > Fix For: 0.12.0 > > Attachments: PIG-3483.patch > > > PIG-3367 added a new keyword ASSERT, so we need to document it. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (PIG-3483) Document ASSERT keyword
[ https://issues.apache.org/jira/browse/PIG-3483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-3483: Status: Patch Available (was: Open) > Document ASSERT keyword > --- > > Key: PIG-3483 > URL: https://issues.apache.org/jira/browse/PIG-3483 > Project: Pig > Issue Type: Task > Components: documentation >Affects Versions: 0.12.0 >Reporter: Cheolsoo Park >Assignee: Aniket Mokashi > Fix For: 0.12.0 > > Attachments: PIG-3483.patch > > > PIG-3367 added a new keyword ASSERT, so we need to document it. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (PIG-3483) Document ASSERT keyword
[ https://issues.apache.org/jira/browse/PIG-3483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-3483: Attachment: PIG-3483.patch > Document ASSERT keyword > --- > > Key: PIG-3483 > URL: https://issues.apache.org/jira/browse/PIG-3483 > Project: Pig > Issue Type: Task > Components: documentation >Affects Versions: 0.12.0 >Reporter: Cheolsoo Park >Assignee: Aniket Mokashi > Fix For: 0.12.0 > > Attachments: PIG-3483.patch > > > PIG-3367 added a new keyword ASSERT, so we need to document it. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (PIG-3483) Document ASSERT keyword
[ https://issues.apache.org/jira/browse/PIG-3483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13785932#comment-13785932 ] Aniket Mokashi commented on PIG-3483: - [~cheolsoo], can you please review this patch? > Document ASSERT keyword > --- > > Key: PIG-3483 > URL: https://issues.apache.org/jira/browse/PIG-3483 > Project: Pig > Issue Type: Task > Components: documentation >Affects Versions: 0.12.0 >Reporter: Cheolsoo Park >Assignee: Aniket Mokashi > Fix For: 0.12.0 > > Attachments: PIG-3483.patch > > > PIG-3367 added a new keyword ASSERT, so we need to document it. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (PIG-3497) JobControlCompiler should only do reducer estimation when the job has a reduce phase
[ https://issues.apache.org/jira/browse/PIG-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13785876#comment-13785876 ] Aniket Mokashi commented on PIG-3497: - +1. Committed to trunk. [~daijy], should we also commit this to 0.12? > JobControlCompiler should only do reducer estimation when the job has a > reduce phase > > > Key: PIG-3497 > URL: https://issues.apache.org/jira/browse/PIG-3497 > Project: Pig > Issue Type: Bug >Reporter: Akihiro Matsukawa >Assignee: Akihiro Matsukawa >Priority: Minor > Attachments: reducer_estimation.patch > > > Currently, JobControlCompiler makes an estimation for the number of reducers > required (by default based on input size into mappers) regardless of whether > there is a reduce phase in the job. This is unnecessary, especially when > running more complex custom reducer estimators. > Change to only estimate reducers when necessary. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (PIG-3494) Several fixes for e2e tests
[ https://issues.apache.org/jira/browse/PIG-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13785825#comment-13785825 ] Hudson commented on PIG-3494: - SUCCESS: Integrated in Hive-trunk-hadoop1-ptest #189 (See [https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/189/]) PIG-3494: Several fixes for e2e tests (daijy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1528712) * /pig/trunk/test/e2e/harness/test_harness.pl * /pig/trunk/test/e2e/pig/conf/default.conf * /pig/trunk/test/e2e/pig/conf/rpm.conf * /pig/trunk/test/e2e/pig/drivers/TestDriverPig.pm * /pig/trunk/test/e2e/pig/drivers/TestDriverScript.pm * /pig/trunk/test/e2e/pig/tests/negative.conf * /pig/trunk/test/e2e/pig/tests/nightly.conf > Several fixes for e2e tests > --- > > Key: PIG-3494 > URL: https://issues.apache.org/jira/browse/PIG-3494 > Project: Pig > Issue Type: Bug > Components: e2e harness >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.12.0 > > Attachments: PIG-3494-1.patch > > > Address several issues in e2e tests: > 1. Adding the capacity to test Pig installed by rpm (also involves > configurable piggybank.jar) > 2. Remove hadoop23.res since it is no longer needed > 3. Remove hadoop2 specific error message "UdfException_[1-4]" since they are > fixed by PIG-3360 -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (PIG-3449) Move JobCreationException to org.apache.pig.backend.hadoop.executionengine
[ https://issues.apache.org/jira/browse/PIG-3449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park updated PIG-3449: --- Attachment: PIG-3446-2.patch Fix compilation error... > Move JobCreationException to org.apache.pig.backend.hadoop.executionengine > -- > > Key: PIG-3449 > URL: https://issues.apache.org/jira/browse/PIG-3449 > Project: Pig > Issue Type: Sub-task > Components: tez >Affects Versions: tez-branch >Reporter: Cheolsoo Park >Assignee: Cheolsoo Park > Fix For: tez-branch > > Attachments: PIG-3446-1.patch, PIG-3446-2.patch > > > JobCreationException is not MR-specific, so it should be moved from > {{org.apache.pig.backend.hadoop.executionengine.mapReduceLayer}} to > {{org.apache.pig.backend.hadoop.executionengine}}. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (PIG-3108) HBaseStorage returns empty maps when mixing wildcard- with other columns
[ https://issues.apache.org/jira/browse/PIG-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13785792#comment-13785792 ] Harsh J commented on PIG-3108: -- Moved from Release Notes to comments: bq. Tested and committed. Thanks for the patch Christoph and sorry for the delay! > HBaseStorage returns empty maps when mixing wildcard- with other columns > > > Key: PIG-3108 > URL: https://issues.apache.org/jira/browse/PIG-3108 > Project: Pig > Issue Type: Bug >Affects Versions: 0.9.0, 0.9.1, 0.9.2, 0.10.0, 0.11, 0.10.1, 0.12.0 >Reporter: Christoph Bauer >Assignee: Christoph Bauer > Fix For: 0.12.0 > > Attachments: PIG-3108.patch, PIG-3108.patch > > > Consider the following: > A and B should be the same (with different order, of course). > {code} > /* > in hbase shell: > create 'pigtest', 'pig' > put 'pigtest' , '1', 'pig:name', 'A' > put 'pigtest' , '1', 'pig:has_legs', 'true' > put 'pigtest' , '1', 'pig:has_ribs', 'true' > */ > A = LOAD 'hbase://pigtest' USING > org.apache.pig.backend.hadoop.hbase.HBaseStorage('pig:name pig:has*') AS > (name:chararray,parts); > B = LOAD 'hbase://pigtest' USING > org.apache.pig.backend.hadoop.hbase.HBaseStorage('pig:has* pig:name') AS > (parts,name:chararray); > dump A; > dump B; > {code} > This is due to a bug in setLocation and initScan. > For _A_ > # scan.addColumn(pig,name); // for 'pig:name' > # scan.addFamily(pig); // for the 'pig:has*' > So that's silently right. > But for _B_ > # scan.addFamily(pig) > # scan.addColumn(pig,name) > will override the first call to addFamily, because you cannot mix them on the > same family. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (PIG-3108) HBaseStorage returns empty maps when mixing wildcard- with other columns
[ https://issues.apache.org/jira/browse/PIG-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated PIG-3108: - Release Note: (was: Tested and committed. Thanks for the patch Christoph and sorry for the delay!) > HBaseStorage returns empty maps when mixing wildcard- with other columns > > > Key: PIG-3108 > URL: https://issues.apache.org/jira/browse/PIG-3108 > Project: Pig > Issue Type: Bug >Affects Versions: 0.9.0, 0.9.1, 0.9.2, 0.10.0, 0.11, 0.10.1, 0.12.0 >Reporter: Christoph Bauer >Assignee: Christoph Bauer > Fix For: 0.12.0 > > Attachments: PIG-3108.patch, PIG-3108.patch > > > Consider the following: > A and B should be the same (with different order, of course). > {code} > /* > in hbase shell: > create 'pigtest', 'pig' > put 'pigtest' , '1', 'pig:name', 'A' > put 'pigtest' , '1', 'pig:has_legs', 'true' > put 'pigtest' , '1', 'pig:has_ribs', 'true' > */ > A = LOAD 'hbase://pigtest' USING > org.apache.pig.backend.hadoop.hbase.HBaseStorage('pig:name pig:has*') AS > (name:chararray,parts); > B = LOAD 'hbase://pigtest' USING > org.apache.pig.backend.hadoop.hbase.HBaseStorage('pig:has* pig:name') AS > (parts,name:chararray); > dump A; > dump B; > {code} > This is due to a bug in setLocation and initScan. > For _A_ > # scan.addColumn(pig,name); // for 'pig:name' > # scan.addFamily(pig); // for the 'pig:has*' > So that's silently right. > But for _B_ > # scan.addFamily(pig) > # scan.addColumn(pig,name) > will override the first call to addFamily, because you cannot mix them on the > same family. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (PIG-3445) Make Parquet format available out of the box in Pig
[ https://issues.apache.org/jira/browse/PIG-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13785785#comment-13785785 ] Daniel Dai commented on PIG-3445: - Great, thanks! > Make Parquet format available out of the box in Pig > --- > > Key: PIG-3445 > URL: https://issues.apache.org/jira/browse/PIG-3445 > Project: Pig > Issue Type: Improvement >Reporter: Julien Le Dem > Fix For: 0.12.0 > > Attachments: PIG-3445-2.patch, PIG-3445-3.patch, PIG-3445-4.patch, > PIG-3445.patch > > > We would add the Parquet jar in the Pig packages to make it available out of > the box to pig users. > On top of that we could add the parquet.pig package to the list of packages > to search for UDFs. (alternatively, the parquet jar could contain classes > name or.apache.pig.builtin.ParquetLoader and ParquetStorer) > This way users can use Parquet simply by typing: > A = LOAD 'foo' USING ParquetLoader(); > STORE A INTO 'bar' USING ParquetStorer(); -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (PIG-3446) Umbrella jira for Pig on Tez
[ https://issues.apache.org/jira/browse/PIG-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park updated PIG-3446: --- Description: This is a umbrella jira for Pig on Tez. More detailed subtasks will be added. More information can be found on the following wiki page: https://cwiki.apache.org/confluence/display/PIG/Pig+on+Tez To build tez-branch, you need to install tez jars on local maven repo first. Please checkout Apache Tez repo and run mvn install. was: This is a umbrella jira for Pig on Tez. More detailed subtasks will be added. More information can be found on the following wiki page: https://cwiki.apache.org/confluence/display/PIG/Pig+on+Tez > Umbrella jira for Pig on Tez > > > Key: PIG-3446 > URL: https://issues.apache.org/jira/browse/PIG-3446 > Project: Pig > Issue Type: New Feature > Components: tez >Affects Versions: tez-branch >Reporter: Cheolsoo Park >Assignee: Cheolsoo Park > Fix For: tez-branch > > > This is a umbrella jira for Pig on Tez. More detailed subtasks will be added. > More information can be found on the following wiki page: > https://cwiki.apache.org/confluence/display/PIG/Pig+on+Tez > To build tez-branch, you need to install tez jars on local maven repo first. > Please checkout Apache Tez repo and run mvn install. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] Subscription: PIG patch available
Issue Subscription Filter: PIG patch available (12 issues) Subscriber: pigdaily Key Summary PIG-3497JobControlCompiler should only do reducer estimation when the job has a reduce phase https://issues.apache.org/jira/browse/PIG-3497 PIG-3496Propagate HBase 0.95 jars to the backend https://issues.apache.org/jira/browse/PIG-3496 PIG-3451EvalFunc ctor reflection to determine value of type param T is brittle https://issues.apache.org/jira/browse/PIG-3451 PIG-3449Move JobCreationException to org.apache.pig.backend.hadoop.executionengine https://issues.apache.org/jira/browse/PIG-3449 PIG-3441Allow Pig to use default resources from Configuration objects https://issues.apache.org/jira/browse/PIG-3441 PIG-3388No support for Regex for row filter in org.apache.pig.backend.hadoop.hbase.HBaseStorage https://issues.apache.org/jira/browse/PIG-3388 PIG-3347Store invocation in local mode brings sire effect https://issues.apache.org/jira/browse/PIG-3347 PIG-3325Adding a tuple to a bag is slow https://issues.apache.org/jira/browse/PIG-3325 PIG-3257Add unique identifier UDF https://issues.apache.org/jira/browse/PIG-3257 PIG-3117A debug mode in which pig does not delete temporary files https://issues.apache.org/jira/browse/PIG-3117 PIG-3088Add a builtin udf which removes prefixes https://issues.apache.org/jira/browse/PIG-3088 PIG-3021Split results missing records when there is null values in the column comparison https://issues.apache.org/jira/browse/PIG-3021 You may edit this subscription at: https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=13225&filterId=12322384
[jira] [Created] (PIG-3502) Initial implementation of TezLauncher
Cheolsoo Park created PIG-3502: -- Summary: Initial implementation of TezLauncher Key: PIG-3502 URL: https://issues.apache.org/jira/browse/PIG-3502 Project: Pig Issue Type: Sub-task Components: tez Affects Versions: tez-branch Reporter: Cheolsoo Park Assignee: Cheolsoo Park Fix For: tez-branch Once Tez DAG is built, TezLauncher submits it to Tez cluster using TezClient API. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (PIG-3501) Initial implementation of TezJobControlCompiler
[ https://issues.apache.org/jira/browse/PIG-3501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13785759#comment-13785759 ] Cheolsoo Park commented on PIG-3501: Note the patch should be applied after PIG-3500 (TezCompiler). > Initial implementation of TezJobControlCompiler > --- > > Key: PIG-3501 > URL: https://issues.apache.org/jira/browse/PIG-3501 > Project: Pig > Issue Type: Sub-task > Components: tez >Affects Versions: tez-branch >Reporter: Cheolsoo Park >Assignee: Cheolsoo Park > Fix For: tez-branch > > Attachments: PIG-3501-1.patch > > > TezJobControlCompiler converts tez plan into tez DAG. Once tez DAG is built, > it is wrapped by JobControl before being submitted by TezLauncher. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (PIG-3501) Initial implementation of TezJobControlCompiler
[ https://issues.apache.org/jira/browse/PIG-3501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park updated PIG-3501: --- Attachment: PIG-3501-1.patch Attached includes an initial version of TezJobControlCompiler with unit tests. Tez DAG is built by TezDagBuilder which is an extension of TezOpPlanVisitor. The unit tests can run with ant clean test -Dtestcase=TestTezJobControlCompiler. > Initial implementation of TezJobControlCompiler > --- > > Key: PIG-3501 > URL: https://issues.apache.org/jira/browse/PIG-3501 > Project: Pig > Issue Type: Sub-task > Components: tez >Affects Versions: tez-branch >Reporter: Cheolsoo Park >Assignee: Cheolsoo Park > Fix For: tez-branch > > Attachments: PIG-3501-1.patch > > > TezJobControlCompiler converts tez plan into tez DAG. Once tez DAG is built, > it is wrapped by JobControl before being submitted by TezLauncher. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (PIG-3501) Initial implementation of TezJobControlCompiler
Cheolsoo Park created PIG-3501: -- Summary: Initial implementation of TezJobControlCompiler Key: PIG-3501 URL: https://issues.apache.org/jira/browse/PIG-3501 Project: Pig Issue Type: Sub-task Components: tez Affects Versions: tez-branch Reporter: Cheolsoo Park Assignee: Cheolsoo Park Fix For: tez-branch TezJobControlCompiler builds tez plan into tez DAG. Once tez DAG is built, it is wrapped by JobControl before being submitted by TezLauncher. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (PIG-3501) Initial implementation of TezJobControlCompiler
[ https://issues.apache.org/jira/browse/PIG-3501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park updated PIG-3501: --- Description: TezJobControlCompiler converts tez plan into tez DAG. Once tez DAG is built, it is wrapped by JobControl before being submitted by TezLauncher. (was: TezJobControlCompiler builds tez plan into tez DAG. Once tez DAG is built, it is wrapped by JobControl before being submitted by TezLauncher.) > Initial implementation of TezJobControlCompiler > --- > > Key: PIG-3501 > URL: https://issues.apache.org/jira/browse/PIG-3501 > Project: Pig > Issue Type: Sub-task > Components: tez >Affects Versions: tez-branch >Reporter: Cheolsoo Park >Assignee: Cheolsoo Park > Fix For: tez-branch > > > TezJobControlCompiler converts tez plan into tez DAG. Once tez DAG is built, > it is wrapped by JobControl before being submitted by TezLauncher. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (PIG-3445) Make Parquet format available out of the box in Pig
[ https://issues.apache.org/jira/browse/PIG-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13785742#comment-13785742 ] Julien Le Dem commented on PIG-3445: I just released parquet-pig-bundle-1.2.3 this should show up in maven central overnight > Make Parquet format available out of the box in Pig > --- > > Key: PIG-3445 > URL: https://issues.apache.org/jira/browse/PIG-3445 > Project: Pig > Issue Type: Improvement >Reporter: Julien Le Dem > Fix For: 0.12.0 > > Attachments: PIG-3445-2.patch, PIG-3445-3.patch, PIG-3445-4.patch, > PIG-3445.patch > > > We would add the Parquet jar in the Pig packages to make it available out of > the box to pig users. > On top of that we could add the parquet.pig package to the list of packages > to search for UDFs. (alternatively, the parquet jar could contain classes > name or.apache.pig.builtin.ParquetLoader and ParquetStorer) > This way users can use Parquet simply by typing: > A = LOAD 'foo' USING ParquetLoader(); > STORE A INTO 'bar' USING ParquetStorer(); -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (PIG-3499) Pig job fails when run in local mode with namenode HA(QJM)
[ https://issues.apache.org/jira/browse/PIG-3499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13785735#comment-13785735 ] venkata kamalnath commented on PIG-3499: yes I am running pig job in local mode using pig -x local test.pig . I am testing pig jobs with hadoop environment setup in HA mode. Steps to reproduce: Set up hadoop HA namenode setup and run pig jobs in using -x localmode. > Pig job fails when run in local mode with namenode HA(QJM) > --- > > Key: PIG-3499 > URL: https://issues.apache.org/jira/browse/PIG-3499 > Project: Pig > Issue Type: Bug > Components: grunt, parser, tools >Affects Versions: 0.10.0 >Reporter: venkata kamalnath > > when we run a pig script with namenode HA(QJM) we always get unknown host > exception. The nameserviceID is being considered as host and pig job giving > unknown host exception. > I am working on this fix but want community to validate whether any bug > reported similar to this. If not I will provide the fix as soon as possible. > The pig script is as below: > testTable = LOAD 'hdfs://kdvenkata/user/kd/test.csv' > USING PigStorage(',') > AS (col1:chararray, col2:chararray, col3:int); > STORE testTable into '/tmp/test_pig_output'; > Exception: > Caused by: java.lang.IllegalArgumentException: java.net.UnknownHostException: > kdvenkata > at > org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:417) > at > org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:164) > at > org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:129) > at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:412) > at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:379) > at > org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:123) > at > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2278) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:86) > at > org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2312) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2294) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:317) > at > org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:70) > at > org.apache.pig.backend.hadoop.datastorage.HDataStorage.(HDataStorage.java:53) > at > org.apache.pig.builtin.JsonMetadata.findMetaFile(JsonMetadata.java:106) > at > org.apache.pig.builtin.JsonMetadata.getSchema(JsonMetadata.java:188) > at org.apache.pig.builtin.PigStorage.getSchema(PigStorage.java:465) > at > org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:151) > at > org.apache.pig.newplan.logical.relational.LOLoad.getSchema(LOLoad.java:110) > at > org.apache.pig.newplan.logical.relational.LOStore.getSchema(LOStore.java:68) > at > org.apache.pig.newplan.logical.visitor.SchemaAliasVisitor.validate(SchemaAliasVisitor.java:60) > at > org.apache.pig.newplan.logical.visitor.SchemaAliasVisitor.visit(SchemaAliasVisitor.java:84) > at > org.apache.pig.newplan.logical.relational.LOStore.accept(LOStore.java:77) > at > org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) > at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50) > at org.apache.pig.PigServer$Graph.compile(PigServer.java:1626) > at org.apache.pig.PigServer$Graph.compile(PigServer.java:1620) > at org.apache.pig.PigServer$Graph.access$200(PigServer.java:1343) > at org.apache.pig.PigServer.storeEx(PigServer.java:960) > at org.apache.pig.PigServer.store(PigServer.java:928) > at org.apache.pig.PigServer.openIterator(PigServer.java:841) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Comment Edited] (PIG-3500) Initial implementation of TezCompiler
[ https://issues.apache.org/jira/browse/PIG-3500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13785730#comment-13785730 ] Cheolsoo Park edited comment on PIG-3500 at 10/4/13 12:29 AM: -- Attached includes an initial version of TezCompiler with unit tests. Note that query #3 is compiled into 3 Tez vertices (two input vertices and one join vertex) unlike MR plan. The unit test can run with ant test clean -Dtestcase=TestTezCompiler. was (Author: cheolsoo): Attached includes an initial version of TezCompiler with unit tests. Note that it query #3 is compiled into 3 Tez vertices (two input vertices and one join vertex) unlike MR plan. The unit test can run with ant test clean -Dtestcase=TestTezCompiler. > Initial implementation of TezCompiler > - > > Key: PIG-3500 > URL: https://issues.apache.org/jira/browse/PIG-3500 > Project: Pig > Issue Type: Sub-task > Components: tez >Affects Versions: tez-branch >Reporter: Cheolsoo Park >Assignee: Cheolsoo Park > Fix For: tez-branch > > Attachments: PIG-3500-1.patch > > > Implement TezCompiler that compiles physical plan into tez plan. To begin > with, we can implement the initial version that works for basic queries as > follows: > # Load-Filter-Store > {code} > a = load 'file:///tmp/input' as (x:int, y:int); > b = filter a by x > 0; > c = foreach b generate y; > store c into 'file:///tmp/output'; > {code} > # Load-Filter-GroupBy-Store > {code} > a = load 'file:///tmp/input' as (x:int, y:int); > b = group a by x; > c = foreach b generate group, a; > store c into 'file:///tmp/output'; > {code} > # Load1-Load2-Join-Store > {code} > a = load 'file:///tmp/input1' as (x:int, y:int); > b = load 'file:///tmp/input2' as (x:int, z:int); > c = join a by x, b by x; > d = foreach c generate a::x as x, y, z; > store d into 'file:///tmp/output'; > {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (PIG-3500) Initial implementation of TezCompiler
[ https://issues.apache.org/jira/browse/PIG-3500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park updated PIG-3500: --- Attachment: PIG-3500-1.patch Attached includes an initial version of TezCompiler with unit tests. Note that it query #3 is compiled into 3 Tez vertices (two input vertices and one join vertex) unlike MR plan. The unit test can run with ant test clean -Dtestcase=TestTezCompiler. > Initial implementation of TezCompiler > - > > Key: PIG-3500 > URL: https://issues.apache.org/jira/browse/PIG-3500 > Project: Pig > Issue Type: Sub-task > Components: tez >Affects Versions: tez-branch >Reporter: Cheolsoo Park >Assignee: Cheolsoo Park > Fix For: tez-branch > > Attachments: PIG-3500-1.patch > > > Implement TezCompiler that compiles physical plan into tez plan. To begin > with, we can implement the initial version that works for basic queries as > follows: > # Load-Filter-Store > {code} > a = load 'file:///tmp/input' as (x:int, y:int); > b = filter a by x > 0; > c = foreach b generate y; > store c into 'file:///tmp/output'; > {code} > # Load-Filter-GroupBy-Store > {code} > a = load 'file:///tmp/input' as (x:int, y:int); > b = group a by x; > c = foreach b generate group, a; > store c into 'file:///tmp/output'; > {code} > # Load1-Load2-Join-Store > {code} > a = load 'file:///tmp/input1' as (x:int, y:int); > b = load 'file:///tmp/input2' as (x:int, z:int); > c = join a by x, b by x; > d = foreach c generate a::x as x, y, z; > store d into 'file:///tmp/output'; > {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (PIG-3500) Initial implementation of TezCompiler
Cheolsoo Park created PIG-3500: -- Summary: Initial implementation of TezCompiler Key: PIG-3500 URL: https://issues.apache.org/jira/browse/PIG-3500 Project: Pig Issue Type: Sub-task Components: tez Affects Versions: tez-branch Reporter: Cheolsoo Park Assignee: Cheolsoo Park Fix For: tez-branch Implement TezCompiler that compiles physical plan into tez plan. To begin with, we can implement the initial version that works for basic queries as follows: # Load-Filter-Store {code} a = load 'file:///tmp/input' as (x:int, y:int); b = filter a by x > 0; c = foreach b generate y; store c into 'file:///tmp/output'; {code} # Load-Filter-GroupBy-Store {code} a = load 'file:///tmp/input' as (x:int, y:int); b = group a by x; c = foreach b generate group, a; store c into 'file:///tmp/output'; {code} # Load1-Load2-Join-Store {code} a = load 'file:///tmp/input1' as (x:int, y:int); b = load 'file:///tmp/input2' as (x:int, z:int); c = join a by x, b by x; d = foreach c generate a::x as x, y, z; store d into 'file:///tmp/output'; {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (PIG-3499) Pig job fails when run in local mode with namenode HA(QJM)
[ https://issues.apache.org/jira/browse/PIG-3499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13785702#comment-13785702 ] Prashant Kommireddi commented on PIG-3499: -- [~kdvenkata] - why is local mode using HA? Are you sure you are running Pig's local mode? > Pig job fails when run in local mode with namenode HA(QJM) > --- > > Key: PIG-3499 > URL: https://issues.apache.org/jira/browse/PIG-3499 > Project: Pig > Issue Type: Bug > Components: grunt, parser, tools >Affects Versions: 0.10.0 >Reporter: venkata kamalnath > > when we run a pig script with namenode HA(QJM) we always get unknown host > exception. The nameserviceID is being considered as host and pig job giving > unknown host exception. > I am working on this fix but want community to validate whether any bug > reported similar to this. If not I will provide the fix as soon as possible. > The pig script is as below: > testTable = LOAD 'hdfs://kdvenkata/user/kd/test.csv' > USING PigStorage(',') > AS (col1:chararray, col2:chararray, col3:int); > STORE testTable into '/tmp/test_pig_output'; > Exception: > Caused by: java.lang.IllegalArgumentException: java.net.UnknownHostException: > kdvenkata > at > org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:417) > at > org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:164) > at > org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:129) > at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:412) > at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:379) > at > org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:123) > at > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2278) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:86) > at > org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2312) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2294) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:317) > at > org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:70) > at > org.apache.pig.backend.hadoop.datastorage.HDataStorage.(HDataStorage.java:53) > at > org.apache.pig.builtin.JsonMetadata.findMetaFile(JsonMetadata.java:106) > at > org.apache.pig.builtin.JsonMetadata.getSchema(JsonMetadata.java:188) > at org.apache.pig.builtin.PigStorage.getSchema(PigStorage.java:465) > at > org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:151) > at > org.apache.pig.newplan.logical.relational.LOLoad.getSchema(LOLoad.java:110) > at > org.apache.pig.newplan.logical.relational.LOStore.getSchema(LOStore.java:68) > at > org.apache.pig.newplan.logical.visitor.SchemaAliasVisitor.validate(SchemaAliasVisitor.java:60) > at > org.apache.pig.newplan.logical.visitor.SchemaAliasVisitor.visit(SchemaAliasVisitor.java:84) > at > org.apache.pig.newplan.logical.relational.LOStore.accept(LOStore.java:77) > at > org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) > at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50) > at org.apache.pig.PigServer$Graph.compile(PigServer.java:1626) > at org.apache.pig.PigServer$Graph.compile(PigServer.java:1620) > at org.apache.pig.PigServer$Graph.access$200(PigServer.java:1343) > at org.apache.pig.PigServer.storeEx(PigServer.java:960) > at org.apache.pig.PigServer.store(PigServer.java:928) > at org.apache.pig.PigServer.openIterator(PigServer.java:841) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (PIG-3499) Pig job fails when run in local mode with namenode HA(QJM)
venkata kamalnath created PIG-3499: -- Summary: Pig job fails when run in local mode with namenode HA(QJM) Key: PIG-3499 URL: https://issues.apache.org/jira/browse/PIG-3499 Project: Pig Issue Type: Bug Components: grunt, parser, tools Affects Versions: 0.10.0 Reporter: venkata kamalnath when we run a pig script with namenode HA(QJM) we always get unknown host exception. The nameserviceID is being considered as host and pig job giving unknown host exception. I am working on this fix but want community to validate whether any bug reported similar to this. If not I will provide the fix as soon as possible. The pig script is as below: testTable = LOAD 'hdfs://kdvenkata/user/kd/test.csv' USING PigStorage(',') AS (col1:chararray, col2:chararray, col3:int); STORE testTable into '/tmp/test_pig_output'; Exception: Caused by: java.lang.IllegalArgumentException: java.net.UnknownHostException: kdvenkata at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:417) at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:164) at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:129) at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:412) at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:379) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:123) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2278) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:86) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2312) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2294) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:317) at org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:70) at org.apache.pig.backend.hadoop.datastorage.HDataStorage.(HDataStorage.java:53) at org.apache.pig.builtin.JsonMetadata.findMetaFile(JsonMetadata.java:106) at org.apache.pig.builtin.JsonMetadata.getSchema(JsonMetadata.java:188) at org.apache.pig.builtin.PigStorage.getSchema(PigStorage.java:465) at org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:151) at org.apache.pig.newplan.logical.relational.LOLoad.getSchema(LOLoad.java:110) at org.apache.pig.newplan.logical.relational.LOStore.getSchema(LOStore.java:68) at org.apache.pig.newplan.logical.visitor.SchemaAliasVisitor.validate(SchemaAliasVisitor.java:60) at org.apache.pig.newplan.logical.visitor.SchemaAliasVisitor.visit(SchemaAliasVisitor.java:84) at org.apache.pig.newplan.logical.relational.LOStore.accept(LOStore.java:77) at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50) at org.apache.pig.PigServer$Graph.compile(PigServer.java:1626) at org.apache.pig.PigServer$Graph.compile(PigServer.java:1620) at org.apache.pig.PigServer$Graph.access$200(PigServer.java:1343) at org.apache.pig.PigServer.storeEx(PigServer.java:960) at org.apache.pig.PigServer.store(PigServer.java:928) at org.apache.pig.PigServer.openIterator(PigServer.java:841) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (PIG-3498) Make pig binary work on both HBase version 0.94 and 0.95
Jarek Jarcec Cecho created PIG-3498: --- Summary: Make pig binary work on both HBase version 0.94 and 0.95 Key: PIG-3498 URL: https://issues.apache.org/jira/browse/PIG-3498 Project: Pig Issue Type: Task Affects Versions: 0.11 Reporter: Jarek Jarcec Cecho HBase 0.95+ support has been added via PIG-3390. Whereas pig can be compiled against both 0.94 and 0.95, due to binary incompatibilities inside HBase, pig compiled against HBase 0.95 can't be used against 0.94 and vice versa. One of the issue we are facing is HBase class {{RowFilter}}, that changed constructor between the two HBase releases: * HBase 0.94 {{RowFilter(CompareOp, WritableByteArrayComparable)}} * HBase 0.95 {{RowFilter(CompareO, ByteArrayComparable)}} We are using children of the classes used in second parameter and therefore the same code compiles against both HBase versions. However as the entire constructor signature is saved into compiled Java class, generated binaries are compatible with only one HBase version. As we're releasing only one pig binary, it would be useful to make Pig compatible with both versions at the same time. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (PIG-3494) Several fixes for e2e tests
[ https://issues.apache.org/jira/browse/PIG-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13785655#comment-13785655 ] Hudson commented on PIG-3494: - FAILURE: Integrated in Hive-trunk-hadoop2-ptest #123 (See [https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/123/]) PIG-3494: Several fixes for e2e tests (daijy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1528712) * /pig/trunk/test/e2e/harness/test_harness.pl * /pig/trunk/test/e2e/pig/conf/default.conf * /pig/trunk/test/e2e/pig/conf/rpm.conf * /pig/trunk/test/e2e/pig/drivers/TestDriverPig.pm * /pig/trunk/test/e2e/pig/drivers/TestDriverScript.pm * /pig/trunk/test/e2e/pig/tests/negative.conf * /pig/trunk/test/e2e/pig/tests/nightly.conf > Several fixes for e2e tests > --- > > Key: PIG-3494 > URL: https://issues.apache.org/jira/browse/PIG-3494 > Project: Pig > Issue Type: Bug > Components: e2e harness >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.12.0 > > Attachments: PIG-3494-1.patch > > > Address several issues in e2e tests: > 1. Adding the capacity to test Pig installed by rpm (also involves > configurable piggybank.jar) > 2. Remove hadoop23.res since it is no longer needed > 3. Remove hadoop2 specific error message "UdfException_[1-4]" since they are > fixed by PIG-3360 -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (PIG-3445) Make Parquet format available out of the box in Pig
[ https://issues.apache.org/jira/browse/PIG-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13785616#comment-13785616 ] Daniel Dai commented on PIG-3445: - Hi, [~julienledem], I am trying to roll a Pig 0.12.0 RC tomorrow, can we get it done by then? > Make Parquet format available out of the box in Pig > --- > > Key: PIG-3445 > URL: https://issues.apache.org/jira/browse/PIG-3445 > Project: Pig > Issue Type: Improvement >Reporter: Julien Le Dem > Fix For: 0.12.0 > > Attachments: PIG-3445-2.patch, PIG-3445-3.patch, PIG-3445-4.patch, > PIG-3445.patch > > > We would add the Parquet jar in the Pig packages to make it available out of > the box to pig users. > On top of that we could add the parquet.pig package to the list of packages > to search for UDFs. (alternatively, the parquet jar could contain classes > name or.apache.pig.builtin.ParquetLoader and ParquetStorer) > This way users can use Parquet simply by typing: > A = LOAD 'foo' USING ParquetLoader(); > STORE A INTO 'bar' USING ParquetStorer(); -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (PIG-3445) Make Parquet format available out of the box in Pig
[ https://issues.apache.org/jira/browse/PIG-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13785610#comment-13785610 ] Julien Le Dem commented on PIG-3445: We merged the PR for parquet-pig-bundle I'm making a release so that this can be merge in pig 0.12 > Make Parquet format available out of the box in Pig > --- > > Key: PIG-3445 > URL: https://issues.apache.org/jira/browse/PIG-3445 > Project: Pig > Issue Type: Improvement >Reporter: Julien Le Dem > Fix For: 0.12.0 > > Attachments: PIG-3445-2.patch, PIG-3445-3.patch, PIG-3445-4.patch, > PIG-3445.patch > > > We would add the Parquet jar in the Pig packages to make it available out of > the box to pig users. > On top of that we could add the parquet.pig package to the list of packages > to search for UDFs. (alternatively, the parquet jar could contain classes > name or.apache.pig.builtin.ParquetLoader and ParquetStorer) > This way users can use Parquet simply by typing: > A = LOAD 'foo' USING ParquetLoader(); > STORE A INTO 'bar' USING ParquetStorer(); -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (PIG-3497) JobControlCompiler should only do reducer estimation when the job has a reduce phase
[ https://issues.apache.org/jira/browse/PIG-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bill Graham updated PIG-3497: - Assignee: Akihiro Matsukawa > JobControlCompiler should only do reducer estimation when the job has a > reduce phase > > > Key: PIG-3497 > URL: https://issues.apache.org/jira/browse/PIG-3497 > Project: Pig > Issue Type: Bug >Reporter: Akihiro Matsukawa >Assignee: Akihiro Matsukawa >Priority: Minor > Attachments: reducer_estimation.patch > > > Currently, JobControlCompiler makes an estimation for the number of reducers > required (by default based on input size into mappers) regardless of whether > there is a reduce phase in the job. This is unnecessary, especially when > running more complex custom reducer estimators. > Change to only estimate reducers when necessary. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (PIG-3497) JobControlCompiler should only do reducer estimation when the job has a reduce phase
[ https://issues.apache.org/jira/browse/PIG-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akihiro Matsukawa updated PIG-3497: --- Status: Patch Available (was: Open) > JobControlCompiler should only do reducer estimation when the job has a > reduce phase > > > Key: PIG-3497 > URL: https://issues.apache.org/jira/browse/PIG-3497 > Project: Pig > Issue Type: Bug >Reporter: Akihiro Matsukawa >Priority: Minor > Attachments: reducer_estimation.patch > > > Currently, JobControlCompiler makes an estimation for the number of reducers > required (by default based on input size into mappers) regardless of whether > there is a reduce phase in the job. This is unnecessary, especially when > running more complex custom reducer estimators. > Change to only estimate reducers when necessary. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (PIG-3497) JobControlCompiler should only do reducer estimation when the job has a reduce phase
[ https://issues.apache.org/jira/browse/PIG-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akihiro Matsukawa updated PIG-3497: --- Attachment: reducer_estimation.patch > JobControlCompiler should only do reducer estimation when the job has a > reduce phase > > > Key: PIG-3497 > URL: https://issues.apache.org/jira/browse/PIG-3497 > Project: Pig > Issue Type: Bug >Reporter: Akihiro Matsukawa >Priority: Minor > Attachments: reducer_estimation.patch > > > Currently, JobControlCompiler makes an estimation for the number of reducers > required (by default based on input size into mappers) regardless of whether > there is a reduce phase in the job. This is unnecessary, especially when > running more complex custom reducer estimators. > Change to only estimate reducers when necessary. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (PIG-3497) JobControlCompiler should only do reducer estimation when the job has a reduce phase
Akihiro Matsukawa created PIG-3497: -- Summary: JobControlCompiler should only do reducer estimation when the job has a reduce phase Key: PIG-3497 URL: https://issues.apache.org/jira/browse/PIG-3497 Project: Pig Issue Type: Bug Reporter: Akihiro Matsukawa Priority: Minor Currently, JobControlCompiler makes an estimation for the number of reducers required (by default based on input size into mappers) regardless of whether there is a reduce phase in the job. This is unnecessary, especially when running more complex custom reducer estimators. Change to only estimate reducers when necessary. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (PIG-3496) Propagate HBase 0.95 jars to the backend
[ https://issues.apache.org/jira/browse/PIG-3496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jarek Jarcec Cecho updated PIG-3496: Attachment: PIG-3496.patch > Propagate HBase 0.95 jars to the backend > > > Key: PIG-3496 > URL: https://issues.apache.org/jira/browse/PIG-3496 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11 >Reporter: Jarek Jarcec Cecho >Assignee: Jarek Jarcec Cecho >Priority: Minor > Fix For: 0.13.0 > > Attachments: PIG-3496.patch > > > In PIG-3390 we've introduced support for HBase 0.95 that introduced a lot of > significant changes to HBase. One of the biggest user facing changes was > splitting one uber jar file into multiple independent jars (such as > {{hbase-common}}, {{hbase-client}}, ...). > {{HBaseStore}} have [special > code|https://github.com/apache/pig/blob/trunk/src/org/apache/pig/backend/hadoop/hbase/HBaseStorage.java#L724] > for propagating HBase jar files and important dependencies to the backend. > This logic has not been altered to take into account the different HBase jars > after the split and as a result the HBase integration with 0.95 is not > working in fully distributed mode (it is work in local mode thought). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (PIG-3496) Propagate HBase 0.95 jars to the backend
[ https://issues.apache.org/jira/browse/PIG-3496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jarek Jarcec Cecho updated PIG-3496: Fix Version/s: 0.13.0 Status: Patch Available (was: Open) > Propagate HBase 0.95 jars to the backend > > > Key: PIG-3496 > URL: https://issues.apache.org/jira/browse/PIG-3496 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11 >Reporter: Jarek Jarcec Cecho >Assignee: Jarek Jarcec Cecho >Priority: Minor > Fix For: 0.13.0 > > Attachments: PIG-3496.patch > > > In PIG-3390 we've introduced support for HBase 0.95 that introduced a lot of > significant changes to HBase. One of the biggest user facing changes was > splitting one uber jar file into multiple independent jars (such as > {{hbase-common}}, {{hbase-client}}, ...). > {{HBaseStore}} have [special > code|https://github.com/apache/pig/blob/trunk/src/org/apache/pig/backend/hadoop/hbase/HBaseStorage.java#L724] > for propagating HBase jar files and important dependencies to the backend. > This logic has not been altered to take into account the different HBase jars > after the split and as a result the HBase integration with 0.95 is not > working in fully distributed mode (it is work in local mode thought). -- This message was sent by Atlassian JIRA (v6.1#6144)
Review Request 14472: PIG-3496 Propagate HBase 0.95 jars to the backend
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/14472/ --- Review request for pig. Bugs: PIG-3496 https://issues.apache.org/jira/browse/PIG-3496 Repository: pig-git Description --- I've added the additional required jars into the initialiseHBaseClassLoaderResources() method, so that they get propagated into the backend. Diffs - src/org/apache/pig/backend/hadoop/hbase/HBaseStorage.java 67aa984 Diff: https://reviews.apache.org/r/14472/diff/ Testing --- Unit tests for both hbaseversion = 94 | 95 seems to be passing: ant clean test -Dtestcase=TestHBaseStorage -Dhbaseversion=94 -Dprotobuf-java.version=2.4.0a ant clean test -Dtestcase=TestHBaseStorage -Dhbaseversion=95 -Dprotobuf-java.version=2.5.0 I've also tried the patch on fully distributed clusters running both major HBase versions and everything seems to be working. Thanks, Jarek Cecho
[jira] [Commented] (PIG-3445) Make Parquet format available out of the box in Pig
[ https://issues.apache.org/jira/browse/PIG-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13785565#comment-13785565 ] Julien Le Dem commented on PIG-3445: parquet-format.version should be 1.0.0 > Make Parquet format available out of the box in Pig > --- > > Key: PIG-3445 > URL: https://issues.apache.org/jira/browse/PIG-3445 > Project: Pig > Issue Type: Improvement >Reporter: Julien Le Dem > Fix For: 0.12.0 > > Attachments: PIG-3445-2.patch, PIG-3445-3.patch, PIG-3445-4.patch, > PIG-3445.patch > > > We would add the Parquet jar in the Pig packages to make it available out of > the box to pig users. > On top of that we could add the parquet.pig package to the list of packages > to search for UDFs. (alternatively, the parquet jar could contain classes > name or.apache.pig.builtin.ParquetLoader and ParquetStorer) > This way users can use Parquet simply by typing: > A = LOAD 'foo' USING ParquetLoader(); > STORE A INTO 'bar' USING ParquetStorer(); -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (PIG-3445) Make Parquet format available out of the box in Pig
[ https://issues.apache.org/jira/browse/PIG-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13785564#comment-13785564 ] Julien Le Dem commented on PIG-3445: I add a parquet-pig-bundle and the shading of fastutil: https://github.com/Parquet/parquet-mr/pull/186 We can make a new release to simplify > Make Parquet format available out of the box in Pig > --- > > Key: PIG-3445 > URL: https://issues.apache.org/jira/browse/PIG-3445 > Project: Pig > Issue Type: Improvement >Reporter: Julien Le Dem > Fix For: 0.12.0 > > Attachments: PIG-3445-2.patch, PIG-3445-3.patch, PIG-3445-4.patch, > PIG-3445.patch > > > We would add the Parquet jar in the Pig packages to make it available out of > the box to pig users. > On top of that we could add the parquet.pig package to the list of packages > to search for UDFs. (alternatively, the parquet jar could contain classes > name or.apache.pig.builtin.ParquetLoader and ParquetStorer) > This way users can use Parquet simply by typing: > A = LOAD 'foo' USING ParquetLoader(); > STORE A INTO 'bar' USING ParquetStorer(); -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (PIG-3496) Propagate HBase 0.95 jars to the backend
Jarek Jarcec Cecho created PIG-3496: --- Summary: Propagate HBase 0.95 jars to the backend Key: PIG-3496 URL: https://issues.apache.org/jira/browse/PIG-3496 Project: Pig Issue Type: Bug Affects Versions: 0.11 Reporter: Jarek Jarcec Cecho Assignee: Jarek Jarcec Cecho Priority: Minor In PIG-3390 we've introduced support for HBase 0.95 that introduced a lot of significant changes to HBase. One of the biggest user facing changes was splitting one uber jar file into multiple independent jars (such as {{hbase-common}}, {{hbase-client}}, ...). {{HBaseStore}} have [special code|https://github.com/apache/pig/blob/trunk/src/org/apache/pig/backend/hadoop/hbase/HBaseStorage.java#L724] for propagating HBase jar files and important dependencies to the backend. This logic has not been altered to take into account the different HBase jars after the split and as a result the HBase integration with 0.95 is not working in fully distributed mode (it is work in local mode thought). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (PIG-3469) Skewed join can cause unrecoverable NullPointerException when one of its inputs is missing.
[ https://issues.apache.org/jira/browse/PIG-3469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated PIG-3469: - Resolution: Fixed Fix Version/s: 0.13.0 Status: Resolved (was: Patch Available) Patch committed to trunk. Thanks, Jarces! > Skewed join can cause unrecoverable NullPointerException when one of its > inputs is missing. > --- > > Key: PIG-3469 > URL: https://issues.apache.org/jira/browse/PIG-3469 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11 > Environment: Apache Pig version 0.11.0-cdh4.4.0 > Happens in both local execution environment (os x) and cluster environment > (linux) >Reporter: Christon DeWan >Assignee: Jarek Jarcec Cecho > Fix For: 0.13.0 > > Attachments: PIG-3469.patch, PIG-3469.patch, PIG-3469.patch > > > Run this script in the local execution environment (affects cluster mode too): > {noformat} > %declare DATA_EXISTS /tmp/test_data_exists.tsv > %declare DATA_MISSING /tmp/test_data_missing.tsv > %declare DUMMY `bash -c '(for (( i=0; \$i < 10; i++ )); do echo \$i; done) > > /tmp/test_data_exists.tsv; true'` > exists = LOAD '$DATA_EXISTS' AS (a:long); > missing = LOAD '$DATA_MISSING' AS (a:long); > missing = FOREACH ( GROUP missing BY a ) GENERATE $0 AS a, COUNT_STAR($1); > joined = JOIN exists BY a, missing BY a USING 'skewed'; > STORE joined INTO '/tmp/test_out.tsv'; > {noformat} > Results in NullPointerException which halts entire pig execution, including > unrelated jobs. Expected: only dependencies of the error'd LOAD statement > should fail. > Error: > {noformat} > 2013-09-18 11:42:31,518 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 2017: Internal error creating job configuration. > 2013-09-18 11:42:31,518 [main] ERROR org.apache.pig.tools.grunt.Grunt - > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException: > ERROR 2017: Internal error creating job configuration. > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:848) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:294) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:177) > at org.apache.pig.PigServer.launchPlan(PigServer.java:1266) > at > org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1251) > at org.apache.pig.PigServer.execute(PigServer.java:1241) > at org.apache.pig.PigServer.executeBatch(PigServer.java:335) > at > org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:137) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170) > at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84) > at org.apache.pig.Main.run(Main.java:604) > at org.apache.pig.Main.main(Main.java:157) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:208) > Caused by: java.lang.NullPointerException > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.adjustNumReducers(JobControlCompiler.java:868) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:480) > ... 17 more > {noformat} > Script above is as small as I can make it while still reproducing the issue. > Removing the group-foreach causes the join to fail harmlessly (not stopping > pig execution), as does using the default join. Did not occur on 0.10.1. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (PIG-2315) Make as clause work in generate
[ https://issues.apache.org/jira/browse/PIG-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-2315: Attachment: PIG-2315-1.patch Fix unit test failures. > Make as clause work in generate > --- > > Key: PIG-2315 > URL: https://issues.apache.org/jira/browse/PIG-2315 > Project: Pig > Issue Type: Bug >Reporter: Olga Natkovich >Assignee: Gianmarco De Francisci Morales > Fix For: 0.12.0 > > Attachments: PIG-2315-1.patch, PIG-2315-1.patch > > > Currently, the following syntax is supported and ignored causing confusing > with users: > A1 = foreach A1 generate a as a:chararray ; > After this statement a just retains its previous type -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (PIG-3494) Several fixes for e2e tests
[ https://issues.apache.org/jira/browse/PIG-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13785466#comment-13785466 ] Hudson commented on PIG-3494: - ABORTED: Integrated in Hive-trunk-hadoop2 #472 (See [https://builds.apache.org/job/Hive-trunk-hadoop2/472/]) PIG-3494: Several fixes for e2e tests (daijy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1528712) * /pig/trunk/test/e2e/harness/test_harness.pl * /pig/trunk/test/e2e/pig/conf/default.conf * /pig/trunk/test/e2e/pig/conf/rpm.conf * /pig/trunk/test/e2e/pig/drivers/TestDriverPig.pm * /pig/trunk/test/e2e/pig/drivers/TestDriverScript.pm * /pig/trunk/test/e2e/pig/tests/negative.conf * /pig/trunk/test/e2e/pig/tests/nightly.conf > Several fixes for e2e tests > --- > > Key: PIG-3494 > URL: https://issues.apache.org/jira/browse/PIG-3494 > Project: Pig > Issue Type: Bug > Components: e2e harness >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.12.0 > > Attachments: PIG-3494-1.patch > > > Address several issues in e2e tests: > 1. Adding the capacity to test Pig installed by rpm (also involves > configurable piggybank.jar) > 2. Remove hadoop23.res since it is no longer needed > 3. Remove hadoop2 specific error message "UdfException_[1-4]" since they are > fixed by PIG-3360 -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (PIG-3082) outputSchema of a UDF allows two usages when describing a Tuple schema
[ https://issues.apache.org/jira/browse/PIG-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13785427#comment-13785427 ] Julien Le Dem commented on PIG-3082: This is intended. The second behavior described above is really problematic. If a UDF breaks because it returns a schema of more than one field it should be changed to return one field of type tuple. Once fixed it works in all versions of Pig. This is only removing an unsafe use of outputSchema in favor of the existing correct use. > outputSchema of a UDF allows two usages when describing a Tuple schema > -- > > Key: PIG-3082 > URL: https://issues.apache.org/jira/browse/PIG-3082 > Project: Pig > Issue Type: Bug >Reporter: Julien Le Dem >Assignee: Jonathan Coveney > Fix For: 0.12.0 > > Attachments: PIG-3082-0.patch, PIG-3082-1.patch > > > When defining an evalfunc that returns a Tuple there are two ways you can > implement outputSchema(). > - The right way: return a schema that contains one Field that contains the > type and schema of the return type of the UDF > - The unreliable way: return a schema that contains more than one field and > it will be understood as a tuple schema even though there is no type (which > is in Field class) to specify that. This is particularly deceitful when the > output schema is derived from the input schema and the outputted Tuple > sometimes contain only one field. In such cases Pig understands the output > schema as a tuple only if there is more than one field. And sometimes it > works, sometimes it does not. > We should at least issue a warning (backward compatibility) if not plain > throw an exception when the output schema contains more than one Field. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (PIG-3470) Print configuration variables in grunt
[ https://issues.apache.org/jira/browse/PIG-3470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-3470: Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) +1. The patch is straightforward enough. I am fine to commit it to 0.12. Patch committed to both branches. Thanks Lorand! > Print configuration variables in grunt > -- > > Key: PIG-3470 > URL: https://issues.apache.org/jira/browse/PIG-3470 > Project: Pig > Issue Type: Improvement > Components: grunt >Reporter: Lorand Bendig >Assignee: Lorand Bendig >Priority: Minor > Fix For: 0.12.0 > > Attachments: PIG-3470-2.patch, PIG-3470.patch > > > However parameter handling is limited in grunt by design (PIG-2122), I would > find it sometimes useful to be able to list the jobConf properties when > testing statements or debugging scripts line by line. This patch extends the > SET command; as an analogue to Hive when calling it itself without the key > value parameters it prints all the configuration variables. System properties > are prefixed with "system:" -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Resolved] (PIG-3495) Streaming udf e2e tests failures on Windows
[ https://issues.apache.org/jira/browse/PIG-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai resolved PIG-3495. - Resolution: Fixed Hadoop Flags: Reviewed Patch committed to branch 0.12 and trunk. > Streaming udf e2e tests failures on Windows > --- > > Key: PIG-3495 > URL: https://issues.apache.org/jira/browse/PIG-3495 > Project: Pig > Issue Type: Bug > Components: impl >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.12.0 > > Attachments: PIG-3495-1.patch > > > Register a jython script with an absolute path fail. For Example: > {code} > register 'D:\scriptingudf.py' using jython as myfuncs; > a = load 'studenttab10k' using PigStorage() as (name, age:int, gpa:double); > b = foreach a generate myfuncs.square(age); > dump b; > {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (PIG-3494) Several fixes for e2e tests
[ https://issues.apache.org/jira/browse/PIG-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13785339#comment-13785339 ] Hudson commented on PIG-3494: - FAILURE: Integrated in Hive-trunk-h0.21 #2376 (See [https://builds.apache.org/job/Hive-trunk-h0.21/2376/]) PIG-3494: Several fixes for e2e tests (daijy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1528712) * /pig/trunk/test/e2e/harness/test_harness.pl * /pig/trunk/test/e2e/pig/conf/default.conf * /pig/trunk/test/e2e/pig/conf/rpm.conf * /pig/trunk/test/e2e/pig/drivers/TestDriverPig.pm * /pig/trunk/test/e2e/pig/drivers/TestDriverScript.pm * /pig/trunk/test/e2e/pig/tests/negative.conf * /pig/trunk/test/e2e/pig/tests/nightly.conf > Several fixes for e2e tests > --- > > Key: PIG-3494 > URL: https://issues.apache.org/jira/browse/PIG-3494 > Project: Pig > Issue Type: Bug > Components: e2e harness >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.12.0 > > Attachments: PIG-3494-1.patch > > > Address several issues in e2e tests: > 1. Adding the capacity to test Pig installed by rpm (also involves > configurable piggybank.jar) > 2. Remove hadoop23.res since it is no longer needed > 3. Remove hadoop2 specific error message "UdfException_[1-4]" since they are > fixed by PIG-3360 -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (PIG-3495) Streaming udf e2e tests failures on Windows
[ https://issues.apache.org/jira/browse/PIG-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13785197#comment-13785197 ] Rohini Palaniswamy commented on PIG-3495: - Thanks. +1 > Streaming udf e2e tests failures on Windows > --- > > Key: PIG-3495 > URL: https://issues.apache.org/jira/browse/PIG-3495 > Project: Pig > Issue Type: Bug > Components: impl >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.12.0 > > Attachments: PIG-3495-1.patch > > > Register a jython script with an absolute path fail. For Example: > {code} > register 'D:\scriptingudf.py' using jython as myfuncs; > a = load 'studenttab10k' using PigStorage() as (name, age:int, gpa:double); > b = foreach a generate myfuncs.square(age); > dump b; > {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (PIG-3470) Print configuration variables in grunt
[ https://issues.apache.org/jira/browse/PIG-3470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lorand Bendig updated PIG-3470: --- Attachment: PIG-3470-2.patch Patch is modified according to your comment. If accepted, the docs for the set command need to be updated > Print configuration variables in grunt > -- > > Key: PIG-3470 > URL: https://issues.apache.org/jira/browse/PIG-3470 > Project: Pig > Issue Type: Improvement > Components: grunt >Reporter: Lorand Bendig >Assignee: Lorand Bendig >Priority: Minor > Fix For: 0.12.0 > > Attachments: PIG-3470-2.patch, PIG-3470.patch > > > However parameter handling is limited in grunt by design (PIG-2122), I would > find it sometimes useful to be able to list the jobConf properties when > testing statements or debugging scripts line by line. This patch extends the > SET command; as an analogue to Hive when calling it itself without the key > value parameters it prints all the configuration variables. System properties > are prefixed with "system:" -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (PIG-3445) Make Parquet format available out of the box in Pig
[ https://issues.apache.org/jira/browse/PIG-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lorand Bendig updated PIG-3445: --- Attachment: PIG-3445-4.patch [~dvryaboy] Thank you. Well, yes, ParquetUtil is general util, so I merged it to JarManager. > Make Parquet format available out of the box in Pig > --- > > Key: PIG-3445 > URL: https://issues.apache.org/jira/browse/PIG-3445 > Project: Pig > Issue Type: Improvement >Reporter: Julien Le Dem > Fix For: 0.12.0 > > Attachments: PIG-3445-2.patch, PIG-3445-3.patch, PIG-3445-4.patch, > PIG-3445.patch > > > We would add the Parquet jar in the Pig packages to make it available out of > the box to pig users. > On top of that we could add the parquet.pig package to the list of packages > to search for UDFs. (alternatively, the parquet jar could contain classes > name or.apache.pig.builtin.ParquetLoader and ParquetStorer) > This way users can use Parquet simply by typing: > A = LOAD 'foo' USING ParquetLoader(); > STORE A INTO 'bar' USING ParquetStorer(); -- This message was sent by Atlassian JIRA (v6.1#6144)