[jira] [Commented] (PIG-3222) New UDFContextSignature assignments in Pig 0.11 breaks HCatalog.HCatStorer

2014-01-30 Thread Viraj Bhat (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887401#comment-13887401
 ] 

Viraj Bhat commented on PIG-3222:
-

Hi Daniel,
 It seems that this patch is in our code base for Pig 0.11. But still the query 
fails. I succeeds in Pig 0.12. I have asked Rohini if she has an idea on this.
Thanks again
Viraj

> New UDFContextSignature assignments in Pig 0.11 breaks HCatalog.HCatStorer 
> ---
>
> Key: PIG-3222
> URL: https://issues.apache.org/jira/browse/PIG-3222
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.11
>Reporter: Feng Peng
>  Labels: hcatalog
> Attachments: PigStorerDemo.java, hcat.trace, hcatstorer.trace.txt
>
>
> Pig 0.11 assigns different UDFContextSignature for different invocations of 
> the same load/store statement. This change breaks the HCatStorer which 
> assumes all front-end and back-end invocations of the same store statement 
> has the same UDFContextSignature so that it can read the previously stored 
> information correctly.
> The related HCatalog code is in 
> https://svn.apache.org/repos/asf/incubator/hcatalog/branches/branch-0.5/hcatalog-pig-adapter/src/main/java/org/apache/hcatalog/pig/HCatStorer.java
>  (the setStoreLocation() function).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (PIG-3373) XMLLoader returns non-matching nodes when a tag name spans through the block boundary

2014-01-30 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887373#comment-13887373
 ] 

Daniel Dai commented on PIG-3373:
-

[~aseldawy], seems the test still pass without your fix. Which version of Pig 
are you using?

> XMLLoader returns non-matching nodes when a tag name spans through the block 
> boundary
> -
>
> Key: PIG-3373
> URL: https://issues.apache.org/jira/browse/PIG-3373
> Project: Pig
>  Issue Type: Bug
>  Components: piggybank
>Affects Versions: site
>Reporter: Ahmed Eldawy
>Assignee: Ahmed Eldawy
>  Labels: patch
> Attachments: PIG3373.patch, PIG3373_1.patch, bad-file.xml.bz2
>
>
> When node start tag spans two blocks this tag is returned even if it is not 
> of the type.
> Example: For the following input file
> 
>   BLOCK BOUNDARY
> entually id="dfasd">
> XMLoader with tag type 'event' should return only the first one but it 
> actually returns both of them



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (PIG-2672) Optimize the use of DistributedCache

2014-01-30 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-2672:


Attachment: PIG-2672-7.patch

> Optimize the use of DistributedCache
> 
>
> Key: PIG-2672
> URL: https://issues.apache.org/jira/browse/PIG-2672
> Project: Pig
>  Issue Type: Improvement
>Reporter: Rohini Palaniswamy
>Assignee: Aniket Mokashi
> Fix For: 0.13.0
>
> Attachments: PIG-2672-5.patch, PIG-2672-7.patch, PIG-2672.patch
>
>
> Pig currently copies jar files to a temporary location in hdfs and then adds 
> them to DistributedCache for each job launched. This is inefficient in terms 
> of 
>* Space - The jars are distributed to task trackers for every job taking 
> up lot of local temporary space in tasktrackers.
>* Performance - The jar distribution impacts the job launch time.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (PIG-2672) Optimize the use of DistributedCache

2014-01-30 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-2672:


Attachment: (was: PIG-2672-7.patch)

> Optimize the use of DistributedCache
> 
>
> Key: PIG-2672
> URL: https://issues.apache.org/jira/browse/PIG-2672
> Project: Pig
>  Issue Type: Improvement
>Reporter: Rohini Palaniswamy
>Assignee: Aniket Mokashi
> Fix For: 0.13.0
>
> Attachments: PIG-2672-5.patch, PIG-2672-7.patch, PIG-2672.patch
>
>
> Pig currently copies jar files to a temporary location in hdfs and then adds 
> them to DistributedCache for each job launched. This is inefficient in terms 
> of 
>* Space - The jars are distributed to task trackers for every job taking 
> up lot of local temporary space in tasktrackers.
>* Performance - The jar distribution impacts the job launch time.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


Re: Review Request 14274: PIG-2672 Optimize the use of DistributedCache

2014-01-30 Thread Aniket Mokashi

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14274/
---

(Updated Jan. 31, 2014, 1:23 a.m.)


Review request for pig, Cheolsoo Park, DanielWX DanielWX, Dmitriy Ryaboy, 
Julien Le Dem, and Rohini Palaniswamy.


Bugs: PIG-2672
https://issues.apache.org/jira/browse/PIG-2672


Repository: pig


Description
---

added jar.cache.location option


Diffs (updated)
-

  
trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
 1563022 
  trunk/src/org/apache/pig/PigConfiguration.java 1563022 
  
trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java
 1563022 
  
trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceLauncher.java
 1563022 
  trunk/src/org/apache/pig/impl/io/FileLocalizer.java 1563022 
  trunk/test/org/apache/pig/test/TestMultiQueryLocal.java 1563022 
  trunk/test/org/apache/pig/test/TestPigServer.java 1563022 

Diff: https://reviews.apache.org/r/14274/diff/


Testing
---


Thanks,

Aniket Mokashi



[jira] Subscription: PIG patch available

2014-01-30 Thread jira
Issue Subscription
Filter: PIG patch available (17 issues)

Subscriber: pigdaily

Key Summary
PIG-3726Ranking empty records leads to NullPointerException
https://issues.apache.org/jira/browse/PIG-3726
PIG-3724pig e2e tests dont have hadoop libs on classpath
https://issues.apache.org/jira/browse/PIG-3724
PIG-3679e2e StreamingPythonUDFs_10 fails in trunk
https://issues.apache.org/jira/browse/PIG-3679
PIG-3670Fix assert in Pig script
https://issues.apache.org/jira/browse/PIG-3670
PIG-3668COR built-in function when atleast one of the coefficient values is 
NaN
https://issues.apache.org/jira/browse/PIG-3668
PIG-3642Direct HDFS access for small jobs (fetch) 
https://issues.apache.org/jira/browse/PIG-3642
PIG-3635Fix e2e tests for Hadoop 2.X on Windows
https://issues.apache.org/jira/browse/PIG-3635
PIG-3615Update the way that JsonLoader/JsonStorage deal with BigDecimal
https://issues.apache.org/jira/browse/PIG-3615
PIG-3613UDF for SimilarityMatching between strings with matching scores
https://issues.apache.org/jira/browse/PIG-3613
PIG-3587add functionality for rolling over dates
https://issues.apache.org/jira/browse/PIG-3587
PIG-3456Reduce threadlocal conf access in backend for each record
https://issues.apache.org/jira/browse/PIG-3456
PIG-3447Compiler warning message dropped for CastLineageSetter and others 
with no enum kind
https://issues.apache.org/jira/browse/PIG-3447
PIG-3441Allow Pig to use default resources from Configuration objects
https://issues.apache.org/jira/browse/PIG-3441
PIG-3373XMLLoader returns non-matching nodes when a tag name spans through 
the block boundary
https://issues.apache.org/jira/browse/PIG-3373
PIG-3347Store invocation brings side effect
https://issues.apache.org/jira/browse/PIG-3347
PIG-3299Provide support for LazyOutputFormat to avoid creating empty files
https://issues.apache.org/jira/browse/PIG-3299
PIG-2672Optimize the use of DistributedCache
https://issues.apache.org/jira/browse/PIG-2672

You may edit this subscription at:
https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=13225&filterId=12322384


[jira] [Updated] (PIG-3732) Use ONE_TO_ONE edge and IdentityInOut in orderby intermediate vertex

2014-01-30 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-3732:


  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Committed to tez branch. Thanks Daniel for the review.

> Use ONE_TO_ONE edge and IdentityInOut in orderby intermediate vertex
> 
>
> Key: PIG-3732
> URL: https://issues.apache.org/jira/browse/PIG-3732
> Project: Pig
>  Issue Type: Sub-task
>  Components: tez
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: tez-branch
>
> Attachments: PIG-3732-1.patch, PIG-3732-2.patch
>
>
>   From the first vertex to the intermediate vertex that does the partitioning 
> of the keys based on the WeightedRangePartitioner, use ONE_TO_ONE Tez edge 
> and unsorted output and input instead of using a shuffle edge. Also replace 
> the POPackage->POForEach->POLocalRearrange in intermediate vertex with 
> POIdentityInOutTez.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (PIG-3732) Use ONE_TO_ONE edge and IdentityInOut in orderby intermediate vertex

2014-01-30 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-3732:


Attachment: PIG-3732-2.patch

> Use ONE_TO_ONE edge and IdentityInOut in orderby intermediate vertex
> 
>
> Key: PIG-3732
> URL: https://issues.apache.org/jira/browse/PIG-3732
> Project: Pig
>  Issue Type: Sub-task
>  Components: tez
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: tez-branch
>
> Attachments: PIG-3732-1.patch, PIG-3732-2.patch
>
>
>   From the first vertex to the intermediate vertex that does the partitioning 
> of the keys based on the WeightedRangePartitioner, use ONE_TO_ONE Tez edge 
> and unsorted output and input instead of using a shuffle edge. Also replace 
> the POPackage->POForEach->POLocalRearrange in intermediate vertex with 
> POIdentityInOutTez.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (PIG-3733) Pig fails to concatenate semi-colon in generate statement

2014-01-30 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887296#comment-13887296
 ] 

Daniel Dai commented on PIG-3733:
-

Works for me on 0.12.0. 

> Pig fails to concatenate semi-colon in generate statement
> -
>
> Key: PIG-3733
> URL: https://issues.apache.org/jira/browse/PIG-3733
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11.1
>Reporter: sudhir mallem
>
> Pig fails to concatenate semi-colon to a column in a generate statement. I've 
> tried multiple ways including unicode version (\\u003B), but fails.
> {code}
> grunt> a = load '/user/smallem/mem.csv' using PigStorage('|') as (uid:int, 
> sid:chararray);
> grunt> b = foreach a generate uid as uid, CONCAT('v=1;',sid) as sids;
>   mismatched character '' expecting '''
> 2014-01-30 08:51:51,759 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1200:   mismatched character '' expecting '''
> Details at logfile: /export/home/smallem/pig_1391071809426.log
> {code}
> The same however works when used nested statement.
> {code}
> grunt> a = load '/user/smallem/mem.csv' using PigStorage('|') as (uid:int, 
> sid:chararray);
> grunt> b = foreach a {
> 
> >> x = CONCAT('v=1;',sid);
> >> generate uid as memberuid, x as sids ;
> >> };
> grunt>
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


Re: Review Request 17529: [PIG-3732] Use ONE_TO_ONE edge and IdentityInOut in orderby intermediate vertex

2014-01-30 Thread Rohini Palaniswamy

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17529/
---

(Updated Jan. 31, 2014, 12:21 a.m.)


Review request for pig, Cheolsoo Park and Daniel Dai.


Changes
---

Final patch that was committed


Bugs: PIG-3732
https://issues.apache.org/jira/browse/PIG-3732


Repository: pig


Description
---

Orderby has 4 vertices and changes done are as below.

Load Vertex -> Partitioner Vertex 
 - Was RoundRobinPartitioner with sorted shuffle and parallelism of 
Partitioner Vertex was same as reducer vertex (i.e PARALLEL clause). Now 
ONE_TO_ONE unsorted edge between Load Vertex and Partitioner Vertex with 
Partitioner Vertex having same parallelism as Load Vertex. Will get the 
performance numbers for both cases by Friday.
Load Vertex -> Sampler Vertex  
Sampler Vertex -> Partitioner Vertex (Broadcast edge)
 - The POPackage->POForeach->POLocalRearrange in Partitioner Vertex has 
been replaced by POIdentityInOutTez
Partitioner Vertex -> Reducer Vertex

Need to attempt this for Skewed Join as well.


This patch also sets credential on DAG which is required after TEZ-395


Diffs (updated)
-

  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/partitioners/WeightedRangePartitioner.java
 1562426 
  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/PhysicalOperator.java
 1562426 
  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/POIdentityInOutTez.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/POLocalRearrangeTez.java
 1562426 
  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/POPartitionRearrangeTez.java
 1562426 
  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/POShuffleTezLoad.java
 1562426 
  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java
 1562426 
  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java
 1562426 
  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezDAG.java
 1562426 
  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/WeightedRangePartitionerTez.java
 1562426 
  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/impl/io/NullablePartitionWritable.java
 1562426 
  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/impl/io/PigNullableWritable.java
 1562426 
  
http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/TEZC16.gld
 1562426 
  
http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/TEZC7.gld
 1562426 

Diff: https://reviews.apache.org/r/17529/diff/


Testing
---

test-tez and tez.conf e2e tests pass


Thanks,

Rohini Palaniswamy



[jira] [Updated] (PIG-2672) Optimize the use of DistributedCache

2014-01-30 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-2672:


Attachment: PIG-2672-7.patch

> Optimize the use of DistributedCache
> 
>
> Key: PIG-2672
> URL: https://issues.apache.org/jira/browse/PIG-2672
> Project: Pig
>  Issue Type: Improvement
>Reporter: Rohini Palaniswamy
> Fix For: 0.13.0
>
> Attachments: PIG-2672-5.patch, PIG-2672-7.patch, PIG-2672.patch
>
>
> Pig currently copies jar files to a temporary location in hdfs and then adds 
> them to DistributedCache for each job launched. This is inefficient in terms 
> of 
>* Space - The jars are distributed to task trackers for every job taking 
> up lot of local temporary space in tasktrackers.
>* Performance - The jar distribution impacts the job launch time.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (PIG-3667) build.xml jar-all target does not include jython*.jar in lib/ directory

2014-01-30 Thread Suhas Satish (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887293#comment-13887293
 ] 

Suhas Satish commented on PIG-3667:
---

I have created a clone here 
https://issues.apache.org/jira/browse/PIG-3734

> build.xml jar-all target does not include jython*.jar in lib/ directory 
> 
>
> Key: PIG-3667
> URL: https://issues.apache.org/jira/browse/PIG-3667
> Project: Pig
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.12.0
>Reporter: Suhas Satish
>Assignee: Suhas Satish
>  Labels: build
> Attachments: PIG-3667.patch
>
>
> Pig package does not include the jython jar within lib/ directory  with the 
> jar-all ant target but includes it in the "ant package" target. It should be 
> including it in both targets as often, the build/ directory is excluded from 
> packaging which is where ivy puts all the dependency jars while building 
> under build/ivy/lib/Pig  
> To reproduce:
> ant jar-all 
> rm -rf build/ 
> bin/pig
> grunt> register '/tmp/test.py' using jython as myfunction;
> If done prior to installing jython, here's the error one gets:
> 2013-12-27 18:22:31,145 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR
> 2998: Unhandled internal error. org/python/core/PyObject
> Details at logfile: pig_*.log
> Within the pig_*.log => 
> 
> Pig Stack Trace
> ---
> ERROR 2998: Unhandled internal error. org/python/core/PyObject
> java.lang.NoClassDefFoundError: org/python/core/PyObject
> at
> org.apache.pig.scripting.jython.JythonScriptEngine.registerFunctions(JythonScriptEngine.java:304)
> at org.apache.pig.PigServer.registerCode(PigServer.java:501)
> at
> org.apache.pig.tools.grunt.GruntParser.processRegister(GruntParser.java:436)
> at
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:445)
> at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
> at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
> at org.apache.pig.Main.run(Main.java:538)
> at org.apache.pig.Main.main(Main.java:157)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
> Caused by: java.lang.ClassNotFoundException: org.python.core.PyObject
> at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> ... 14 more
> Fix: Including jython*.jar within the lib/ directory gets rid of this issue 
> and the UDF can be loaded- 
> grunt>  register '/tmp/test.py' using jython as myfuncs;
> 2013-12-27 18:37:02,402 [main] INFO 
> org.apache.pig.scripting.jython.JythonScriptEngine - created tmp
> python.cachedir=/tmp/pig_jython_4887743829482443898
> 2013-12-27 18:37:03,448 [main] WARN 
> org.apache.pig.scripting.jython.JythonScriptEngine - pig.cmd.args.remainders 
> is
> empty. This is not expected unless on testing.
> 2013-12-27 18:37:03,724 [main] INFO 
> org.apache.pig.scripting.jython.JythonScriptEngine - Register scripting UDF:
> myfuncs.helloworld



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (PIG-3734) build.xml jar-all target does not include jython*.jar in lib/ directory

2014-01-30 Thread Suhas Satish (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887294#comment-13887294
 ] 

Suhas Satish commented on PIG-3734:
---

This is a clone of PIG-3667 which was accidentally marked as "closed".
 

> build.xml jar-all target does not include jython*.jar in lib/ directory
> ---
>
> Key: PIG-3734
> URL: https://issues.apache.org/jira/browse/PIG-3734
> Project: Pig
>  Issue Type: Bug
>Reporter: Suhas Satish
>Assignee: Suhas Satish
> Attachments: PIG-3734.patch
>
>
> Pig package does not include the jython jar within lib/ directory with the 
> jar-all ant target but includes it in the "ant package" target. It should be 
> including it in both targets as often, the build/ directory is excluded from 
> packaging which is where ivy puts all the dependency jars while building 
> under build/ivy/lib/Pig
> To reproduce:
> ant jar-all 
> rm -rf build/ 
> bin/pig
> grunt> register '/tmp/test.py' using jython as myfunction;
> If done prior to installing jython, here's the error one gets:
> 2013-12-27 18:22:31,145 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR
> 2998: Unhandled internal error. org/python/core/PyObject
> Details at logfile: pig_*.log
> Within the pig_*.log =>
> 
> Pig Stack Trace
> ---
> ERROR 2998: Unhandled internal error. org/python/core/PyObject
> java.lang.NoClassDefFoundError: org/python/core/PyObject
> at
> org.apache.pig.scripting.jython.JythonScriptEngine.registerFunctions(JythonScriptEngine.java:304)
> at org.apache.pig.PigServer.registerCode(PigServer.java:501)
> at
> org.apache.pig.tools.grunt.GruntParser.processRegister(GruntParser.java:436)
> at
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:445)
> at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
> at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
> at org.apache.pig.Main.run(Main.java:538)
> at org.apache.pig.Main.main(Main.java:157)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
> Caused by: java.lang.ClassNotFoundException: org.python.core.PyObject
> at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> ... 14 more
> Fix: Including jython*.jar within the lib/ directory gets rid of this issue 
> and the UDF can be loaded- 
> grunt> register '/tmp/test.py' using jython as myfuncs;
> 2013-12-27 18:37:02,402 [main] INFO 
> org.apache.pig.scripting.jython.JythonScriptEngine - created tmp
> python.cachedir=/tmp/pig_jython_4887743829482443898
> 2013-12-27 18:37:03,448 [main] WARN 
> org.apache.pig.scripting.jython.JythonScriptEngine - pig.cmd.args.remainders 
> is
> empty. This is not expected unless on testing.
> 2013-12-27 18:37:03,724 [main] INFO 
> org.apache.pig.scripting.jython.JythonScriptEngine - Register scripting UDF:
> myfuncs.helloworld



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (PIG-2672) Optimize the use of DistributedCache

2014-01-30 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-2672:


Assignee: Aniket Mokashi
  Status: Patch Available  (was: Open)

> Optimize the use of DistributedCache
> 
>
> Key: PIG-2672
> URL: https://issues.apache.org/jira/browse/PIG-2672
> Project: Pig
>  Issue Type: Improvement
>Reporter: Rohini Palaniswamy
>Assignee: Aniket Mokashi
> Fix For: 0.13.0
>
> Attachments: PIG-2672-5.patch, PIG-2672-7.patch, PIG-2672.patch
>
>
> Pig currently copies jar files to a temporary location in hdfs and then adds 
> them to DistributedCache for each job launched. This is inefficient in terms 
> of 
>* Space - The jars are distributed to task trackers for every job taking 
> up lot of local temporary space in tasktrackers.
>* Performance - The jar distribution impacts the job launch time.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (PIG-3734) build.xml jar-all target does not include jython*.jar in lib/ directory

2014-01-30 Thread Suhas Satish (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suhas Satish updated PIG-3734:
--

Attachment: PIG-3734.patch

> build.xml jar-all target does not include jython*.jar in lib/ directory
> ---
>
> Key: PIG-3734
> URL: https://issues.apache.org/jira/browse/PIG-3734
> Project: Pig
>  Issue Type: Bug
>Reporter: Suhas Satish
>Assignee: Suhas Satish
> Attachments: PIG-3734.patch
>
>
> Pig package does not include the jython jar within lib/ directory with the 
> jar-all ant target but includes it in the "ant package" target. It should be 
> including it in both targets as often, the build/ directory is excluded from 
> packaging which is where ivy puts all the dependency jars while building 
> under build/ivy/lib/Pig
> To reproduce:
> ant jar-all 
> rm -rf build/ 
> bin/pig
> grunt> register '/tmp/test.py' using jython as myfunction;
> If done prior to installing jython, here's the error one gets:
> 2013-12-27 18:22:31,145 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR
> 2998: Unhandled internal error. org/python/core/PyObject
> Details at logfile: pig_*.log
> Within the pig_*.log =>
> 
> Pig Stack Trace
> ---
> ERROR 2998: Unhandled internal error. org/python/core/PyObject
> java.lang.NoClassDefFoundError: org/python/core/PyObject
> at
> org.apache.pig.scripting.jython.JythonScriptEngine.registerFunctions(JythonScriptEngine.java:304)
> at org.apache.pig.PigServer.registerCode(PigServer.java:501)
> at
> org.apache.pig.tools.grunt.GruntParser.processRegister(GruntParser.java:436)
> at
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:445)
> at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
> at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
> at org.apache.pig.Main.run(Main.java:538)
> at org.apache.pig.Main.main(Main.java:157)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
> Caused by: java.lang.ClassNotFoundException: org.python.core.PyObject
> at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> ... 14 more
> Fix: Including jython*.jar within the lib/ directory gets rid of this issue 
> and the UDF can be loaded- 
> grunt> register '/tmp/test.py' using jython as myfuncs;
> 2013-12-27 18:37:02,402 [main] INFO 
> org.apache.pig.scripting.jython.JythonScriptEngine - created tmp
> python.cachedir=/tmp/pig_jython_4887743829482443898
> 2013-12-27 18:37:03,448 [main] WARN 
> org.apache.pig.scripting.jython.JythonScriptEngine - pig.cmd.args.remainders 
> is
> empty. This is not expected unless on testing.
> 2013-12-27 18:37:03,724 [main] INFO 
> org.apache.pig.scripting.jython.JythonScriptEngine - Register scripting UDF:
> myfuncs.helloworld



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (PIG-3734) build.xml jar-all target does not include jython*.jar in lib/ directory

2014-01-30 Thread Suhas Satish (JIRA)
Suhas Satish created PIG-3734:
-

 Summary: build.xml jar-all target does not include jython*.jar in 
lib/ directory
 Key: PIG-3734
 URL: https://issues.apache.org/jira/browse/PIG-3734
 Project: Pig
  Issue Type: Bug
Reporter: Suhas Satish
Assignee: Suhas Satish


Pig package does not include the jython jar within lib/ directory with the 
jar-all ant target but includes it in the "ant package" target. It should be 
including it in both targets as often, the build/ directory is excluded from 
packaging which is where ivy puts all the dependency jars while building under 
build/ivy/lib/Pig
To reproduce:
ant jar-all 
rm -rf build/ 
bin/pig
grunt> register '/tmp/test.py' using jython as myfunction;
If done prior to installing jython, here's the error one gets:
2013-12-27 18:22:31,145 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR
2998: Unhandled internal error. org/python/core/PyObject
Details at logfile: pig_*.log
Within the pig_*.log =>

Pig Stack Trace
---
ERROR 2998: Unhandled internal error. org/python/core/PyObject
java.lang.NoClassDefFoundError: org/python/core/PyObject
at
org.apache.pig.scripting.jython.JythonScriptEngine.registerFunctions(JythonScriptEngine.java:304)
at org.apache.pig.PigServer.registerCode(PigServer.java:501)
at
org.apache.pig.tools.grunt.GruntParser.processRegister(GruntParser.java:436)
at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:445)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
at org.apache.pig.Main.run(Main.java:538)
at org.apache.pig.Main.main(Main.java:157)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
Caused by: java.lang.ClassNotFoundException: org.python.core.PyObject
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
... 14 more
Fix: Including jython*.jar within the lib/ directory gets rid of this issue and 
the UDF can be loaded- 
grunt> register '/tmp/test.py' using jython as myfuncs;
2013-12-27 18:37:02,402 [main] INFO 
org.apache.pig.scripting.jython.JythonScriptEngine - created tmp
python.cachedir=/tmp/pig_jython_4887743829482443898
2013-12-27 18:37:03,448 [main] WARN 
org.apache.pig.scripting.jython.JythonScriptEngine - pig.cmd.args.remainders is
empty. This is not expected unless on testing.
2013-12-27 18:37:03,724 [main] INFO 
org.apache.pig.scripting.jython.JythonScriptEngine - Register scripting UDF:
myfuncs.helloworld



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (PIG-3399) Missing required library: 'build/ivy/lib/Pig/javacc-4.2.jar' pig

2014-01-30 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887272#comment-13887272
 ] 

Daniel Dai commented on PIG-3399:
-

The patch should work. Can you try it?

> Missing required library: 'build/ivy/lib/Pig/javacc-4.2.jar'  pig
> 
>
> Key: PIG-3399
> URL: https://issues.apache.org/jira/browse/PIG-3399
> Project: Pig
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.11.1
>Reporter: vikram s
> Attachments: PIG-3399-1.patch
>
>
> 1.Got latest version by git pull
> 2.mvn clean eclipse-files
> 3.Imported to eclipse
> 4.Got error Missing required library: 'build/ivy/lib/Pig/javacc-4.2.jar'  
> pig



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (PIG-3299) Provide support for LazyOutputFormat to avoid creating empty files

2014-01-30 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887271#comment-13887271
 ] 

Daniel Dai commented on PIG-3299:
-

LazyOutputFormat is not in Pig bundled hadoop 1.0.0. We will also need to bump 
up hadoop version in order to compile. Let me give a try.

> Provide support for LazyOutputFormat to avoid creating empty files
> --
>
> Key: PIG-3299
> URL: https://issues.apache.org/jira/browse/PIG-3299
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.11.1
>Reporter: Rohini Palaniswamy
>Assignee: Lorand Bendig
> Attachments: PIG-3299.patch
>
>
> LazyOutputFormat (HADOOP-4927) in hadoop is a wrapper to avoid creating part 
> files if there is no records output. It would be good to add support for that 
> by having a configuration in pig which wraps storeFunc.getOutputFormat() with 
> LazyOutputFormat. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (PIG-3299) Provide support for LazyOutputFormat to avoid creating empty files

2014-01-30 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887258#comment-13887258
 ] 

Daniel Dai commented on PIG-3299:
-

Sorry for delay, let me kick off the tests.

> Provide support for LazyOutputFormat to avoid creating empty files
> --
>
> Key: PIG-3299
> URL: https://issues.apache.org/jira/browse/PIG-3299
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.11.1
>Reporter: Rohini Palaniswamy
>Assignee: Lorand Bendig
> Attachments: PIG-3299.patch
>
>
> LazyOutputFormat (HADOOP-4927) in hadoop is a wrapper to avoid creating part 
> files if there is no records output. It would be good to add support for that 
> by having a configuration in pig which wraps storeFunc.getOutputFormat() with 
> LazyOutputFormat. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (PIG-3667) build.xml jar-all target does not include jython*.jar in lib/ directory

2014-01-30 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887255#comment-13887255
 ] 

Daniel Dai commented on PIG-3667:
-

Since it is "closed", there is no way to reopen, can you open a new Jira?

> build.xml jar-all target does not include jython*.jar in lib/ directory 
> 
>
> Key: PIG-3667
> URL: https://issues.apache.org/jira/browse/PIG-3667
> Project: Pig
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.12.0
>Reporter: Suhas Satish
>Assignee: Suhas Satish
>  Labels: build
> Attachments: PIG-3667.patch
>
>
> Pig package does not include the jython jar within lib/ directory  with the 
> jar-all ant target but includes it in the "ant package" target. It should be 
> including it in both targets as often, the build/ directory is excluded from 
> packaging which is where ivy puts all the dependency jars while building 
> under build/ivy/lib/Pig  
> To reproduce:
> ant jar-all 
> rm -rf build/ 
> bin/pig
> grunt> register '/tmp/test.py' using jython as myfunction;
> If done prior to installing jython, here's the error one gets:
> 2013-12-27 18:22:31,145 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR
> 2998: Unhandled internal error. org/python/core/PyObject
> Details at logfile: pig_*.log
> Within the pig_*.log => 
> 
> Pig Stack Trace
> ---
> ERROR 2998: Unhandled internal error. org/python/core/PyObject
> java.lang.NoClassDefFoundError: org/python/core/PyObject
> at
> org.apache.pig.scripting.jython.JythonScriptEngine.registerFunctions(JythonScriptEngine.java:304)
> at org.apache.pig.PigServer.registerCode(PigServer.java:501)
> at
> org.apache.pig.tools.grunt.GruntParser.processRegister(GruntParser.java:436)
> at
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:445)
> at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
> at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
> at org.apache.pig.Main.run(Main.java:538)
> at org.apache.pig.Main.main(Main.java:157)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
> Caused by: java.lang.ClassNotFoundException: org.python.core.PyObject
> at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> ... 14 more
> Fix: Including jython*.jar within the lib/ directory gets rid of this issue 
> and the UDF can be loaded- 
> grunt>  register '/tmp/test.py' using jython as myfuncs;
> 2013-12-27 18:37:02,402 [main] INFO 
> org.apache.pig.scripting.jython.JythonScriptEngine - created tmp
> python.cachedir=/tmp/pig_jython_4887743829482443898
> 2013-12-27 18:37:03,448 [main] WARN 
> org.apache.pig.scripting.jython.JythonScriptEngine - pig.cmd.args.remainders 
> is
> empty. This is not expected unless on testing.
> 2013-12-27 18:37:03,724 [main] INFO 
> org.apache.pig.scripting.jython.JythonScriptEngine - Register scripting UDF:
> myfuncs.helloworld



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (PIG-3652) Pigmix parser (PigPerformanceLoader) deletes chars during parsing

2014-01-30 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai resolved PIG-3652.
-

   Resolution: Fixed
Fix Version/s: 0.13.0
 Hadoop Flags: Reviewed

Patch committed to trunk. Thanks Keren!

> Pigmix parser (PigPerformanceLoader) deletes chars during parsing 
> --
>
> Key: PIG-3652
> URL: https://issues.apache.org/jira/browse/PIG-3652
> Project: Pig
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 0.12.0
>Reporter: Keren Ouaknine
>Assignee: Keren Ouaknine
> Fix For: 0.13.0
>
> Attachments: PIG-3652-1.patch, PIG-3652-2.patch
>
>
> When importing data generated by Pigmix using pigper.jar, the first char of 
> the value of a map are missing like in the following example:
> DATA GENERATED:
> f^DGvds_NL //^D is the delimiter
> DATA LOADED:
> [f#vds_NL]
> The letter G is missing.
> This issue reproduces to the key of the map when the number of bytes >1



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (PIG-3222) New UDFContextSignature assignments in Pig 0.11 breaks HCatalog.HCatStorer

2014-01-30 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887242#comment-13887242
 ] 

Daniel Dai commented on PIG-3222:
-

PIG-3267 checked into Pig 0.11 branch (but no further 0.11 release after this 
checked in). Can you check if you have this patch in your codebase?

> New UDFContextSignature assignments in Pig 0.11 breaks HCatalog.HCatStorer 
> ---
>
> Key: PIG-3222
> URL: https://issues.apache.org/jira/browse/PIG-3222
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.11
>Reporter: Feng Peng
>  Labels: hcatalog
> Attachments: PigStorerDemo.java, hcat.trace, hcatstorer.trace.txt
>
>
> Pig 0.11 assigns different UDFContextSignature for different invocations of 
> the same load/store statement. This change breaks the HCatStorer which 
> assumes all front-end and back-end invocations of the same store statement 
> has the same UDFContextSignature so that it can read the previously stored 
> information correctly.
> The related HCatalog code is in 
> https://svn.apache.org/repos/asf/incubator/hcatalog/branches/branch-0.5/hcatalog-pig-adapter/src/main/java/org/apache/hcatalog/pig/HCatStorer.java
>  (the setStoreLocation() function).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (PIG-3347) Store invocation brings side effect

2014-01-30 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887236#comment-13887236
 ] 

Daniel Dai commented on PIG-3347:
-

UID is to track column lineage so in logical optimizer, so that we can freely 
move operate up and down, ProjectionPatcher will reposition the column 
according to uid, even if the column get reordered. A new source of data should 
have a new UID, that's the case for nested LOForEach/LODistinct, since they are 
not directly derived from the previous operator, instead, it is a new field 
generated by the foreach.

> Store invocation brings side effect
> ---
>
> Key: PIG-3347
> URL: https://issues.apache.org/jira/browse/PIG-3347
> Project: Pig
>  Issue Type: Bug
>  Components: grunt
>Affects Versions: 0.11
> Environment: local mode
>Reporter: Sergey
>Assignee: Daniel Dai
>Priority: Critical
> Fix For: 0.12.1
>
> Attachments: PIG-3347-1.patch
>
>
> The problem is that intermediate 'store' invocation "changes" the final store 
> output. Looks like it brings some kind of side effect. We did use 'local' 
> mode to run script
> here is the input data:
> 1
> 1
> Here is the script:
> {code}
> a = load 'test';
> a_group = group a by $0;
> b = foreach a_group {
>   a_distinct = distinct a.$0;
>   generate group, a_distinct;
> }
> --store b into 'b';
> c = filter b by SIZE(a_distinct) == 1;
> store c into 'out';
> {code}
> We expect output to be:
> 1 1
> The output is empty file.
> Uncomment {code}--store b into 'b';{code} line and see the diffrence.
> Yuo would get expected output.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (PIG-3222) New UDFContextSignature assignments in Pig 0.11 breaks HCatalog.HCatStorer

2014-01-30 Thread Viraj Bhat (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887216#comment-13887216
 ] 

Viraj Bhat commented on PIG-3222:
-

Hi Feng,
 Thanks for finding this error in Pig 0.11. It seems the limit to HCatStorer 
works fine with Pig 0.12 but is still a problem with Pig 0.11.  Not sure if we 
need to backport something that got this working in Pig 0.12
Viraj

> New UDFContextSignature assignments in Pig 0.11 breaks HCatalog.HCatStorer 
> ---
>
> Key: PIG-3222
> URL: https://issues.apache.org/jira/browse/PIG-3222
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.11
>Reporter: Feng Peng
>  Labels: hcatalog
> Attachments: PigStorerDemo.java, hcat.trace, hcatstorer.trace.txt
>
>
> Pig 0.11 assigns different UDFContextSignature for different invocations of 
> the same load/store statement. This change breaks the HCatStorer which 
> assumes all front-end and back-end invocations of the same store statement 
> has the same UDFContextSignature so that it can read the previously stored 
> information correctly.
> The related HCatalog code is in 
> https://svn.apache.org/repos/asf/incubator/hcatalog/branches/branch-0.5/hcatalog-pig-adapter/src/main/java/org/apache/hcatalog/pig/HCatStorer.java
>  (the setStoreLocation() function).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (PIG-3347) Store invocation brings side effect

2014-01-30 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887203#comment-13887203
 ] 

Julien Le Dem commented on PIG-3347:


I thought that the field UIDs were used to track lineage across the plan.
[~aniket486] correct me if I'm wrong but it is used to determine which fields 
are reads for projection push down.
In the case of self join (directly or indirectly) we end up with duplicate ids 
in the same relation because the same field is derived to 2 different fields.
Otherwise I'm as lost as [~knoguchi] regarding the actual mechanisms around the 
UID.
I tried to fix some of these in the past (PIG-3020) but it appears they created 
more problems (PIG-3492)
[~daijy] maybe you can enlighten us?

> Store invocation brings side effect
> ---
>
> Key: PIG-3347
> URL: https://issues.apache.org/jira/browse/PIG-3347
> Project: Pig
>  Issue Type: Bug
>  Components: grunt
>Affects Versions: 0.11
> Environment: local mode
>Reporter: Sergey
>Assignee: Daniel Dai
>Priority: Critical
> Fix For: 0.12.1
>
> Attachments: PIG-3347-1.patch
>
>
> The problem is that intermediate 'store' invocation "changes" the final store 
> output. Looks like it brings some kind of side effect. We did use 'local' 
> mode to run script
> here is the input data:
> 1
> 1
> Here is the script:
> {code}
> a = load 'test';
> a_group = group a by $0;
> b = foreach a_group {
>   a_distinct = distinct a.$0;
>   generate group, a_distinct;
> }
> --store b into 'b';
> c = filter b by SIZE(a_distinct) == 1;
> store c into 'out';
> {code}
> We expect output to be:
> 1 1
> The output is empty file.
> Uncomment {code}--store b into 'b';{code} line and see the diffrence.
> Yuo would get expected output.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


Re: Review Request 17529: [PIG-3732] Use ONE_TO_ONE edge and IdentityInOut in orderby intermediate vertex

2014-01-30 Thread Daniel Dai

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17529/#review33264
---

Ship it!


+1 for the rest

- Daniel Dai


On Jan. 30, 2014, 8:28 a.m., Rohini Palaniswamy wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/17529/
> ---
> 
> (Updated Jan. 30, 2014, 8:28 a.m.)
> 
> 
> Review request for pig, Cheolsoo Park and Daniel Dai.
> 
> 
> Bugs: PIG-3732
> https://issues.apache.org/jira/browse/PIG-3732
> 
> 
> Repository: pig
> 
> 
> Description
> ---
> 
> Orderby has 4 vertices and changes done are as below.
> 
> Load Vertex -> Partitioner Vertex 
>  - Was RoundRobinPartitioner with sorted shuffle and parallelism of 
> Partitioner Vertex was same as reducer vertex (i.e PARALLEL clause). Now 
> ONE_TO_ONE unsorted edge between Load Vertex and Partitioner Vertex with 
> Partitioner Vertex having same parallelism as Load Vertex. Will get the 
> performance numbers for both cases by Friday.
> Load Vertex -> Sampler Vertex  
> Sampler Vertex -> Partitioner Vertex (Broadcast edge)
>  - The POPackage->POForeach->POLocalRearrange in Partitioner Vertex has 
> been replaced by POIdentityInOutTez
> Partitioner Vertex -> Reducer Vertex
> 
> Need to attempt this for Skewed Join as well.
> 
> 
> This patch also sets credential on DAG which is required after TEZ-395
> 
> 
> Diffs
> -
> 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/partitioners/WeightedRangePartitioner.java
>  1562426 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/PhysicalOperator.java
>  1562426 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/POIdentityInOutTez.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/POLocalRearrangeTez.java
>  1562426 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/POPartitionRearrangeTez.java
>  1562426 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/POShuffleTezLoad.java
>  1562426 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java
>  1562426 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java
>  1562426 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezDAG.java
>  1562426 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/WeightedRangePartitionerTez.java
>  1562426 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/impl/io/NullablePartitionWritable.java
>  1562426 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/impl/io/PigNullableWritable.java
>  1562426 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/TEZC16.gld
>  1562426 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/TEZC7.gld
>  1562426 
> 
> Diff: https://reviews.apache.org/r/17529/diff/
> 
> 
> Testing
> ---
> 
> test-tez and tez.conf e2e tests pass
> 
> 
> Thanks,
> 
> Rohini Palaniswamy
> 
>



Re: Review Request 17529: [PIG-3732] Use ONE_TO_ONE edge and IdentityInOut in orderby intermediate vertex

2014-01-30 Thread Rohini Palaniswamy

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17529/#review33262
---



http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java


I need to replace this with just 

POLocalRearrangeTez lr = new POLocalRearrangeTez(new OperatorKey(scope, 
nig.getNextNodeId(scope)));

as endSingleInputWithStoreAndSample currently overrides all the fields set 
in localRearrangeFactory.create.  Will do that before committing.


- Rohini Palaniswamy


On Jan. 30, 2014, 8:28 a.m., Rohini Palaniswamy wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/17529/
> ---
> 
> (Updated Jan. 30, 2014, 8:28 a.m.)
> 
> 
> Review request for pig, Cheolsoo Park and Daniel Dai.
> 
> 
> Bugs: PIG-3732
> https://issues.apache.org/jira/browse/PIG-3732
> 
> 
> Repository: pig
> 
> 
> Description
> ---
> 
> Orderby has 4 vertices and changes done are as below.
> 
> Load Vertex -> Partitioner Vertex 
>  - Was RoundRobinPartitioner with sorted shuffle and parallelism of 
> Partitioner Vertex was same as reducer vertex (i.e PARALLEL clause). Now 
> ONE_TO_ONE unsorted edge between Load Vertex and Partitioner Vertex with 
> Partitioner Vertex having same parallelism as Load Vertex. Will get the 
> performance numbers for both cases by Friday.
> Load Vertex -> Sampler Vertex  
> Sampler Vertex -> Partitioner Vertex (Broadcast edge)
>  - The POPackage->POForeach->POLocalRearrange in Partitioner Vertex has 
> been replaced by POIdentityInOutTez
> Partitioner Vertex -> Reducer Vertex
> 
> Need to attempt this for Skewed Join as well.
> 
> 
> This patch also sets credential on DAG which is required after TEZ-395
> 
> 
> Diffs
> -
> 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/partitioners/WeightedRangePartitioner.java
>  1562426 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/PhysicalOperator.java
>  1562426 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/POIdentityInOutTez.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/POLocalRearrangeTez.java
>  1562426 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/POPartitionRearrangeTez.java
>  1562426 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/POShuffleTezLoad.java
>  1562426 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java
>  1562426 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java
>  1562426 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezDAG.java
>  1562426 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/WeightedRangePartitionerTez.java
>  1562426 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/impl/io/NullablePartitionWritable.java
>  1562426 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/impl/io/PigNullableWritable.java
>  1562426 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/TEZC16.gld
>  1562426 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/TEZC7.gld
>  1562426 
> 
> Diff: https://reviews.apache.org/r/17529/diff/
> 
> 
> Testing
> ---
> 
> test-tez and tez.conf e2e tests pass
> 
> 
> Thanks,
> 
> Rohini Palaniswamy
> 
>



Re: Review Request 17529: [PIG-3732] Use ONE_TO_ONE edge and IdentityInOut in orderby intermediate vertex

2014-01-30 Thread Rohini Palaniswamy


> On Jan. 30, 2014, 7:53 p.m., Daniel Dai wrote:
> > http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java,
> >  line 1462
> > 
> >
> > What's the reason for this change? lr should be already constructed, 
> > right?

lr was just a plain POLOcalRearrangeTez without plan (only Constant(DummyVal) 
in its plan). Since we removed the LR in the Partition vertex and used 
POIdentityInOut which does not process key values, that had to be done moved to 
the input vertex. See TEZC16.gld for how orderby plan has changed. 


- Rohini


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17529/#review33243
---


On Jan. 30, 2014, 8:28 a.m., Rohini Palaniswamy wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/17529/
> ---
> 
> (Updated Jan. 30, 2014, 8:28 a.m.)
> 
> 
> Review request for pig, Cheolsoo Park and Daniel Dai.
> 
> 
> Bugs: PIG-3732
> https://issues.apache.org/jira/browse/PIG-3732
> 
> 
> Repository: pig
> 
> 
> Description
> ---
> 
> Orderby has 4 vertices and changes done are as below.
> 
> Load Vertex -> Partitioner Vertex 
>  - Was RoundRobinPartitioner with sorted shuffle and parallelism of 
> Partitioner Vertex was same as reducer vertex (i.e PARALLEL clause). Now 
> ONE_TO_ONE unsorted edge between Load Vertex and Partitioner Vertex with 
> Partitioner Vertex having same parallelism as Load Vertex. Will get the 
> performance numbers for both cases by Friday.
> Load Vertex -> Sampler Vertex  
> Sampler Vertex -> Partitioner Vertex (Broadcast edge)
>  - The POPackage->POForeach->POLocalRearrange in Partitioner Vertex has 
> been replaced by POIdentityInOutTez
> Partitioner Vertex -> Reducer Vertex
> 
> Need to attempt this for Skewed Join as well.
> 
> 
> This patch also sets credential on DAG which is required after TEZ-395
> 
> 
> Diffs
> -
> 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/partitioners/WeightedRangePartitioner.java
>  1562426 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/PhysicalOperator.java
>  1562426 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/POIdentityInOutTez.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/POLocalRearrangeTez.java
>  1562426 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/POPartitionRearrangeTez.java
>  1562426 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/POShuffleTezLoad.java
>  1562426 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java
>  1562426 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java
>  1562426 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezDAG.java
>  1562426 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/WeightedRangePartitionerTez.java
>  1562426 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/impl/io/NullablePartitionWritable.java
>  1562426 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/impl/io/PigNullableWritable.java
>  1562426 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/TEZC16.gld
>  1562426 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/TEZC7.gld
>  1562426 
> 
> Diff: https://reviews.apache.org/r/17529/diff/
> 
> 
> Testing
> ---
> 
> test-tez and tez.conf e2e tests pass
> 
> 
> Thanks,
> 
> Rohini Palaniswamy
> 
>



[jira] [Commented] (PIG-3667) build.xml jar-all target does not include jython*.jar in lib/ directory

2014-01-30 Thread Suhas Satish (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887164#comment-13887164
 ] 

Suhas Satish commented on PIG-3667:
---

[~cheolsoo]- Sorry, I accidentally marked it as "fixed". The intention was to 
fix it and get the patch committed to pig trunk since the build/ directory 
where the jython standalone jar currently resides is not packaged in production 
environments. It is desirable to have it in $PIG_HOME/lib/ directory in the 
package. 

> build.xml jar-all target does not include jython*.jar in lib/ directory 
> 
>
> Key: PIG-3667
> URL: https://issues.apache.org/jira/browse/PIG-3667
> Project: Pig
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.12.0
>Reporter: Suhas Satish
>Assignee: Suhas Satish
>  Labels: build
> Attachments: PIG-3667.patch
>
>
> Pig package does not include the jython jar within lib/ directory  with the 
> jar-all ant target but includes it in the "ant package" target. It should be 
> including it in both targets as often, the build/ directory is excluded from 
> packaging which is where ivy puts all the dependency jars while building 
> under build/ivy/lib/Pig  
> To reproduce:
> ant jar-all 
> rm -rf build/ 
> bin/pig
> grunt> register '/tmp/test.py' using jython as myfunction;
> If done prior to installing jython, here's the error one gets:
> 2013-12-27 18:22:31,145 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR
> 2998: Unhandled internal error. org/python/core/PyObject
> Details at logfile: pig_*.log
> Within the pig_*.log => 
> 
> Pig Stack Trace
> ---
> ERROR 2998: Unhandled internal error. org/python/core/PyObject
> java.lang.NoClassDefFoundError: org/python/core/PyObject
> at
> org.apache.pig.scripting.jython.JythonScriptEngine.registerFunctions(JythonScriptEngine.java:304)
> at org.apache.pig.PigServer.registerCode(PigServer.java:501)
> at
> org.apache.pig.tools.grunt.GruntParser.processRegister(GruntParser.java:436)
> at
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:445)
> at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
> at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
> at org.apache.pig.Main.run(Main.java:538)
> at org.apache.pig.Main.main(Main.java:157)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
> Caused by: java.lang.ClassNotFoundException: org.python.core.PyObject
> at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> ... 14 more
> Fix: Including jython*.jar within the lib/ directory gets rid of this issue 
> and the UDF can be loaded- 
> grunt>  register '/tmp/test.py' using jython as myfuncs;
> 2013-12-27 18:37:02,402 [main] INFO 
> org.apache.pig.scripting.jython.JythonScriptEngine - created tmp
> python.cachedir=/tmp/pig_jython_4887743829482443898
> 2013-12-27 18:37:03,448 [main] WARN 
> org.apache.pig.scripting.jython.JythonScriptEngine - pig.cmd.args.remainders 
> is
> empty. This is not expected unless on testing.
> 2013-12-27 18:37:03,724 [main] INFO 
> org.apache.pig.scripting.jython.JythonScriptEngine - Register scripting UDF:
> myfuncs.helloworld



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (PIG-3347) Store invocation brings side effect

2014-01-30 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887114#comment-13887114
 ] 

Dmitriy V. Ryaboy commented on PIG-3347:


Yikes.

[~aniket486] & [~julienledem] this seems like a critical bug to look at. 
Julien, you investigated this UID situation before, right?

> Store invocation brings side effect
> ---
>
> Key: PIG-3347
> URL: https://issues.apache.org/jira/browse/PIG-3347
> Project: Pig
>  Issue Type: Bug
>  Components: grunt
>Affects Versions: 0.11
> Environment: local mode
>Reporter: Sergey
>Assignee: Daniel Dai
>Priority: Critical
> Fix For: 0.12.1
>
> Attachments: PIG-3347-1.patch
>
>
> The problem is that intermediate 'store' invocation "changes" the final store 
> output. Looks like it brings some kind of side effect. We did use 'local' 
> mode to run script
> here is the input data:
> 1
> 1
> Here is the script:
> {code}
> a = load 'test';
> a_group = group a by $0;
> b = foreach a_group {
>   a_distinct = distinct a.$0;
>   generate group, a_distinct;
> }
> --store b into 'b';
> c = filter b by SIZE(a_distinct) == 1;
> store c into 'out';
> {code}
> We expect output to be:
> 1 1
> The output is empty file.
> Uncomment {code}--store b into 'b';{code} line and see the diffrence.
> Yuo would get expected output.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (PIG-3347) Store invocation brings side effect

2014-01-30 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-3347:
---

Priority: Critical  (was: Major)

> Store invocation brings side effect
> ---
>
> Key: PIG-3347
> URL: https://issues.apache.org/jira/browse/PIG-3347
> Project: Pig
>  Issue Type: Bug
>  Components: grunt
>Affects Versions: 0.11
> Environment: local mode
>Reporter: Sergey
>Assignee: Daniel Dai
>Priority: Critical
> Fix For: 0.12.1
>
> Attachments: PIG-3347-1.patch
>
>
> The problem is that intermediate 'store' invocation "changes" the final store 
> output. Looks like it brings some kind of side effect. We did use 'local' 
> mode to run script
> here is the input data:
> 1
> 1
> Here is the script:
> {code}
> a = load 'test';
> a_group = group a by $0;
> b = foreach a_group {
>   a_distinct = distinct a.$0;
>   generate group, a_distinct;
> }
> --store b into 'b';
> c = filter b by SIZE(a_distinct) == 1;
> store c into 'out';
> {code}
> We expect output to be:
> 1 1
> The output is empty file.
> Uncomment {code}--store b into 'b';{code} line and see the diffrence.
> Yuo would get expected output.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (PIG-3299) Provide support for LazyOutputFormat to avoid creating empty files

2014-01-30 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887109#comment-13887109
 ] 

Dmitriy V. Ryaboy commented on PIG-3299:


[~daijy] shall we commit this?

> Provide support for LazyOutputFormat to avoid creating empty files
> --
>
> Key: PIG-3299
> URL: https://issues.apache.org/jira/browse/PIG-3299
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.11.1
>Reporter: Rohini Palaniswamy
>Assignee: Lorand Bendig
> Attachments: PIG-3299.patch
>
>
> LazyOutputFormat (HADOOP-4927) in hadoop is a wrapper to avoid creating part 
> files if there is no records output. It would be good to add support for that 
> by having a configuration in pig which wraps storeFunc.getOutputFormat() with 
> LazyOutputFormat. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (PIG-3672) pig should not hardcode "hdfs://" path in code, should be configurable to other file system implementations

2014-01-30 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-3672:
---

Status: Open  (was: Patch Available)

cancelling patch available status given Rohini's comments -- please make patch 
available again when a new patch is submitted

> pig should not hardcode "hdfs://" path in code, should be configurable to 
> other file system implementations
> ---
>
> Key: PIG-3672
> URL: https://issues.apache.org/jira/browse/PIG-3672
> Project: Pig
>  Issue Type: Bug
>  Components: data, parser
>Affects Versions: 0.11.1, 0.12.0, 0.10.0
>Reporter: Suhas Satish
>Assignee: Suhas Satish
> Attachments: PIG-3672-1.patch, PIG-3672-2.patch, PIG-3672.patch
>
>
> QueryParserUtils.java has the code - 
> result.add("hdfs://"+thisHost+":"+uri.getPort());
> I propose to make it generic like - 
> result.add(uri.getScheme() + "://"+thisHost+":"+uri.getPort());
> Similarly jobControlCompiler.java has - 
> if (!outputPathString.contains("://") || 
> outputPathString.startsWith("hdfs://")) {
>  I have a patch version which I ran passing unit tests on. Will be uploading 
> it shortly.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (PIG-3456) Reduce threadlocal conf access in backend for each record

2014-01-30 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887098#comment-13887098
 ] 

Dmitriy V. Ryaboy commented on PIG-3456:


Could you post a patch without the whitespace changes (for ease of review) and 
some microbenchmark results?

I had some microbenchmark code in PIG-3325, that might help bootstrap you here.

> Reduce threadlocal conf access in backend for each record
> -
>
> Key: PIG-3456
> URL: https://issues.apache.org/jira/browse/PIG-3456
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.11.1
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.13.0
>
> Attachments: PIG-3456-1.patch
>
>
> Noticed few things while browsing code
> 1) DefaultTuple has a protected boolean isNull = false; which is never used. 
> Removing this gives ~3-5% improvement for big jobs
> 2) Config checking with ThreadLocal conf is repeatedly done for each record. 
> For eg: createDataBag in POCombinerPackage. But initialized only for first 
> time in other places like POPackage, POJoinPackage, etc.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (PIG-3456) Reduce threadlocal conf access in backend for each record

2014-01-30 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-3456:


Fix Version/s: 0.13.0
Affects Version/s: 0.11.1
   Status: Patch Available  (was: Open)

> Reduce threadlocal conf access in backend for each record
> -
>
> Key: PIG-3456
> URL: https://issues.apache.org/jira/browse/PIG-3456
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.11.1
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.13.0
>
> Attachments: PIG-3456-1.patch
>
>
> Noticed few things while browsing code
> 1) DefaultTuple has a protected boolean isNull = false; which is never used. 
> Removing this gives ~3-5% improvement for big jobs
> 2) Config checking with ThreadLocal conf is repeatedly done for each record. 
> For eg: createDataBag in POCombinerPackage. But initialized only for first 
> time in other places like POPackage, POJoinPackage, etc.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (PIG-3456) Reduce threadlocal conf access in backend for each record

2014-01-30 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-3456:


Attachment: PIG-3456-1.patch

> Reduce threadlocal conf access in backend for each record
> -
>
> Key: PIG-3456
> URL: https://issues.apache.org/jira/browse/PIG-3456
> Project: Pig
>  Issue Type: Improvement
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Attachments: PIG-3456-1.patch
>
>
> Noticed few things while browsing code
> 1) DefaultTuple has a protected boolean isNull = false; which is never used. 
> Removing this gives ~3-5% improvement for big jobs
> 2) Config checking with ThreadLocal conf is repeatedly done for each record. 
> For eg: createDataBag in POCombinerPackage. But initialized only for first 
> time in other places like POPackage, POJoinPackage, etc.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


Re: Review Request 17557: [PIG-3456] Reduce threadlocal conf access in backend for each record

2014-01-30 Thread Rohini Palaniswamy

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17557/
---

(Updated Jan. 30, 2014, 8:37 p.m.)


Review request for pig.


Bugs: PIG-3456
https://issues.apache.org/jira/browse/PIG-3456


Repository: pig


Description
---

1) DefaultTuple has a protected boolean isNull = false; which is never used. 
Removed that.
2) Config checking with ThreadLocal conf is repeatedly done for each record. 
For eg: createDataBag in POCombinerPackage. But initialized only for first time 
in other places like POPackage, POJoinPackage, etc. PIG-3730 was one case which 
showed that config access was causing performance degradation. So replaced all 
occurrences with accessing config during initialization time.


Diffs (updated)
-

  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapBase.java
 1562947 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POCombinerPackage.java
 1562947 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/PODistinct.java
 1562947 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POLoad.java
 1562947 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPackage.java
 1562947 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPartialAgg.java
 1562947 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POSort.java
 1562947 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/builtin/Distinct.java
 1562947 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/data/DefaultAbstractBag.java
 1562947 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/data/DefaultTuple.java
 1562947 
  
http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestTuple.java
 1562947 

Diff: https://reviews.apache.org/r/17557/diff/


Testing
---

Ran full suite of unit tests


Thanks,

Rohini Palaniswamy



Review Request 17557: [PIG-3456] Reduce threadlocal conf access in backend for each record

2014-01-30 Thread Rohini Palaniswamy

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17557/
---

Review request for pig.


Bugs: PIG-3456
https://issues.apache.org/jira/browse/PIG-3456


Repository: pig


Description
---

1) DefaultTuple has a protected boolean isNull = false; which is never used. 
Removed that.
2) Config checking with ThreadLocal conf is repeatedly done for each record. 
For eg: createDataBag in POCombinerPackage. But initialized only for first time 
in other places like POPackage, POJoinPackage, etc. PIG-3730 was one case which 
showed that config access was causing performance degradation. So replaced all 
occurrences with accessing config during initialization time.


Diffs
-

  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapBase.java
 1562947 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POCombinerPackage.java
 1562947 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/PODistinct.java
 1562947 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POLoad.java
 1562947 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPackage.java
 1562947 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPartialAgg.java
 1562947 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POSort.java
 1562947 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/builtin/Distinct.java
 1562947 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/data/DefaultAbstractBag.java
 1562947 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/data/DefaultTuple.java
 1562947 
  
http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestTuple.java
 1562947 

Diff: https://reviews.apache.org/r/17557/diff/


Testing
---

Ran full suite of unit tests


Thanks,

Rohini Palaniswamy



[jira] [Updated] (PIG-3456) Reduce threadlocal conf access in backend for each record

2014-01-30 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-3456:


Summary: Reduce threadlocal conf access in backend for each record  (was: 
Some minor performance improvements)

> Reduce threadlocal conf access in backend for each record
> -
>
> Key: PIG-3456
> URL: https://issues.apache.org/jira/browse/PIG-3456
> Project: Pig
>  Issue Type: Improvement
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
>
> Noticed few things while browsing code
> 1) DefaultTuple has a protected boolean isNull = false; which is never used. 
> Removing this gives ~3-5% improvement for big jobs
> 2) Config checking with ThreadLocal conf is repeatedly done for each record. 
> For eg: createDataBag in POCombinerPackage. But initialized only for first 
> time in other places like POPackage, POJoinPackage, etc.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (PIG-3722) Udf deserialization for registered classes fails in local_mode

2014-01-30 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-3722:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Udf deserialization for registered classes fails in local_mode
> --
>
> Key: PIG-3722
> URL: https://issues.apache.org/jira/browse/PIG-3722
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
> Fix For: 0.13.0
>
> Attachments: PIG-3722.patch
>
>
> Similar to https://issues.apache.org/jira/browse/PIG-2532, registered classes 
> are not available if jobs are converted to local_mode.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (PIG-3722) Udf deserialization for registered classes fails in local_mode

2014-01-30 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887009#comment-13887009
 ] 

Aniket Mokashi commented on PIG-3722:
-

Committed to trunk. Thanks [~dvryaboy] for the review!

> Udf deserialization for registered classes fails in local_mode
> --
>
> Key: PIG-3722
> URL: https://issues.apache.org/jira/browse/PIG-3722
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
> Fix For: 0.13.0
>
> Attachments: PIG-3722.patch
>
>
> Similar to https://issues.apache.org/jira/browse/PIG-2532, registered classes 
> are not available if jobs are converted to local_mode.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (PIG-3722) Udf deserialization for registered classes fails in local_mode

2014-01-30 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13886991#comment-13886991
 ] 

Dmitriy V. Ryaboy commented on PIG-3722:


+1

> Udf deserialization for registered classes fails in local_mode
> --
>
> Key: PIG-3722
> URL: https://issues.apache.org/jira/browse/PIG-3722
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
> Fix For: 0.13.0
>
> Attachments: PIG-3722.patch
>
>
> Similar to https://issues.apache.org/jira/browse/PIG-2532, registered classes 
> are not available if jobs are converted to local_mode.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


Re: Review Request 17529: [PIG-3732] Use ONE_TO_ONE edge and IdentityInOut in orderby intermediate vertex

2014-01-30 Thread Daniel Dai

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17529/#review33243
---



http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java


What's the reason for this change? lr should be already constructed, right?


- Daniel Dai


On Jan. 30, 2014, 8:28 a.m., Rohini Palaniswamy wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/17529/
> ---
> 
> (Updated Jan. 30, 2014, 8:28 a.m.)
> 
> 
> Review request for pig, Cheolsoo Park and Daniel Dai.
> 
> 
> Bugs: PIG-3732
> https://issues.apache.org/jira/browse/PIG-3732
> 
> 
> Repository: pig
> 
> 
> Description
> ---
> 
> Orderby has 4 vertices and changes done are as below.
> 
> Load Vertex -> Partitioner Vertex 
>  - Was RoundRobinPartitioner with sorted shuffle and parallelism of 
> Partitioner Vertex was same as reducer vertex (i.e PARALLEL clause). Now 
> ONE_TO_ONE unsorted edge between Load Vertex and Partitioner Vertex with 
> Partitioner Vertex having same parallelism as Load Vertex. Will get the 
> performance numbers for both cases by Friday.
> Load Vertex -> Sampler Vertex  
> Sampler Vertex -> Partitioner Vertex (Broadcast edge)
>  - The POPackage->POForeach->POLocalRearrange in Partitioner Vertex has 
> been replaced by POIdentityInOutTez
> Partitioner Vertex -> Reducer Vertex
> 
> Need to attempt this for Skewed Join as well.
> 
> 
> This patch also sets credential on DAG which is required after TEZ-395
> 
> 
> Diffs
> -
> 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/partitioners/WeightedRangePartitioner.java
>  1562426 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/PhysicalOperator.java
>  1562426 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/POIdentityInOutTez.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/POLocalRearrangeTez.java
>  1562426 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/POPartitionRearrangeTez.java
>  1562426 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/POShuffleTezLoad.java
>  1562426 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java
>  1562426 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java
>  1562426 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezDAG.java
>  1562426 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/WeightedRangePartitionerTez.java
>  1562426 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/impl/io/NullablePartitionWritable.java
>  1562426 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/impl/io/PigNullableWritable.java
>  1562426 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/TEZC16.gld
>  1562426 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/TEZC7.gld
>  1562426 
> 
> Diff: https://reviews.apache.org/r/17529/diff/
> 
> 
> Testing
> ---
> 
> test-tez and tez.conf e2e tests pass
> 
> 
> Thanks,
> 
> Rohini Palaniswamy
> 
>



[jira] [Commented] (PIG-3733) Pig fails to concatenate semi-colon in generate statement

2014-01-30 Thread sudhir mallem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13886899#comment-13886899
 ] 

sudhir mallem commented on PIG-3733:


Yes. If the semi-colon is part of REGEX statement, it works. whereas in 
Function like CONCAT, its not working. I'm not sure if other functions which 
has ";" has the same problem. 
For example, if there is a variable substitution and the variable has 
semi-colon, it fails.
here is an example:
{code}
[smallem@eat1-hcl4014 ~]$ vi testvar.pig

%declare testVar 'hello;';

a = load '/user/smallem/mem.csv' using PigStorage('|') as (uid:int, 
sid:chararray);
b = foreach a generate uid, sid, $testVar as newcol;
dump b
{code}

> Pig fails to concatenate semi-colon in generate statement
> -
>
> Key: PIG-3733
> URL: https://issues.apache.org/jira/browse/PIG-3733
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11.1
>Reporter: sudhir mallem
>
> Pig fails to concatenate semi-colon to a column in a generate statement. I've 
> tried multiple ways including unicode version (\\u003B), but fails.
> {code}
> grunt> a = load '/user/smallem/mem.csv' using PigStorage('|') as (uid:int, 
> sid:chararray);
> grunt> b = foreach a generate uid as uid, CONCAT('v=1;',sid) as sids;
>   mismatched character '' expecting '''
> 2014-01-30 08:51:51,759 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1200:   mismatched character '' expecting '''
> Details at logfile: /export/home/smallem/pig_1391071809426.log
> {code}
> The same however works when used nested statement.
> {code}
> grunt> a = load '/user/smallem/mem.csv' using PigStorage('|') as (uid:int, 
> sid:chararray);
> grunt> b = foreach a {
> 
> >> x = CONCAT('v=1;',sid);
> >> generate uid as memberuid, x as sids ;
> >> };
> grunt>
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (PIG-3641) Split "otherwise" producing incorrect output when combined with ColumnPruning

2014-01-30 Thread Koji Noguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-3641:
--

   Resolution: Fixed
Fix Version/s: 0.12.1
   Status: Resolved  (was: Patch Available)

Thanks Rohini for the review!

Committed to 0.12.1 and trunk.

> Split "otherwise" producing incorrect output when combined with ColumnPruning
> -
>
> Key: PIG-3641
> URL: https://issues.apache.org/jira/browse/PIG-3641
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.10.0, 0.12.0, 0.11.1, 0.13.0
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
> Fix For: 0.12.1
>
> Attachments: pig-3641_v01.patch, pig-3641_v02_withe2etest.patch
>
>
> Our user was observing incorrect outputs depending on if the query had 
> intermediate output or not.  Below is a simplified testcase I came up with.
> {noformat}
> knoguchi pig> cat test.txt
> 9,1,ignored
> 9,1,ignored
> 9,1,ignored
> knoguchi pig> cat bz-6590644/test.pig
> A = load 'test.txt' using PigStorage(',') as (a1:int, a2:int, a3:chararray);
> B = foreach A generate a1,a2;
> SPLIT B into C1 if a2 == 1, D1 otherwise;
> C2 = foreach C1 generate a2;
> store C2 into '/tmp/testC';
> store D1 into '/tmp/testD';
> knoguchi@nameother-lm pig>
> {noformat}
> Incorrect output shown below.  /tmp/testD should be empty but somehow has 
> data in it.
> {noformat}
> knoguchi@nameother-lm pig> cat /tmp/testC/part-m-0
> 1
> 1
> 1
> knoguchi pig> cat /tmp/testD/part-m-0
> 9   1
> 9   1
> 9   1
> knoguchi pig>
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (PIG-3641) Split "otherwise" producing incorrect output when combined with ColumnPruning

2014-01-30 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13886776#comment-13886776
 ] 

Rohini Palaniswamy commented on PIG-3641:
-

TestScriptLanguage.runParallelTest2 also failed. That is also unrelated. Will 
file a jira to fix both.

> Split "otherwise" producing incorrect output when combined with ColumnPruning
> -
>
> Key: PIG-3641
> URL: https://issues.apache.org/jira/browse/PIG-3641
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.10.0, 0.12.0, 0.11.1, 0.13.0
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
> Attachments: pig-3641_v01.patch, pig-3641_v02_withe2etest.patch
>
>
> Our user was observing incorrect outputs depending on if the query had 
> intermediate output or not.  Below is a simplified testcase I came up with.
> {noformat}
> knoguchi pig> cat test.txt
> 9,1,ignored
> 9,1,ignored
> 9,1,ignored
> knoguchi pig> cat bz-6590644/test.pig
> A = load 'test.txt' using PigStorage(',') as (a1:int, a2:int, a3:chararray);
> B = foreach A generate a1,a2;
> SPLIT B into C1 if a2 == 1, D1 otherwise;
> C2 = foreach C1 generate a2;
> store C2 into '/tmp/testC';
> store D1 into '/tmp/testD';
> knoguchi@nameother-lm pig>
> {noformat}
> Incorrect output shown below.  /tmp/testD should be empty but somehow has 
> data in it.
> {noformat}
> knoguchi@nameother-lm pig> cat /tmp/testC/part-m-0
> 1
> 1
> 1
> knoguchi pig> cat /tmp/testD/part-m-0
> 9   1
> 9   1
> 9   1
> knoguchi pig>
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (PIG-3641) Split "otherwise" producing incorrect output when combined with ColumnPruning

2014-01-30 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13886759#comment-13886759
 ] 

Rohini Palaniswamy commented on PIG-3641:
-

+1. I ran the full suite of unit tests and they are fine. TestAutoLocalMode 
fails, but that is unrelated.

> Split "otherwise" producing incorrect output when combined with ColumnPruning
> -
>
> Key: PIG-3641
> URL: https://issues.apache.org/jira/browse/PIG-3641
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.10.0, 0.12.0, 0.11.1, 0.13.0
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
> Attachments: pig-3641_v01.patch, pig-3641_v02_withe2etest.patch
>
>
> Our user was observing incorrect outputs depending on if the query had 
> intermediate output or not.  Below is a simplified testcase I came up with.
> {noformat}
> knoguchi pig> cat test.txt
> 9,1,ignored
> 9,1,ignored
> 9,1,ignored
> knoguchi pig> cat bz-6590644/test.pig
> A = load 'test.txt' using PigStorage(',') as (a1:int, a2:int, a3:chararray);
> B = foreach A generate a1,a2;
> SPLIT B into C1 if a2 == 1, D1 otherwise;
> C2 = foreach C1 generate a2;
> store C2 into '/tmp/testC';
> store D1 into '/tmp/testD';
> knoguchi@nameother-lm pig>
> {noformat}
> Incorrect output shown below.  /tmp/testD should be empty but somehow has 
> data in it.
> {noformat}
> knoguchi@nameother-lm pig> cat /tmp/testC/part-m-0
> 1
> 1
> 1
> knoguchi pig> cat /tmp/testD/part-m-0
> 9   1
> 9   1
> 9   1
> knoguchi pig>
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (PIG-3730) Performance issue in SelfSpillBag

2014-01-30 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy resolved PIG-3730.
-

   Resolution: Fixed
Fix Version/s: 0.13.0
 Assignee: Rajesh Balamohan

Committed to trunk. Thanks Rajesh.

> Performance issue in SelfSpillBag
> -
>
> Key: PIG-3730
> URL: https://issues.apache.org/jira/browse/PIG-3730
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.11
> Environment: Pig 0.11 with MR-V1
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Fix For: 0.13.0
>
> Attachments: PIG-3730-trunk-v1.patch, PIG-3730-trunk-v2.patch
>
>
> We have bunch of joins in our pig scripts (joining 5 to 15 datasets 
> together).  Pig creates a bunch of REPLICATED, HASH_JOINs and we observed 
> heavy performance degradation in one of the launched M/R job.  This was 
> specifically on the reducer side.  Taking multiple threaddumps revealed the 
> following
> "main" prio=10 tid=0x7fbaa801c000 nid=0x1464 runnable [0x7fbaaee76000]
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1781)
>   - locked <0xb5316370> (a org.apache.hadoop.mapred.JobConf)
>   at org.apache.hadoop.conf.Configuration.get(Configuration.java:712)
>   at 
> org.apache.pig.data.SelfSpillBag$MemoryLimits.init(SelfSpillBag.java:73)
>   at 
> org.apache.pig.data.SelfSpillBag$MemoryLimits.(SelfSpillBag.java:65)
>   at org.apache.pig.data.SelfSpillBag.(SelfSpillBag.java:39)
>   at 
> org.apache.pig.data.InternalCachedBag.(InternalCachedBag.java:63)
>   at 
> org.apache.pig.data.InternalCachedBag.(InternalCachedBag.java:59)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POJoinPackage.getNext(POJoinPackage.java:146)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:422)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:405)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:257)
>   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:164)
>   at 
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:610)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:444)
>   at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>   at org.apache.hadoop.mapred.Child.main(Child.java:262)
> at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1781)
>   - locked <0xb5316388> (a org.apache.hadoop.mapred.JobConf)
>   at org.apache.hadoop.conf.Configuration.get(Configuration.java:712)
>   at 
> org.apache.pig.data.SelfSpillBag$MemoryLimits.init(SelfSpillBag.java:73)
>   at 
> org.apache.pig.data.SelfSpillBag$MemoryLimits.(SelfSpillBag.java:65)
>   at org.apache.pig.data.SelfSpillBag.(SelfSpillBag.java:39)
>   at 
> org.apache.pig.data.InternalCachedBag.(InternalCachedBag.java:63)
>   at 
> org.apache.pig.data.InternalCachedBag.(InternalCachedBag.java:59)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POJoinPackage.getNext(POJoinPackage.java:146)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:422)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:405)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:257)
>   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:164)
>   at 
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:610)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:444)
>   at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>   at org.apache.hadoop.mapred.Child.main(Child.java:262)
> In certain corner cases (where pig.cachedbag.type is not "default"), 
> InternalCachedBag is initialized in POJoinPackage.  
> InternalCachedBag in

[jira] [Commented] (PIG-3729) Need to rebuild pig 0.12.0 jar from mvnrepository.com to resove error in hadoop 2.2 0

2014-01-30 Thread Nezih Yigitbasi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13886753#comment-13886753
 ] 

Nezih Yigitbasi commented on PIG-3729:
--

Can you use -secretDebugCmd to see what gets into the classpath (pig 
-secretDebugCmd)?



> Need to rebuild pig 0.12.0 jar from mvnrepository.com to resove error in 
> hadoop 2.2 0 
> --
>
> Key: PIG-3729
> URL: https://issues.apache.org/jira/browse/PIG-3729
> Project: Pig
>  Issue Type: Bug
>  Components: build, internal-udfs
>Affects Versions: 0.12.0
> Environment: debian single node pseudo distributed hadoop 2.2 cluster
>Reporter: Nigel Savage
>Priority: Blocker
>  Labels: Hadoop, Iteator, Pig, PigServer
>
> Unable to get pig 0.12.0 to run single node pseudo distributed hadoop 2.2 
> cluster using the jar from mvnrepository.com 
> this is the error
> "org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
> open iterator for alias"
> if I download the jar from
> http://apache.claz.org/pig/pig-0.12.0/pig-0.12.0.tar.gz  
> and then recompile the src with  "ant clean jar -Dhadoopversion=23"  
> when I install the recompiled  jar in .m2 everything works
> If it is the case that the Apache hadoop 2.2.0 jars are backwardly compatible 
> then could some one comment on what could be the issue?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (PIG-3729) Need to rebuild pig 0.12.0 jar from mvnrepository.com to resove error in hadoop 2.2 0

2014-01-30 Thread Ashlee Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13886748#comment-13886748
 ] 

Ashlee Lee commented on PIG-3729:
-

BTW, I tried using hadoop 1.2.1 + Pig 0.12.0 for my case, it is working..

> Need to rebuild pig 0.12.0 jar from mvnrepository.com to resove error in 
> hadoop 2.2 0 
> --
>
> Key: PIG-3729
> URL: https://issues.apache.org/jira/browse/PIG-3729
> Project: Pig
>  Issue Type: Bug
>  Components: build, internal-udfs
>Affects Versions: 0.12.0
> Environment: debian single node pseudo distributed hadoop 2.2 cluster
>Reporter: Nigel Savage
>Priority: Blocker
>  Labels: Hadoop, Iteator, Pig, PigServer
>
> Unable to get pig 0.12.0 to run single node pseudo distributed hadoop 2.2 
> cluster using the jar from mvnrepository.com 
> this is the error
> "org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
> open iterator for alias"
> if I download the jar from
> http://apache.claz.org/pig/pig-0.12.0/pig-0.12.0.tar.gz  
> and then recompile the src with  "ant clean jar -Dhadoopversion=23"  
> when I install the recompiled  jar in .m2 everything works
> If it is the case that the Apache hadoop 2.2.0 jars are backwardly compatible 
> then could some one comment on what could be the issue?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (PIG-3729) Need to rebuild pig 0.12.0 jar from mvnrepository.com to resove error in hadoop 2.2 0

2014-01-30 Thread Ashlee Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13886745#comment-13886745
 ] 

Ashlee Lee commented on PIG-3729:
-

Thanks for your reply! And sorry for the mistyping PIG_CLASS_PATH, I export 
PIG_CLASSPATH and HADOOP_CONF_DIR (which is equal to $HADOOP_HOME/etc/hadoop in 
hadoop 2.2.0), I and tried to add $HADOOP_HOME/etc/hadoop (which is not /conf 
anymore in hadoop 2.2.0) to the classpath, but it is still not working...

> Need to rebuild pig 0.12.0 jar from mvnrepository.com to resove error in 
> hadoop 2.2 0 
> --
>
> Key: PIG-3729
> URL: https://issues.apache.org/jira/browse/PIG-3729
> Project: Pig
>  Issue Type: Bug
>  Components: build, internal-udfs
>Affects Versions: 0.12.0
> Environment: debian single node pseudo distributed hadoop 2.2 cluster
>Reporter: Nigel Savage
>Priority: Blocker
>  Labels: Hadoop, Iteator, Pig, PigServer
>
> Unable to get pig 0.12.0 to run single node pseudo distributed hadoop 2.2 
> cluster using the jar from mvnrepository.com 
> this is the error
> "org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
> open iterator for alias"
> if I download the jar from
> http://apache.claz.org/pig/pig-0.12.0/pig-0.12.0.tar.gz  
> and then recompile the src with  "ant clean jar -Dhadoopversion=23"  
> when I install the recompiled  jar in .m2 everything works
> If it is the case that the Apache hadoop 2.2.0 jars are backwardly compatible 
> then could some one comment on what could be the issue?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (PIG-3729) Need to rebuild pig 0.12.0 jar from mvnrepository.com to resove error in hadoop 2.2 0

2014-01-30 Thread Nezih Yigitbasi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13886724#comment-13886724
 ] 

Nezih Yigitbasi commented on PIG-3729:
--

Ashlee, seems like your classpath doesn't contain $HADOOP_HOME/conf so Pig 
can't find them. Can you try exporting PIG_CLASSPATH (no underscores between 
CLASS and PATH) with $HADOOP_HOME/conf and try again (assuming HADOOP_HOME is 
set properly)? 

> Need to rebuild pig 0.12.0 jar from mvnrepository.com to resove error in 
> hadoop 2.2 0 
> --
>
> Key: PIG-3729
> URL: https://issues.apache.org/jira/browse/PIG-3729
> Project: Pig
>  Issue Type: Bug
>  Components: build, internal-udfs
>Affects Versions: 0.12.0
> Environment: debian single node pseudo distributed hadoop 2.2 cluster
>Reporter: Nigel Savage
>Priority: Blocker
>  Labels: Hadoop, Iteator, Pig, PigServer
>
> Unable to get pig 0.12.0 to run single node pseudo distributed hadoop 2.2 
> cluster using the jar from mvnrepository.com 
> this is the error
> "org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
> open iterator for alias"
> if I download the jar from
> http://apache.claz.org/pig/pig-0.12.0/pig-0.12.0.tar.gz  
> and then recompile the src with  "ant clean jar -Dhadoopversion=23"  
> when I install the recompiled  jar in .m2 everything works
> If it is the case that the Apache hadoop 2.2.0 jars are backwardly compatible 
> then could some one comment on what could be the issue?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (PIG-3733) Pig fails to concatenate semi-colon in generate statement

2014-01-30 Thread Lorand Bendig (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13886525#comment-13886525
 ] 

Lorand Bendig commented on PIG-3733:


I suspect, that this issue has been fixed in PIG-2507

> Pig fails to concatenate semi-colon in generate statement
> -
>
> Key: PIG-3733
> URL: https://issues.apache.org/jira/browse/PIG-3733
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11.1
>Reporter: sudhir mallem
>
> Pig fails to concatenate semi-colon to a column in a generate statement. I've 
> tried multiple ways including unicode version (\\u003B), but fails.
> {code}
> grunt> a = load '/user/smallem/mem.csv' using PigStorage('|') as (uid:int, 
> sid:chararray);
> grunt> b = foreach a generate uid as uid, CONCAT('v=1;',sid) as sids;
>   mismatched character '' expecting '''
> 2014-01-30 08:51:51,759 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1200:   mismatched character '' expecting '''
> Details at logfile: /export/home/smallem/pig_1391071809426.log
> {code}
> The same however works when used nested statement.
> {code}
> grunt> a = load '/user/smallem/mem.csv' using PigStorage('|') as (uid:int, 
> sid:chararray);
> grunt> b = foreach a {
> 
> >> x = CONCAT('v=1;',sid);
> >> generate uid as memberuid, x as sids ;
> >> };
> grunt>
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (PIG-3729) Need to rebuild pig 0.12.0 jar from mvnrepository.com to resove error in hadoop 2.2 0

2014-01-30 Thread Ashlee Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13886423#comment-13886423
 ] 

Ashlee Lee commented on PIG-3729:
-

Different situation for me. I tried both pig-0.12.0.jar in mvnrepository and my 
recomplied pig jar, but when I run my jar which use PigServer to run 
pigscripts, it failed with ERROR 4010: Cannot find hadoop configurations in 
classpath (neither hadoop-site.xml nor core-site.xml was found in the 
classpath). I have PIG_CLASS_PATH and HADOOP_CONF_DIR setted. 

Then I tried above h2 version dependency, it comes 
error>java.lang.NoClassDefFoundError: org/apache/pig/PigServer.

My jar is working fine with Pig 0.11.0 and Hadoop 1.2.0, but now I want to run 
in Pig 0.12.0 and Hadoop 2.2.0...I need help!

> Need to rebuild pig 0.12.0 jar from mvnrepository.com to resove error in 
> hadoop 2.2 0 
> --
>
> Key: PIG-3729
> URL: https://issues.apache.org/jira/browse/PIG-3729
> Project: Pig
>  Issue Type: Bug
>  Components: build, internal-udfs
>Affects Versions: 0.12.0
> Environment: debian single node pseudo distributed hadoop 2.2 cluster
>Reporter: Nigel Savage
>Priority: Blocker
>  Labels: Hadoop, Iteator, Pig, PigServer
>
> Unable to get pig 0.12.0 to run single node pseudo distributed hadoop 2.2 
> cluster using the jar from mvnrepository.com 
> this is the error
> "org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
> open iterator for alias"
> if I download the jar from
> http://apache.claz.org/pig/pig-0.12.0/pig-0.12.0.tar.gz  
> and then recompile the src with  "ant clean jar -Dhadoopversion=23"  
> when I install the recompiled  jar in .m2 everything works
> If it is the case that the Apache hadoop 2.2.0 jars are backwardly compatible 
> then could some one comment on what could be the issue?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (PIG-3733) Pig fails to concatenate semi-colon in generate statement

2014-01-30 Thread sudhir mallem (JIRA)
sudhir mallem created PIG-3733:
--

 Summary: Pig fails to concatenate semi-colon in generate statement
 Key: PIG-3733
 URL: https://issues.apache.org/jira/browse/PIG-3733
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11.1
Reporter: sudhir mallem


Pig fails to concatenate semi-colon to a column in a generate statement. I've 
tried multiple ways including unicode version (\\u003B), but fails.

{code}
grunt> a = load '/user/smallem/mem.csv' using PigStorage('|') as (uid:int, 
sid:chararray);
grunt> b = foreach a generate uid as uid, CONCAT('v=1;',sid) as sids;
  mismatched character '' expecting '''
2014-01-30 08:51:51,759 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
1200:   mismatched character '' expecting '''
Details at logfile: /export/home/smallem/pig_1391071809426.log
{code}

The same however works when used nested statement.

{code}
grunt> a = load '/user/smallem/mem.csv' using PigStorage('|') as (uid:int, 
sid:chararray);
grunt> b = foreach a {  
  
>> x = CONCAT('v=1;',sid);
>> generate uid as memberuid, x as sids ;
>> };
grunt>
{code}




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (PIG-3732) Use ONE_TO_ONE edge and IdentityInOut in orderby intermediate vertex

2014-01-30 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-3732:


Status: Patch Available  (was: Open)

> Use ONE_TO_ONE edge and IdentityInOut in orderby intermediate vertex
> 
>
> Key: PIG-3732
> URL: https://issues.apache.org/jira/browse/PIG-3732
> Project: Pig
>  Issue Type: Sub-task
>  Components: tez
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: tez-branch
>
> Attachments: PIG-3732-1.patch
>
>
>   From the first vertex to the intermediate vertex that does the partitioning 
> of the keys based on the WeightedRangePartitioner, use ONE_TO_ONE Tez edge 
> and unsorted output and input instead of using a shuffle edge. Also replace 
> the POPackage->POForEach->POLocalRearrange in intermediate vertex with 
> POIdentityInOutTez.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (PIG-3732) Use ONE_TO_ONE edge and IdentityInOut in orderby intermediate vertex

2014-01-30 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-3732:


Attachment: PIG-3732-1.patch

RB link:  https://reviews.apache.org/r/17529/ 
 - There are some whitespaces in the patch in reviewboard. That is fixed in the 
uploaded patch.

> Use ONE_TO_ONE edge and IdentityInOut in orderby intermediate vertex
> 
>
> Key: PIG-3732
> URL: https://issues.apache.org/jira/browse/PIG-3732
> Project: Pig
>  Issue Type: Sub-task
>  Components: tez
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: tez-branch
>
> Attachments: PIG-3732-1.patch
>
>
>   From the first vertex to the intermediate vertex that does the partitioning 
> of the keys based on the WeightedRangePartitioner, use ONE_TO_ONE Tez edge 
> and unsorted output and input instead of using a shuffle edge. Also replace 
> the POPackage->POForEach->POLocalRearrange in intermediate vertex with 
> POIdentityInOutTez.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


Review Request 17529: [PIG-3732] Use ONE_TO_ONE edge and IdentityInOut in orderby intermediate vertex

2014-01-30 Thread Rohini Palaniswamy

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17529/
---

Review request for pig, Cheolsoo Park and Daniel Dai.


Bugs: PIG-3732
https://issues.apache.org/jira/browse/PIG-3732


Repository: pig


Description
---

Orderby has 4 vertices and changes done are as below.

Load Vertex -> Partitioner Vertex 
 - Was RoundRobinPartitioner with sorted shuffle and parallelism of 
Partitioner Vertex was same as reducer vertex (i.e PARALLEL clause). Now 
ONE_TO_ONE unsorted edge between Load Vertex and Partitioner Vertex with 
Partitioner Vertex having same parallelism as Load Vertex. Will get the 
performance numbers for both cases by Friday.
Load Vertex -> Sampler Vertex  
Sampler Vertex -> Partitioner Vertex (Broadcast edge)
 - The POPackage->POForeach->POLocalRearrange in Partitioner Vertex has 
been replaced by POIdentityInOutTez
Partitioner Vertex -> Reducer Vertex

Need to attempt this for Skewed Join as well.


This patch also sets credential on DAG which is required after TEZ-395


Diffs
-

  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/partitioners/WeightedRangePartitioner.java
 1562426 
  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/PhysicalOperator.java
 1562426 
  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/POIdentityInOutTez.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/POLocalRearrangeTez.java
 1562426 
  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/POPartitionRearrangeTez.java
 1562426 
  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/POShuffleTezLoad.java
 1562426 
  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java
 1562426 
  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java
 1562426 
  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezDAG.java
 1562426 
  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/WeightedRangePartitionerTez.java
 1562426 
  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/impl/io/NullablePartitionWritable.java
 1562426 
  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/impl/io/PigNullableWritable.java
 1562426 
  
http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/TEZC16.gld
 1562426 
  
http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/TEZC7.gld
 1562426 

Diff: https://reviews.apache.org/r/17529/diff/


Testing
---

test-tez and tez.conf e2e tests pass


Thanks,

Rohini Palaniswamy