[jira] [Updated] (PIG-2913) org.apache.pig.test.TestPigServerWithMacros fails sometimes because it picks up previous minicluster configuration file

2012-10-23 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-2913:
---

Fix Version/s: 0.11

> org.apache.pig.test.TestPigServerWithMacros fails sometimes because it picks 
> up previous minicluster configuration file
> ---
>
> Key: PIG-2913
> URL: https://issues.apache.org/jira/browse/PIG-2913
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Fix For: 0.11
>
> Attachments: PIG-2913-2.patch, PIG-2913.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2999) Regression after PIG-2975: BinInterSedesTupleRawComparator secondary sort failing

2012-10-23 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482936#comment-13482936
 ] 

Gianmarco De Francisci Morales commented on PIG-2999:
-

Makes sense. Comparable only guarantees <0 ==0 and >0

> Regression after PIG-2975: BinInterSedesTupleRawComparator secondary sort 
> failing
> -
>
> Key: PIG-2999
> URL: https://issues.apache.org/jira/browse/PIG-2999
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11, 0.12
>Reporter: Koji Noguchi
>Assignee: Jonathan Coveney
> Attachments: pig-2999-v1.txt, pig-2999-v2.txt
>
>
> I think I broke the build from PIG-2975.  I see couple of tests failing at 
> BinInterSedesTupleRawComparator. 
> {noformat}
> 12/10/22 22:26:15 WARN mapred.LocalJobRunner: job_local_0022
> java.nio.BufferUnderflowException
>   at java.nio.Buffer.nextGetIndex(Buffer.java:478)
>   at java.nio.HeapByteBuffer.getLong(HeapByteBuffer.java:387)
>   at 
> org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compareBinInterSedesDatum(BinInterSedes.java:829)
>   at 
> org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compareBinSedesTuple(BinInterSedes.java:732)
>   at 
> org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compare(BinInterSedes.java:695)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSecondaryKeyComparator.compare(PigSecondaryKeyComparator.java:78)
>   at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373)
>   at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:139)
>   at 
> org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103)
>   at 
> org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335)
>   at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350)
>   at org.apache.hadoop.mapred.ReduceTask$4.next(ReduceTask.java:625)
>   at 
> org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:117)
>   at 
> org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
>   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
>   at 
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2999) Regression after PIG-2975: BinInterSedesTupleRawComparator secondary sort failing

2012-10-23 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482934#comment-13482934
 ] 

Koji Noguchi commented on PIG-2999:
---

bq. less than tuple with greater value expected:<-1> but was:<-2>

DataByteArray.compare returns -1,0,+1 .
WritableComparator.compareBytes returns -n, 0, +n   

It doesn't convert value back to -1 or +1. 
In the test, can we wrap the result with Math.signum like we do in 
TestPigTupleRawComparator.java ?

{nofromat}
377 assertEquals("less than tuple with greater value", -1, 
t1.compareTo(t2));
to 
377 assertEquals("less than tuple with greater value", Math.signum(-1), 
Math.signum(t1.compareTo(t2)));
{noformat]


> Regression after PIG-2975: BinInterSedesTupleRawComparator secondary sort 
> failing
> -
>
> Key: PIG-2999
> URL: https://issues.apache.org/jira/browse/PIG-2999
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11, 0.12
>Reporter: Koji Noguchi
>Assignee: Jonathan Coveney
> Attachments: pig-2999-v1.txt, pig-2999-v2.txt
>
>
> I think I broke the build from PIG-2975.  I see couple of tests failing at 
> BinInterSedesTupleRawComparator. 
> {noformat}
> 12/10/22 22:26:15 WARN mapred.LocalJobRunner: job_local_0022
> java.nio.BufferUnderflowException
>   at java.nio.Buffer.nextGetIndex(Buffer.java:478)
>   at java.nio.HeapByteBuffer.getLong(HeapByteBuffer.java:387)
>   at 
> org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compareBinInterSedesDatum(BinInterSedes.java:829)
>   at 
> org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compareBinSedesTuple(BinInterSedes.java:732)
>   at 
> org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compare(BinInterSedes.java:695)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSecondaryKeyComparator.compare(PigSecondaryKeyComparator.java:78)
>   at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373)
>   at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:139)
>   at 
> org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103)
>   at 
> org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335)
>   at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350)
>   at org.apache.hadoop.mapred.ReduceTask$4.next(ReduceTask.java:625)
>   at 
> org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:117)
>   at 
> org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
>   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
>   at 
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2885) TestJobSumission and TestHBaseStorage don't work with HBase 0.94 and ZK 3.4.3

2012-10-23 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-2885:
-

   Resolution: Fixed
Fix Version/s: 0.12
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

> TestJobSumission and TestHBaseStorage don't work with HBase 0.94 and ZK 3.4.3
> -
>
> Key: PIG-2885
> URL: https://issues.apache.org/jira/browse/PIG-2885
> Project: Pig
>  Issue Type: Bug
> Environment: Hadoop 1.0.3, CentOS 6.3 64 bit
>Reporter: Cheolsoo Park
>Assignee: Cheolsoo Park
>Priority: Minor
>  Labels: hbase
> Fix For: 0.12
>
> Attachments: PIG-2885-2.patch, PIG-2885-3.patch, PIG-2885-4.patch, 
> PIG-2885.patch
>
>
> I ran into two unit test failures (TestJobSubmission and TestHBaseStorage) by 
> bumping the version of HBase and ZK to 0.94 and 3.4.3 respectively in hadoop 
> 1.0.3. I am opening a jira to capture what I found for future reference.
> - Two dependency libraries of HBase 0.94 are missing in ivy.xml - 
> high-scale-lib and protobuf-java.
> - The HTable constructor in HBase 0.94 changed:
> {code}
> -HTable table = new HTable(TESTTABLE_2);
> +HTable table = new HTable(conf, TESTTABLE_2);
> {code}
> - The default client port of MiniZooKeeperCluster in HBase 0.94 is no longer 
> 21818. Since it is chosen randomly at runtime, it has to be set in PigContext.
> {code}
> @@ -541,7 +543,7 @@ public class TestJobSubmission {
>  // use the estimation
>  Configuration conf = cluster.getConfiguration();
>  HBaseTestingUtility util = new HBaseTestingUtility(conf);
> -util.startMiniZKCluster();
> +int clientPort = util.startMiniZKCluster().getClientPort();
>  util.startMiniHBaseCluster(1, 1); 
>  
>  String query = "a = load '/passwd';" + 
> @@ -553,6 +555,7 @@ public class TestJobSubmission {
>  
>  pc.getConf().setProperty("pig.exec.reducers.bytes.per.reducer", 
> "100");
>  pc.getConf().setProperty("pig.exec.reducers.max", "10");
> +pc.getConf().setProperty(HConstants.ZOOKEEPER_CLIENT_PORT, 
> Integer.toString(clientPort));
>  ConfigurationValidator.validatePigProperties(pc.getProperties());
>  conf = ConfigurationUtil.toConfiguration(pc.getProperties());
>  JobControlCompiler jcc = new JobControlCompiler(pc, conf);
> {code}
> With the attached patch, both tests pass with hadoop 1.0.3. Please note that 
> TestHBaseStorage fails in hadoop 0.23, and I haven't investigated that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2885) TestJobSumission and TestHBaseStorage don't work with HBase 0.94 and ZK 3.4.3

2012-10-23 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482930#comment-13482930
 ] 

Santhosh Srinivasan commented on PIG-2885:
--

Patch has been committed. Thanks Cheolsoo!

> TestJobSumission and TestHBaseStorage don't work with HBase 0.94 and ZK 3.4.3
> -
>
> Key: PIG-2885
> URL: https://issues.apache.org/jira/browse/PIG-2885
> Project: Pig
>  Issue Type: Bug
> Environment: Hadoop 1.0.3, CentOS 6.3 64 bit
>Reporter: Cheolsoo Park
>Assignee: Cheolsoo Park
>Priority: Minor
>  Labels: hbase
> Attachments: PIG-2885-2.patch, PIG-2885-3.patch, PIG-2885-4.patch, 
> PIG-2885.patch
>
>
> I ran into two unit test failures (TestJobSubmission and TestHBaseStorage) by 
> bumping the version of HBase and ZK to 0.94 and 3.4.3 respectively in hadoop 
> 1.0.3. I am opening a jira to capture what I found for future reference.
> - Two dependency libraries of HBase 0.94 are missing in ivy.xml - 
> high-scale-lib and protobuf-java.
> - The HTable constructor in HBase 0.94 changed:
> {code}
> -HTable table = new HTable(TESTTABLE_2);
> +HTable table = new HTable(conf, TESTTABLE_2);
> {code}
> - The default client port of MiniZooKeeperCluster in HBase 0.94 is no longer 
> 21818. Since it is chosen randomly at runtime, it has to be set in PigContext.
> {code}
> @@ -541,7 +543,7 @@ public class TestJobSubmission {
>  // use the estimation
>  Configuration conf = cluster.getConfiguration();
>  HBaseTestingUtility util = new HBaseTestingUtility(conf);
> -util.startMiniZKCluster();
> +int clientPort = util.startMiniZKCluster().getClientPort();
>  util.startMiniHBaseCluster(1, 1); 
>  
>  String query = "a = load '/passwd';" + 
> @@ -553,6 +555,7 @@ public class TestJobSubmission {
>  
>  pc.getConf().setProperty("pig.exec.reducers.bytes.per.reducer", 
> "100");
>  pc.getConf().setProperty("pig.exec.reducers.max", "10");
> +pc.getConf().setProperty(HConstants.ZOOKEEPER_CLIENT_PORT, 
> Integer.toString(clientPort));
>  ConfigurationValidator.validatePigProperties(pc.getProperties());
>  conf = ConfigurationUtil.toConfiguration(pc.getProperties());
>  JobControlCompiler jcc = new JobControlCompiler(pc, conf);
> {code}
> With the attached patch, both tests pass with hadoop 1.0.3. Please note that 
> TestHBaseStorage fails in hadoop 0.23, and I haven't investigated that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2979) Pig.jar doesn't work with hadoop-2.0.x

2012-10-23 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-2979:
---

Description: 
To reproduce, please do the following:

1) ensure that no hadoop is installed and therefore no hadoop classes are 
present in classpath.
2) ant clean jar -Dhadoopversion=23
3) ./bin/pig -x local
4) fail with the following error: (you may not see this if slf4j is available 
in classpath.)
{code}
cheolsoo@localhost:~/workspace/pig-trunk $cat  
/home/cheolsoo/workspace/pig-trunk/pig_1350687456711.log
Error before Pig is launched

ERROR 2998: Unhandled internal error. org/slf4j/LoggerFactory

java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory
...
{code}
5) cp ./build/ivy/lib/Pig/slf4j-api-1.6.1.jar lib
6) ./bin/pig -x local
7) fail with the following error:
{code}
cheolsoo@localhost:~/workspace/pig-trunk $cat 
/home/cheolsoo/workspace/pig-trunk/pig_1350687052995.log
Error before Pig is launched

ERROR 2999: Unexpected internal error. Failed to create DataStorage

java.lang.RuntimeException: Failed to create DataStorage
at 
org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:75)
at 
org.apache.pig.backend.hadoop.datastorage.HDataStorage.(HDataStorage.java:58)
at 
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:204)
at 
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:117)
at org.apache.pig.impl.PigContext.connect(PigContext.java:229)
at org.apache.pig.PigServer.(PigServer.java:213)
at org.apache.pig.PigServer.(PigServer.java:198)
at org.apache.pig.tools.grunt.Grunt.(Grunt.java:47)
at org.apache.pig.Main.run(Main.java:535)
at org.apache.pig.Main.main(Main.java:154)
Caused by: java.io.IOException: No FileSystem for scheme: file
at 
org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2130)
at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2137)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:80)
at 
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2176)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2158)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:302)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:158)
at 
org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:72)
... 9 more

{code}
8) cp ./build/ivy/lib/Pig/hadoop-common-2.0.0-alpha.jar lib
9) ./bin/pig -x local
10) works fine!

In fact, this is also an issue with running e2e test in local mode:
{code}
ant clean
ant -Dharness.old.pig=old_pig -Dharness.cluster.conf=hadoop_conf_dir 
-Dharness.cluster.bin=hadoop_script test-e2e-deploy-local -Dhadoopversion=23
ant -Dharness.old.pig=old_pig -Dharness.cluster.conf=hadoop_conf_dir 
-Dharness.cluster.bin=hadoop_script test-e2e-local -Dhadoopversion=23
{code}

The ant test-e2e-local fails with the following error:
{code}
java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory
at 
org.apache.hadoop.security.authentication.util.KerberosName.(KerberosName.java:42)
at 
org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:211)
at 
org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:274)
at 
org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:531)
at 
org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:512)
{code}

  was:
To reproduce, please run on machine where no Hadoop is installed:
{code}
ant clean
ant -Dharness.old.pig=old_pig -Dharness.cluster.conf=hadoop_conf_dir 
-Dharness.cluster.bin=hadoop_script test-e2e-deploy-local -Dhadoopversion=23
ant -Dharness.old.pig=old_pig -Dharness.cluster.conf=hadoop_conf_dir 
-Dharness.cluster.bin=hadoop_script test-e2e-local -Dhadoopversion=23
{code}

The ant test-e2e-local fails with the following error:
{code}
java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory
at 
org.apache.hadoop.security.authentication.util.KerberosName.(KerberosName.java:42)
at 
org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:211)
at 
org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:274)
at 
org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:531)
at 
org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:512)
{code}

In fact, this is also an issue with running Pig in local mode with the fat jar 
where no hado

[jira] [Updated] (PIG-2913) org.apache.pig.test.TestPigServerWithMacros fails sometimes because it picks up previous minicluster configuration file

2012-10-23 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-2913:
---

Status: Patch Available  (was: Open)

> org.apache.pig.test.TestPigServerWithMacros fails sometimes because it picks 
> up previous minicluster configuration file
> ---
>
> Key: PIG-2913
> URL: https://issues.apache.org/jira/browse/PIG-2913
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Attachments: PIG-2913-2.patch, PIG-2913.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2913) org.apache.pig.test.TestPigServerWithMacros fails sometimes because it picks up previous minicluster configuration file

2012-10-23 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-2913:
---

Attachment: PIG-2913-2.patch

I don't know if Julien is working on this, but here is a patch that does what 
Rohini is asking for. I am doing it since I am fixing tests now anyway.

I re-factored the code so that the "remote" test cases use a Minicluster while 
the other test cases use local file system.

One question that I had was "why testRegisterRemoteScript is in this test suite 
although it's not a macro test case?" I didn't move it, but please let me know 
if you think that it should be moved to somewhere else.

Thanks!


> org.apache.pig.test.TestPigServerWithMacros fails sometimes because it picks 
> up previous minicluster configuration file
> ---
>
> Key: PIG-2913
> URL: https://issues.apache.org/jira/browse/PIG-2913
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Attachments: PIG-2913-2.patch, PIG-2913.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Subscription: PIG patch available

2012-10-23 Thread jira
Issue Subscription
Filter: PIG patch available (37 issues)

Subscriber: pigdaily

Key Summary
PIG-2998Fix TestScriptLangunage
https://issues.apache.org/jira/browse/PIG-2998
PIG-2990the -secretDebugCmd shouldn't be a secret and should just be...a 
command
https://issues.apache.org/jira/browse/PIG-2990
PIG-2979ant test-e2e-local fails due to missing run-time dependencies in 
classpath with hadoop-2.0.x
https://issues.apache.org/jira/browse/PIG-2979
PIG-2978TestLoadStoreFuncLifeCycle fails with hadoop-2.0.x
https://issues.apache.org/jira/browse/PIG-2978
PIG-2973TestStreaming test times out
https://issues.apache.org/jira/browse/PIG-2973
PIG-2968ColumnMapKeyPrune fails to prune a subtree inside foreach
https://issues.apache.org/jira/browse/PIG-2968
PIG-2960Increase the timeout for unit test
https://issues.apache.org/jira/browse/PIG-2960
PIG-2959Add a pig.cmd for Pig to run under Windows
https://issues.apache.org/jira/browse/PIG-2959
PIG-2957TetsScriptUDF fail due to volume prefix in jar
https://issues.apache.org/jira/browse/PIG-2957
PIG-2956Invalid cache specification for some streaming statement
https://issues.apache.org/jira/browse/PIG-2956
PIG-2955 Fix bunch of Pig e2e tests on Windows 
https://issues.apache.org/jira/browse/PIG-2955
PIG-2954 TestParamSubPreproc still depends on "bash" to run 
https://issues.apache.org/jira/browse/PIG-2954
PIG-2953"which" utility does not exist on Windows
https://issues.apache.org/jira/browse/PIG-2953
PIG-2942DevTests, TestLoad has a false failure on Windows
https://issues.apache.org/jira/browse/PIG-2942
PIG-2904Scripting UDFs should allow DEFINE statements to pass parameters to 
the UDF's constructor
https://issues.apache.org/jira/browse/PIG-2904
PIG-2898Parallel execution of e2e tests
https://issues.apache.org/jira/browse/PIG-2898
PIG-2885TestJobSumission and TestHBaseStorage don't work with HBase 0.94 
and ZK 3.4.3
https://issues.apache.org/jira/browse/PIG-2885
PIG-2881Add SUBTRACT eval function
https://issues.apache.org/jira/browse/PIG-2881
PIG-2873Converting bin/pig shell script to python
https://issues.apache.org/jira/browse/PIG-2873
PIG-2834MultiStorage requires unused constructor argument
https://issues.apache.org/jira/browse/PIG-2834
PIG-2824Pushing checking number of fields into LoadFunc
https://issues.apache.org/jira/browse/PIG-2824
PIG-2801grunt "sh" command should invoke the shell implicitly instead of 
calling exec directly with the command tokens
https://issues.apache.org/jira/browse/PIG-2801
PIG-2799Update pig streaming interface to run correctly on Windows without 
Cygwin
https://issues.apache.org/jira/browse/PIG-2799
PIG-2798pig streaming tests assume interpreters are auto-resolved
https://issues.apache.org/jira/browse/PIG-2798
PIG-2796Local temporary paths are not always valid HDFS path names.
https://issues.apache.org/jira/browse/PIG-2796
PIG-2795Fix test cases that generate pig scripts with "load " + pathStr to 
encode "\" in the path
https://issues.apache.org/jira/browse/PIG-2795
PIG-2661Pig uses an extra job for loading data in Pigmix L9
https://issues.apache.org/jira/browse/PIG-2661
PIG-2657Print warning if using wrong jython version
https://issues.apache.org/jira/browse/PIG-2657
PIG-2495Using merge JOIN from a HBaseStorage produces an error
https://issues.apache.org/jira/browse/PIG-2495
PIG-2433Jython import module not working if module path is in classpath
https://issues.apache.org/jira/browse/PIG-2433
PIG-2417Streaming UDFs -  allow users to easily write UDFs in scripting 
languages with no JVM implementation.
https://issues.apache.org/jira/browse/PIG-2417
PIG-2405svn tags/release-0.9.1: some unit test case failed with open JDK
https://issues.apache.org/jira/browse/PIG-2405
PIG-2362Rework Ant build.xml to use macrodef instead of antcall
https://issues.apache.org/jira/browse/PIG-2362
PIG-2312NPE when relation and column share the same name and used in Nested 
Foreach 
https://issues.apache.org/jira/browse/PIG-2312
PIG-1942script UDF (jython) should utilize the intended output schema to 
more directly convert Py objects to Pig objects
https://issues.apache.org/jira/browse/PIG-1942
PIG-1431Current DateTime UDFs: ISONOW(), UNIXNOW()
https://issues.apache.org/jira/browse/PIG-1431
PIG-1237Piggybank MutliStorage - specify field to write in output
https://issues.apache.org/jira/browse/PIG-1237

You may edit this subscription at:
https://issues.apache.org/jira/s

[jira] [Updated] (PIG-3000) Optimize nested foreach

2012-10-23 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-3000:
--

Description: 
In this Pig script:

{code}
A = load 'data' as (a:chararray);
B = foreach A { c = UPPER(a); generate ((c eq 'TEST') ? 1 : 0), ((c eq 'DEV') ? 
1 : 0); }
{code}

The Eval function UPPER is called twice for each record.

This should be optimized so that the UPPER is called only once for each record

  was:
In this Pig script:

{case}
A = load 'data' as (a:chararray);
B = foreach A { c = UPPER(a); generate ((c eq 'TEST') ? 1 : 0), ((c eq 'DEV') ? 
1 : 0); }
{case}


> Optimize nested foreach
> ---
>
> Key: PIG-3000
> URL: https://issues.apache.org/jira/browse/PIG-3000
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.10.0
>Reporter: Richard Ding
>
> In this Pig script:
> {code}
> A = load 'data' as (a:chararray);
> B = foreach A { c = UPPER(a); generate ((c eq 'TEST') ? 1 : 0), ((c eq 'DEV') 
> ? 1 : 0); }
> {code}
> The Eval function UPPER is called twice for each record.
> This should be optimized so that the UPPER is called only once for each record

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3000) Optimize nested foreach

2012-10-23 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-3000:
--

Description: 
In this Pig script:

{case}
A = load 'data' as (a:chararray);
B = foreach A { c = UPPER(a); generate ((c eq 'TEST') ? 1 : 0), ((c eq 'DEV') ? 
1 : 0); }
{case}

> Optimize nested foreach
> ---
>
> Key: PIG-3000
> URL: https://issues.apache.org/jira/browse/PIG-3000
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.10.0
>Reporter: Richard Ding
>
> In this Pig script:
> {case}
> A = load 'data' as (a:chararray);
> B = foreach A { c = UPPER(a); generate ((c eq 'TEST') ? 1 : 0), ((c eq 'DEV') 
> ? 1 : 0); }
> {case}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3000) Optimize nested foreach

2012-10-23 Thread Richard Ding (JIRA)
Richard Ding created PIG-3000:
-

 Summary: Optimize nested foreach
 Key: PIG-3000
 URL: https://issues.apache.org/jira/browse/PIG-3000
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.10.0
Reporter: Richard Ding




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2999) Regression after PIG-2975: BinInterSedesTupleRawComparator secondary sort failing

2012-10-23 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482839#comment-13482839
 ] 

Cheolsoo Park commented on PIG-2999:


Hi, I see that org.apache.pig.test.TestDataModel.testMultiFieldTupleCompareTo 
fails:
{code}
Testcase: testMultiFieldTupleCompareTo took 0.002 sec 
FAILED
less than tuple with greater value expected:<-1> but was:<-2>
junit.framework.AssertionFailedError: less than tuple with greater value 
expected:<-1> but was:<-2>
at 
org.apache.pig.test.TestDataModel.testMultiFieldTupleCompareTo(TestDataModel.java:377)
{code}
Do anyone also it this failure? I don't any other failures.

> Regression after PIG-2975: BinInterSedesTupleRawComparator secondary sort 
> failing
> -
>
> Key: PIG-2999
> URL: https://issues.apache.org/jira/browse/PIG-2999
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11, 0.12
>Reporter: Koji Noguchi
>Assignee: Jonathan Coveney
> Attachments: pig-2999-v1.txt, pig-2999-v2.txt
>
>
> I think I broke the build from PIG-2975.  I see couple of tests failing at 
> BinInterSedesTupleRawComparator. 
> {noformat}
> 12/10/22 22:26:15 WARN mapred.LocalJobRunner: job_local_0022
> java.nio.BufferUnderflowException
>   at java.nio.Buffer.nextGetIndex(Buffer.java:478)
>   at java.nio.HeapByteBuffer.getLong(HeapByteBuffer.java:387)
>   at 
> org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compareBinInterSedesDatum(BinInterSedes.java:829)
>   at 
> org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compareBinSedesTuple(BinInterSedes.java:732)
>   at 
> org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compare(BinInterSedes.java:695)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSecondaryKeyComparator.compare(PigSecondaryKeyComparator.java:78)
>   at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373)
>   at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:139)
>   at 
> org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103)
>   at 
> org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335)
>   at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350)
>   at org.apache.hadoop.mapred.ReduceTask$4.next(ReduceTask.java:625)
>   at 
> org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:117)
>   at 
> org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
>   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
>   at 
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2999) Regression after PIG-2975: BinInterSedesTupleRawComparator secondary sort failing

2012-10-23 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482837#comment-13482837
 ] 

Gianmarco De Francisci Morales commented on PIG-2999:
-

Not yet. I am running the full test suite to be sure we don't break other 
things, but it takes a while.

> Regression after PIG-2975: BinInterSedesTupleRawComparator secondary sort 
> failing
> -
>
> Key: PIG-2999
> URL: https://issues.apache.org/jira/browse/PIG-2999
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11, 0.12
>Reporter: Koji Noguchi
>Assignee: Jonathan Coveney
> Attachments: pig-2999-v1.txt, pig-2999-v2.txt
>
>
> I think I broke the build from PIG-2975.  I see couple of tests failing at 
> BinInterSedesTupleRawComparator. 
> {noformat}
> 12/10/22 22:26:15 WARN mapred.LocalJobRunner: job_local_0022
> java.nio.BufferUnderflowException
>   at java.nio.Buffer.nextGetIndex(Buffer.java:478)
>   at java.nio.HeapByteBuffer.getLong(HeapByteBuffer.java:387)
>   at 
> org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compareBinInterSedesDatum(BinInterSedes.java:829)
>   at 
> org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compareBinSedesTuple(BinInterSedes.java:732)
>   at 
> org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compare(BinInterSedes.java:695)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSecondaryKeyComparator.compare(PigSecondaryKeyComparator.java:78)
>   at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373)
>   at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:139)
>   at 
> org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103)
>   at 
> org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335)
>   at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350)
>   at org.apache.hadoop.mapred.ReduceTask$4.next(ReduceTask.java:625)
>   at 
> org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:117)
>   at 
> org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
>   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
>   at 
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2999) Regression after PIG-2975: BinInterSedesTupleRawComparator secondary sort failing

2012-10-23 Thread Jonathan Coveney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482830#comment-13482830
 ] 

Jonathan Coveney commented on PIG-2999:
---

Has this been committed?

> Regression after PIG-2975: BinInterSedesTupleRawComparator secondary sort 
> failing
> -
>
> Key: PIG-2999
> URL: https://issues.apache.org/jira/browse/PIG-2999
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11, 0.12
>Reporter: Koji Noguchi
>Assignee: Jonathan Coveney
> Attachments: pig-2999-v1.txt, pig-2999-v2.txt
>
>
> I think I broke the build from PIG-2975.  I see couple of tests failing at 
> BinInterSedesTupleRawComparator. 
> {noformat}
> 12/10/22 22:26:15 WARN mapred.LocalJobRunner: job_local_0022
> java.nio.BufferUnderflowException
>   at java.nio.Buffer.nextGetIndex(Buffer.java:478)
>   at java.nio.HeapByteBuffer.getLong(HeapByteBuffer.java:387)
>   at 
> org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compareBinInterSedesDatum(BinInterSedes.java:829)
>   at 
> org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compareBinSedesTuple(BinInterSedes.java:732)
>   at 
> org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compare(BinInterSedes.java:695)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSecondaryKeyComparator.compare(PigSecondaryKeyComparator.java:78)
>   at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373)
>   at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:139)
>   at 
> org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103)
>   at 
> org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335)
>   at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350)
>   at org.apache.hadoop.mapred.ReduceTask$4.next(ReduceTask.java:625)
>   at 
> org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:117)
>   at 
> org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
>   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
>   at 
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: PROPOSAL: how to handle release documentation going forward

2012-10-23 Thread Julien Le Dem
Sounds good to me.
+1

On Tue, Oct 23, 2012 at 10:20 AM, Gianmarco De Francisci Morales <
g...@apache.org> wrote:

> I guess this is the only way to ensure documented code.
>
> +1
>
> We need to put this rule somewhere, maybe in the Wiki?
>
> Cheers,
> --
> Gianmarco
>
>
> On Tue, Oct 23, 2012 at 12:37 AM, Santhosh M S
>  wrote:
> > +1
> >
> >
> > 
> >  From: Jonathan Coveney 
> > To: dev@pig.apache.org; Olga Natkovich 
> > Sent: Monday, October 22, 2012 5:09 PM
> > Subject: Re: PROPOSAL: how to handle release documentation going forward
> >
> > As someone who chronically under-documents, I think that this is a good
> > idea. +1
> >
> > 2012/10/22 Olga Natkovich 
> >
> >> Hi,
> >>
> >> Since we lost the dedicated document writer for Pig, would it make sense
> >> to require that going forward (0.12 and beyond) we require that
> >> documentation updates are included in the patch together with code
> changes
> >> and tests. I think that should work for most features/updates except
> >> perhaps big items that might require more than one JIRA to be completed
> >> before documentation changes make sense.
> >>
> >> Comments?
> >>
> >> Olga
> >>
>


[jira] [Updated] (PIG-2907) Publish pig 0.23 jars to maven

2012-10-23 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-2907:


Fix Version/s: 0.11

> Publish pig 0.23 jars to maven
> --
>
> Key: PIG-2907
> URL: https://issues.apache.org/jira/browse/PIG-2907
> Project: Pig
>  Issue Type: New Feature
>Reporter: Francis Liu
> Fix For: 0.11
>
>
> HCatalog would like to get our unit tests be able to run against 0.23 part of 
> it would require pulling the pig 0.23 dependency from maven.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2885) TestJobSumission and TestHBaseStorage don't work with HBase 0.94 and ZK 3.4.3

2012-10-23 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-2885:
---

Release Note: This makes hbase.jar and zookeeper.jar no longer be embedded 
in pig.jar, and thus, to use HBaseStorage with pig.jar, hbase.jar and 
zookeeper.jar must be present in classpath. To add them to classpath, the user 
can set either PIG_CLASSPATH=, or 
HBASE_HOME= and ZOOKEEPER_HOME=.

> TestJobSumission and TestHBaseStorage don't work with HBase 0.94 and ZK 3.4.3
> -
>
> Key: PIG-2885
> URL: https://issues.apache.org/jira/browse/PIG-2885
> Project: Pig
>  Issue Type: Bug
> Environment: Hadoop 1.0.3, CentOS 6.3 64 bit
>Reporter: Cheolsoo Park
>Assignee: Cheolsoo Park
>Priority: Minor
>  Labels: hbase
> Attachments: PIG-2885-2.patch, PIG-2885-3.patch, PIG-2885-4.patch, 
> PIG-2885.patch
>
>
> I ran into two unit test failures (TestJobSubmission and TestHBaseStorage) by 
> bumping the version of HBase and ZK to 0.94 and 3.4.3 respectively in hadoop 
> 1.0.3. I am opening a jira to capture what I found for future reference.
> - Two dependency libraries of HBase 0.94 are missing in ivy.xml - 
> high-scale-lib and protobuf-java.
> - The HTable constructor in HBase 0.94 changed:
> {code}
> -HTable table = new HTable(TESTTABLE_2);
> +HTable table = new HTable(conf, TESTTABLE_2);
> {code}
> - The default client port of MiniZooKeeperCluster in HBase 0.94 is no longer 
> 21818. Since it is chosen randomly at runtime, it has to be set in PigContext.
> {code}
> @@ -541,7 +543,7 @@ public class TestJobSubmission {
>  // use the estimation
>  Configuration conf = cluster.getConfiguration();
>  HBaseTestingUtility util = new HBaseTestingUtility(conf);
> -util.startMiniZKCluster();
> +int clientPort = util.startMiniZKCluster().getClientPort();
>  util.startMiniHBaseCluster(1, 1); 
>  
>  String query = "a = load '/passwd';" + 
> @@ -553,6 +555,7 @@ public class TestJobSubmission {
>  
>  pc.getConf().setProperty("pig.exec.reducers.bytes.per.reducer", 
> "100");
>  pc.getConf().setProperty("pig.exec.reducers.max", "10");
> +pc.getConf().setProperty(HConstants.ZOOKEEPER_CLIENT_PORT, 
> Integer.toString(clientPort));
>  ConfigurationValidator.validatePigProperties(pc.getProperties());
>  conf = ConfigurationUtil.toConfiguration(pc.getProperties());
>  JobControlCompiler jcc = new JobControlCompiler(pc, conf);
> {code}
> With the attached patch, both tests pass with hadoop 1.0.3. Please note that 
> TestHBaseStorage fails in hadoop 0.23, and I haven't investigated that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2979) ant test-e2e-local fails due to missing run-time dependencies in classpath with hadoop-2.0.x

2012-10-23 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-2979:
---

Attachment: PIG-2979.patch

The problem is that the provider-configuration file of hadoop-hdfs.jar 
overwrites that of hadoop-common.jar since both files have the same name.

Attached is a patch that includes two changes:
1) bundle slf4j\*.jar in pig.jar
2) exclude hadoop-hdfs.jar from pig.jar. Since pig.jar is meant to be used in 
local mode only, I don't think that hadoop-hdfs.jar is needed at all. Please 
correct me if you think otherwise.

With the patch, "ant e2e-test-local" as well as "./bin/pig -x local" runs fine.

Thanks!

> ant test-e2e-local fails due to missing run-time dependencies in classpath 
> with hadoop-2.0.x
> 
>
> Key: PIG-2979
> URL: https://issues.apache.org/jira/browse/PIG-2979
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Cheolsoo Park
>Assignee: Cheolsoo Park
> Fix For: 0.11
>
> Attachments: PIG-2979.patch
>
>
> To reproduce, please run on machine where no Hadoop is installed:
> {code}
> ant clean
> ant -Dharness.old.pig=old_pig -Dharness.cluster.conf=hadoop_conf_dir 
> -Dharness.cluster.bin=hadoop_script test-e2e-deploy-local -Dhadoopversion=23
> ant -Dharness.old.pig=old_pig -Dharness.cluster.conf=hadoop_conf_dir 
> -Dharness.cluster.bin=hadoop_script test-e2e-local -Dhadoopversion=23
> {code}
> The ant test-e2e-local fails with the following error:
> {code}
> java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory
> at 
> org.apache.hadoop.security.authentication.util.KerberosName.(KerberosName.java:42)
> at 
> org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:211)
> at 
> org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:274)
> at 
> org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:531)
> at 
> org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:512)
> {code}
> In fact, this is also an issue with running Pig in local mode with the fat 
> jar where no hadoop dependencies are available in classpath. For example, the 
> following command also fails with the same error:
> {code}
> ant clean jar -Dhadoopversion=23
> ./bin/pig -x local
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Work started] (PIG-2979) ant test-e2e-local fails due to missing run-time dependencies in classpath with hadoop-2.0.x

2012-10-23 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on PIG-2979 started by Cheolsoo Park.

> ant test-e2e-local fails due to missing run-time dependencies in classpath 
> with hadoop-2.0.x
> 
>
> Key: PIG-2979
> URL: https://issues.apache.org/jira/browse/PIG-2979
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Cheolsoo Park
>Assignee: Cheolsoo Park
> Fix For: 0.11
>
> Attachments: PIG-2979.patch
>
>
> To reproduce, please run on machine where no Hadoop is installed:
> {code}
> ant clean
> ant -Dharness.old.pig=old_pig -Dharness.cluster.conf=hadoop_conf_dir 
> -Dharness.cluster.bin=hadoop_script test-e2e-deploy-local -Dhadoopversion=23
> ant -Dharness.old.pig=old_pig -Dharness.cluster.conf=hadoop_conf_dir 
> -Dharness.cluster.bin=hadoop_script test-e2e-local -Dhadoopversion=23
> {code}
> The ant test-e2e-local fails with the following error:
> {code}
> java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory
> at 
> org.apache.hadoop.security.authentication.util.KerberosName.(KerberosName.java:42)
> at 
> org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:211)
> at 
> org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:274)
> at 
> org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:531)
> at 
> org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:512)
> {code}
> In fact, this is also an issue with running Pig in local mode with the fat 
> jar where no hadoop dependencies are available in classpath. For example, the 
> following command also fails with the same error:
> {code}
> ant clean jar -Dhadoopversion=23
> ./bin/pig -x local
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2979) ant test-e2e-local fails due to missing run-time dependencies in classpath with hadoop-2.0.x

2012-10-23 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-2979:
---

Status: Patch Available  (was: In Progress)

> ant test-e2e-local fails due to missing run-time dependencies in classpath 
> with hadoop-2.0.x
> 
>
> Key: PIG-2979
> URL: https://issues.apache.org/jira/browse/PIG-2979
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Cheolsoo Park
>Assignee: Cheolsoo Park
> Fix For: 0.11
>
> Attachments: PIG-2979.patch
>
>
> To reproduce, please run on machine where no Hadoop is installed:
> {code}
> ant clean
> ant -Dharness.old.pig=old_pig -Dharness.cluster.conf=hadoop_conf_dir 
> -Dharness.cluster.bin=hadoop_script test-e2e-deploy-local -Dhadoopversion=23
> ant -Dharness.old.pig=old_pig -Dharness.cluster.conf=hadoop_conf_dir 
> -Dharness.cluster.bin=hadoop_script test-e2e-local -Dhadoopversion=23
> {code}
> The ant test-e2e-local fails with the following error:
> {code}
> java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory
> at 
> org.apache.hadoop.security.authentication.util.KerberosName.(KerberosName.java:42)
> at 
> org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:211)
> at 
> org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:274)
> at 
> org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:531)
> at 
> org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:512)
> {code}
> In fact, this is also an issue with running Pig in local mode with the fat 
> jar where no hadoop dependencies are available in classpath. For example, the 
> following command also fails with the same error:
> {code}
> ant clean jar -Dhadoopversion=23
> ./bin/pig -x local
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (PIG-2507) Semicolon in paramenters for UDF results in parsing error

2012-10-23 Thread Timothy Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Chen reassigned PIG-2507:
-

Assignee: Timothy Chen

> Semicolon in paramenters for UDF results in parsing error
> -
>
> Key: PIG-2507
> URL: https://issues.apache.org/jira/browse/PIG-2507
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0, 0.9.1, 0.10.0
>Reporter: Vivek Padmanabhan
>Assignee: Timothy Chen
>
> If I have a semicolon in the parameter passed to a udf, the script execution 
> will fail with a parsing error.
> a = load 'i1' as (f1:chararray);
> c = foreach a generate REGEX_EXTRACT(f1, '.;' ,1);
> dump c;
> The above script fails with the below error 
> [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200:  line 3, column 0>  mismatched character '' expecting '''
> Even replacing the semicolon with Unicode \u003B results in the same error.
> c = foreach a generate REGEX_EXTRACT(f1, '.\u003B',1);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2979) ant test-e2e-local fails due to missing run-time dependencies in classpath with hadoop-2.0.x

2012-10-23 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482657#comment-13482657
 ] 

Cheolsoo Park commented on PIG-2979:


There are 2 issues:

1) slf4j has to be bundled in pig.jar. Easy to fix.

2) java.io.IOException: No FileSystem for scheme: file

This is a regression from HADOOP-7549.

HADOOP-7549 made the FileSystem implementation be configured by ServiceLoader 
instead of by configuration files. As part of changes, the "fs.file.impl" 
property is removed from core-default.xml.

Now to map the scheme "file://" to the hadoop.fs.LocalFileSystem class, the 
fully qualified name of LocalFileSystem must be listed in the 
provider-configuration file (META-INF/services/org.apache.hadoop.fs.FileSystem) 
in pig.jar. However, it isn't.

Here is the diff of META-INF/services/org.apache.hadoop.fs.FileSystem of 
hadoop-commom.jar and pig.jar:
{code:title=hadoop-commom.jar}
org.apache.hadoop.fs.LocalFileSystem
org.apache.hadoop.fs.viewfs.ViewFileSystem
org.apache.hadoop.fs.s3.S3FileSystem
org.apache.hadoop.fs.s3native.NativeS3FileSystem
org.apache.hadoop.fs.kfs.KosmosFileSystem
org.apache.hadoop.fs.ftp.FTPFileSystem
org.apache.hadoop.fs.HarFileSystem
{code}
{code:title=pig.ajr}
org.apache.hadoop.hdfs.DistributedFileSystem
org.apache.hadoop.hdfs.HftpFileSystem
org.apache.hadoop.hdfs.HsftpFileSystem
org.apache.hadoop.hdfs.web.WebHdfsFileSystem
{code}

> ant test-e2e-local fails due to missing run-time dependencies in classpath 
> with hadoop-2.0.x
> 
>
> Key: PIG-2979
> URL: https://issues.apache.org/jira/browse/PIG-2979
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Cheolsoo Park
>Assignee: Cheolsoo Park
> Fix For: 0.11
>
>
> To reproduce, please run on machine where no Hadoop is installed:
> {code}
> ant clean
> ant -Dharness.old.pig=old_pig -Dharness.cluster.conf=hadoop_conf_dir 
> -Dharness.cluster.bin=hadoop_script test-e2e-deploy-local -Dhadoopversion=23
> ant -Dharness.old.pig=old_pig -Dharness.cluster.conf=hadoop_conf_dir 
> -Dharness.cluster.bin=hadoop_script test-e2e-local -Dhadoopversion=23
> {code}
> The ant test-e2e-local fails with the following error:
> {code}
> java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory
> at 
> org.apache.hadoop.security.authentication.util.KerberosName.(KerberosName.java:42)
> at 
> org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:211)
> at 
> org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:274)
> at 
> org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:531)
> at 
> org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:512)
> {code}
> In fact, this is also an issue with running Pig in local mode with the fat 
> jar where no hadoop dependencies are available in classpath. For example, the 
> following command also fails with the same error:
> {code}
> ant clean jar -Dhadoopversion=23
> ./bin/pig -x local
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (PIG-2979) ant test-e2e-local fails due to missing run-time dependencies in classpath with hadoop-2.0.x

2012-10-23 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park reassigned PIG-2979:
--

Assignee: Cheolsoo Park

> ant test-e2e-local fails due to missing run-time dependencies in classpath 
> with hadoop-2.0.x
> 
>
> Key: PIG-2979
> URL: https://issues.apache.org/jira/browse/PIG-2979
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Cheolsoo Park
>Assignee: Cheolsoo Park
> Fix For: 0.11
>
>
> To reproduce, please run on machine where no Hadoop is installed:
> {code}
> ant clean
> ant -Dharness.old.pig=old_pig -Dharness.cluster.conf=hadoop_conf_dir 
> -Dharness.cluster.bin=hadoop_script test-e2e-deploy-local -Dhadoopversion=23
> ant -Dharness.old.pig=old_pig -Dharness.cluster.conf=hadoop_conf_dir 
> -Dharness.cluster.bin=hadoop_script test-e2e-local -Dhadoopversion=23
> {code}
> The ant test-e2e-local fails with the following error:
> {code}
> java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory
> at 
> org.apache.hadoop.security.authentication.util.KerberosName.(KerberosName.java:42)
> at 
> org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:211)
> at 
> org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:274)
> at 
> org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:531)
> at 
> org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:512)
> {code}
> In fact, this is also an issue with running Pig in local mode with the fat 
> jar where no hadoop dependencies are available in classpath. For example, the 
> following command also fails with the same error:
> {code}
> ant clean jar -Dhadoopversion=23
> ./bin/pig -x local
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request: TestJobSumission and TestHBaseStorage don't work with HBase 0.94 and ZK 3.4.3

2012-10-23 Thread Bill Graham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7676/#review12694
---

Ship it!


Ship It!

- Bill Graham


On Oct. 23, 2012, 4:48 p.m., Cheolsoo Park wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7676/
> ---
> 
> (Updated Oct. 23, 2012, 4:48 p.m.)
> 
> 
> Review request for pig and Santhosh Srinivasan.
> 
> 
> Description
> ---
> 
> The changes include:
> 
> 1. Stop bundling hbase.jar and zookeoper.jar with pig.jar. So there should be 
> no longer incompatibility issues when using pig.jar with different versions 
> of hbase.jar. But to use HBaseStorage, HBASE_HOME and ZOOKEEPER_HOME must be 
> set by the user. Note that I am adding protobuf-java.jar to pig.jar because 
> otherwise it has to be explicitly added to PIG_CLASSPATH to use HBaseStorage, 
> which is not very intuitive.
> 
> 2. Bump hbase and zk to 0.94.1 and 3.4.3 respectively. Since we no longer 
> bundle them in pig.jar, which versions we use doesn't matter. These jar files 
> will be used for unit test only.
> 
> 3. Make the unit test cases work with newer versions of hbase and zk.
> 
> 4. Add hbase runtime dependencies to ivy.xml.
> 
> 
> This addresses bug PIG-2885.
> https://issues.apache.org/jira/browse/PIG-2885
> 
> 
> Diffs
> -
> 
>   build.xml 6b04f8a 
>   ivy.xml 6e0a2e5 
>   ivy/libraries.properties 55da6c6 
>   test/org/apache/pig/test/TestHBaseStorage.java cc1efef 
>   test/org/apache/pig/test/TestJobSubmission.java 021662f 
> 
> Diff: https://reviews.apache.org/r/7676/diff/
> 
> 
> Testing
> ---
> 
> ant clean test-commit -Dhadoopversion=20
> ant clean test -Dtestcase=TestHBaseStorage -Dhadoopversion=20
> ant clean test -Dtestcase=TestJobSumission -Dhadoopversion=20
> 
> I also manually tested pig.jar with hbase 0.90 and 0.94. Once HBASE_HOME and 
> ZOOKEEPER_HOME are set, HBaseStorage works fine with both versions.
> 
> 
> Thanks,
> 
> Cheolsoo Park
> 
>



Re: Review Request: TestJobSumission and TestHBaseStorage don't work with HBase 0.94 and ZK 3.4.3

2012-10-23 Thread Cheolsoo Park


> On Oct. 23, 2012, 6:55 p.m., Bill Graham wrote:
> > Do both HBASE_HOME and ZOOKEEPER_HOME need to be set, or would it suffice 
> > to just have their respective jars + HBASE_CONF_DIR in the classpath? 
> > Ideally, only the classpath would need to be set.
> > 
> > Besides that, the code looks good.

Hi Bill,

Including hbase-*.jar and zookeeper-*.jar in PIG_CLASSPATH also works. All 
HBASE_HOME and ZOOKEEPER_HOME do is finding hbase-*.jar and zookeeper-*.jar in 
respective dirs and adding them to CLASSPATH.


So the following commands are basically equivalent:

PIG_CLASSPATH=~/workspace/hbase-0.94.1/hbase-0.94.1.jar:~/workspace/hbase-0.94.1/lib/zookeeper-3.4.3.jar
 ./bin/pig -x local

or

HBASE_HOME=~/workspace/hbase-0.94.1 ZOOKEEPER_HOME=~/workspace/hbase-0.94.1/lib 
./bin/pig -x local

Of course, hbase-*.jar and zookeeper-*.jar are already present in your 
classpath, nothing has to be set.


I can definitely document both in the release note.

Thanks!


- Cheolsoo


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7676/#review12689
---


On Oct. 23, 2012, 4:48 p.m., Cheolsoo Park wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7676/
> ---
> 
> (Updated Oct. 23, 2012, 4:48 p.m.)
> 
> 
> Review request for pig and Santhosh Srinivasan.
> 
> 
> Description
> ---
> 
> The changes include:
> 
> 1. Stop bundling hbase.jar and zookeoper.jar with pig.jar. So there should be 
> no longer incompatibility issues when using pig.jar with different versions 
> of hbase.jar. But to use HBaseStorage, HBASE_HOME and ZOOKEEPER_HOME must be 
> set by the user. Note that I am adding protobuf-java.jar to pig.jar because 
> otherwise it has to be explicitly added to PIG_CLASSPATH to use HBaseStorage, 
> which is not very intuitive.
> 
> 2. Bump hbase and zk to 0.94.1 and 3.4.3 respectively. Since we no longer 
> bundle them in pig.jar, which versions we use doesn't matter. These jar files 
> will be used for unit test only.
> 
> 3. Make the unit test cases work with newer versions of hbase and zk.
> 
> 4. Add hbase runtime dependencies to ivy.xml.
> 
> 
> This addresses bug PIG-2885.
> https://issues.apache.org/jira/browse/PIG-2885
> 
> 
> Diffs
> -
> 
>   build.xml 6b04f8a 
>   ivy.xml 6e0a2e5 
>   ivy/libraries.properties 55da6c6 
>   test/org/apache/pig/test/TestHBaseStorage.java cc1efef 
>   test/org/apache/pig/test/TestJobSubmission.java 021662f 
> 
> Diff: https://reviews.apache.org/r/7676/diff/
> 
> 
> Testing
> ---
> 
> ant clean test-commit -Dhadoopversion=20
> ant clean test -Dtestcase=TestHBaseStorage -Dhadoopversion=20
> ant clean test -Dtestcase=TestJobSumission -Dhadoopversion=20
> 
> I also manually tested pig.jar with hbase 0.90 and 0.94. Once HBASE_HOME and 
> ZOOKEEPER_HOME are set, HBaseStorage works fine with both versions.
> 
> 
> Thanks,
> 
> Cheolsoo Park
> 
>



Re: Review Request: TestJobSumission and TestHBaseStorage don't work with HBase 0.94 and ZK 3.4.3

2012-10-23 Thread Bill Graham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7676/#review12689
---


Do both HBASE_HOME and ZOOKEEPER_HOME need to be set, or would it suffice to 
just have their respective jars + HBASE_CONF_DIR in the classpath? Ideally, 
only the classpath would need to be set.

Besides that, the code looks good. 

- Bill Graham


On Oct. 23, 2012, 4:48 p.m., Cheolsoo Park wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7676/
> ---
> 
> (Updated Oct. 23, 2012, 4:48 p.m.)
> 
> 
> Review request for pig and Santhosh Srinivasan.
> 
> 
> Description
> ---
> 
> The changes include:
> 
> 1. Stop bundling hbase.jar and zookeoper.jar with pig.jar. So there should be 
> no longer incompatibility issues when using pig.jar with different versions 
> of hbase.jar. But to use HBaseStorage, HBASE_HOME and ZOOKEEPER_HOME must be 
> set by the user. Note that I am adding protobuf-java.jar to pig.jar because 
> otherwise it has to be explicitly added to PIG_CLASSPATH to use HBaseStorage, 
> which is not very intuitive.
> 
> 2. Bump hbase and zk to 0.94.1 and 3.4.3 respectively. Since we no longer 
> bundle them in pig.jar, which versions we use doesn't matter. These jar files 
> will be used for unit test only.
> 
> 3. Make the unit test cases work with newer versions of hbase and zk.
> 
> 4. Add hbase runtime dependencies to ivy.xml.
> 
> 
> This addresses bug PIG-2885.
> https://issues.apache.org/jira/browse/PIG-2885
> 
> 
> Diffs
> -
> 
>   build.xml 6b04f8a 
>   ivy.xml 6e0a2e5 
>   ivy/libraries.properties 55da6c6 
>   test/org/apache/pig/test/TestHBaseStorage.java cc1efef 
>   test/org/apache/pig/test/TestJobSubmission.java 021662f 
> 
> Diff: https://reviews.apache.org/r/7676/diff/
> 
> 
> Testing
> ---
> 
> ant clean test-commit -Dhadoopversion=20
> ant clean test -Dtestcase=TestHBaseStorage -Dhadoopversion=20
> ant clean test -Dtestcase=TestJobSumission -Dhadoopversion=20
> 
> I also manually tested pig.jar with hbase 0.90 and 0.94. Once HBASE_HOME and 
> ZOOKEEPER_HOME are set, HBaseStorage works fine with both versions.
> 
> 
> Thanks,
> 
> Cheolsoo Park
> 
>



[jira] [Updated] (PIG-2600) Better Map support

2012-10-23 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi updated PIG-2600:
-

Tags: udf
Release Note: 
Pig 0.11+ includes the following UDFs for operating with Map

1. VALUESET
2. VALUELIST
3. KEYSET
4. INVERSEMAP

VALUESET

  This UDF takes a Map and returns a Tuple containing the value set. 
  Note, this UDF returns only unique values. For all values, use 
  VALUELIST instead.

  
  grunt> cat data
  [open#apache,1#2,11#2]
  [apache#hadoop,3#4,12#hadoop]
 
  grunt> a = load 'data' as (M:[]);
  grunt> b = foreach a generate VALUELIST($0);
  ({(apache),(2)})
  ({(4),(hadoop)})
 
  

VALUELIST

 
  This UDF takes a Map and returns a Bag containing the values from map. 
  Note that output tuple contains all values, not just unique ones.
  For obtaining unique values from map, use VALUESET instead. 
 
  
  grunt> cat data
  [open#apache,1#2,11#2]
  [apache#hadoop,3#4,12#hadoop]
 
  grunt> a = load 'data' as (M:[]);
  grunt> b = foreach a generate VALUELIST($0);
  grunt> dump b;
  ({(apache),(2),(2)})
  ({(4),(hadoop),(hadoop)})
  

KEYSET

  This UDF takes a Map and returns a Bag containing the keyset.

  
  grunt> cat data
  [open#apache,1#2,11#2]
  [apache#hadoop,3#4,12#hadoop]
 
  grunt> a = load 'data' as (M:[]);
  grunt> b = foreach a generate KEYSET($0);
  grunt> dump b;
  ({(open),(1),(11)})
  ({(3),(apache),(12)})
  

INVERSEMAP

  This UDF accepts a Map as input with values of any primitive data type. 
  UDF swaps keys with values and returns the new inverse Map. 
  Note in case original values are non-unique, the resulting Map would 
  contain String Key -> DataBag of values. Here the bag of values is composed 
  of the original keys having the same value. 
 
  Note: 1. UDF accepts Map with Values of primitive data type
   2. UDF returns Map
  
  grunt> cat 1data
  [open#1,1#2,11#2]
  [apache#2,3#4,12#24]
 
  
  grunt> a = load 'data' as (M:[int]);
  grunt> b = foreach a generate INVERSEMAP($0);
 
  grunt> dump b;
  ([2#{(1),(11)},apache#{(open)}])
  ([hadoop#{(apache),(12)},4#{(3)}])
  

Olga, adding release notes. Let me know if you need more info.

> Better Map support
> --
>
> Key: PIG-2600
> URL: https://issues.apache.org/jira/browse/PIG-2600
> Project: Pig
>  Issue Type: Improvement
>Reporter: Jonathan Coveney
>Assignee: Prashant Kommireddi
> Fix For: 0.11
>
> Attachments: PIG-2600_2.patch, PIG-2600_3.patch, PIG-2600_4.patch, 
> PIG-2600_5.patch, PIG-2600_6.patch, PIG-2600_7.patch, PIG-2600_8.patch, 
> PIG-2600_9.patch, PIG-2600.patch
>
>
> It would be nice if Pig played better with Maps. To that end, I'd like to add 
> a lot of utility around Maps.
> - TOBAG should take a Map and output {(key, value)}
> - TOMAP should take a Bag in that same form and make a map.
> - KEYSET should return the set of keys.
> - VALUESET should return the set of values.
> - VALUELIST should return the List of values (no deduping).
> - INVERSEMAP would return a Map of values => the set of keys that refer to 
> that Key
> This would all be pretty easy. A more substantial piece of work would be to 
> make Pig support non-String keys (this is especially an issue since UDFs and 
> whatnot probably assume that they are all Integers). Not sure if it is worth 
> it.
> I'd love to hear other things that would be useful for people!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2999) Regression after PIG-2975: BinInterSedesTupleRawComparator secondary sort failing

2012-10-23 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482538#comment-13482538
 ] 

Gianmarco De Francisci Morales commented on PIG-2999:
-

Sure, I will take care of it.

> Regression after PIG-2975: BinInterSedesTupleRawComparator secondary sort 
> failing
> -
>
> Key: PIG-2999
> URL: https://issues.apache.org/jira/browse/PIG-2999
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11, 0.12
>Reporter: Koji Noguchi
>Assignee: Jonathan Coveney
> Attachments: pig-2999-v1.txt, pig-2999-v2.txt
>
>
> I think I broke the build from PIG-2975.  I see couple of tests failing at 
> BinInterSedesTupleRawComparator. 
> {noformat}
> 12/10/22 22:26:15 WARN mapred.LocalJobRunner: job_local_0022
> java.nio.BufferUnderflowException
>   at java.nio.Buffer.nextGetIndex(Buffer.java:478)
>   at java.nio.HeapByteBuffer.getLong(HeapByteBuffer.java:387)
>   at 
> org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compareBinInterSedesDatum(BinInterSedes.java:829)
>   at 
> org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compareBinSedesTuple(BinInterSedes.java:732)
>   at 
> org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compare(BinInterSedes.java:695)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSecondaryKeyComparator.compare(PigSecondaryKeyComparator.java:78)
>   at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373)
>   at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:139)
>   at 
> org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103)
>   at 
> org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335)
>   at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350)
>   at org.apache.hadoop.mapred.ReduceTask$4.next(ReduceTask.java:625)
>   at 
> org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:117)
>   at 
> org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
>   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
>   at 
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2433) Jython import module not working if module path is in classpath

2012-10-23 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-2433:


Fix Version/s: 0.12
 Assignee: Rohini Palaniswamy
   Status: Patch Available  (was: Open)

> Jython import module not working if module path is in classpath
> ---
>
> Key: PIG-2433
> URL: https://issues.apache.org/jira/browse/PIG-2433
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.10.0
>Reporter: Daniel Dai
>Assignee: Rohini Palaniswamy
> Fix For: 0.12
>
> Attachments: PIG-2433.patch
>
>
> This is a hole of PIG-1824. If the path of python module is in classpath, job 
> die with the message could not instantiate 
> 'org.apache.pig.scripting.jython.JythonFunction'.
> Here is my observation:
> If the path of python module is in classpath, fileEntry we got in 
> JythonScriptEngine:236 is __pyclasspath__/script$py.class instead of the 
> script itself. Thus we cannot locate the script and skip the script in 
> job.xml. 
> For example:
> {code}
> register 'scriptB.py' using 
> org.apache.pig.scripting.jython.JythonScriptEngine as pig
> A = LOAD 'table_testPythonNestedImport' as (a0:long, a1:long);
> B = foreach A generate pig.square(a0);
> dump B;
> scriptB.py:
> #!/usr/bin/python
> import scriptA
> @outputSchema("x:{t:(num:double)}")
> def sqrt(number):
>  return (number ** .5)
> @outputSchema("x:{t:(num:long)}")
> def square(number):
>  return long(scriptA.square(number))
> scriptA.py:
> #!/usr/bin/python
> def square(number):
>  return (number * number)
> {code}
> When we register scriptB.py, we use jython library to figure out the 
> dependent modules scriptB relies on, in this case, scriptA. However, if 
> current directory is in classpath, instead of scriptA.py, we get 
> __pyclasspath__/scriptA.class. Then we try to put 
> __pyclasspath__/script$py.class into job.jar, Pig complains 
> __pyclasspath__/script$py.class does not exist. 
> This is exactly TestScriptUDF.testPythonNestedImport is doing. In hadoop 
> 20.x, the test still success because MiniCluster will take local classpath so 
> it can still find scriptA.py even if it is not in job.jar. However, the 
> script will fail in real cluster and MiniMRYarnCluster of hadoop 23.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2999) Regression after PIG-2975: BinInterSedesTupleRawComparator secondary sort failing

2012-10-23 Thread Jonathan Coveney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482533#comment-13482533
 ] 

Jonathan Coveney commented on PIG-2999:
---

+1. Gianmarco, you want to commit once the tests pass?

> Regression after PIG-2975: BinInterSedesTupleRawComparator secondary sort 
> failing
> -
>
> Key: PIG-2999
> URL: https://issues.apache.org/jira/browse/PIG-2999
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11, 0.12
>Reporter: Koji Noguchi
>Assignee: Jonathan Coveney
> Attachments: pig-2999-v1.txt, pig-2999-v2.txt
>
>
> I think I broke the build from PIG-2975.  I see couple of tests failing at 
> BinInterSedesTupleRawComparator. 
> {noformat}
> 12/10/22 22:26:15 WARN mapred.LocalJobRunner: job_local_0022
> java.nio.BufferUnderflowException
>   at java.nio.Buffer.nextGetIndex(Buffer.java:478)
>   at java.nio.HeapByteBuffer.getLong(HeapByteBuffer.java:387)
>   at 
> org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compareBinInterSedesDatum(BinInterSedes.java:829)
>   at 
> org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compareBinSedesTuple(BinInterSedes.java:732)
>   at 
> org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compare(BinInterSedes.java:695)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSecondaryKeyComparator.compare(PigSecondaryKeyComparator.java:78)
>   at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373)
>   at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:139)
>   at 
> org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103)
>   at 
> org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335)
>   at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350)
>   at org.apache.hadoop.mapred.ReduceTask$4.next(ReduceTask.java:625)
>   at 
> org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:117)
>   at 
> org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
>   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
>   at 
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2433) Jython import module not working if module path is in classpath

2012-10-23 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-2433:


Attachment: PIG-2433.patch

Fixed the issue and added unit tests to import os and re. 

Note: If jython-standalone.jar is in pig classpath, found that in real cluster 
had to add -Dmapred.child.env="JYTHONPATH=job.jar/Lib" to pick up the builtin 
modules as the jar gets extracted on the datanode and Lib is not in classpath. 
Might apply to using with oozie too. Could not simulate the error in unit test 
environment even after removing jython jar from mr-apps-classpath. If the 
extracted Lib directory is in classpath instead of standalone jar while 
launching pig the env setting is not required. 

> Jython import module not working if module path is in classpath
> ---
>
> Key: PIG-2433
> URL: https://issues.apache.org/jira/browse/PIG-2433
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.10.0
>Reporter: Daniel Dai
> Fix For: 0.12
>
> Attachments: PIG-2433.patch
>
>
> This is a hole of PIG-1824. If the path of python module is in classpath, job 
> die with the message could not instantiate 
> 'org.apache.pig.scripting.jython.JythonFunction'.
> Here is my observation:
> If the path of python module is in classpath, fileEntry we got in 
> JythonScriptEngine:236 is __pyclasspath__/script$py.class instead of the 
> script itself. Thus we cannot locate the script and skip the script in 
> job.xml. 
> For example:
> {code}
> register 'scriptB.py' using 
> org.apache.pig.scripting.jython.JythonScriptEngine as pig
> A = LOAD 'table_testPythonNestedImport' as (a0:long, a1:long);
> B = foreach A generate pig.square(a0);
> dump B;
> scriptB.py:
> #!/usr/bin/python
> import scriptA
> @outputSchema("x:{t:(num:double)}")
> def sqrt(number):
>  return (number ** .5)
> @outputSchema("x:{t:(num:long)}")
> def square(number):
>  return long(scriptA.square(number))
> scriptA.py:
> #!/usr/bin/python
> def square(number):
>  return (number * number)
> {code}
> When we register scriptB.py, we use jython library to figure out the 
> dependent modules scriptB relies on, in this case, scriptA. However, if 
> current directory is in classpath, instead of scriptA.py, we get 
> __pyclasspath__/scriptA.class. Then we try to put 
> __pyclasspath__/script$py.class into job.jar, Pig complains 
> __pyclasspath__/script$py.class does not exist. 
> This is exactly TestScriptUDF.testPythonNestedImport is doing. In hadoop 
> 20.x, the test still success because MiniCluster will take local classpath so 
> it can still find scriptA.py even if it is not in job.jar. However, the 
> script will fail in real cluster and MiniMRYarnCluster of hadoop 23.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2904) Scripting UDFs should allow DEFINE statements to pass parameters to the UDF's constructor

2012-10-23 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482495#comment-13482495
 ] 

Cheolsoo Park commented on PIG-2904:


Thank you very much for reviewing it Julien!

I agree with your comments. Please let me address them.

> Scripting UDFs should allow DEFINE statements to pass parameters to the UDF's 
> constructor
> -
>
> Key: PIG-2904
> URL: https://issues.apache.org/jira/browse/PIG-2904
> Project: Pig
>  Issue Type: New Feature
>Reporter: Julien Le Dem
>Assignee: Cheolsoo Park
> Attachments: PIG-2904.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2999) Regression after PIG-2975: BinInterSedesTupleRawComparator secondary sort failing

2012-10-23 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482489#comment-13482489
 ] 

Gianmarco De Francisci Morales commented on PIG-2999:
-

The patch looks good.
Running tests to make sure everything works.

> Regression after PIG-2975: BinInterSedesTupleRawComparator secondary sort 
> failing
> -
>
> Key: PIG-2999
> URL: https://issues.apache.org/jira/browse/PIG-2999
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11, 0.12
>Reporter: Koji Noguchi
>Assignee: Jonathan Coveney
> Attachments: pig-2999-v1.txt, pig-2999-v2.txt
>
>
> I think I broke the build from PIG-2975.  I see couple of tests failing at 
> BinInterSedesTupleRawComparator. 
> {noformat}
> 12/10/22 22:26:15 WARN mapred.LocalJobRunner: job_local_0022
> java.nio.BufferUnderflowException
>   at java.nio.Buffer.nextGetIndex(Buffer.java:478)
>   at java.nio.HeapByteBuffer.getLong(HeapByteBuffer.java:387)
>   at 
> org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compareBinInterSedesDatum(BinInterSedes.java:829)
>   at 
> org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compareBinSedesTuple(BinInterSedes.java:732)
>   at 
> org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compare(BinInterSedes.java:695)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSecondaryKeyComparator.compare(PigSecondaryKeyComparator.java:78)
>   at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373)
>   at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:139)
>   at 
> org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103)
>   at 
> org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335)
>   at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350)
>   at org.apache.hadoop.mapred.ReduceTask$4.next(ReduceTask.java:625)
>   at 
> org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:117)
>   at 
> org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
>   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
>   at 
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2904) Scripting UDFs should allow DEFINE statements to pass parameters to the UDF's constructor

2012-10-23 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482485#comment-13482485
 ] 

Julien Le Dem commented on PIG-2904:


That's a pretty good start Cheolsoo.
My main comment is we should avoid to have to test specifically for 
JythonFunction in the LogicalPlanGenerator. Instead we should look into 
generalizing how UDFs are resolved.
Thanks for improving the scripting extension!
See: https://reviews.apache.org/r/7217/

> Scripting UDFs should allow DEFINE statements to pass parameters to the UDF's 
> constructor
> -
>
> Key: PIG-2904
> URL: https://issues.apache.org/jira/browse/PIG-2904
> Project: Pig
>  Issue Type: New Feature
>Reporter: Julien Le Dem
>Assignee: Cheolsoo Park
> Attachments: PIG-2904.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request: PIG-2904 Scripting UDFs should allow DEFINE statements to pass parameters to the UDF's constructor

2012-10-23 Thread Julien Le Dem

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7217/#review12686
---


That's a pretty good start Cheolsoo.
My main comment is we should avoid to have to test specifically for 
JythonFunction in the LogicalPlanGenerator. Instead we should look into 
generalizing how UDFs are resolved.
Thanks for improving the scripting extension!


src/org/apache/pig/parser/LogicalPlanGenerator.g


I think this logic should leave in the upper layer. The ScriptEngine should 
deal with this. The LogicPlanGenerator should not have to special case the 
JythonFunction.
Possibly the we can change how the LogicalPlanGenerator looks up functions 
to make it more generic?



src/org/apache/pig/scripting/Pig.java


make sure you escape correctly in case args contain quote for example.



src/org/apache/pig/scripting/Pig.java


sb.append(";\n").toString()


- Julien Le Dem


On Sept. 25, 2012, 1:11 a.m., Cheolsoo Park wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7217/
> ---
> 
> (Updated Sept. 25, 2012, 1:11 a.m.)
> 
> 
> Review request for pig and Julien Le Dem.
> 
> 
> Description
> ---
> 
> Currently, it is not possible to pass arguments to scripting udf's 
> constructor via define statements. However, this will be useful to support 
> function closure in scripting udfs. Take the following udf for example,
> 
> //-
> 
> # udf that returns a closure
> @outputSchema("log:double")
> def logn(base):
> def log(x):
> return math.log(x, float(base))
> return log
> 
> //-
> 
> With this patch, now we can do:
> 
> //-
> 
> REGISTER 'scriptingudf.py' using jython as myfuncs;
> 
> DEFINE log2 myfuncs.logn('2');
> DEFINE log10 myfuncs.logn('10');
> 
> out2 = FOREACH in GENERATE log2($0);
> out10 = FOREACH in GENERATE log10($0);
> 
> //-
> 
> 
> Please note that constructor arguments specified in define statements are 
> interpreted as closure parameters, so it is not valid to pass constructor 
> arguments to a udf that does not return a closure. For example, the following 
> script will throw an exception:
> 
> //-
> 
> # udf that doesn't return a closure
> @outputSchema("log:double")
> def log2(x):
> return math.log(x, 2)
> 
> //-
> 
> DEFINE log2 myfuncs.log2('2');
> 
> //-
> 
> 
> The changes include:
> - Update LogicalPlanGenerator grammar so that jython functions can be 
> re-registered with constructor parameters in define statement.
> - Change JythonFunction's constructor so that it can take a list of string 
> parameters.
> - Change JythonFunction's exec method so that it can obtain a closure when 
> constructor parameters are present.
> - Update Pig.java so that all this also works with embedded scripts.
> 
> Note that this patch only includes work for Jython. Work for other supported 
> scripting languages is yet to be added.
> 
> 
> This addresses bug PIG-2904.
> https://issues.apache.org/jira/browse/PIG-2904
> 
> 
> Diffs
> -
> 
>   src/org/apache/pig/parser/LogicalPlanGenerator.g 9b9c099 
>   src/org/apache/pig/scripting/Pig.java 1e4065f 
>   src/org/apache/pig/scripting/jython/JythonFunction.java 15f936d 
>   test/e2e/pig/tests/nightly.conf ebcd66e 
>   test/e2e/pig/tests/turing_jython.conf d800d78 
>   test/e2e/pig/udfs/python/scriptingudf.py 54129b6 
> 
> Diff: https://reviews.apache.org/r/7217/diff/
> 
> 
> Testing
> ---
> 
> Added 3 test cases to e2e test suite:
> 1) Test for UDF that returns a closure with closure parameters in define 
> statements. (positive test)
> 2) Test for UDF that doesn't returns a closure with closure parameters in 
> define statements. (negative test)
> 3) Test for define statements in embedded scripts.
> 
> Ran ant test-commit.
> Ran relevant e2e tests: Scripting_*, Jython_*.
> 
> 
> Thanks,
> 
> Cheolsoo Park
> 
>



Re: PROPOSAL: how to handle release documentation going forward

2012-10-23 Thread Gianmarco De Francisci Morales
I guess this is the only way to ensure documented code.

+1

We need to put this rule somewhere, maybe in the Wiki?

Cheers,
--
Gianmarco


On Tue, Oct 23, 2012 at 12:37 AM, Santhosh M S
 wrote:
> +1
>
>
> 
>  From: Jonathan Coveney 
> To: dev@pig.apache.org; Olga Natkovich 
> Sent: Monday, October 22, 2012 5:09 PM
> Subject: Re: PROPOSAL: how to handle release documentation going forward
>
> As someone who chronically under-documents, I think that this is a good
> idea. +1
>
> 2012/10/22 Olga Natkovich 
>
>> Hi,
>>
>> Since we lost the dedicated document writer for Pig, would it make sense
>> to require that going forward (0.12 and beyond) we require that
>> documentation updates are included in the patch together with code changes
>> and tests. I think that should work for most features/updates except
>> perhaps big items that might require more than one JIRA to be completed
>> before documentation changes make sense.
>>
>> Comments?
>>
>> Olga
>>


[jira] [Updated] (PIG-2885) TestJobSumission and TestHBaseStorage don't work with HBase 0.94 and ZK 3.4.3

2012-10-23 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-2885:
---

Attachment: PIG-2885-4.patch

Bumping ZK to 3.4.4 since ZK 3.4.3 has known issues. (Please see Santhosh's 
comment for details.)

> TestJobSumission and TestHBaseStorage don't work with HBase 0.94 and ZK 3.4.3
> -
>
> Key: PIG-2885
> URL: https://issues.apache.org/jira/browse/PIG-2885
> Project: Pig
>  Issue Type: Bug
> Environment: Hadoop 1.0.3, CentOS 6.3 64 bit
>Reporter: Cheolsoo Park
>Assignee: Cheolsoo Park
>Priority: Minor
>  Labels: hbase
> Attachments: PIG-2885-2.patch, PIG-2885-3.patch, PIG-2885-4.patch, 
> PIG-2885.patch
>
>
> I ran into two unit test failures (TestJobSubmission and TestHBaseStorage) by 
> bumping the version of HBase and ZK to 0.94 and 3.4.3 respectively in hadoop 
> 1.0.3. I am opening a jira to capture what I found for future reference.
> - Two dependency libraries of HBase 0.94 are missing in ivy.xml - 
> high-scale-lib and protobuf-java.
> - The HTable constructor in HBase 0.94 changed:
> {code}
> -HTable table = new HTable(TESTTABLE_2);
> +HTable table = new HTable(conf, TESTTABLE_2);
> {code}
> - The default client port of MiniZooKeeperCluster in HBase 0.94 is no longer 
> 21818. Since it is chosen randomly at runtime, it has to be set in PigContext.
> {code}
> @@ -541,7 +543,7 @@ public class TestJobSubmission {
>  // use the estimation
>  Configuration conf = cluster.getConfiguration();
>  HBaseTestingUtility util = new HBaseTestingUtility(conf);
> -util.startMiniZKCluster();
> +int clientPort = util.startMiniZKCluster().getClientPort();
>  util.startMiniHBaseCluster(1, 1); 
>  
>  String query = "a = load '/passwd';" + 
> @@ -553,6 +555,7 @@ public class TestJobSubmission {
>  
>  pc.getConf().setProperty("pig.exec.reducers.bytes.per.reducer", 
> "100");
>  pc.getConf().setProperty("pig.exec.reducers.max", "10");
> +pc.getConf().setProperty(HConstants.ZOOKEEPER_CLIENT_PORT, 
> Integer.toString(clientPort));
>  ConfigurationValidator.validatePigProperties(pc.getProperties());
>  conf = ConfigurationUtil.toConfiguration(pc.getProperties());
>  JobControlCompiler jcc = new JobControlCompiler(pc, conf);
> {code}
> With the attached patch, both tests pass with hadoop 1.0.3. Please note that 
> TestHBaseStorage fails in hadoop 0.23, and I haven't investigated that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request: TestJobSumission and TestHBaseStorage don't work with HBase 0.94 and ZK 3.4.3

2012-10-23 Thread Cheolsoo Park

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7676/
---

(Updated Oct. 23, 2012, 4:48 p.m.)


Review request for pig and Santhosh Srinivasan.


Changes
---

Updating the version ZK to 3.4.4 as per Santhosh's suggestion.


Description
---

The changes include:

1. Stop bundling hbase.jar and zookeoper.jar with pig.jar. So there should be 
no longer incompatibility issues when using pig.jar with different versions of 
hbase.jar. But to use HBaseStorage, HBASE_HOME and ZOOKEEPER_HOME must be set 
by the user. Note that I am adding protobuf-java.jar to pig.jar because 
otherwise it has to be explicitly added to PIG_CLASSPATH to use HBaseStorage, 
which is not very intuitive.

2. Bump hbase and zk to 0.94.1 and 3.4.3 respectively. Since we no longer 
bundle them in pig.jar, which versions we use doesn't matter. These jar files 
will be used for unit test only.

3. Make the unit test cases work with newer versions of hbase and zk.

4. Add hbase runtime dependencies to ivy.xml.


This addresses bug PIG-2885.
https://issues.apache.org/jira/browse/PIG-2885


Diffs (updated)
-

  build.xml 6b04f8a 
  ivy.xml 6e0a2e5 
  ivy/libraries.properties 55da6c6 
  test/org/apache/pig/test/TestHBaseStorage.java cc1efef 
  test/org/apache/pig/test/TestJobSubmission.java 021662f 

Diff: https://reviews.apache.org/r/7676/diff/


Testing
---

ant clean test-commit -Dhadoopversion=20
ant clean test -Dtestcase=TestHBaseStorage -Dhadoopversion=20
ant clean test -Dtestcase=TestJobSumission -Dhadoopversion=20

I also manually tested pig.jar with hbase 0.90 and 0.94. Once HBASE_HOME and 
ZOOKEEPER_HOME are set, HBaseStorage works fine with both versions.


Thanks,

Cheolsoo Park



[jira] [Commented] (PIG-2328) Add builtin UDFs for building and using bloom filters

2012-10-23 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482451#comment-13482451
 ] 

Olga Natkovich commented on PIG-2328:
-

This one are in builtins at least according to the patch, so they need to be in 
docs. I will create a doc patch, I just was not sure if it was in a different 
place

> Add builtin UDFs for building and using bloom filters
> -
>
> Key: PIG-2328
> URL: https://issues.apache.org/jira/browse/PIG-2328
> Project: Pig
>  Issue Type: New Feature
>  Components: internal-udfs
>Reporter: Alan Gates
>Assignee: Alan Gates
> Fix For: 0.10.0, 0.11
>
> Attachments: PIG-bloom-2.patch, PIG-bloom-3.patch, PIG-bloom.patch
>
>
> Bloom filters are a common way to do select a limited set of records before 
> moving data for a join or other heavy weight operation.  Pig should add UDFs 
> to support building and using bloom filters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2999) Regression after PIG-2975: BinInterSedesTupleRawComparator secondary sort failing

2012-10-23 Thread Koji Noguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-2999:
--

Attachment: pig-2999-v2.txt

In WritableComparator, it converts to 'int' and then compare.

{noformat}
123   public static int compareBytes(byte[] b1, int s1, int l1,
124  byte[] b2, int s2, int l2) {
125 int end1 = s1 + l1;
126 int end2 = s2 + l2;
127 for (int i = s1, j = s2; i < end1 && j < end2; i++, j++) {
128   int a = (b1[i] & 0xff);
129   int b = (b2[j] & 0xff);
130   if (a != b) {
131 return a - b;
132   }
133 }
134 return l1 - l2;
135   }
{noformat}

whereas, DataByteArray compares directly as byte.  To make it consistent with 
other shortcurts we added in PIG-2975, making the DataByteArray.compare simply 
call WritableComparator.compareBytes.  This fixes the 
TestPigTupleRawComparator.testRandomTuples failure.

> Regression after PIG-2975: BinInterSedesTupleRawComparator secondary sort 
> failing
> -
>
> Key: PIG-2999
> URL: https://issues.apache.org/jira/browse/PIG-2999
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11, 0.12
>Reporter: Koji Noguchi
>Assignee: Jonathan Coveney
> Attachments: pig-2999-v1.txt, pig-2999-v2.txt
>
>
> I think I broke the build from PIG-2975.  I see couple of tests failing at 
> BinInterSedesTupleRawComparator. 
> {noformat}
> 12/10/22 22:26:15 WARN mapred.LocalJobRunner: job_local_0022
> java.nio.BufferUnderflowException
>   at java.nio.Buffer.nextGetIndex(Buffer.java:478)
>   at java.nio.HeapByteBuffer.getLong(HeapByteBuffer.java:387)
>   at 
> org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compareBinInterSedesDatum(BinInterSedes.java:829)
>   at 
> org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compareBinSedesTuple(BinInterSedes.java:732)
>   at 
> org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compare(BinInterSedes.java:695)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSecondaryKeyComparator.compare(PigSecondaryKeyComparator.java:78)
>   at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373)
>   at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:139)
>   at 
> org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103)
>   at 
> org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335)
>   at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350)
>   at org.apache.hadoop.mapred.ReduceTask$4.next(ReduceTask.java:625)
>   at 
> org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:117)
>   at 
> org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
>   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
>   at 
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2999) Regression after PIG-2975: BinInterSedesTupleRawComparator secondary sort failing

2012-10-23 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482381#comment-13482381
 ] 

Koji Noguchi commented on PIG-2999:
---

After patch 'pig-2999-v1.txt' is applied, previous new failing tests are going 
through.  However, I still see

{noformat}
junit.framework.AssertionFailedError: expected:<1.0> but was:<-1.0>
at 
org.apache.pig.test.TestPigTupleRawComparator.testRandomTuples(TestPigTupleRawComparator.java:399)
{noformat}

This seems to be coming from DataByteArray.compare(byte[] b1, byte[] b2)'s byte 
comparisons are different from WritableComparator.compareBytes. Checking.

> Regression after PIG-2975: BinInterSedesTupleRawComparator secondary sort 
> failing
> -
>
> Key: PIG-2999
> URL: https://issues.apache.org/jira/browse/PIG-2999
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11, 0.12
>Reporter: Koji Noguchi
>Assignee: Jonathan Coveney
> Attachments: pig-2999-v1.txt
>
>
> I think I broke the build from PIG-2975.  I see couple of tests failing at 
> BinInterSedesTupleRawComparator. 
> {noformat}
> 12/10/22 22:26:15 WARN mapred.LocalJobRunner: job_local_0022
> java.nio.BufferUnderflowException
>   at java.nio.Buffer.nextGetIndex(Buffer.java:478)
>   at java.nio.HeapByteBuffer.getLong(HeapByteBuffer.java:387)
>   at 
> org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compareBinInterSedesDatum(BinInterSedes.java:829)
>   at 
> org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compareBinSedesTuple(BinInterSedes.java:732)
>   at 
> org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compare(BinInterSedes.java:695)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSecondaryKeyComparator.compare(PigSecondaryKeyComparator.java:78)
>   at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373)
>   at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:139)
>   at 
> org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103)
>   at 
> org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335)
>   at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350)
>   at org.apache.hadoop.mapred.ReduceTask$4.next(ReduceTask.java:625)
>   at 
> org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:117)
>   at 
> org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
>   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
>   at 
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Build failed in Jenkins: Pig-trunk #1344

2012-10-23 Thread Apache Jenkins Server
See 

Changes:

[gdfm] PIG-2941: Ivy resolvers in pig don't have consistent chaining and don't 
have a kitchen sink option for novices (jgordon via azaroth)

[jcoveney] PIG-2975: TestTypedMap.testOrderBy failing with incorrect result 
(knoguchi via jcoveney)

--
[...truncated 6631 lines...]
 [findbugs]   org.apache.hadoop.io.file.tfile.TFile$Reader$Scanner$Entry
 [findbugs]   org.apache.hadoop.fs.FSDataInputStream
 [findbugs]   org.python.core.PyObject
 [findbugs]   jline.History
 [findbugs]   org.jruby.embed.internal.LocalContextProvider
 [findbugs]   org.apache.hadoop.io.BooleanWritable
 [findbugs]   org.apache.log4j.Logger
 [findbugs]   org.apache.hadoop.hbase.filter.FamilyFilter
 [findbugs]   org.codehaus.jackson.annotate.JsonPropertyOrder
 [findbugs]   groovy.lang.Tuple
 [findbugs]   org.antlr.runtime.IntStream
 [findbugs]   org.apache.hadoop.util.ReflectionUtils
 [findbugs]   org.apache.hadoop.fs.ContentSummary
 [findbugs]   org.jruby.runtime.builtin.IRubyObject
 [findbugs]   org.jruby.RubyInteger
 [findbugs]   org.python.core.PyTuple
 [findbugs]   org.mortbay.log.Log
 [findbugs]   org.apache.hadoop.conf.Configuration
 [findbugs]   com.google.common.base.Joiner
 [findbugs]   org.apache.hadoop.mapreduce.lib.input.FileSplit
 [findbugs]   org.apache.hadoop.mapred.Counters$Counter
 [findbugs]   com.jcraft.jsch.Channel
 [findbugs]   org.apache.hadoop.mapred.JobPriority
 [findbugs]   org.apache.commons.cli.Options
 [findbugs]   org.apache.hadoop.mapred.JobID
 [findbugs]   org.apache.hadoop.util.bloom.BloomFilter
 [findbugs]   org.python.core.PyFrame
 [findbugs]   org.apache.hadoop.hbase.filter.CompareFilter
 [findbugs]   org.apache.hadoop.util.VersionInfo
 [findbugs]   org.python.core.PyString
 [findbugs]   org.apache.hadoop.io.Text$Comparator
 [findbugs]   org.jruby.runtime.Block
 [findbugs]   org.antlr.runtime.MismatchedSetException
 [findbugs]   org.apache.hadoop.io.BytesWritable
 [findbugs]   org.apache.hadoop.fs.FsShell
 [findbugs]   org.joda.time.Months
 [findbugs]   org.mozilla.javascript.ImporterTopLevel
 [findbugs]   org.apache.hadoop.hbase.mapreduce.TableOutputFormat
 [findbugs]   org.apache.hadoop.mapred.TaskReport
 [findbugs]   org.apache.hadoop.security.UserGroupInformation
 [findbugs]   org.antlr.runtime.tree.RewriteRuleSubtreeStream
 [findbugs]   org.apache.commons.cli.HelpFormatter
 [findbugs]   com.google.common.collect.Maps
 [findbugs]   org.joda.time.ReadableInstant
 [findbugs]   org.mozilla.javascript.NativeObject
 [findbugs]   org.apache.hadoop.hbase.HConstants
 [findbugs]   org.apache.hadoop.io.serializer.Deserializer
 [findbugs]   org.antlr.runtime.FailedPredicateException
 [findbugs]   org.apache.hadoop.io.compress.CompressionCodec
 [findbugs]   org.jruby.RubyNil
 [findbugs]   org.apache.hadoop.fs.FileStatus
 [findbugs]   org.apache.hadoop.hbase.client.Result
 [findbugs]   org.apache.hadoop.mapreduce.JobContext
 [findbugs]   org.codehaus.jackson.JsonGenerator
 [findbugs]   org.apache.hadoop.mapreduce.TaskAttemptContext
 [findbugs]   org.apache.hadoop.io.LongWritable$Comparator
 [findbugs]   org.codehaus.jackson.map.util.LRUMap
 [findbugs]   org.apache.hadoop.hbase.util.Bytes
 [findbugs]   org.antlr.runtime.MismatchedTokenException
 [findbugs]   org.codehaus.jackson.JsonParser
 [findbugs]   com.jcraft.jsch.UserInfo
 [findbugs]   org.python.core.PyException
 [findbugs]   org.apache.commons.cli.ParseException
 [findbugs]   org.apache.hadoop.io.compress.CompressionOutputStream
 [findbugs]   org.apache.hadoop.hbase.filter.WritableByteArrayComparable
 [findbugs]   org.antlr.runtime.tree.CommonTreeNodeStream
 [findbugs]   org.apache.log4j.Level
 [findbugs]   org.apache.hadoop.hbase.client.Scan
 [findbugs]   org.jruby.anno.JRubyMethod
 [findbugs]   org.apache.hadoop.mapreduce.Job
 [findbugs]   com.google.common.util.concurrent.Futures
 [findbugs]   org.apache.commons.logging.LogFactory
 [findbugs]   org.apache.commons.collections.IteratorUtils
 [findbugs]   org.apache.commons.codec.binary.Base64
 [findbugs]   org.codehaus.jackson.map.ObjectMapper
 [findbugs]   org.apache.hadoop.fs.FileSystem
 [findbugs]   org.jruby.embed.LocalContextScope
 [findbugs]   org.apache.hadoop.hbase.filter.FilterList$Operator
 [findbugs]   org.jruby.RubySymbol
 [findbugs]   org.apache.hadoop.hbase.io.ImmutableBytesWritable
 [findbugs]   org.apache.hadoop.io.serializer.SerializationFactory
 [findbugs]   org.antlr.runtime.tree.TreeAdaptor
 [findbugs]   org.apache.hadoop.mapred.RunningJob
 [findbugs]   org.antlr.runtime.CommonTokenStream
 [findbugs]   org.apache.hadoop.io.DataInputBuffer
 [findbugs]   org.apache.hadoop.io.file.tfile.TFile
 [findbugs]   org.apache.commons.cli.GnuParser
 [findbugs]   org.mozilla.javascript.Context
 [findbugs]   org.apache.hadoop.io.FloatWritable
 [findbugs]   org.antlr.runtime.tree.RewriteEarlyExitException
 [findbugs]   org.apache.hadoop.hbase.HBaseConfiguration
 [findbugs]   org.codehaus.jac

[jira] [Updated] (PIG-2832) org.apache.pig.pigunit.pig.PigServer does not initialize udf.import.list of PigContext

2012-10-23 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi updated PIG-2832:
-

Patch Info: Patch Available

> org.apache.pig.pigunit.pig.PigServer does not initialize udf.import.list of 
> PigContext
> --
>
> Key: PIG-2832
> URL: https://issues.apache.org/jira/browse/PIG-2832
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.10.0
> Environment: pig-0.10.0, Hadoop 2.0.0-cdh4.0.1 on Kubuntu 12.04 64Bit.
>Reporter: Johannes Schwenk
>Assignee: Prashant Kommireddi
> Attachments: PIG-2832.patch
>
>
> PigServer does not initialize udf.import.list. 
> So, if you have a pig script that uses UDFs and want to pass the 
> udf.import.list via a property file you can do so using the -propertyFile 
> command line to pig. But you should also be able to do it using pigunits 
> PigServer class that already has the corresponding contructor, e.g. doing 
> something similar to :
> {code}
> Properties props = new Properties();
> props.load(new FileInputStream("./testdata/test.properties"));
> pig = new PigServer(ExecType.LOCAL, props);
> String[] params = {"data_dir=testdata"};
> test = new PigTest("test.pig", params, pig, cluster);
> test.assertSortedOutput("aggregated", new File("./testdata/expected.out"));
> {code}
> While udf.import.list is defined in test.properties and test.pig uses names 
> of UDFs which should be resolved using that list.
> This does not work!
> I'd say the org.apache.pig.PigServer class is the problem. It should 
> initialize the import list of the PigContext. 
> {code}
> if(properties.get("udf.import.list") != null) {
> 
> PigContext.initializeImportList((String)properties.get("udf.import.list"));
> }{code}
> Right now this is done in org.apache.pig.Main.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (PIG-2832) org.apache.pig.pigunit.pig.PigServer does not initialize udf.import.list of PigContext

2012-10-23 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi reassigned PIG-2832:


Assignee: Prashant Kommireddi

> org.apache.pig.pigunit.pig.PigServer does not initialize udf.import.list of 
> PigContext
> --
>
> Key: PIG-2832
> URL: https://issues.apache.org/jira/browse/PIG-2832
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.10.0
> Environment: pig-0.10.0, Hadoop 2.0.0-cdh4.0.1 on Kubuntu 12.04 64Bit.
>Reporter: Johannes Schwenk
>Assignee: Prashant Kommireddi
> Attachments: PIG-2832.patch
>
>
> PigServer does not initialize udf.import.list. 
> So, if you have a pig script that uses UDFs and want to pass the 
> udf.import.list via a property file you can do so using the -propertyFile 
> command line to pig. But you should also be able to do it using pigunits 
> PigServer class that already has the corresponding contructor, e.g. doing 
> something similar to :
> {code}
> Properties props = new Properties();
> props.load(new FileInputStream("./testdata/test.properties"));
> pig = new PigServer(ExecType.LOCAL, props);
> String[] params = {"data_dir=testdata"};
> test = new PigTest("test.pig", params, pig, cluster);
> test.assertSortedOutput("aggregated", new File("./testdata/expected.out"));
> {code}
> While udf.import.list is defined in test.properties and test.pig uses names 
> of UDFs which should be resolved using that list.
> This does not work!
> I'd say the org.apache.pig.PigServer class is the problem. It should 
> initialize the import list of the PigContext. 
> {code}
> if(properties.get("udf.import.list") != null) {
> 
> PigContext.initializeImportList((String)properties.get("udf.import.list"));
> }{code}
> Right now this is done in org.apache.pig.Main.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2832) org.apache.pig.pigunit.pig.PigServer does not initialize udf.import.list of PigContext

2012-10-23 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi updated PIG-2832:
-

Attachment: PIG-2832.patch

> org.apache.pig.pigunit.pig.PigServer does not initialize udf.import.list of 
> PigContext
> --
>
> Key: PIG-2832
> URL: https://issues.apache.org/jira/browse/PIG-2832
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.10.0
> Environment: pig-0.10.0, Hadoop 2.0.0-cdh4.0.1 on Kubuntu 12.04 64Bit.
>Reporter: Johannes Schwenk
>Assignee: Prashant Kommireddi
> Attachments: PIG-2832.patch
>
>
> PigServer does not initialize udf.import.list. 
> So, if you have a pig script that uses UDFs and want to pass the 
> udf.import.list via a property file you can do so using the -propertyFile 
> command line to pig. But you should also be able to do it using pigunits 
> PigServer class that already has the corresponding contructor, e.g. doing 
> something similar to :
> {code}
> Properties props = new Properties();
> props.load(new FileInputStream("./testdata/test.properties"));
> pig = new PigServer(ExecType.LOCAL, props);
> String[] params = {"data_dir=testdata"};
> test = new PigTest("test.pig", params, pig, cluster);
> test.assertSortedOutput("aggregated", new File("./testdata/expected.out"));
> {code}
> While udf.import.list is defined in test.properties and test.pig uses names 
> of UDFs which should be resolved using that list.
> This does not work!
> I'd say the org.apache.pig.PigServer class is the problem. It should 
> initialize the import list of the PigContext. 
> {code}
> if(properties.get("udf.import.list") != null) {
> 
> PigContext.initializeImportList((String)properties.get("udf.import.list"));
> }{code}
> Right now this is done in org.apache.pig.Main.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2832) org.apache.pig.pigunit.pig.PigServer does not initialize udf.import.list of PigContext

2012-10-23 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482198#comment-13482198
 ] 

Prashant Kommireddi commented on PIG-2832:
--

Thanks Johannes from reporting this. It seems like there is no reason for this 
initialization to exist on Main (grunt) or on PigServer. This is more of a 
PigContext behavior. 

Adding a patch that contains the necessary initialization moved to PigContext 
and removed from Main. PigServer benefits from this directly.

> org.apache.pig.pigunit.pig.PigServer does not initialize udf.import.list of 
> PigContext
> --
>
> Key: PIG-2832
> URL: https://issues.apache.org/jira/browse/PIG-2832
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.10.0
> Environment: pig-0.10.0, Hadoop 2.0.0-cdh4.0.1 on Kubuntu 12.04 64Bit.
>Reporter: Johannes Schwenk
> Attachments: PIG-2832.patch
>
>
> PigServer does not initialize udf.import.list. 
> So, if you have a pig script that uses UDFs and want to pass the 
> udf.import.list via a property file you can do so using the -propertyFile 
> command line to pig. But you should also be able to do it using pigunits 
> PigServer class that already has the corresponding contructor, e.g. doing 
> something similar to :
> {code}
> Properties props = new Properties();
> props.load(new FileInputStream("./testdata/test.properties"));
> pig = new PigServer(ExecType.LOCAL, props);
> String[] params = {"data_dir=testdata"};
> test = new PigTest("test.pig", params, pig, cluster);
> test.assertSortedOutput("aggregated", new File("./testdata/expected.out"));
> {code}
> While udf.import.list is defined in test.properties and test.pig uses names 
> of UDFs which should be resolved using that list.
> This does not work!
> I'd say the org.apache.pig.PigServer class is the problem. It should 
> initialize the import list of the PigContext. 
> {code}
> if(properties.get("udf.import.list") != null) {
> 
> PigContext.initializeImportList((String)properties.get("udf.import.list"));
> }{code}
> Right now this is done in org.apache.pig.Main.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request: TestJobSumission and TestHBaseStorage don't work with HBase 0.94 and ZK 3.4.3

2012-10-23 Thread Santhosh Srinivasan


> On Oct. 23, 2012, 5:07 a.m., Santhosh Srinivasan wrote:
> > A few comments. Its probably a good idea to have someone with more 
> > knowledge of HBaseStorage to take a second look.
> 
> Cheolsoo Park wrote:
> Thank you very much for your feedback! I added answers below. Please let 
> me know if you disagree with me.

Bill Graham would be a good choice to take a second look.


- Santhosh


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7676/#review12676
---


On Oct. 22, 2012, 6:50 a.m., Cheolsoo Park wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7676/
> ---
> 
> (Updated Oct. 22, 2012, 6:50 a.m.)
> 
> 
> Review request for pig and Santhosh Srinivasan.
> 
> 
> Description
> ---
> 
> The changes include:
> 
> 1. Stop bundling hbase.jar and zookeoper.jar with pig.jar. So there should be 
> no longer incompatibility issues when using pig.jar with different versions 
> of hbase.jar. But to use HBaseStorage, HBASE_HOME and ZOOKEEPER_HOME must be 
> set by the user. Note that I am adding protobuf-java.jar to pig.jar because 
> otherwise it has to be explicitly added to PIG_CLASSPATH to use HBaseStorage, 
> which is not very intuitive.
> 
> 2. Bump hbase and zk to 0.94.1 and 3.4.3 respectively. Since we no longer 
> bundle them in pig.jar, which versions we use doesn't matter. These jar files 
> will be used for unit test only.
> 
> 3. Make the unit test cases work with newer versions of hbase and zk.
> 
> 4. Add hbase runtime dependencies to ivy.xml.
> 
> 
> This addresses bug PIG-2885.
> https://issues.apache.org/jira/browse/PIG-2885
> 
> 
> Diffs
> -
> 
>   build.xml 6b04f8a 
>   ivy.xml 6e0a2e5 
>   ivy/libraries.properties 55da6c6 
>   test/org/apache/pig/test/TestHBaseStorage.java cc1efef 
>   test/org/apache/pig/test/TestJobSubmission.java 021662f 
> 
> Diff: https://reviews.apache.org/r/7676/diff/
> 
> 
> Testing
> ---
> 
> ant clean test-commit -Dhadoopversion=20
> ant clean test -Dtestcase=TestHBaseStorage -Dhadoopversion=20
> ant clean test -Dtestcase=TestJobSumission -Dhadoopversion=20
> 
> I also manually tested pig.jar with hbase 0.90 and 0.94. Once HBASE_HOME and 
> ZOOKEEPER_HOME are set, HBaseStorage works fine with both versions.
> 
> 
> Thanks,
> 
> Cheolsoo Park
> 
>



Re: Review Request: TestJobSumission and TestHBaseStorage don't work with HBase 0.94 and ZK 3.4.3

2012-10-23 Thread Santhosh Srinivasan


> On Oct. 23, 2012, 5:07 a.m., Santhosh Srinivasan wrote:
> > ivy/libraries.properties, line 74
> > 
> >
> > Zookeeper-3.4.4 has been out but has a known issue with SASL and Java 
> > 1.7.  Is 3.3.3 required for Hbase 0.94.1 ?
> 
> Cheolsoo Park wrote:
> You're asking whether ZK 3.4.3 (not 3.3.3) is required by hbase 0.94.1, 
> right?
> 
> The answer is yes. In particular, HBaseTestingUtility depends on the 
> following ZK class, which doesn't seem to exist in ZK 3.3.3:
> 
> java.lang.NoClassDefFoundError: 
> org/apache/zookeeper/server/NIOServerCnxnFactory
> 
> In fact, I don't think that we should worry about those ZK known issues 
> because the versions of HBase and ZK that I am updating only matter to unit 
> test. As far as I can tell, HBaseStorage itself is fully compatible with all 
> of HBase 0.90, 0.92, and 0.94 and won't be effected by this change at all.

Actually, I was asking if we can pick up ZK-3.4.4 instead of ZK-3.4.3. Sorry, 
if I was not clear in my previous comment.


> On Oct. 23, 2012, 5:07 a.m., Santhosh Srinivasan wrote:
> > test/org/apache/pig/test/TestJobSubmission.java, line 431
> > 
> >
> > Can the commented out code be removed?
> 
> Cheolsoo Park wrote:
> To be honest, I do not know why we keep that block of code. Nevertheless, 
> I am hesitating to remove it since someone might have commented it out only 
> temporarily.

That should be fine. Can you create a separate JIRA to remove the commented out 
code. We are using version control for a reason. If the author is interested 
then (s)he can retrieve it from an earlier revision.


- Santhosh


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7676/#review12676
---


On Oct. 22, 2012, 6:50 a.m., Cheolsoo Park wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7676/
> ---
> 
> (Updated Oct. 22, 2012, 6:50 a.m.)
> 
> 
> Review request for pig and Santhosh Srinivasan.
> 
> 
> Description
> ---
> 
> The changes include:
> 
> 1. Stop bundling hbase.jar and zookeoper.jar with pig.jar. So there should be 
> no longer incompatibility issues when using pig.jar with different versions 
> of hbase.jar. But to use HBaseStorage, HBASE_HOME and ZOOKEEPER_HOME must be 
> set by the user. Note that I am adding protobuf-java.jar to pig.jar because 
> otherwise it has to be explicitly added to PIG_CLASSPATH to use HBaseStorage, 
> which is not very intuitive.
> 
> 2. Bump hbase and zk to 0.94.1 and 3.4.3 respectively. Since we no longer 
> bundle them in pig.jar, which versions we use doesn't matter. These jar files 
> will be used for unit test only.
> 
> 3. Make the unit test cases work with newer versions of hbase and zk.
> 
> 4. Add hbase runtime dependencies to ivy.xml.
> 
> 
> This addresses bug PIG-2885.
> https://issues.apache.org/jira/browse/PIG-2885
> 
> 
> Diffs
> -
> 
>   build.xml 6b04f8a 
>   ivy.xml 6e0a2e5 
>   ivy/libraries.properties 55da6c6 
>   test/org/apache/pig/test/TestHBaseStorage.java cc1efef 
>   test/org/apache/pig/test/TestJobSubmission.java 021662f 
> 
> Diff: https://reviews.apache.org/r/7676/diff/
> 
> 
> Testing
> ---
> 
> ant clean test-commit -Dhadoopversion=20
> ant clean test -Dtestcase=TestHBaseStorage -Dhadoopversion=20
> ant clean test -Dtestcase=TestJobSumission -Dhadoopversion=20
> 
> I also manually tested pig.jar with hbase 0.90 and 0.94. Once HBASE_HOME and 
> ZOOKEEPER_HOME are set, HBaseStorage works fine with both versions.
> 
> 
> Thanks,
> 
> Cheolsoo Park
> 
>



Re: PROPOSAL: how to handle release documentation going forward

2012-10-23 Thread Santhosh M S
+1



 From: Jonathan Coveney 
To: dev@pig.apache.org; Olga Natkovich  
Sent: Monday, October 22, 2012 5:09 PM
Subject: Re: PROPOSAL: how to handle release documentation going forward
 
As someone who chronically under-documents, I think that this is a good
idea. +1

2012/10/22 Olga Natkovich 

> Hi,
>
> Since we lost the dedicated document writer for Pig, would it make sense
> to require that going forward (0.12 and beyond) we require that
> documentation updates are included in the patch together with code changes
> and tests. I think that should work for most features/updates except
> perhaps big items that might require more than one JIRA to be completed
> before documentation changes make sense.
>
> Comments?
>
> Olga
>