[jira] [Updated] (PIG-3899) Fix memory leak with PigTezLogger

2014-04-22 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-3899:


  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Committed to tez branch. Thanks Cheolsoo for the review

 Fix memory leak with PigTezLogger
 -

 Key: PIG-3899
 URL: https://issues.apache.org/jira/browse/PIG-3899
 Project: Pig
  Issue Type: Sub-task
  Components: tez
Reporter: Rohini Palaniswamy
Assignee: Rohini Palaniswamy
 Fix For: tez-branch

 Attachments: PIG-3899-1.patch


 PigTezLogger references TezProcessorContext through TezStatusReporter.  
 PigTezLogger is held in a static variable in DefaultAbstractBag and also can 
 be held in static variables by user UDFs. TezProcessorContext holds 
 references to the Input and its sort buffers causing lot of memory leak.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (PIG-3855) Turn on UnionOptimizer by default and add new e2e tests for union

2014-04-22 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy resolved PIG-3855.
-

  Resolution: Fixed
Hadoop Flags: Reviewed

Committed to tez branch. Thanks Daniel and Cheolsoo for the review.

 Turn on UnionOptimizer by default and add new e2e tests for union
 -

 Key: PIG-3855
 URL: https://issues.apache.org/jira/browse/PIG-3855
 Project: Pig
  Issue Type: Sub-task
Reporter: Rohini Palaniswamy
Assignee: Rohini Palaniswamy
 Fix For: tez-branch

 Attachments: PIG-3855-1.patch, PIG-3855-3.patch


   We don't have e2e tests for cases like union followed by group by, join 
 (replicate, skewed, hash), orderby, limit, etc. PIG-3835 adds optimization to 
 those cases and we should have e2e tests for that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3672) pig should not hardcode hdfs:// path in code, should be configurable to other file system implementations

2014-04-22 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-3672:


Attachment: PIG-3672-1.patch

This patch

   -  handles recent change in hadoop (HADOOP-7549) w.r.t to getting filesystem 
implementations.
   - handles configuring mapreduce.job.hdfs-servers correctly for other schemes 
like webhdfs, viewfs, etc. 
   - Fixes PIG-3796

 pig should not hardcode hdfs:// path in code, should be configurable to 
 other file system implementations
 ---

 Key: PIG-3672
 URL: https://issues.apache.org/jira/browse/PIG-3672
 Project: Pig
  Issue Type: Bug
  Components: data, parser
Affects Versions: 0.10.0, 0.12.0, 0.11.1
Reporter: Suhas Satish
Assignee: Rohini Palaniswamy
 Attachments: PIG-3672-1.patch, PIG-3672-2.patch, PIG-3672.patch


 QueryParserUtils.java has the code - 
 result.add(hdfs://+thisHost+:+uri.getPort());
 I propose to make it generic like - 
 result.add(uri.getScheme() + ://+thisHost+:+uri.getPort());
 Similarly jobControlCompiler.java has - 
 if (!outputPathString.contains(://) || 
 outputPathString.startsWith(hdfs://)) {
  I have a patch version which I ran passing unit tests on. Will be uploading 
 it shortly.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3672) pig should not hardcode hdfs:// path in code, should be configurable to other file system implementations

2014-04-22 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-3672:


Attachment: (was: PIG-3672-1.patch)

 pig should not hardcode hdfs:// path in code, should be configurable to 
 other file system implementations
 ---

 Key: PIG-3672
 URL: https://issues.apache.org/jira/browse/PIG-3672
 Project: Pig
  Issue Type: Bug
  Components: data, parser
Affects Versions: 0.10.0, 0.12.0, 0.11.1
Reporter: Suhas Satish
Assignee: Rohini Palaniswamy
 Attachments: PIG-3672-1.patch, PIG-3672-2.patch, PIG-3672.patch


 QueryParserUtils.java has the code - 
 result.add(hdfs://+thisHost+:+uri.getPort());
 I propose to make it generic like - 
 result.add(uri.getScheme() + ://+thisHost+:+uri.getPort());
 Similarly jobControlCompiler.java has - 
 if (!outputPathString.contains(://) || 
 outputPathString.startsWith(hdfs://)) {
  I have a patch version which I ran passing unit tests on. Will be uploading 
 it shortly.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3672) Pig should not check for hardcoded file system implementations

2014-04-22 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-3672:


Summary: Pig should not check for hardcoded file system implementations  
(was: pig should not hardcode hdfs:// path in code, should be configurable to 
other file system implementations)

 Pig should not check for hardcoded file system implementations
 --

 Key: PIG-3672
 URL: https://issues.apache.org/jira/browse/PIG-3672
 Project: Pig
  Issue Type: Bug
  Components: data, parser
Affects Versions: 0.10.0, 0.12.0, 0.11.1
Reporter: Suhas Satish
Assignee: Rohini Palaniswamy
 Fix For: 0.13.0

 Attachments: PIG-3672-1.patch, PIG-3672-2.patch, PIG-3672-3.patch, 
 PIG-3672.patch


 QueryParserUtils.java has the code - 
 result.add(hdfs://+thisHost+:+uri.getPort());
 I propose to make it generic like - 
 result.add(uri.getScheme() + ://+thisHost+:+uri.getPort());
 Similarly jobControlCompiler.java has - 
 if (!outputPathString.contains(://) || 
 outputPathString.startsWith(hdfs://)) {
  I have a patch version which I ran passing unit tests on. Will be uploading 
 it shortly.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3672) Pig should not check for hardcoded file system implementations

2014-04-22 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-3672:


Fix Version/s: 0.13.0
   Status: Patch Available  (was: Open)

 Pig should not check for hardcoded file system implementations
 --

 Key: PIG-3672
 URL: https://issues.apache.org/jira/browse/PIG-3672
 Project: Pig
  Issue Type: Bug
  Components: data, parser
Affects Versions: 0.11.1, 0.12.0, 0.10.0
Reporter: Suhas Satish
Assignee: Rohini Palaniswamy
 Fix For: 0.13.0

 Attachments: PIG-3672-1.patch, PIG-3672-2.patch, PIG-3672-3.patch, 
 PIG-3672.patch


 QueryParserUtils.java has the code - 
 result.add(hdfs://+thisHost+:+uri.getPort());
 I propose to make it generic like - 
 result.add(uri.getScheme() + ://+thisHost+:+uri.getPort());
 Similarly jobControlCompiler.java has - 
 if (!outputPathString.contains(://) || 
 outputPathString.startsWith(hdfs://)) {
  I have a patch version which I ran passing unit tests on. Will be uploading 
 it shortly.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3672) pig should not hardcode hdfs:// path in code, should be configurable to other file system implementations

2014-04-22 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-3672:


Attachment: PIG-3672-3.patch

 pig should not hardcode hdfs:// path in code, should be configurable to 
 other file system implementations
 ---

 Key: PIG-3672
 URL: https://issues.apache.org/jira/browse/PIG-3672
 Project: Pig
  Issue Type: Bug
  Components: data, parser
Affects Versions: 0.10.0, 0.12.0, 0.11.1
Reporter: Suhas Satish
Assignee: Rohini Palaniswamy
 Fix For: 0.13.0

 Attachments: PIG-3672-1.patch, PIG-3672-2.patch, PIG-3672-3.patch, 
 PIG-3672.patch


 QueryParserUtils.java has the code - 
 result.add(hdfs://+thisHost+:+uri.getPort());
 I propose to make it generic like - 
 result.add(uri.getScheme() + ://+thisHost+:+uri.getPort());
 Similarly jobControlCompiler.java has - 
 if (!outputPathString.contains(://) || 
 outputPathString.startsWith(hdfs://)) {
  I have a patch version which I ran passing unit tests on. Will be uploading 
 it shortly.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3613) UDF for SimilarityMatching between strings with matching scores

2014-04-22 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13977111#comment-13977111
 ] 

Alan Gates commented on PIG-3613:
-

[~rekhajoshm], thanks for the update.  You need to add a unit test so we can 
confirm this works as we make changes to Pig going forward.

 UDF for SimilarityMatching between strings with matching scores
 ---

 Key: PIG-3613
 URL: https://issues.apache.org/jira/browse/PIG-3613
 Project: Pig
  Issue Type: Task
  Components: piggybank
Affects Versions: 0.10.1
Reporter: Rekha Joshi
Assignee: Rekha Joshi
  Labels: piggybank
 Fix For: 0.10.1

 Attachments: PIG-3613.0.patch, PIG-3613.1.patch


 It would be great if we can do similarity matching between strings on big 
 data using pig udf.
 Proposed udf works on tuple of strings and gives a matching score.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3613) UDF for SimilarityMatching between strings with matching scores

2014-04-22 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-3613:


Status: Open  (was: Patch Available)

 UDF for SimilarityMatching between strings with matching scores
 ---

 Key: PIG-3613
 URL: https://issues.apache.org/jira/browse/PIG-3613
 Project: Pig
  Issue Type: Task
  Components: piggybank
Affects Versions: 0.10.1
Reporter: Rekha Joshi
Assignee: Rekha Joshi
  Labels: piggybank
 Fix For: 0.10.1

 Attachments: PIG-3613.0.patch, PIG-3613.1.patch


 It would be great if we can do similarity matching between strings on big 
 data using pig udf.
 Proposed udf works on tuple of strings and gives a matching score.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PIG-3908) Fix UnionOptimizer bug with expressions and MR compressions settings not honored

2014-04-22 Thread Rohini Palaniswamy (JIRA)
Rohini Palaniswamy created PIG-3908:
---

 Summary: Fix UnionOptimizer bug with expressions and MR 
compressions settings not honored
 Key: PIG-3908
 URL: https://issues.apache.org/jira/browse/PIG-3908
 Project: Pig
  Issue Type: Sub-task
Reporter: Rohini Palaniswamy
Assignee: Rohini Palaniswamy
 Fix For: tez-branch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3880) After compiling trunk, I am seeing ClassLoaderObjectInputStream ClassNotFoundException.

2014-04-22 Thread David Medinets (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13977498#comment-13977498
 ] 

David Medinets commented on PIG-3880:
-

I tried to add the commons-io to my classpath. I got the same error. Here is 
the dry run showing the jar file in the path.

$ pig
dry run:
HADOOP_CLASSPATH: 
/home/566453/pig/conf:/usr/java/jdk1.7.0_09/lib/tools.jar:/opt/accumulo/lib/accumulo-core-1.4.2.jar:/opt/accumulo/lib/libthrift-0.6.1.jar:/opt/accumulo/lib/cloudtrace-1.4.2.jar:/opt/zookeeper/zookeeper-3.3.3.jar:/home/566453/.m2/repository/commons-io/commons-io/2.1/commons-io-2.1.jar:/home/566453/pig/build/ivy/lib/Pig/jython-standalone-2.5.3.jar:/home/566453/pig/build/ivy/lib/Pig/jruby-complete-1.6.7.jar:/home/566453/pig/pig-withouthadoop.jar:
HADOOP_OPTS: -Xmx1000m  -Dpig.log.dir=/home/566453/pig/logs 
-Dpig.log.file=pig.log -Dpig.home.dir=/home/566453/pig
/opt/hadoop/bin/hadoop jar /home/566453/pig/pig-withouthadoop.jar

I tried both commons-io 1.4 and 2.1. I checked that the class is in the jar:

$ jar tf .m2/repository/commons-io/commons-io/2.1/commons-io-2.1.jar | grep 
ClassLoaderObjectInputStream
org/apache/commons/io/input/ClassLoaderObjectInputStream.class

Anything else I can try?

 After compiling trunk, I am seeing ClassLoaderObjectInputStream 
 ClassNotFoundException.
 ---

 Key: PIG-3880
 URL: https://issues.apache.org/jira/browse/PIG-3880
 Project: Pig
  Issue Type: Bug
  Components: grunt
Affects Versions: 0.13.0
Reporter: David Medinets

 I pulled trunk from subversion using the following commands:
 mkdir pig
 cd pig
 svn co http://svn.apache.org/repos/asf/pig/trunk
 cd trunk
 ant
 export PATH=$PATH:$HOME/pig/trunk/bin
 export ACCUMULO_HOME=/opt/accumulo
 export HADOOP_HOME=/opt/hadoop
 export PIG_HOME=$HOME/pig/trunk
 export PIG_CLASSPATH=$HOME/pig/trunk/build/ivy/lib/Pig/*
 export PIG_CLASSPATH=$ACCUMULO_HOME/lib/*:$PIG_CLASSPATH
 cd ~
 pig
 Then I ran into this error:
 java.lang.NoClassDefFoundError: 
 org/apache/commons/io/input/ClassLoaderObjectInputStream
   at org.apache.pig.Main.run(Main.java:399)
 When I change PIG_JAR to use the fat jar, I was able to run the pig command 
 without getting the exception.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3908) Fix UnionOptimizer bug with expressions and MR compressions settings not honored

2014-04-22 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-3908:


Attachment: PIG-3908-1.patch

 Fix UnionOptimizer bug with expressions and MR compressions settings not 
 honored
 

 Key: PIG-3908
 URL: https://issues.apache.org/jira/browse/PIG-3908
 Project: Pig
  Issue Type: Sub-task
  Components: tez
Reporter: Rohini Palaniswamy
Assignee: Rohini Palaniswamy
 Fix For: tez-branch

 Attachments: PIG-3908-1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3908) Fix UnionOptimizer bug with expressions and MR compressions settings not honored

2014-04-22 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-3908:


Status: Patch Available  (was: Open)

 Fix UnionOptimizer bug with expressions and MR compressions settings not 
 honored
 

 Key: PIG-3908
 URL: https://issues.apache.org/jira/browse/PIG-3908
 Project: Pig
  Issue Type: Sub-task
  Components: tez
Reporter: Rohini Palaniswamy
Assignee: Rohini Palaniswamy
 Fix For: tez-branch

 Attachments: PIG-3908-1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] Subscription: PIG patch available

2014-04-22 Thread jira
Issue Subscription
Filter: PIG patch available (18 issues)

Subscriber: pigdaily

Key Summary
PIG-3908Fix UnionOptimizer bug with expressions and MR compressions 
settings not honored
https://issues.apache.org/jira/browse/PIG-3908
PIG-3901Organize the Pig properties file and document all properties
https://issues.apache.org/jira/browse/PIG-3901
PIG-3877Getting Geo Latitude/Longitude from Address Lines
https://issues.apache.org/jira/browse/PIG-3877
PIG-3874FileLocalizer temp path can sometimes be non-unique
https://issues.apache.org/jira/browse/PIG-3874
PIG-3873Geo distance calculation using Haversine
https://issues.apache.org/jira/browse/PIG-3873
PIG-3867Added hadoop home to build classpath for build pig with unit test 
on windows
https://issues.apache.org/jira/browse/PIG-3867
PIG-3866Create ThreadLocal classloader per PigContext
https://issues.apache.org/jira/browse/PIG-3866
PIG-3865Remodel the XMLLoader to work to be faster and more maintainable
https://issues.apache.org/jira/browse/PIG-3865
PIG-3861duplicate jars get added to distributed cache
https://issues.apache.org/jira/browse/PIG-3861
PIG-3825Stats collection needs to be changed for hadoop2 (with auto local 
mode)
https://issues.apache.org/jira/browse/PIG-3825
PIG-3737Bundle dependent jars in distribution in %PIG_HOME%/lib folder
https://issues.apache.org/jira/browse/PIG-3737
PIG-3735UDF to data cleanse the dirty data with expected pattern
https://issues.apache.org/jira/browse/PIG-3735
PIG-3672Pig should not check for hardcoded file system implementations
https://issues.apache.org/jira/browse/PIG-3672
PIG-3668COR built-in function when atleast one of the coefficient values is 
NaN
https://issues.apache.org/jira/browse/PIG-3668
PIG-3635Fix e2e tests for Hadoop 2.X on Windows
https://issues.apache.org/jira/browse/PIG-3635
PIG-3587add functionality for rolling over dates
https://issues.apache.org/jira/browse/PIG-3587
PIG-3441Allow Pig to use default resources from Configuration objects
https://issues.apache.org/jira/browse/PIG-3441
PIG-3373XMLLoader returns non-matching nodes when a tag name spans through 
the block boundary
https://issues.apache.org/jira/browse/PIG-3373

You may edit this subscription at:
https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=13225filterId=12322384


[jira] [Commented] (PIG-3891) FileBasedOutputSizeReader does not calculate size of files in sub-directories

2014-04-22 Thread Mona Chitnis (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13977698#comment-13977698
 ] 

Mona Chitnis commented on PIG-3891:
---

Linking the original JIRA introducing this change. The issue is probably in 
reporting the counters as a whole as I'm getting the following output for a 
sample pig test (map-reduce mode of course), even though its successful and 
produced output successfully.

{quote}
Input(s):
Successfully read 0 records from: /user/pig/tests/data/pigmix/page_views

Output(s):
Successfully stored 0 records in: /user/chitnis//L1out

Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
{quote}

 FileBasedOutputSizeReader does not calculate size of files in sub-directories
 -

 Key: PIG-3891
 URL: https://issues.apache.org/jira/browse/PIG-3891
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Rohini Palaniswamy

 FileBasedOutputSizeReader only includes files in the top level output 
 directory. So if files are stored under subdirectories (For eg: 
 MultiStorage), it does not have the bytes written correctly. 
 0.11 shows the correct number of total bytes written and this is a 
 regression. A quick look at the code shows that the 
 JobStats.addOneOutputStats() in 0.11 also does not recursively iterate and 
 code is same as  FileBasedOutputSizeReader. Need to investigate where the 
 correct value comes from in 0.11 and fix it in 0.12.1/0.13.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (PIG-3891) FileBasedOutputSizeReader does not calculate size of files in sub-directories

2014-04-22 Thread Mona Chitnis (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mona Chitnis reassigned PIG-3891:
-

Assignee: Mona Chitnis

 FileBasedOutputSizeReader does not calculate size of files in sub-directories
 -

 Key: PIG-3891
 URL: https://issues.apache.org/jira/browse/PIG-3891
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Rohini Palaniswamy
Assignee: Mona Chitnis

 FileBasedOutputSizeReader only includes files in the top level output 
 directory. So if files are stored under subdirectories (For eg: 
 MultiStorage), it does not have the bytes written correctly. 
 0.11 shows the correct number of total bytes written and this is a 
 regression. A quick look at the code shows that the 
 JobStats.addOneOutputStats() in 0.11 also does not recursively iterate and 
 code is same as  FileBasedOutputSizeReader. Need to investigate where the 
 correct value comes from in 0.11 and fix it in 0.12.1/0.13.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3880) After compiling trunk, I am seeing ClassLoaderObjectInputStream ClassNotFoundException.

2014-04-22 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13977807#comment-13977807
 ] 

Josh Elser commented on PIG-3880:
-

I'm a bit confused as to what you're showing here. Where is this dry run: 
output coming from? Can you verify that the following does (not) work:

{{PIG_CLASSPATH=/home/566453/.m2/repository/commons-io/commons-io/2.1/commons-io-2.1.jar
 pig -x mapreduce my_script.pig}}

 After compiling trunk, I am seeing ClassLoaderObjectInputStream 
 ClassNotFoundException.
 ---

 Key: PIG-3880
 URL: https://issues.apache.org/jira/browse/PIG-3880
 Project: Pig
  Issue Type: Bug
  Components: grunt
Affects Versions: 0.13.0
Reporter: David Medinets

 I pulled trunk from subversion using the following commands:
 mkdir pig
 cd pig
 svn co http://svn.apache.org/repos/asf/pig/trunk
 cd trunk
 ant
 export PATH=$PATH:$HOME/pig/trunk/bin
 export ACCUMULO_HOME=/opt/accumulo
 export HADOOP_HOME=/opt/hadoop
 export PIG_HOME=$HOME/pig/trunk
 export PIG_CLASSPATH=$HOME/pig/trunk/build/ivy/lib/Pig/*
 export PIG_CLASSPATH=$ACCUMULO_HOME/lib/*:$PIG_CLASSPATH
 cd ~
 pig
 Then I ran into this error:
 java.lang.NoClassDefFoundError: 
 org/apache/commons/io/input/ClassLoaderObjectInputStream
   at org.apache.pig.Main.run(Main.java:399)
 When I change PIG_JAR to use the fat jar, I was able to run the pig command 
 without getting the exception.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3904) Pig support windows i18n

2014-04-22 Thread Lizhao.Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lizhao.Du updated PIG-3904:
---

Attachment: PIG-3904.patch

 Pig support windows i18n
 

 Key: PIG-3904
 URL: https://issues.apache.org/jira/browse/PIG-3904
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.9.2, 0.9.3
 Environment: Windows 7(de_DE/fr_FR/zh_CN)
Reporter: Lizhao.Du
 Fix For: 0.9.3

 Attachments: PIG-3904.patch


 Utilize  Pig   run  a pig script  in  Windows (de_DE), it is  failed.
 The error message showed Input path does not exist: 
 hdfs://10.141.73.10:8020/tmp/测试/pwInput, but /tmp/测试/pwInput is exist 
 in fact.  Because of  encoding of hadoop adoption is UTF-8. When encoding of 
 client OS pig situated is different with it, hadoop will unrecognize these 
 characters.
 Log message as below:
 ==
 ERROR Spring Shell 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - 
 Backend error message during job submission
 org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path 
 does not exist: hdfs://10.141.73.10:8020/tmp/测试/pwInput
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:282)
   at 
 org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1054)
   at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1071)
   at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:179)
   at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:983)
   at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Unknown Source)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
   at 
 org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)
   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910)
   at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
   at 
 org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
   at 
 org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
   at java.lang.Thread.run(Unknown Source)
 Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input 
 path does not exist: hdfs://10.141.73.10:8020/tmp/测试/pwInput
   at 
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:235)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)
   at 
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:270)
   ... 14 more
 I have added a patch, PIG-3904.patch to fix it. It works. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)