[jira] [Updated] (PIG-3899) Fix memory leak with PigTezLogger
[ https://issues.apache.org/jira/browse/PIG-3899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohini Palaniswamy updated PIG-3899: Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to tez branch. Thanks Cheolsoo for the review Fix memory leak with PigTezLogger - Key: PIG-3899 URL: https://issues.apache.org/jira/browse/PIG-3899 Project: Pig Issue Type: Sub-task Components: tez Reporter: Rohini Palaniswamy Assignee: Rohini Palaniswamy Fix For: tez-branch Attachments: PIG-3899-1.patch PigTezLogger references TezProcessorContext through TezStatusReporter. PigTezLogger is held in a static variable in DefaultAbstractBag and also can be held in static variables by user UDFs. TezProcessorContext holds references to the Input and its sort buffers causing lot of memory leak. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (PIG-3855) Turn on UnionOptimizer by default and add new e2e tests for union
[ https://issues.apache.org/jira/browse/PIG-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohini Palaniswamy resolved PIG-3855. - Resolution: Fixed Hadoop Flags: Reviewed Committed to tez branch. Thanks Daniel and Cheolsoo for the review. Turn on UnionOptimizer by default and add new e2e tests for union - Key: PIG-3855 URL: https://issues.apache.org/jira/browse/PIG-3855 Project: Pig Issue Type: Sub-task Reporter: Rohini Palaniswamy Assignee: Rohini Palaniswamy Fix For: tez-branch Attachments: PIG-3855-1.patch, PIG-3855-3.patch We don't have e2e tests for cases like union followed by group by, join (replicate, skewed, hash), orderby, limit, etc. PIG-3835 adds optimization to those cases and we should have e2e tests for that. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PIG-3672) pig should not hardcode hdfs:// path in code, should be configurable to other file system implementations
[ https://issues.apache.org/jira/browse/PIG-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohini Palaniswamy updated PIG-3672: Attachment: PIG-3672-1.patch This patch - handles recent change in hadoop (HADOOP-7549) w.r.t to getting filesystem implementations. - handles configuring mapreduce.job.hdfs-servers correctly for other schemes like webhdfs, viewfs, etc. - Fixes PIG-3796 pig should not hardcode hdfs:// path in code, should be configurable to other file system implementations --- Key: PIG-3672 URL: https://issues.apache.org/jira/browse/PIG-3672 Project: Pig Issue Type: Bug Components: data, parser Affects Versions: 0.10.0, 0.12.0, 0.11.1 Reporter: Suhas Satish Assignee: Rohini Palaniswamy Attachments: PIG-3672-1.patch, PIG-3672-2.patch, PIG-3672.patch QueryParserUtils.java has the code - result.add(hdfs://+thisHost+:+uri.getPort()); I propose to make it generic like - result.add(uri.getScheme() + ://+thisHost+:+uri.getPort()); Similarly jobControlCompiler.java has - if (!outputPathString.contains(://) || outputPathString.startsWith(hdfs://)) { I have a patch version which I ran passing unit tests on. Will be uploading it shortly. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PIG-3672) pig should not hardcode hdfs:// path in code, should be configurable to other file system implementations
[ https://issues.apache.org/jira/browse/PIG-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohini Palaniswamy updated PIG-3672: Attachment: (was: PIG-3672-1.patch) pig should not hardcode hdfs:// path in code, should be configurable to other file system implementations --- Key: PIG-3672 URL: https://issues.apache.org/jira/browse/PIG-3672 Project: Pig Issue Type: Bug Components: data, parser Affects Versions: 0.10.0, 0.12.0, 0.11.1 Reporter: Suhas Satish Assignee: Rohini Palaniswamy Attachments: PIG-3672-1.patch, PIG-3672-2.patch, PIG-3672.patch QueryParserUtils.java has the code - result.add(hdfs://+thisHost+:+uri.getPort()); I propose to make it generic like - result.add(uri.getScheme() + ://+thisHost+:+uri.getPort()); Similarly jobControlCompiler.java has - if (!outputPathString.contains(://) || outputPathString.startsWith(hdfs://)) { I have a patch version which I ran passing unit tests on. Will be uploading it shortly. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PIG-3672) Pig should not check for hardcoded file system implementations
[ https://issues.apache.org/jira/browse/PIG-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohini Palaniswamy updated PIG-3672: Summary: Pig should not check for hardcoded file system implementations (was: pig should not hardcode hdfs:// path in code, should be configurable to other file system implementations) Pig should not check for hardcoded file system implementations -- Key: PIG-3672 URL: https://issues.apache.org/jira/browse/PIG-3672 Project: Pig Issue Type: Bug Components: data, parser Affects Versions: 0.10.0, 0.12.0, 0.11.1 Reporter: Suhas Satish Assignee: Rohini Palaniswamy Fix For: 0.13.0 Attachments: PIG-3672-1.patch, PIG-3672-2.patch, PIG-3672-3.patch, PIG-3672.patch QueryParserUtils.java has the code - result.add(hdfs://+thisHost+:+uri.getPort()); I propose to make it generic like - result.add(uri.getScheme() + ://+thisHost+:+uri.getPort()); Similarly jobControlCompiler.java has - if (!outputPathString.contains(://) || outputPathString.startsWith(hdfs://)) { I have a patch version which I ran passing unit tests on. Will be uploading it shortly. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PIG-3672) Pig should not check for hardcoded file system implementations
[ https://issues.apache.org/jira/browse/PIG-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohini Palaniswamy updated PIG-3672: Fix Version/s: 0.13.0 Status: Patch Available (was: Open) Pig should not check for hardcoded file system implementations -- Key: PIG-3672 URL: https://issues.apache.org/jira/browse/PIG-3672 Project: Pig Issue Type: Bug Components: data, parser Affects Versions: 0.11.1, 0.12.0, 0.10.0 Reporter: Suhas Satish Assignee: Rohini Palaniswamy Fix For: 0.13.0 Attachments: PIG-3672-1.patch, PIG-3672-2.patch, PIG-3672-3.patch, PIG-3672.patch QueryParserUtils.java has the code - result.add(hdfs://+thisHost+:+uri.getPort()); I propose to make it generic like - result.add(uri.getScheme() + ://+thisHost+:+uri.getPort()); Similarly jobControlCompiler.java has - if (!outputPathString.contains(://) || outputPathString.startsWith(hdfs://)) { I have a patch version which I ran passing unit tests on. Will be uploading it shortly. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PIG-3672) pig should not hardcode hdfs:// path in code, should be configurable to other file system implementations
[ https://issues.apache.org/jira/browse/PIG-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohini Palaniswamy updated PIG-3672: Attachment: PIG-3672-3.patch pig should not hardcode hdfs:// path in code, should be configurable to other file system implementations --- Key: PIG-3672 URL: https://issues.apache.org/jira/browse/PIG-3672 Project: Pig Issue Type: Bug Components: data, parser Affects Versions: 0.10.0, 0.12.0, 0.11.1 Reporter: Suhas Satish Assignee: Rohini Palaniswamy Fix For: 0.13.0 Attachments: PIG-3672-1.patch, PIG-3672-2.patch, PIG-3672-3.patch, PIG-3672.patch QueryParserUtils.java has the code - result.add(hdfs://+thisHost+:+uri.getPort()); I propose to make it generic like - result.add(uri.getScheme() + ://+thisHost+:+uri.getPort()); Similarly jobControlCompiler.java has - if (!outputPathString.contains(://) || outputPathString.startsWith(hdfs://)) { I have a patch version which I ran passing unit tests on. Will be uploading it shortly. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PIG-3613) UDF for SimilarityMatching between strings with matching scores
[ https://issues.apache.org/jira/browse/PIG-3613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13977111#comment-13977111 ] Alan Gates commented on PIG-3613: - [~rekhajoshm], thanks for the update. You need to add a unit test so we can confirm this works as we make changes to Pig going forward. UDF for SimilarityMatching between strings with matching scores --- Key: PIG-3613 URL: https://issues.apache.org/jira/browse/PIG-3613 Project: Pig Issue Type: Task Components: piggybank Affects Versions: 0.10.1 Reporter: Rekha Joshi Assignee: Rekha Joshi Labels: piggybank Fix For: 0.10.1 Attachments: PIG-3613.0.patch, PIG-3613.1.patch It would be great if we can do similarity matching between strings on big data using pig udf. Proposed udf works on tuple of strings and gives a matching score. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PIG-3613) UDF for SimilarityMatching between strings with matching scores
[ https://issues.apache.org/jira/browse/PIG-3613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-3613: Status: Open (was: Patch Available) UDF for SimilarityMatching between strings with matching scores --- Key: PIG-3613 URL: https://issues.apache.org/jira/browse/PIG-3613 Project: Pig Issue Type: Task Components: piggybank Affects Versions: 0.10.1 Reporter: Rekha Joshi Assignee: Rekha Joshi Labels: piggybank Fix For: 0.10.1 Attachments: PIG-3613.0.patch, PIG-3613.1.patch It would be great if we can do similarity matching between strings on big data using pig udf. Proposed udf works on tuple of strings and gives a matching score. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (PIG-3908) Fix UnionOptimizer bug with expressions and MR compressions settings not honored
Rohini Palaniswamy created PIG-3908: --- Summary: Fix UnionOptimizer bug with expressions and MR compressions settings not honored Key: PIG-3908 URL: https://issues.apache.org/jira/browse/PIG-3908 Project: Pig Issue Type: Sub-task Reporter: Rohini Palaniswamy Assignee: Rohini Palaniswamy Fix For: tez-branch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PIG-3880) After compiling trunk, I am seeing ClassLoaderObjectInputStream ClassNotFoundException.
[ https://issues.apache.org/jira/browse/PIG-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13977498#comment-13977498 ] David Medinets commented on PIG-3880: - I tried to add the commons-io to my classpath. I got the same error. Here is the dry run showing the jar file in the path. $ pig dry run: HADOOP_CLASSPATH: /home/566453/pig/conf:/usr/java/jdk1.7.0_09/lib/tools.jar:/opt/accumulo/lib/accumulo-core-1.4.2.jar:/opt/accumulo/lib/libthrift-0.6.1.jar:/opt/accumulo/lib/cloudtrace-1.4.2.jar:/opt/zookeeper/zookeeper-3.3.3.jar:/home/566453/.m2/repository/commons-io/commons-io/2.1/commons-io-2.1.jar:/home/566453/pig/build/ivy/lib/Pig/jython-standalone-2.5.3.jar:/home/566453/pig/build/ivy/lib/Pig/jruby-complete-1.6.7.jar:/home/566453/pig/pig-withouthadoop.jar: HADOOP_OPTS: -Xmx1000m -Dpig.log.dir=/home/566453/pig/logs -Dpig.log.file=pig.log -Dpig.home.dir=/home/566453/pig /opt/hadoop/bin/hadoop jar /home/566453/pig/pig-withouthadoop.jar I tried both commons-io 1.4 and 2.1. I checked that the class is in the jar: $ jar tf .m2/repository/commons-io/commons-io/2.1/commons-io-2.1.jar | grep ClassLoaderObjectInputStream org/apache/commons/io/input/ClassLoaderObjectInputStream.class Anything else I can try? After compiling trunk, I am seeing ClassLoaderObjectInputStream ClassNotFoundException. --- Key: PIG-3880 URL: https://issues.apache.org/jira/browse/PIG-3880 Project: Pig Issue Type: Bug Components: grunt Affects Versions: 0.13.0 Reporter: David Medinets I pulled trunk from subversion using the following commands: mkdir pig cd pig svn co http://svn.apache.org/repos/asf/pig/trunk cd trunk ant export PATH=$PATH:$HOME/pig/trunk/bin export ACCUMULO_HOME=/opt/accumulo export HADOOP_HOME=/opt/hadoop export PIG_HOME=$HOME/pig/trunk export PIG_CLASSPATH=$HOME/pig/trunk/build/ivy/lib/Pig/* export PIG_CLASSPATH=$ACCUMULO_HOME/lib/*:$PIG_CLASSPATH cd ~ pig Then I ran into this error: java.lang.NoClassDefFoundError: org/apache/commons/io/input/ClassLoaderObjectInputStream at org.apache.pig.Main.run(Main.java:399) When I change PIG_JAR to use the fat jar, I was able to run the pig command without getting the exception. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PIG-3908) Fix UnionOptimizer bug with expressions and MR compressions settings not honored
[ https://issues.apache.org/jira/browse/PIG-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohini Palaniswamy updated PIG-3908: Attachment: PIG-3908-1.patch Fix UnionOptimizer bug with expressions and MR compressions settings not honored Key: PIG-3908 URL: https://issues.apache.org/jira/browse/PIG-3908 Project: Pig Issue Type: Sub-task Components: tez Reporter: Rohini Palaniswamy Assignee: Rohini Palaniswamy Fix For: tez-branch Attachments: PIG-3908-1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PIG-3908) Fix UnionOptimizer bug with expressions and MR compressions settings not honored
[ https://issues.apache.org/jira/browse/PIG-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohini Palaniswamy updated PIG-3908: Status: Patch Available (was: Open) Fix UnionOptimizer bug with expressions and MR compressions settings not honored Key: PIG-3908 URL: https://issues.apache.org/jira/browse/PIG-3908 Project: Pig Issue Type: Sub-task Components: tez Reporter: Rohini Palaniswamy Assignee: Rohini Palaniswamy Fix For: tez-branch Attachments: PIG-3908-1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] Subscription: PIG patch available
Issue Subscription Filter: PIG patch available (18 issues) Subscriber: pigdaily Key Summary PIG-3908Fix UnionOptimizer bug with expressions and MR compressions settings not honored https://issues.apache.org/jira/browse/PIG-3908 PIG-3901Organize the Pig properties file and document all properties https://issues.apache.org/jira/browse/PIG-3901 PIG-3877Getting Geo Latitude/Longitude from Address Lines https://issues.apache.org/jira/browse/PIG-3877 PIG-3874FileLocalizer temp path can sometimes be non-unique https://issues.apache.org/jira/browse/PIG-3874 PIG-3873Geo distance calculation using Haversine https://issues.apache.org/jira/browse/PIG-3873 PIG-3867Added hadoop home to build classpath for build pig with unit test on windows https://issues.apache.org/jira/browse/PIG-3867 PIG-3866Create ThreadLocal classloader per PigContext https://issues.apache.org/jira/browse/PIG-3866 PIG-3865Remodel the XMLLoader to work to be faster and more maintainable https://issues.apache.org/jira/browse/PIG-3865 PIG-3861duplicate jars get added to distributed cache https://issues.apache.org/jira/browse/PIG-3861 PIG-3825Stats collection needs to be changed for hadoop2 (with auto local mode) https://issues.apache.org/jira/browse/PIG-3825 PIG-3737Bundle dependent jars in distribution in %PIG_HOME%/lib folder https://issues.apache.org/jira/browse/PIG-3737 PIG-3735UDF to data cleanse the dirty data with expected pattern https://issues.apache.org/jira/browse/PIG-3735 PIG-3672Pig should not check for hardcoded file system implementations https://issues.apache.org/jira/browse/PIG-3672 PIG-3668COR built-in function when atleast one of the coefficient values is NaN https://issues.apache.org/jira/browse/PIG-3668 PIG-3635Fix e2e tests for Hadoop 2.X on Windows https://issues.apache.org/jira/browse/PIG-3635 PIG-3587add functionality for rolling over dates https://issues.apache.org/jira/browse/PIG-3587 PIG-3441Allow Pig to use default resources from Configuration objects https://issues.apache.org/jira/browse/PIG-3441 PIG-3373XMLLoader returns non-matching nodes when a tag name spans through the block boundary https://issues.apache.org/jira/browse/PIG-3373 You may edit this subscription at: https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=13225filterId=12322384
[jira] [Commented] (PIG-3891) FileBasedOutputSizeReader does not calculate size of files in sub-directories
[ https://issues.apache.org/jira/browse/PIG-3891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13977698#comment-13977698 ] Mona Chitnis commented on PIG-3891: --- Linking the original JIRA introducing this change. The issue is probably in reporting the counters as a whole as I'm getting the following output for a sample pig test (map-reduce mode of course), even though its successful and produced output successfully. {quote} Input(s): Successfully read 0 records from: /user/pig/tests/data/pigmix/page_views Output(s): Successfully stored 0 records in: /user/chitnis//L1out Counters: Total records written : 0 Total bytes written : 0 Spillable Memory Manager spill count : 0 Total bags proactively spilled: 0 Total records proactively spilled: 0 {quote} FileBasedOutputSizeReader does not calculate size of files in sub-directories - Key: PIG-3891 URL: https://issues.apache.org/jira/browse/PIG-3891 Project: Pig Issue Type: Bug Affects Versions: 0.12.0 Reporter: Rohini Palaniswamy FileBasedOutputSizeReader only includes files in the top level output directory. So if files are stored under subdirectories (For eg: MultiStorage), it does not have the bytes written correctly. 0.11 shows the correct number of total bytes written and this is a regression. A quick look at the code shows that the JobStats.addOneOutputStats() in 0.11 also does not recursively iterate and code is same as FileBasedOutputSizeReader. Need to investigate where the correct value comes from in 0.11 and fix it in 0.12.1/0.13. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (PIG-3891) FileBasedOutputSizeReader does not calculate size of files in sub-directories
[ https://issues.apache.org/jira/browse/PIG-3891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mona Chitnis reassigned PIG-3891: - Assignee: Mona Chitnis FileBasedOutputSizeReader does not calculate size of files in sub-directories - Key: PIG-3891 URL: https://issues.apache.org/jira/browse/PIG-3891 Project: Pig Issue Type: Bug Affects Versions: 0.12.0 Reporter: Rohini Palaniswamy Assignee: Mona Chitnis FileBasedOutputSizeReader only includes files in the top level output directory. So if files are stored under subdirectories (For eg: MultiStorage), it does not have the bytes written correctly. 0.11 shows the correct number of total bytes written and this is a regression. A quick look at the code shows that the JobStats.addOneOutputStats() in 0.11 also does not recursively iterate and code is same as FileBasedOutputSizeReader. Need to investigate where the correct value comes from in 0.11 and fix it in 0.12.1/0.13. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PIG-3880) After compiling trunk, I am seeing ClassLoaderObjectInputStream ClassNotFoundException.
[ https://issues.apache.org/jira/browse/PIG-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13977807#comment-13977807 ] Josh Elser commented on PIG-3880: - I'm a bit confused as to what you're showing here. Where is this dry run: output coming from? Can you verify that the following does (not) work: {{PIG_CLASSPATH=/home/566453/.m2/repository/commons-io/commons-io/2.1/commons-io-2.1.jar pig -x mapreduce my_script.pig}} After compiling trunk, I am seeing ClassLoaderObjectInputStream ClassNotFoundException. --- Key: PIG-3880 URL: https://issues.apache.org/jira/browse/PIG-3880 Project: Pig Issue Type: Bug Components: grunt Affects Versions: 0.13.0 Reporter: David Medinets I pulled trunk from subversion using the following commands: mkdir pig cd pig svn co http://svn.apache.org/repos/asf/pig/trunk cd trunk ant export PATH=$PATH:$HOME/pig/trunk/bin export ACCUMULO_HOME=/opt/accumulo export HADOOP_HOME=/opt/hadoop export PIG_HOME=$HOME/pig/trunk export PIG_CLASSPATH=$HOME/pig/trunk/build/ivy/lib/Pig/* export PIG_CLASSPATH=$ACCUMULO_HOME/lib/*:$PIG_CLASSPATH cd ~ pig Then I ran into this error: java.lang.NoClassDefFoundError: org/apache/commons/io/input/ClassLoaderObjectInputStream at org.apache.pig.Main.run(Main.java:399) When I change PIG_JAR to use the fat jar, I was able to run the pig command without getting the exception. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PIG-3904) Pig support windows i18n
[ https://issues.apache.org/jira/browse/PIG-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lizhao.Du updated PIG-3904: --- Attachment: PIG-3904.patch Pig support windows i18n Key: PIG-3904 URL: https://issues.apache.org/jira/browse/PIG-3904 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.9.2, 0.9.3 Environment: Windows 7(de_DE/fr_FR/zh_CN) Reporter: Lizhao.Du Fix For: 0.9.3 Attachments: PIG-3904.patch Utilize Pig run a pig script in Windows (de_DE), it is failed. The error message showed Input path does not exist: hdfs://10.141.73.10:8020/tmp/测试/pwInput, but /tmp/测试/pwInput is exist in fact. Because of encoding of hadoop adoption is UTF-8. When encoding of client OS pig situated is different with it, hadoop will unrecognize these characters. Log message as below: == ERROR Spring Shell org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - Backend error message during job submission org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: hdfs://10.141.73.10:8020/tmp/测试/pwInput at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:282) at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1054) at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1071) at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:179) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:983) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Unknown Source) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910) at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378) at org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247) at org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279) at java.lang.Thread.run(Unknown Source) Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://10.141.73.10:8020/tmp/测试/pwInput at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:235) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:270) ... 14 more I have added a patch, PIG-3904.patch to fix it. It works. -- This message was sent by Atlassian JIRA (v6.2#6252)