[ https://issues.apache.org/jira/browse/PIG-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749905#action_12749905 ]
Mridul Muralidharan commented on PIG-940: ----------------------------------------- Is this supported in hadoop ? As in, can you specify the input to be on a different hdfs and get a mapred job to work ? IIRC no, but I could be missing something. If it is no, then not sure if pig can support it without an intermediate distcp ... > Cross site HDFS access using the default.fs.name not possible in Pig > -------------------------------------------------------------------- > > Key: PIG-940 > URL: https://issues.apache.org/jira/browse/PIG-940 > Project: Pig > Issue Type: Bug > Components: impl > Affects Versions: 0.3.0 > Environment: Hadoop 20 > Reporter: Viraj Bhat > Fix For: 0.3.0 > > > I have a script which does the following.. access data from a remote HDFS > location (via a HDFS installed at:hdfs://remotemachine1.company.com/ ) [[as I > do not want to copy this huge amount of data between HDFS locations]]. > However I want my Pigscript to write data to the HDFS running on > localmachine.company.com. > Currently Pig does not support that behavior and complains that: > "hdfs://localmachine.company.com/user/viraj/A1.txt does not exist" > {code} > A = LOAD 'hdfs://remotemachine1.company.com/user/viraj/A1.txt' as (a, b); > B = LOAD 'hdfs://remotemachine1.company.com/user/viraj/B1.txt' as (c, d); > C = JOIN A by a, B by c; > store C into 'output' using PigStorage(); > {code} > ======================================================================================================================================= > 2009-09-01 00:37:24,032 [main] INFO > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting > to hadoop file system at: hdfs://localmachine.company.com:8020 > 2009-09-01 00:37:24,277 [main] INFO > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting > to map-reduce job tracker at: localmachine.company.com:50300 > 2009-09-01 00:37:24,567 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler$LastInputStreamingOptimizer > - Rewrite: POPackage->POForEach to POJoinPackage > 2009-09-01 00:37:24,573 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - MR plan size before optimization: 1 > 2009-09-01 00:37:24,573 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - MR plan size after optimization: 1 > 2009-09-01 00:37:26,197 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler > - Setting up single store job > 2009-09-01 00:37:26,249 [Thread-9] WARN org.apache.hadoop.mapred.JobClient - > Use GenericOptionsParser for parsing the arguments. Applications should > implement Tool for the same. > 2009-09-01 00:37:26,746 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - 0% complete > 2009-09-01 00:37:26,746 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - 100% complete > 2009-09-01 00:37:26,747 [main] ERROR > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - 1 map reduce job(s) failed! > 2009-09-01 00:37:26,756 [main] ERROR > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - Failed to produce result in: > "hdfs:/localmachine.company.com/tmp/temp-1470407685/tmp-510854480" > 2009-09-01 00:37:26,756 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - Failed! > 2009-09-01 00:37:26,758 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 2100: hdfs://localmachine.company.com/user/viraj/A1.txt does not exist. > Details at logfile: /home/viraj/pigscripts/pig_1251765443851.log > ======================================================================================================================================= > The error file in Pig contains: > ======================================================================================================================================= > ERROR 2998: Unhandled internal error. > org.apache.pig.backend.executionengine.ExecException: ERROR 2100: > hdfs://localmachine.company.com/user/viraj/A1.txt does not exist. > at > org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:126) > at > org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59) > at > org.apache.pig.impl.io.ValidatingInputFileSpec.<init>(ValidatingInputFileSpec.java:44) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:228) > at > org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810) > at > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781) > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730) > at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378) > at > org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247) > at > org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279) > at java.lang.Thread.run(Thread.java:619) > java.lang.Exception: org.apache.pig.backend.executionengine.ExecException: > ERROR 2100: hdfs://localmachine.company.com/user/viraj/A1.txt does not exist. > at > org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:126) > at > org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59) > at > org.apache.pig.impl.io.ValidatingInputFileSpec.<init>(ValidatingInputFileSpec.java:44) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:228) > at > org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810) > at > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781) > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730) > at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378) > at > org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247) > at > org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279) > at java.lang.Thread.run(Thread.java:619) > ======================================================================================================================================= -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.