Re: Pig and Hadoop Integration Error

Jeff Zhang Thu, 26 Aug 2010 18:59:49 -0700

Execute command jps in shell to see whether namenode and jobtracker is
running correctly.




On Fri, Aug 27, 2010 at 9:49 AM, rahul <rmalv...@apple.com> wrote:
> Hi Jeff,
>
> I transferred the hadoop conf files to the pig/conf location but still i get 
> the same error.
>
> Does the issue is with the configuration files or with the hdfs files system ?
>
> Can test the connection to hdfs(localhost/127.0.0.1:9001) in some way ?
>
> Steps I did :
>
> 1. I have formatted initially my local file system using the ./hadoop 
> namenode -format command. I believe this mounts the local file system to HDFS.
> 2. Then I configured the hadoop conf files and started ./start-all script.
> 3. Started Pig with a custom pig script which should read hdfs as I passed 
> the HADOOP_CONF_DIR as parameter.
> The command was java -cp $PIGDIR/pig.jar:$HADOOP_CONF_DIR org.apache.pig.Main 
> script1-hadoop.pig
>
> Please let me know if these step miss something ?
>
> Thanks,
> Rahul
>
>
> On Aug 26, 2010, at 6:33 PM, Jeff Zhang wrote:
>
>> Try to put the hadoop xml configuration file to pig/conf folder
>>
>>
>>
>> On Thu, Aug 26, 2010 at 6:22 PM, rahul <rmalv...@apple.com> wrote:
>>> Hi Jeff,
>>>
>>> I have set the hadoop conf in class path by setting $HADOOP_CONF_DIR 
>>> variable.
>>>
>>> But I have both Pig and hadoop running at the same machine, so localhost 
>>> should not make a difference.
>>>
>>> So I have used all the default config setting for the core-site.xml, 
>>> hdfs-site.xml, mapred-site.xml, as per the hadoop tutorial.
>>>
>>> Please let me know if my understanding is correct ?
>>>
>>> I am attaching the conf files as well :
>>> hdfs-site.xml:
>>>
>>> <?xml version="1.0"?>
>>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>>
>>> <!-- Put site-specific property overrides in this file. -->
>>>
>>> <configuration>
>>> <property>
>>>  <name>fs.default.name</name>
>>>  <value>hdfs://localhost:9000</value>
>>>  <description>The name of the default file system.  A URI whose
>>>  scheme and authority determine the FileSystem implementation.  The
>>>  uri's scheme determines the config property (fs.SCHEME.impl) naming
>>>  the FileSystem implementation class.  The uri's authority is used to
>>>  determine the host, port, etc. for a filesystem.</description>
>>> </property>
>>>
>>> <property>
>>>  <name>dfs.replication</name>
>>>  <value>1</value>
>>>  <description>Default block replication.
>>>  The actual number of replications can be specified when the file is 
>>> created.
>>>  The default is used if replication is not specified in create time.
>>>  </description>
>>> </property>
>>>
>>> </configuration>
>>>
>>> core-site.xml
>>> <?xml version="1.0"?>
>>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>>
>>> <!-- Put site-specific property overrides in this file. -->
>>>
>>> <configuration>
>>> <property>
>>>  <name>hadoop.tmp.dir</name>
>>>  <value>/Users/rahulmalviya/Documents/Hadoop/hadoop-0.21.0/hadoop-${user.name}</value>
>>>  <description>A base for other temporary directories.</description>
>>> </property>
>>> </configuration>
>>>
>>> mapred-site.xml
>>> <?xml version="1.0"?>
>>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>>
>>> <!-- Put site-specific property overrides in this file. -->
>>>
>>> <configuration>
>>> <property>
>>>  <name>mapred.job.tracker</name>
>>>  <value>localhost:9001</value>
>>>  <description>The host and port that the MapReduce job tracker runs
>>>  at. If "local", then jobs are run in-process as a single map
>>>  and reduce task.
>>>  </description>
>>> </property>
>>>
>>> <property>
>>> <name>mapred.tasktracker.tasks.maximum</name>
>>> <value>8</value>
>>> <description>The maximum number of tasks that will be run simultaneously by 
>>> a
>>> a task tracker
>>> </description>
>>> </property>
>>> </configuration>
>>>
>>> Please let me know if there is a issue in my configurations ? Any input is 
>>> valuable for me.
>>>
>>> Thanks,
>>> Rahul
>>>
>>> On Aug 26, 2010, at 6:10 PM, Jeff Zhang wrote:
>>>
>>>> Do you put the hadoop conf on classpath ? It seems you are still using
>>>> local file system but conncect Hadoop's JobTracker.
>>>> Make sure you set the correct configuration in core-site.xml
>>>> hdfs-site.xml, mapred-site.xml, and put them on classpath.
>>>>
>>>>
>>>>
>>>> On Thu, Aug 26, 2010 at 5:32 PM, rahul <rmalv...@apple.com> wrote:
>>>>> Hi ,
>>>>>
>>>>> I am trying to integrate Pig with Hadoop for processing of jobs.
>>>>>
>>>>> I am able to run Pig in local mode and Hadoop with streaming api 
>>>>> perfectly.
>>>>>
>>>>> But when I try to run Pig with Hadoop I get follwong Error:
>>>>>
>>>>> Pig Stack Trace
>>>>> ---------------
>>>>> ERROR 2116: Unexpected error. Could not validate the output specification 
>>>>> for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out
>>>>>
>>>>> org.apache.pig.impl.plan.PlanValidationException: ERROR 0: An unexpected 
>>>>> exception caused the validation to stop
>>>>>        at 
>>>>> org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:56)
>>>>>        at 
>>>>> org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:49)
>>>>>        at 
>>>>> org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:37)
>>>>>        at 
>>>>> org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:89)
>>>>>        at org.apache.pig.PigServer.validate(PigServer.java:930)
>>>>>        at org.apache.pig.PigServer.compileLp(PigServer.java:910)
>>>>>        at org.apache.pig.PigServer.compileLp(PigServer.java:871)
>>>>>        at org.apache.pig.PigServer.compileLp(PigServer.java:852)
>>>>>        at org.apache.pig.PigServer.execute(PigServer.java:816)
>>>>>        at org.apache.pig.PigServer.access$100(PigServer.java:105)
>>>>>        at org.apache.pig.PigServer$Graph.execute(PigServer.java:1080)
>>>>>        at org.apache.pig.PigServer.executeBatch(PigServer.java:288)
>>>>>        at 
>>>>> org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:109)
>>>>>        at 
>>>>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
>>>>>        at 
>>>>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
>>>>>        at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
>>>>>        at org.apache.pig.Main.main(Main.java:391)
>>>>> Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2116: 
>>>>> Unexpected error. Could not validate the output specification for: 
>>>>> file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out
>>>>>        at 
>>>>> org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:93)
>>>>>        at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:140)
>>>>>        at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:37)
>>>>>        at 
>>>>> org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
>>>>>        at 
>>>>> org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
>>>>>        at 
>>>>> org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
>>>>>        at 
>>>>> org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
>>>>>        at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
>>>>>        at 
>>>>> org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:50)
>>>>>        ... 16 more
>>>>> Caused by: java.io.IOException: Call to localhost/127.0.0.1:9001 failed 
>>>>> on local exception: java.io.EOFException
>>>>>        at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
>>>>>        at org.apache.hadoop.ipc.Client.call(Client.java:743)
>>>>>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>>>>>        at org.apache.hadoop.mapred.$Proxy0.getProtocolVersion(Unknown 
>>>>> Source)
>>>>>        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
>>>>>        at 
>>>>> org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:429)
>>>>>        at org.apache.hadoop.mapred.JobClient.init(JobClient.java:423)
>>>>>        at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:410)
>>>>>        at org.apache.hadoop.mapreduce.Job.<init>(Job.java:50)
>>>>>        at 
>>>>> org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:89)
>>>>>        ... 24 more
>>>>> Caused by: java.io.EOFException
>>>>>        at java.io.DataInputStream.readInt(DataInputStream.java:375)
>>>>>        at 
>>>>> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
>>>>>        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
>>>>> ================================================================================
>>>>>
>>>>> Did anyone got the same error. I think it related to connection between 
>>>>> pig and hadoop.
>>>>>
>>>>> Can someone tell me how to connect Pig and hadoop.
>>>>>
>>>>> Thanks.
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards
>>>>
>>>> Jeff Zhang
>>>
>>>
>>
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>
>



-- 
Best Regards

Jeff Zhang

Re: Pig and Hadoop Integration Error

Reply via email to