Hi , I tried the grunt shell as well but that also does not connects to hadoop. It throws a warning and runs the job in standalone mode. So it tried it using the pig.jar.
Do you have any further suggestion on that ? Rahul On Aug 26, 2010, at 7:23 PM, Jeff Zhang wrote: > Connect to 9001 is right, this is jobtracker's ipc port while 50030 > is its http server port. > And have you ever try to run the grunt shell ? > > On Thu, Aug 26, 2010 at 7:12 PM, rahul <rmalv...@apple.com> wrote: >> Hi Jeff, >> >> I can connect to the jobtracker web UI using the following URL : >> http://localhost:50030/jobtracker.jsp >> >> And also I can see jobs which I ran directly using the streaming api on >> hadoop. >> >> I also see it tries to connect to localhost/127.0.0.1:9001 which I have >> specified in the hadoop conf file >> and I have also tried changing this location to localhost:50030 but still >> the error remains the same. >> >> Can you suggest something further ? >> >> Thanks, >> Rahul >> >> On Aug 26, 2010, at 7:07 PM, Jeff Zhang wrote: >> >>> Can you look at the jobtracker log or access jobtracker web ui ? >>> It seems you can not connect to jobtracker according your log >>> >>> "Caused by: java.io.IOException: Call to localhost/127.0.0.1:9001 >>> failed on local exception: java.io.EOFException" >>> >>> >>> >>> On Fri, Aug 27, 2010 at 10:00 AM, rahul <rmalv...@apple.com> wrote: >>>> Yes they are running. >>>> >>>> On Aug 26, 2010, at 6:59 PM, Jeff Zhang wrote: >>>> >>>>> Execute command jps in shell to see whether namenode and jobtracker is >>>>> running correctly. >>>>> >>>>> >>>>> >>>>> On Fri, Aug 27, 2010 at 9:49 AM, rahul <rmalv...@apple.com> wrote: >>>>>> Hi Jeff, >>>>>> >>>>>> I transferred the hadoop conf files to the pig/conf location but still i >>>>>> get the same error. >>>>>> >>>>>> Does the issue is with the configuration files or with the hdfs files >>>>>> system ? >>>>>> >>>>>> Can test the connection to hdfs(localhost/127.0.0.1:9001) in some way ? >>>>>> >>>>>> Steps I did : >>>>>> >>>>>> 1. I have formatted initially my local file system using the ./hadoop >>>>>> namenode -format command. I believe this mounts the local file system to >>>>>> HDFS. >>>>>> 2. Then I configured the hadoop conf files and started ./start-all >>>>>> script. >>>>>> 3. Started Pig with a custom pig script which should read hdfs as I >>>>>> passed the HADOOP_CONF_DIR as parameter. >>>>>> The command was java -cp $PIGDIR/pig.jar:$HADOOP_CONF_DIR >>>>>> org.apache.pig.Main script1-hadoop.pig >>>>>> >>>>>> Please let me know if these step miss something ? >>>>>> >>>>>> Thanks, >>>>>> Rahul >>>>>> >>>>>> >>>>>> On Aug 26, 2010, at 6:33 PM, Jeff Zhang wrote: >>>>>> >>>>>>> Try to put the hadoop xml configuration file to pig/conf folder >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, Aug 26, 2010 at 6:22 PM, rahul <rmalv...@apple.com> wrote: >>>>>>>> Hi Jeff, >>>>>>>> >>>>>>>> I have set the hadoop conf in class path by setting $HADOOP_CONF_DIR >>>>>>>> variable. >>>>>>>> >>>>>>>> But I have both Pig and hadoop running at the same machine, so >>>>>>>> localhost should not make a difference. >>>>>>>> >>>>>>>> So I have used all the default config setting for the core-site.xml, >>>>>>>> hdfs-site.xml, mapred-site.xml, as per the hadoop tutorial. >>>>>>>> >>>>>>>> Please let me know if my understanding is correct ? >>>>>>>> >>>>>>>> I am attaching the conf files as well : >>>>>>>> hdfs-site.xml: >>>>>>>> >>>>>>>> <?xml version="1.0"?> >>>>>>>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> >>>>>>>> >>>>>>>> <!-- Put site-specific property overrides in this file. --> >>>>>>>> >>>>>>>> <configuration> >>>>>>>> <property> >>>>>>>> <name>fs.default.name</name> >>>>>>>> <value>hdfs://localhost:9000</value> >>>>>>>> <description>The name of the default file system. A URI whose >>>>>>>> scheme and authority determine the FileSystem implementation. The >>>>>>>> uri's scheme determines the config property (fs.SCHEME.impl) naming >>>>>>>> the FileSystem implementation class. The uri's authority is used to >>>>>>>> determine the host, port, etc. for a filesystem.</description> >>>>>>>> </property> >>>>>>>> >>>>>>>> <property> >>>>>>>> <name>dfs.replication</name> >>>>>>>> <value>1</value> >>>>>>>> <description>Default block replication. >>>>>>>> The actual number of replications can be specified when the file is >>>>>>>> created. >>>>>>>> The default is used if replication is not specified in create time. >>>>>>>> </description> >>>>>>>> </property> >>>>>>>> >>>>>>>> </configuration> >>>>>>>> >>>>>>>> core-site.xml >>>>>>>> <?xml version="1.0"?> >>>>>>>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> >>>>>>>> >>>>>>>> <!-- Put site-specific property overrides in this file. --> >>>>>>>> >>>>>>>> <configuration> >>>>>>>> <property> >>>>>>>> <name>hadoop.tmp.dir</name> >>>>>>>> >>>>>>>> <value>/Users/rahulmalviya/Documents/Hadoop/hadoop-0.21.0/hadoop-${user.name}</value> >>>>>>>> <description>A base for other temporary directories.</description> >>>>>>>> </property> >>>>>>>> </configuration> >>>>>>>> >>>>>>>> mapred-site.xml >>>>>>>> <?xml version="1.0"?> >>>>>>>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> >>>>>>>> >>>>>>>> <!-- Put site-specific property overrides in this file. --> >>>>>>>> >>>>>>>> <configuration> >>>>>>>> <property> >>>>>>>> <name>mapred.job.tracker</name> >>>>>>>> <value>localhost:9001</value> >>>>>>>> <description>The host and port that the MapReduce job tracker runs >>>>>>>> at. If "local", then jobs are run in-process as a single map >>>>>>>> and reduce task. >>>>>>>> </description> >>>>>>>> </property> >>>>>>>> >>>>>>>> <property> >>>>>>>> <name>mapred.tasktracker.tasks.maximum</name> >>>>>>>> <value>8</value> >>>>>>>> <description>The maximum number of tasks that will be run >>>>>>>> simultaneously by a >>>>>>>> a task tracker >>>>>>>> </description> >>>>>>>> </property> >>>>>>>> </configuration> >>>>>>>> >>>>>>>> Please let me know if there is a issue in my configurations ? Any >>>>>>>> input is valuable for me. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Rahul >>>>>>>> >>>>>>>> On Aug 26, 2010, at 6:10 PM, Jeff Zhang wrote: >>>>>>>> >>>>>>>>> Do you put the hadoop conf on classpath ? It seems you are still using >>>>>>>>> local file system but conncect Hadoop's JobTracker. >>>>>>>>> Make sure you set the correct configuration in core-site.xml >>>>>>>>> hdfs-site.xml, mapred-site.xml, and put them on classpath. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Aug 26, 2010 at 5:32 PM, rahul <rmalv...@apple.com> wrote: >>>>>>>>>> Hi , >>>>>>>>>> >>>>>>>>>> I am trying to integrate Pig with Hadoop for processing of jobs. >>>>>>>>>> >>>>>>>>>> I am able to run Pig in local mode and Hadoop with streaming api >>>>>>>>>> perfectly. >>>>>>>>>> >>>>>>>>>> But when I try to run Pig with Hadoop I get follwong Error: >>>>>>>>>> >>>>>>>>>> Pig Stack Trace >>>>>>>>>> --------------- >>>>>>>>>> ERROR 2116: Unexpected error. Could not validate the output >>>>>>>>>> specification for: >>>>>>>>>> file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out >>>>>>>>>> >>>>>>>>>> org.apache.pig.impl.plan.PlanValidationException: ERROR 0: An >>>>>>>>>> unexpected exception caused the validation to stop >>>>>>>>>> at >>>>>>>>>> org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:56) >>>>>>>>>> at >>>>>>>>>> org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:49) >>>>>>>>>> at >>>>>>>>>> org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:37) >>>>>>>>>> at >>>>>>>>>> org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:89) >>>>>>>>>> at org.apache.pig.PigServer.validate(PigServer.java:930) >>>>>>>>>> at org.apache.pig.PigServer.compileLp(PigServer.java:910) >>>>>>>>>> at org.apache.pig.PigServer.compileLp(PigServer.java:871) >>>>>>>>>> at org.apache.pig.PigServer.compileLp(PigServer.java:852) >>>>>>>>>> at org.apache.pig.PigServer.execute(PigServer.java:816) >>>>>>>>>> at org.apache.pig.PigServer.access$100(PigServer.java:105) >>>>>>>>>> at org.apache.pig.PigServer$Graph.execute(PigServer.java:1080) >>>>>>>>>> at org.apache.pig.PigServer.executeBatch(PigServer.java:288) >>>>>>>>>> at >>>>>>>>>> org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:109) >>>>>>>>>> at >>>>>>>>>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166) >>>>>>>>>> at >>>>>>>>>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138) >>>>>>>>>> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) >>>>>>>>>> at org.apache.pig.Main.main(Main.java:391) >>>>>>>>>> Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR >>>>>>>>>> 2116: Unexpected error. Could not validate the output specification >>>>>>>>>> for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out >>>>>>>>>> at >>>>>>>>>> org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:93) >>>>>>>>>> at >>>>>>>>>> org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:140) >>>>>>>>>> at >>>>>>>>>> org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:37) >>>>>>>>>> at >>>>>>>>>> org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67) >>>>>>>>>> at >>>>>>>>>> org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69) >>>>>>>>>> at >>>>>>>>>> org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69) >>>>>>>>>> at >>>>>>>>>> org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50) >>>>>>>>>> at >>>>>>>>>> org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51) >>>>>>>>>> at >>>>>>>>>> org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:50) >>>>>>>>>> ... 16 more >>>>>>>>>> Caused by: java.io.IOException: Call to localhost/127.0.0.1:9001 >>>>>>>>>> failed on local exception: java.io.EOFException >>>>>>>>>> at org.apache.hadoop.ipc.Client.wrapException(Client.java:775) >>>>>>>>>> at org.apache.hadoop.ipc.Client.call(Client.java:743) >>>>>>>>>> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.mapred.$Proxy0.getProtocolVersion(Unknown Source) >>>>>>>>>> at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:429) >>>>>>>>>> at org.apache.hadoop.mapred.JobClient.init(JobClient.java:423) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:410) >>>>>>>>>> at org.apache.hadoop.mapreduce.Job.<init>(Job.java:50) >>>>>>>>>> at >>>>>>>>>> org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:89) >>>>>>>>>> ... 24 more >>>>>>>>>> Caused by: java.io.EOFException >>>>>>>>>> at java.io.DataInputStream.readInt(DataInputStream.java:375) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501) >>>>>>>>>> at >>>>>>>>>> org.apache.hadoop.ipc.Client$Connection.run(Client.java:446) >>>>>>>>>> ================================================================================ >>>>>>>>>> >>>>>>>>>> Did anyone got the same error. I think it related to connection >>>>>>>>>> between pig and hadoop. >>>>>>>>>> >>>>>>>>>> Can someone tell me how to connect Pig and hadoop. >>>>>>>>>> >>>>>>>>>> Thanks. >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Best Regards >>>>>>>>> >>>>>>>>> Jeff Zhang >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Best Regards >>>>>>> >>>>>>> Jeff Zhang >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Best Regards >>>>> >>>>> Jeff Zhang >>>> >>>> >>> >>> >>> >>> -- >>> Best Regards >>> >>> Jeff Zhang >> >> > > > > -- > Best Regards > > Jeff Zhang