thx for confirming. so when follow the instructions to run the hadoop consumer (https://github.com/kafka-dev/kafka/tree/master/contrib/hadoop-consumer) i see my mapred job being submitted properly on hostB (jobtracker) but it always fails with :
console output on hostA 11/08/31 07:32:22 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 11/08/31 07:32:35 INFO mapred.FileInputFormat: Total input paths to process : 1 Hadoop job id=job_201108291829_0041 Exception in thread "main" java.lang.Exception: Hadoop ETL job failed! Please check status on http://localhost:9001/jobdetails.jsp?jobid=job_201108291829_0041 at kafka.etl.impl.SimpleKafkaETLJob.execute(SimpleKafkaETLJob.java:82) at kafka.etl.impl.SimpleKafkaETLJob.main(SimpleKafkaETLJob.java:100) in mapred log, for each tasktracker host i see : Meta VERSION="1" . Job JOBID="job_201108291829_0041" JOBNAME="SimpleKafakETL" USER="root" SUBMIT_TIME="1314747158794" JOBCONF="maprfs://10\.18\.125\.176:7222/var/mapr/cluster/mapred/jobTracker/staging/root/\.staging/job_201108291829_0041/job\.xml" VIEW_JOB="*" MODIFY_JOB="*" JOB_QUEUE="default" . Job JOBID="job_201108291829_0041" JOB_PRIORITY="NORMAL" . Job JOBID="job_201108291829_0041" JOB_STATUS="RUNNING" . Job JOBID="job_201108291829_0041" LAUNCH_TIME="1314747158885" TOTAL_MAPS="1" TOTAL_REDUCES="0" JOB_STATUS="PREP" . Task TASKID="task_201108291829_0041_m_000000" TASK_TYPE="MAP" START_TIME="1314747160010" SPLITS="/default-rack/hadoop2,/default-rack/hadoop9,/default-rack/hadoop6" . MapAttempt TASK_TYPE="MAP" TASKID="task_201108291829_0041_m_000000" TASK_ATTEMPT_ID="attempt_201108291829_0041_m_000000_0" START_TIME="1314747160121" TRACKER_NAME="tracker_hadoop9:localhost/127\.0\.0\.1:59411" HTTP_PORT="50060" . MapAttempt TASK_TYPE="MAP" TASKID="task_201108291829_0041_m_000000" TASK_ATTEMPT_ID="attempt_201108291829_0041_m_000000_0" TASK_STATUS="FAILED" FINISH_TIME="1314747164349" HOSTNAME="hadoop9" ERROR="java\.io\.IOException: java\.net\.ConnectException: Connection refused at kafka\.etl\.KafkaETLRecordReader\.next(KafkaETLRecordReader\.java:155) at kafka\.etl\.KafkaETLRecordReader\.next(KafkaETLRecordReader\.java:14) at org\.apache\.hadoop\.mapred\.MapTask$TrackedRecordReader\.moveToNext(MapTask\.java:210) at org\.apache\.hadoop\.mapred\.MapTask$TrackedRecordReader\.next(MapTask\.java:195) at org\.apache\.hadoop\.mapred\.MapRunner\.run(MapRunner\.java:48) at org\.apache\.hadoop\.mapred\.MapTask\.runOldMapper(MapTask\.java:393) at org\.apache\.hadoop\.mapred\.MapTask\.run(MapTask\.java:326) at org\.apache\.hadoop\.mapred\.Child$4\.run(Child\.java:268) at java\.security\.AccessController\.doPrivileged(Native Method) at javax\.security\.auth\.Subject\.doAs(Subject\.java:396) at org\.apache\.hadoop\.security\.UserGroupInformation\.doAs(UserGroupInformation\.java:1074) at org\.apache\.hadoop\.mapred\.Child\.main(Child\.java:262) Caused by: java\.net\.ConnectException: Connection refused at sun\.nio\.ch\.Net\.connect(Native Method) at sun\.nio\.ch\.SocketChannelImpl\.connect(SocketChannelImpl\.java:500) at kafka\.consumer\.SimpleConsumer\.connect(SimpleConsumer\.scala:54) at kafka\.consumer\.SimpleConsumer\.getOrMakeConnection(SimpleConsumer\.scala:193) at kafka\.consumer\.SimpleConsumer\.getOffsetsBefore(SimpleConsumer\.scala:156) at kafka\.javaapi\.consumer\.SimpleConsumer\.getOffsetsBefore(SimpleConsumer\.scala:65) at kafka\.etl\.KafkaETLContext\.getOffsetRange(KafkaETLContext\.java:209) at kafka\.etl\.KafkaETLContext\.<init>(KafkaETLContext\.java:97) at kafka\.etl\.KafkaETLRecordReader\.next(KafkaETLRecordReader\.java:115) \.\.\. 11 more which other port do i need to open between hostA and the tasktrackers ? please note i can send a simple non kafka job from the same hostA to hadoop and it completes successfully. Cheers, Ben- On Tue, Aug 30, 2011 at 12:04 PM, Richard Park <[email protected]> wrote: > The answer should be yes, each process should be able to run on different > hosts. We are currently doing this. > > Host A submits kafka hadoop job to the job tracker on Host B, > Host B then then connects to Host C (or many host C's) > > I planned on having a look at the example again to see if there are steps > there are missing, or if the examples need to be beefed up. > > Thanks, > -Richard > > > On Tue, Aug 30, 2011 at 11:39 AM, Ben Ciceron <[email protected]> wrote: > >> let me rephrase this: >> >> can any of the kafka process run outside the hadoop cluster as long as >> it can connect to the hadoop process from that host ? >> e.g : >> >> hostA (NOT in th hadoop cluster) : runs kafka hadoop consumer >> hostB (in th hadoop cluster) : runs jobtracker >> >> >> Cheers, >> Ben- >> >> >> >> >> On Mon, Aug 29, 2011 at 4:59 PM, Jun Rao <[email protected]> wrote: >> > My understanding is that it's not tied to localhost. You just need to >> change >> > the jobtracker setting in you Hadoop config. >> > >> > Thanks, >> > >> > Jun >> > >> > On Thu, Aug 25, 2011 at 4:31 PM, Ben Ciceron <[email protected]> wrote: >> > >> >> Hello, >> >> >> >> does kafka hadoop consumer expect the jobtracker to run locally only ? >> >> it seems it expect it locally (localhost/127.0.0.1:9001) . >> >> Is it a requirement or there is a way to change it to a remote uri ? >> >> >> >> Cheers, >> >> Ben- >> >> >> > >> >
