Re: jobtracker / hadoop comsumer

Ben Ciceron Tue, 30 Aug 2011 16:51:02 -0700

thx for confirming.

so when follow the instructions to run the hadoop consumer
(https://github.com/kafka-dev/kafka/tree/master/contrib/hadoop-consumer)
i see my mapred job being submitted properly on hostB (jobtracker) but
it always fails with :


console output on hostA

11/08/31 07:32:22 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the
same.
11/08/31 07:32:35 INFO mapred.FileInputFormat: Total input paths to process : 1
Hadoop job id=job_201108291829_0041
Exception in thread "main" java.lang.Exception: Hadoop ETL job failed!
Please check status on
http://localhost:9001/jobdetails.jsp?jobid=job_201108291829_0041
        at kafka.etl.impl.SimpleKafkaETLJob.execute(SimpleKafkaETLJob.java:82)
        at kafka.etl.impl.SimpleKafkaETLJob.main(SimpleKafkaETLJob.java:100)

in mapred log, for each tasktracker host i see :

Meta VERSION="1" .
Job JOBID="job_201108291829_0041" JOBNAME="SimpleKafakETL" USER="root"
SUBMIT_TIME="1314747158794"
JOBCONF="maprfs://10\.18\.125\.176:7222/var/mapr/cluster/mapred/jobTracker/staging/root/\.staging/job_201108291829_0041/job\.xml"
VIEW_JOB="*" MODIFY_JOB="*" JOB_QUEUE="default" .
Job JOBID="job_201108291829_0041" JOB_PRIORITY="NORMAL" .
Job JOBID="job_201108291829_0041" JOB_STATUS="RUNNING" .
Job JOBID="job_201108291829_0041" LAUNCH_TIME="1314747158885"
TOTAL_MAPS="1" TOTAL_REDUCES="0" JOB_STATUS="PREP" .
Task TASKID="task_201108291829_0041_m_000000" TASK_TYPE="MAP"
START_TIME="1314747160010"
SPLITS="/default-rack/hadoop2,/default-rack/hadoop9,/default-rack/hadoop6"
.
MapAttempt TASK_TYPE="MAP" TASKID="task_201108291829_0041_m_000000"
TASK_ATTEMPT_ID="attempt_201108291829_0041_m_000000_0"
START_TIME="1314747160121"
TRACKER_NAME="tracker_hadoop9:localhost/127\.0\.0\.1:59411"
HTTP_PORT="50060" .
MapAttempt TASK_TYPE="MAP" TASKID="task_201108291829_0041_m_000000"
TASK_ATTEMPT_ID="attempt_201108291829_0041_m_000000_0"
TASK_STATUS="FAILED" FINISH_TIME="1314747164349" HOSTNAME="hadoop9"
ERROR="java\.io\.IOException: java\.net\.ConnectException: Connection
refused
        at 
kafka\.etl\.KafkaETLRecordReader\.next(KafkaETLRecordReader\.java:155)
        at kafka\.etl\.KafkaETLRecordReader\.next(KafkaETLRecordReader\.java:14)
        at 
org\.apache\.hadoop\.mapred\.MapTask$TrackedRecordReader\.moveToNext(MapTask\.java:210)
        at 
org\.apache\.hadoop\.mapred\.MapTask$TrackedRecordReader\.next(MapTask\.java:195)
        at org\.apache\.hadoop\.mapred\.MapRunner\.run(MapRunner\.java:48)
        at org\.apache\.hadoop\.mapred\.MapTask\.runOldMapper(MapTask\.java:393)
        at org\.apache\.hadoop\.mapred\.MapTask\.run(MapTask\.java:326)
        at org\.apache\.hadoop\.mapred\.Child$4\.run(Child\.java:268)
        at java\.security\.AccessController\.doPrivileged(Native Method)
        at javax\.security\.auth\.Subject\.doAs(Subject\.java:396)
        at 
org\.apache\.hadoop\.security\.UserGroupInformation\.doAs(UserGroupInformation\.java:1074)
        at org\.apache\.hadoop\.mapred\.Child\.main(Child\.java:262)
Caused by: java\.net\.ConnectException: Connection refused
        at sun\.nio\.ch\.Net\.connect(Native Method)
        at sun\.nio\.ch\.SocketChannelImpl\.connect(SocketChannelImpl\.java:500)
        at kafka\.consumer\.SimpleConsumer\.connect(SimpleConsumer\.scala:54)
        at 
kafka\.consumer\.SimpleConsumer\.getOrMakeConnection(SimpleConsumer\.scala:193)
        at 
kafka\.consumer\.SimpleConsumer\.getOffsetsBefore(SimpleConsumer\.scala:156)
        at 
kafka\.javaapi\.consumer\.SimpleConsumer\.getOffsetsBefore(SimpleConsumer\.scala:65)
        at 
kafka\.etl\.KafkaETLContext\.getOffsetRange(KafkaETLContext\.java:209)
        at kafka\.etl\.KafkaETLContext\.<init>(KafkaETLContext\.java:97)
        at 
kafka\.etl\.KafkaETLRecordReader\.next(KafkaETLRecordReader\.java:115)
        \.\.\. 11 more

which other port do i need to open between hostA and the tasktrackers ?
please note i can send a simple non kafka job from the same hostA to
hadoop and it completes successfully.

Cheers,
Ben-




On Tue, Aug 30, 2011 at 12:04 PM, Richard Park <[email protected]> wrote:
> The answer should be yes, each process should be able to run on different
> hosts. We are currently doing this.
>
> Host A submits kafka hadoop job to the job tracker on Host B,
> Host B then then connects to Host C (or many host C's)
>
> I planned on having a look at the example again to see if there are steps
> there are missing, or if the examples need to be beefed up.
>
> Thanks,
> -Richard
>
>
> On Tue, Aug 30, 2011 at 11:39 AM, Ben Ciceron <[email protected]> wrote:
>
>> let me rephrase this:
>>
>> can any of the kafka process run outside the hadoop cluster as long as
>> it can connect to the hadoop process from that host ?
>> e.g :
>>
>> hostA (NOT in th hadoop cluster) : runs kafka hadoop consumer
>> hostB (in th hadoop cluster) : runs jobtracker
>>
>>
>> Cheers,
>> Ben-
>>
>>
>>
>>
>> On Mon, Aug 29, 2011 at 4:59 PM, Jun Rao <[email protected]> wrote:
>> > My understanding is that it's not tied to localhost. You just need to
>> change
>> > the jobtracker setting in you Hadoop config.
>> >
>> > Thanks,
>> >
>> > Jun
>> >
>> > On Thu, Aug 25, 2011 at 4:31 PM, Ben Ciceron <[email protected]> wrote:
>> >
>> >> Hello,
>> >>
>> >> does kafka hadoop consumer expect the jobtracker to run locally only ?
>> >> it seems it expect it locally (localhost/127.0.0.1:9001) .
>> >> Is it a requirement or there is a way to change it to a remote uri ?
>> >>
>> >> Cheers,
>> >> Ben-
>> >>
>> >
>>
>

Re: jobtracker / hadoop comsumer

Reply via email to