Reduce Hangs at 66%

2012-05-02 Thread Keith Thompson
I am running a task which gets to 66% of the Reduce step and then hangs
indefinitely. Here is the log file (I apologize if I am putting too much
here but I am not exactly sure what is relevant):

2012-05-02 16:42:52,975 INFO org.apache.hadoop.mapred.JobTracker:
Adding task (REDUCE) 'attempt_201202240659_6433_r_00_0' to tip
task_201202240659_6433_r_00, for tracker
'tracker_analytix7:localhost.localdomain/127.0.0.1:56515'
2012-05-02 16:42:53,584 INFO org.apache.hadoop.mapred.JobInProgress:
Task 'attempt_201202240659_6433_m_01_0' has completed
task_201202240659_6433_m_01 successfully.
2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.TaskInProgress:
Error from attempt_201202240659_6432_r_00_0: Task
attempt_201202240659_6432_r_00_0 failed to report status for 1800
seconds. Killing!
2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.JobTracker:
Removing task 'attempt_201202240659_6432_r_00_0'
2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.JobTracker:
Adding task (TASK_CLEANUP) 'attempt_201202240659_6432_r_00_0' to
tip task_201202240659_6432_r_00, for tracker
'tracker_analytix4:localhost.localdomain/127.0.0.1:44204'
2012-05-02 17:00:48,763 INFO org.apache.hadoop.mapred.JobTracker:
Removing task 'attempt_201202240659_6432_r_00_0'
2012-05-02 17:00:48,957 INFO org.apache.hadoop.mapred.JobTracker:
Adding task (REDUCE) 'attempt_201202240659_6432_r_00_1' to tip
task_201202240659_6432_r_00, for tracker
'tracker_analytix5:localhost.localdomain/127.0.0.1:59117'
2012-05-02 17:00:56,559 INFO org.apache.hadoop.mapred.TaskInProgress:
Error from attempt_201202240659_6432_r_00_1: java.io.IOException:
The temporary job-output directory
hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary
doesn't exist!
at 
org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
at 
org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:240)
at 
org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
at 
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:438)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
at org.apache.hadoop.mapred.Child.main(Child.java:262)

2012-05-02 17:00:59,903 INFO org.apache.hadoop.mapred.JobTracker:
Removing task 'attempt_201202240659_6432_r_00_1'
2012-05-02 17:00:59,906 INFO org.apache.hadoop.mapred.JobTracker:
Adding task (REDUCE) 'attempt_201202240659_6432_r_00_2' to tip
task_201202240659_6432_r_00, for tracker
'tracker_analytix3:localhost.localdomain/127.0.0.1:39980'
2012-05-02 17:01:07,200 INFO org.apache.hadoop.mapred.TaskInProgress:
Error from attempt_201202240659_6432_r_00_2: java.io.IOException:
The temporary job-output directory
hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary
doesn't exist!
at 
org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
at 
org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:240)
at 
org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
at 
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:438)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
at org.apache.hadoop.mapred.Child.main(Child.java:262)

2012-05-02 17:01:10,239 INFO org.apache.hadoop.mapred.JobTracker:
Removing task 'attempt_201202240659_6432_r_00_2'
2012-05-02 17:01:10,283 INFO org.apache.hadoop.mapred.JobTracker:
Adding task (REDUCE) 'attempt_201202240659_6432_r_00_3' to tip
task_201202240659_6432_r_00, for tracker
'tracker_analytix2:localhost.localdomain/127.0.0.1:33297'
2012-05-02 17:01:18,188 INFO org.apache.hadoop.mapred.TaskInProgress:
Error from attempt_201202240659_6432_r_00_3: java.io.IOException:
The temporary job-output directory
hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary
doesn't exist!
at 
org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
at 
org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:240)
at 
org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
at 
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:438)

Re: Reduce Hangs at 66%

2012-05-03 Thread Michel Segel
Well...
Lots of information but still missing some of the basics...

Which release and version?
What are your ulimits set to?
How much free disk space do you have?
What are you attempting to do?

Stuff like that.



Sent from a remote device. Please excuse any typos...

Mike Segel

On May 2, 2012, at 4:49 PM, Keith Thompson  wrote:

> I am running a task which gets to 66% of the Reduce step and then hangs
> indefinitely. Here is the log file (I apologize if I am putting too much
> here but I am not exactly sure what is relevant):
> 
> 2012-05-02 16:42:52,975 INFO org.apache.hadoop.mapred.JobTracker:
> Adding task (REDUCE) 'attempt_201202240659_6433_r_00_0' to tip
> task_201202240659_6433_r_00, for tracker
> 'tracker_analytix7:localhost.localdomain/127.0.0.1:56515'
> 2012-05-02 16:42:53,584 INFO org.apache.hadoop.mapred.JobInProgress:
> Task 'attempt_201202240659_6433_m_01_0' has completed
> task_201202240659_6433_m_01 successfully.
> 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.TaskInProgress:
> Error from attempt_201202240659_6432_r_00_0: Task
> attempt_201202240659_6432_r_00_0 failed to report status for 1800
> seconds. Killing!
> 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.JobTracker:
> Removing task 'attempt_201202240659_6432_r_00_0'
> 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.JobTracker:
> Adding task (TASK_CLEANUP) 'attempt_201202240659_6432_r_00_0' to
> tip task_201202240659_6432_r_00, for tracker
> 'tracker_analytix4:localhost.localdomain/127.0.0.1:44204'
> 2012-05-02 17:00:48,763 INFO org.apache.hadoop.mapred.JobTracker:
> Removing task 'attempt_201202240659_6432_r_00_0'
> 2012-05-02 17:00:48,957 INFO org.apache.hadoop.mapred.JobTracker:
> Adding task (REDUCE) 'attempt_201202240659_6432_r_00_1' to tip
> task_201202240659_6432_r_00, for tracker
> 'tracker_analytix5:localhost.localdomain/127.0.0.1:59117'
> 2012-05-02 17:00:56,559 INFO org.apache.hadoop.mapred.TaskInProgress:
> Error from attempt_201202240659_6432_r_00_1: java.io.IOException:
> The temporary job-output directory
> hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary
> doesn't exist!
>at 
> org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
>at 
> org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:240)
>at 
> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
>at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:438)
>at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
>at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>at java.security.AccessController.doPrivileged(Native Method)
>at javax.security.auth.Subject.doAs(Subject.java:396)
>at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>at org.apache.hadoop.mapred.Child.main(Child.java:262)
> 
> 2012-05-02 17:00:59,903 INFO org.apache.hadoop.mapred.JobTracker:
> Removing task 'attempt_201202240659_6432_r_00_1'
> 2012-05-02 17:00:59,906 INFO org.apache.hadoop.mapred.JobTracker:
> Adding task (REDUCE) 'attempt_201202240659_6432_r_00_2' to tip
> task_201202240659_6432_r_00, for tracker
> 'tracker_analytix3:localhost.localdomain/127.0.0.1:39980'
> 2012-05-02 17:01:07,200 INFO org.apache.hadoop.mapred.TaskInProgress:
> Error from attempt_201202240659_6432_r_00_2: java.io.IOException:
> The temporary job-output directory
> hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary
> doesn't exist!
>at 
> org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
>at 
> org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:240)
>at 
> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
>at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:438)
>at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
>at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>at java.security.AccessController.doPrivileged(Native Method)
>at javax.security.auth.Subject.doAs(Subject.java:396)
>at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>at org.apache.hadoop.mapred.Child.main(Child.java:262)
> 
> 2012-05-02 17:01:10,239 INFO org.apache.hadoop.mapred.JobTracker:
> Removing task 'attempt_201202240659_6432_r_00_2'
> 2012-05-02 17:01:10,283 INFO org.apache.hadoop.mapred.JobTracker:
> Adding task (REDUCE) 'attempt_201202240659_6432_r_00_3' to tip
> task_201202240659_6432_r_00, for tracker
> 'tracker_analytix2:localhost.localdomain/127.0.0.1:33297'
> 2012-05-02 17:01:18,188 INFO org.apache.hadoop.mapred.TaskInProgress:
> Error from attempt_201202240659_6432_r_00_3: java.io.IOException:
> The temporary job-output directory
> hdfs://analytix1:9000/thompson/outputDensity/density1/_tem

Re: Reduce Hangs at 66%

2012-05-03 Thread Keith Thompson
I am not sure about ulimits, but I can answer the rest. It's a Cloudera
distribution of Hadoop 0.20.2. The HDFS has 9 TB free. In the reduce step,
I am taking keys in the form of (gridID, date), each with a value of 1. The
reduce step just sums the 1's as the final output value for the key (It's
counting how many people were in the gridID on a certain day).

I have been running other more complicated jobs with no problem, so I'm not
sure why this one is being peculiar. This is the code I used to execute the
program from the command line (the source is a file on the hdfs):

hadoop jar/thompson/outputDensity/density1

The program then executes the map and gets to 66% of the reduce, then stops
responding. The cause of the error seems to be:

Error from attempt_201202240659_6432_r_00_1: java.io.IOException:
> The temporary job-output directory
> hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary
> doesn't exist!

I don't understand what the _temporary is. I am assuming it's something
Hadoop creates automatically.



On Thu, May 3, 2012 at 5:02 AM, Michel Segel wrote:

> Well...
> Lots of information but still missing some of the basics...
>
> Which release and version?
> What are your ulimits set to?
> How much free disk space do you have?
> What are you attempting to do?
>
> Stuff like that.
>
>
>
> Sent from a remote device. Please excuse any typos...
>
> Mike Segel
>
> On May 2, 2012, at 4:49 PM, Keith Thompson 
> wrote:
>
> > I am running a task which gets to 66% of the Reduce step and then hangs
> > indefinitely. Here is the log file (I apologize if I am putting too much
> > here but I am not exactly sure what is relevant):
> >
> > 2012-05-02 16:42:52,975 INFO org.apache.hadoop.mapred.JobTracker:
> > Adding task (REDUCE) 'attempt_201202240659_6433_r_00_0' to tip
> > task_201202240659_6433_r_00, for tracker
> > 'tracker_analytix7:localhost.localdomain/127.0.0.1:56515'
> > 2012-05-02 16:42:53,584 INFO org.apache.hadoop.mapred.JobInProgress:
> > Task 'attempt_201202240659_6433_m_01_0' has completed
> > task_201202240659_6433_m_01 successfully.
> > 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.TaskInProgress:
> > Error from attempt_201202240659_6432_r_00_0: Task
> > attempt_201202240659_6432_r_00_0 failed to report status for 1800
> > seconds. Killing!
> > 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.JobTracker:
> > Removing task 'attempt_201202240659_6432_r_00_0'
> > 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.JobTracker:
> > Adding task (TASK_CLEANUP) 'attempt_201202240659_6432_r_00_0' to
> > tip task_201202240659_6432_r_00, for tracker
> > 'tracker_analytix4:localhost.localdomain/127.0.0.1:44204'
> > 2012-05-02 17:00:48,763 INFO org.apache.hadoop.mapred.JobTracker:
> > Removing task 'attempt_201202240659_6432_r_00_0'
> > 2012-05-02 17:00:48,957 INFO org.apache.hadoop.mapred.JobTracker:
> > Adding task (REDUCE) 'attempt_201202240659_6432_r_00_1' to tip
> > task_201202240659_6432_r_00, for tracker
> > 'tracker_analytix5:localhost.localdomain/127.0.0.1:59117'
> > 2012-05-02 17:00:56,559 INFO org.apache.hadoop.mapred.TaskInProgress:
> > Error from attempt_201202240659_6432_r_00_1: java.io.IOException:
> > The temporary job-output directory
> > hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary
> > doesn't exist!
> >at
> org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
> >at
> org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:240)
> >at
> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
> >at
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:438)
> >at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
> >at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
> >at java.security.AccessController.doPrivileged(Native Method)
> >at javax.security.auth.Subject.doAs(Subject.java:396)
> >at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
> >at org.apache.hadoop.mapred.Child.main(Child.java:262)
> >
> > 2012-05-02 17:00:59,903 INFO org.apache.hadoop.mapred.JobTracker:
> > Removing task 'attempt_201202240659_6432_r_00_1'
> > 2012-05-02 17:00:59,906 INFO org.apache.hadoop.mapred.JobTracker:
> > Adding task (REDUCE) 'attempt_201202240659_6432_r_00_2' to tip
> > task_201202240659_6432_r_00, for tracker
> > 'tracker_analytix3:localhost.localdomain/127.0.0.1:39980'
> > 2012-05-02 17:01:07,200 INFO org.apache.hadoop.mapred.TaskInProgress:
> > Error from attempt_201202240659_6432_r_00_2: java.io.IOException:
> > The temporary job-output directory
> > hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary
> > doesn't exist!
> >at
> org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
> >at
> org.apache.hadoop.mapred.FileOutputFormat.getT

Re: Reduce Hangs at 66%

2012-05-03 Thread Raj Vishwanathan
Keith

What is the the output for ulimit -n? Your value for number of open files is 
probably too low.

Raj




>
> From: Keith Thompson 
>To: common-user@hadoop.apache.org 
>Sent: Thursday, May 3, 2012 4:33 PM
>Subject: Re: Reduce Hangs at 66%
> 
>I am not sure about ulimits, but I can answer the rest. It's a Cloudera
>distribution of Hadoop 0.20.2. The HDFS has 9 TB free. In the reduce step,
>I am taking keys in the form of (gridID, date), each with a value of 1. The
>reduce step just sums the 1's as the final output value for the key (It's
>counting how many people were in the gridID on a certain day).
>
>I have been running other more complicated jobs with no problem, so I'm not
>sure why this one is being peculiar. This is the code I used to execute the
>program from the command line (the source is a file on the hdfs):
>
>hadoop jar/thompson/outputDensity/density1
>
>The program then executes the map and gets to 66% of the reduce, then stops
>responding. The cause of the error seems to be:
>
>Error from attempt_201202240659_6432_r_00_1: java.io.IOException:
>> The temporary job-output directory
>> hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary
>> doesn't exist!
>
>I don't understand what the _temporary is. I am assuming it's something
>Hadoop creates automatically.
>
>
>
>On Thu, May 3, 2012 at 5:02 AM, Michel Segel wrote:
>
>> Well...
>> Lots of information but still missing some of the basics...
>>
>> Which release and version?
>> What are your ulimits set to?
>> How much free disk space do you have?
>> What are you attempting to do?
>>
>> Stuff like that.
>>
>>
>>
>> Sent from a remote device. Please excuse any typos...
>>
>> Mike Segel
>>
>> On May 2, 2012, at 4:49 PM, Keith Thompson 
>> wrote:
>>
>> > I am running a task which gets to 66% of the Reduce step and then hangs
>> > indefinitely. Here is the log file (I apologize if I am putting too much
>> > here but I am not exactly sure what is relevant):
>> >
>> > 2012-05-02 16:42:52,975 INFO org.apache.hadoop.mapred.JobTracker:
>> > Adding task (REDUCE) 'attempt_201202240659_6433_r_00_0' to tip
>> > task_201202240659_6433_r_00, for tracker
>> > 'tracker_analytix7:localhost.localdomain/127.0.0.1:56515'
>> > 2012-05-02 16:42:53,584 INFO org.apache.hadoop.mapred.JobInProgress:
>> > Task 'attempt_201202240659_6433_m_01_0' has completed
>> > task_201202240659_6433_m_01 successfully.
>> > 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.TaskInProgress:
>> > Error from attempt_201202240659_6432_r_00_0: Task
>> > attempt_201202240659_6432_r_00_0 failed to report status for 1800
>> > seconds. Killing!
>> > 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.JobTracker:
>> > Removing task 'attempt_201202240659_6432_r_00_0'
>> > 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.JobTracker:
>> > Adding task (TASK_CLEANUP) 'attempt_201202240659_6432_r_00_0' to
>> > tip task_201202240659_6432_r_00, for tracker
>> > 'tracker_analytix4:localhost.localdomain/127.0.0.1:44204'
>> > 2012-05-02 17:00:48,763 INFO org.apache.hadoop.mapred.JobTracker:
>> > Removing task 'attempt_201202240659_6432_r_00_0'
>> > 2012-05-02 17:00:48,957 INFO org.apache.hadoop.mapred.JobTracker:
>> > Adding task (REDUCE) 'attempt_201202240659_6432_r_00_1' to tip
>> > task_201202240659_6432_r_00, for tracker
>> > 'tracker_analytix5:localhost.localdomain/127.0.0.1:59117'
>> > 2012-05-02 17:00:56,559 INFO org.apache.hadoop.mapred.TaskInProgress:
>> > Error from attempt_201202240659_6432_r_00_1: java.io.IOException:
>> > The temporary job-output directory
>> > hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary
>> > doesn't exist!
>> >    at
>> org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
>> >    at
>> org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:240)
>> >    at
>> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
>> >    at
>> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:438)
>> >    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
>> >    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)

Re: Reduce Hangs at 66%

2012-05-04 Thread Michael Segel
Well 
That was one of the things I had asked. 
ulimit -a says it all. 

But you have to do this for the users... hdfs, mapred, and hadoop

(Which is why I asked about which flavor.)

On May 3, 2012, at 7:03 PM, Raj Vishwanathan wrote:

> Keith
> 
> What is the the output for ulimit -n? Your value for number of open files is 
> probably too low.
> 
> Raj
> 
> 
> 
> 
>> 
>> From: Keith Thompson 
>> To: common-user@hadoop.apache.org 
>> Sent: Thursday, May 3, 2012 4:33 PM
>> Subject: Re: Reduce Hangs at 66%
>> 
>> I am not sure about ulimits, but I can answer the rest. It's a Cloudera
>> distribution of Hadoop 0.20.2. The HDFS has 9 TB free. In the reduce step,
>> I am taking keys in the form of (gridID, date), each with a value of 1. The
>> reduce step just sums the 1's as the final output value for the key (It's
>> counting how many people were in the gridID on a certain day).
>> 
>> I have been running other more complicated jobs with no problem, so I'm not
>> sure why this one is being peculiar. This is the code I used to execute the
>> program from the command line (the source is a file on the hdfs):
>> 
>> hadoop jar/thompson/outputDensity/density1
>> 
>> The program then executes the map and gets to 66% of the reduce, then stops
>> responding. The cause of the error seems to be:
>> 
>> Error from attempt_201202240659_6432_r_00_1: java.io.IOException:
>>> The temporary job-output directory
>>> hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary
>>> doesn't exist!
>> 
>> I don't understand what the _temporary is. I am assuming it's something
>> Hadoop creates automatically.
>> 
>> 
>> 
>> On Thu, May 3, 2012 at 5:02 AM, Michel Segel 
>> wrote:
>> 
>>> Well...
>>> Lots of information but still missing some of the basics...
>>> 
>>> Which release and version?
>>> What are your ulimits set to?
>>> How much free disk space do you have?
>>> What are you attempting to do?
>>> 
>>> Stuff like that.
>>> 
>>> 
>>> 
>>> Sent from a remote device. Please excuse any typos...
>>> 
>>> Mike Segel
>>> 
>>> On May 2, 2012, at 4:49 PM, Keith Thompson 
>>> wrote:
>>> 
>>>> I am running a task which gets to 66% of the Reduce step and then hangs
>>>> indefinitely. Here is the log file (I apologize if I am putting too much
>>>> here but I am not exactly sure what is relevant):
>>>> 
>>>> 2012-05-02 16:42:52,975 INFO org.apache.hadoop.mapred.JobTracker:
>>>> Adding task (REDUCE) 'attempt_201202240659_6433_r_00_0' to tip
>>>> task_201202240659_6433_r_00, for tracker
>>>> 'tracker_analytix7:localhost.localdomain/127.0.0.1:56515'
>>>> 2012-05-02 16:42:53,584 INFO org.apache.hadoop.mapred.JobInProgress:
>>>> Task 'attempt_201202240659_6433_m_01_0' has completed
>>>> task_201202240659_6433_m_01 successfully.
>>>> 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.TaskInProgress:
>>>> Error from attempt_201202240659_6432_r_00_0: Task
>>>> attempt_201202240659_6432_r_00_0 failed to report status for 1800
>>>> seconds. Killing!
>>>> 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.JobTracker:
>>>> Removing task 'attempt_201202240659_6432_r_00_0'
>>>> 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.JobTracker:
>>>> Adding task (TASK_CLEANUP) 'attempt_201202240659_6432_r_00_0' to
>>>> tip task_201202240659_6432_r_00, for tracker
>>>> 'tracker_analytix4:localhost.localdomain/127.0.0.1:44204'
>>>> 2012-05-02 17:00:48,763 INFO org.apache.hadoop.mapred.JobTracker:
>>>> Removing task 'attempt_201202240659_6432_r_00_0'
>>>> 2012-05-02 17:00:48,957 INFO org.apache.hadoop.mapred.JobTracker:
>>>> Adding task (REDUCE) 'attempt_201202240659_6432_r_00_1' to tip
>>>> task_201202240659_6432_r_00, for tracker
>>>> 'tracker_analytix5:localhost.localdomain/127.0.0.1:59117'
>>>> 2012-05-02 17:00:56,559 INFO org.apache.hadoop.mapred.TaskInProgress:
>>>> Error from attempt_201202240659_6432_r_00_1: java.io.IOException:
>>>> The temporary job-output directory
>>>> hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary
>>>>

Re: Reduce Hangs at 66%

2012-05-04 Thread Keith Thompson
Thanks everyone for your help. It is running fine now.


On Fri, May 4, 2012 at 11:22 AM, Michael Segel wrote:

> Well
> That was one of the things I had asked.
> ulimit -a says it all.
>
> But you have to do this for the users... hdfs, mapred, and hadoop
>
> (Which is why I asked about which flavor.)
>
> On May 3, 2012, at 7:03 PM, Raj Vishwanathan wrote:
>
> > Keith
> >
> > What is the the output for ulimit -n? Your value for number of open
> files is probably too low.
> >
> > Raj
> >
> >
> >
> >
> >> 
> >> From: Keith Thompson 
> >> To: common-user@hadoop.apache.org
> >> Sent: Thursday, May 3, 2012 4:33 PM
> >> Subject: Re: Reduce Hangs at 66%
> >>
> >> I am not sure about ulimits, but I can answer the rest. It's a Cloudera
> >> distribution of Hadoop 0.20.2. The HDFS has 9 TB free. In the reduce
> step,
> >> I am taking keys in the form of (gridID, date), each with a value of 1.
> The
> >> reduce step just sums the 1's as the final output value for the key
> (It's
> >> counting how many people were in the gridID on a certain day).
> >>
> >> I have been running other more complicated jobs with no problem, so I'm
> not
> >> sure why this one is being peculiar. This is the code I used to execute
> the
> >> program from the command line (the source is a file on the hdfs):
> >>
> >> hadoop jar/thompson/outputDensity/density1
> >>
> >> The program then executes the map and gets to 66% of the reduce, then
> stops
> >> responding. The cause of the error seems to be:
> >>
> >> Error from attempt_201202240659_6432_r_00_1: java.io.IOException:
> >>> The temporary job-output directory
> >>> hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary
> >>> doesn't exist!
> >>
> >> I don't understand what the _temporary is. I am assuming it's something
> >> Hadoop creates automatically.
> >>
> >>
> >>
> >> On Thu, May 3, 2012 at 5:02 AM, Michel Segel  >wrote:
> >>
> >>> Well...
> >>> Lots of information but still missing some of the basics...
> >>>
> >>> Which release and version?
> >>> What are your ulimits set to?
> >>> How much free disk space do you have?
> >>> What are you attempting to do?
> >>>
> >>> Stuff like that.
> >>>
> >>>
> >>>
> >>> Sent from a remote device. Please excuse any typos...
> >>>
> >>> Mike Segel
> >>>
> >>> On May 2, 2012, at 4:49 PM, Keith Thompson 
> >>> wrote:
> >>>
> >>>> I am running a task which gets to 66% of the Reduce step and then
> hangs
> >>>> indefinitely. Here is the log file (I apologize if I am putting too
> much
> >>>> here but I am not exactly sure what is relevant):
> >>>>
> >>>> 2012-05-02 16:42:52,975 INFO org.apache.hadoop.mapred.JobTracker:
> >>>> Adding task (REDUCE) 'attempt_201202240659_6433_r_00_0' to tip
> >>>> task_201202240659_6433_r_00, for tracker
> >>>> 'tracker_analytix7:localhost.localdomain/127.0.0.1:56515'
> >>>> 2012-05-02 16:42:53,584 INFO org.apache.hadoop.mapred.JobInProgress:
> >>>> Task 'attempt_201202240659_6433_m_01_0' has completed
> >>>> task_201202240659_6433_m_01 successfully.
> >>>> 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.TaskInProgress:
> >>>> Error from attempt_201202240659_6432_r_00_0: Task
> >>>> attempt_201202240659_6432_r_00_0 failed to report status for 1800
> >>>> seconds. Killing!
> >>>> 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.JobTracker:
> >>>> Removing task 'attempt_201202240659_6432_r_00_0'
> >>>> 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.JobTracker:
> >>>> Adding task (TASK_CLEANUP) 'attempt_201202240659_6432_r_00_0' to
> >>>> tip task_201202240659_6432_r_00, for tracker
> >>>> 'tracker_analytix4:localhost.localdomain/127.0.0.1:44204'
> >>>> 2012-05-02 17:00:48,763 INFO org.apache.hadoop.mapred.JobTracker:
> >>>> Removing task 'attempt_201202240659_6432_r_00_0'
> >>>> 2012-