Repeated attempts to kill old job?

2010-02-01 Thread Meng Mao
On our worker nodes, I see repeated requests for KillJobActions for the same
old job:
2010-01-31 00:00:01,024 INFO org.apache.hadoop.mapred.TaskTracker: Received
'KillJobAction' for job: job_201001261532_0690
2010-01-31 00:00:01,064 WARN org.apache.hadoop.mapred.TaskTracker: Unknown
job job_201001261532_0690 being deleted.

This request and response is repeated almost 30k times over the course of
the day. Other nodes have the same behavior, except with different job ids.
The jobs presumably all ran in the past, to completion or got killed
manually. We use the grid fairly actively and that job is several hundred
increments old.

Has anyone seen this before? Is there a way to stop it?


EOFException and BadLink, but file descriptors number is ok?

2010-02-02 Thread Meng Mao
I've been trying to run a fairly small input file (300MB) on Cloudera Hadoop
0.20.1. The job I'm using probably writes to on the order of over 1000
part-files at once, across the whole grid. The grid has 33 nodes in it. I
get the following exception in the run logs:

10/01/30 17:24:25 INFO mapred.JobClient:  map 100% reduce 12%
10/01/30 17:24:25 INFO mapred.JobClient: Task Id :
attempt_201001261532_1137_r_13_0, Status : FAILED
java.io.EOFException
at java.io.DataInputStream.readByte(DataInputStream.java:250)
at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
at org.apache.hadoop.io.Text.readString(Text.java:400)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2869)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2794)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2077)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2263)

lots of EOFExceptions

10/01/30 17:24:25 INFO mapred.JobClient: Task Id :
attempt_201001261532_1137_r_19_0, Status : FAILED
java.io.IOException: Bad connect ack with firstBadLink 10.2.19.1:50010
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2871)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2794)
 at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2077)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2263)

10/01/30 17:24:36 INFO mapred.JobClient:  map 100% reduce 11%
10/01/30 17:24:42 INFO mapred.JobClient:  map 100% reduce 12%
10/01/30 17:24:49 INFO mapred.JobClient:  map 100% reduce 13%
10/01/30 17:24:55 INFO mapred.JobClient:  map 100% reduce 14%
10/01/30 17:25:00 INFO mapred.JobClient:  map 100% reduce 15%

>From searching around, it seems like the most common cause of BadLink and
EOFExceptions is when the nodes don't have enough file descriptors set. But
across all the grid machines, the file-max has been set to 1573039.
Furthermore, we set ulimit -n to 65536 using hadoop-env.sh.

Where else should I be looking for what's causing this?


Re: Repeated attempts to kill old job?

2010-02-02 Thread Meng Mao
We restarted the grid and that did kill the repeated KillJobAction attempts.
I forgot to look around with hadoop -dfsadmin, though.

On Mon, Feb 1, 2010 at 11:29 PM, Rekha Joshi  wrote:

> I would say restart the cluster, but suspect that would not help either -
> instead try checking up your running process list (eg: perl/shell script or
> a ETL pipeline job) to analyze/kill.
> Also wondering if any hadoop -dfsadmin commands can supersede this
> scenario..
>
> Cheers,
> /R
>
>
> On 2/2/10 2:50 AM, "Meng Mao"  wrote:
>
> On our worker nodes, I see repeated requests for KillJobActions for the
> same
> old job:
> 2010-01-31 00:00:01,024 INFO org.apache.hadoop.mapred.TaskTracker: Received
> 'KillJobAction' for job: job_201001261532_0690
> 2010-01-31 00:00:01,064 WARN org.apache.hadoop.mapred.TaskTracker: Unknown
> job job_201001261532_0690 being deleted.
>
> This request and response is repeated almost 30k times over the course of
> the day. Other nodes have the same behavior, except with different job ids.
> The jobs presumably all ran in the past, to completion or got killed
> manually. We use the grid fairly actively and that job is several hundred
> increments old.
>
> Has anyone seen this before? Is there a way to stop it?
>
>


Re: EOFException and BadLink, but file descriptors number is ok?

2010-02-03 Thread Meng Mao
also, which is the ulimit that's important, the one for the user who is
running the job, or the hadoop user that owns the Hadoop processes?

On Tue, Feb 2, 2010 at 7:29 PM, Meng Mao  wrote:

> I've been trying to run a fairly small input file (300MB) on Cloudera
> Hadoop 0.20.1. The job I'm using probably writes to on the order of over
> 1000 part-files at once, across the whole grid. The grid has 33 nodes in it.
> I get the following exception in the run logs:
>
> 10/01/30 17:24:25 INFO mapred.JobClient:  map 100% reduce 12%
> 10/01/30 17:24:25 INFO mapred.JobClient: Task Id :
> attempt_201001261532_1137_r_13_0, Status : FAILED
> java.io.EOFException
> at java.io.DataInputStream.readByte(DataInputStream.java:250)
> at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
> at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
> at org.apache.hadoop.io.Text.readString(Text.java:400)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2869)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2794)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2077)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2263)
>
> lots of EOFExceptions
>
> 10/01/30 17:24:25 INFO mapred.JobClient: Task Id :
> attempt_201001261532_1137_r_19_0, Status : FAILED
> java.io.IOException: Bad connect ack with firstBadLink 10.2.19.1:50010
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2871)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2794)
>  at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2077)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2263)
>
> 10/01/30 17:24:36 INFO mapred.JobClient:  map 100% reduce 11%
> 10/01/30 17:24:42 INFO mapred.JobClient:  map 100% reduce 12%
> 10/01/30 17:24:49 INFO mapred.JobClient:  map 100% reduce 13%
> 10/01/30 17:24:55 INFO mapred.JobClient:  map 100% reduce 14%
> 10/01/30 17:25:00 INFO mapred.JobClient:  map 100% reduce 15%
>
> From searching around, it seems like the most common cause of BadLink and
> EOFExceptions is when the nodes don't have enough file descriptors set. But
> across all the grid machines, the file-max has been set to 1573039.
> Furthermore, we set ulimit -n to 65536 using hadoop-env.sh.
>
> Where else should I be looking for what's causing this?
>


Re: EOFException and BadLink, but file descriptors number is ok?

2010-02-04 Thread Meng Mao
I wrote a hadoop job that checks for ulimits across the nodes, and every
node is reporting:
core file size  (blocks, -c) 0
data seg size   (kbytes, -d) unlimited
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 139264
max locked memory   (kbytes, -l) 32
max memory size (kbytes, -m) unlimited
open files  (-n) 65536
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) 10240
cpu time   (seconds, -t) unlimited
max user processes  (-u) 139264
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited


Is anything in there telling about file number limits? From what I
understand, a high open files limit like 65536 should be enough. I estimate
only a couple thousand part-files on HDFS being written to at once, and
around 200 on the filesystem per node.

On Wed, Feb 3, 2010 at 4:04 PM, Meng Mao  wrote:

> also, which is the ulimit that's important, the one for the user who is
> running the job, or the hadoop user that owns the Hadoop processes?
>
>
> On Tue, Feb 2, 2010 at 7:29 PM, Meng Mao  wrote:
>
>> I've been trying to run a fairly small input file (300MB) on Cloudera
>> Hadoop 0.20.1. The job I'm using probably writes to on the order of over
>> 1000 part-files at once, across the whole grid. The grid has 33 nodes in it.
>> I get the following exception in the run logs:
>>
>> 10/01/30 17:24:25 INFO mapred.JobClient:  map 100% reduce 12%
>> 10/01/30 17:24:25 INFO mapred.JobClient: Task Id :
>> attempt_201001261532_1137_r_13_0, Status : FAILED
>> java.io.EOFException
>> at java.io.DataInputStream.readByte(DataInputStream.java:250)
>> at
>> org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
>> at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
>> at org.apache.hadoop.io.Text.readString(Text.java:400)
>> at
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2869)
>> at
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2794)
>> at
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2077)
>> at
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2263)
>>
>> lots of EOFExceptions
>>
>> 10/01/30 17:24:25 INFO mapred.JobClient: Task Id :
>> attempt_201001261532_1137_r_19_0, Status : FAILED
>> java.io.IOException: Bad connect ack with firstBadLink 10.2.19.1:50010
>> at
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2871)
>> at
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2794)
>>  at
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2077)
>> at
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2263)
>>
>> 10/01/30 17:24:36 INFO mapred.JobClient:  map 100% reduce 11%
>> 10/01/30 17:24:42 INFO mapred.JobClient:  map 100% reduce 12%
>> 10/01/30 17:24:49 INFO mapred.JobClient:  map 100% reduce 13%
>> 10/01/30 17:24:55 INFO mapred.JobClient:  map 100% reduce 14%
>> 10/01/30 17:25:00 INFO mapred.JobClient:  map 100% reduce 15%
>>
>> From searching around, it seems like the most common cause of BadLink and
>> EOFExceptions is when the nodes don't have enough file descriptors set. But
>> across all the grid machines, the file-max has been set to 1573039.
>> Furthermore, we set ulimit -n to 65536 using hadoop-env.sh.
>>
>> Where else should I be looking for what's causing this?
>>
>
>


Re: EOFException and BadLink, but file descriptors number is ok?

2010-02-04 Thread Meng Mao
not sure what else I could be checking to see where the problem lies. Should
I be looking in the datanode logs? I looked briefly in there and didn't see
anything from around the time exceptions started getting reported.
lsof during the job execution? Number of open threads?

I'm at a loss here.

On Thu, Feb 4, 2010 at 2:52 PM, Meng Mao  wrote:

> I wrote a hadoop job that checks for ulimits across the nodes, and every
> node is reporting:
> core file size  (blocks, -c) 0
> data seg size   (kbytes, -d) unlimited
> scheduling priority (-e) 0
> file size   (blocks, -f) unlimited
> pending signals (-i) 139264
> max locked memory   (kbytes, -l) 32
> max memory size (kbytes, -m) unlimited
> open files  (-n) 65536
> pipe size(512 bytes, -p) 8
> POSIX message queues (bytes, -q) 819200
> real-time priority  (-r) 0
> stack size  (kbytes, -s) 10240
> cpu time   (seconds, -t) unlimited
> max user processes  (-u) 139264
> virtual memory  (kbytes, -v) unlimited
> file locks  (-x) unlimited
>
>
> Is anything in there telling about file number limits? From what I
> understand, a high open files limit like 65536 should be enough. I estimate
> only a couple thousand part-files on HDFS being written to at once, and
> around 200 on the filesystem per node.
>
> On Wed, Feb 3, 2010 at 4:04 PM, Meng Mao  wrote:
>
>> also, which is the ulimit that's important, the one for the user who is
>> running the job, or the hadoop user that owns the Hadoop processes?
>>
>>
>> On Tue, Feb 2, 2010 at 7:29 PM, Meng Mao  wrote:
>>
>>> I've been trying to run a fairly small input file (300MB) on Cloudera
>>> Hadoop 0.20.1. The job I'm using probably writes to on the order of over
>>> 1000 part-files at once, across the whole grid. The grid has 33 nodes in it.
>>> I get the following exception in the run logs:
>>>
>>> 10/01/30 17:24:25 INFO mapred.JobClient:  map 100% reduce 12%
>>> 10/01/30 17:24:25 INFO mapred.JobClient: Task Id :
>>> attempt_201001261532_1137_r_13_0, Status : FAILED
>>> java.io.EOFException
>>> at java.io.DataInputStream.readByte(DataInputStream.java:250)
>>> at
>>> org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
>>> at
>>> org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
>>> at org.apache.hadoop.io.Text.readString(Text.java:400)
>>> at
>>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2869)
>>> at
>>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2794)
>>> at
>>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2077)
>>> at
>>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2263)
>>>
>>> lots of EOFExceptions
>>>
>>> 10/01/30 17:24:25 INFO mapred.JobClient: Task Id :
>>> attempt_201001261532_1137_r_19_0, Status : FAILED
>>> java.io.IOException: Bad connect ack with firstBadLink 10.2.19.1:50010
>>> at
>>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2871)
>>> at
>>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2794)
>>>  at
>>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2077)
>>> at
>>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2263)
>>>
>>> 10/01/30 17:24:36 INFO mapred.JobClient:  map 100% reduce 11%
>>> 10/01/30 17:24:42 INFO mapred.JobClient:  map 100% reduce 12%
>>> 10/01/30 17:24:49 INFO mapred.JobClient:  map 100% reduce 13%
>>> 10/01/30 17:24:55 INFO mapred.JobClient:  map 100% reduce 14%
>>> 10/01/30 17:25:00 INFO mapred.JobClient:  map 100% reduce 15%
>>>
>>> From searching around, it seems like the most common cause of BadLink and
>>> EOFExceptions is when the nodes don't have enough file descriptors set. But
>>> across all the grid machines, the file-max has been set to 1573039.
>>> Furthermore, we set ulimit -n to 65536 using hadoop-env.sh.
>>>
>>> Where else should I be looking for what's causing this?
>>>
>>
>>
>


Re: EOFException and BadLink, but file descriptors number is ok?

2010-02-05 Thread Meng Mao
ack, after looking at the logs again, there are definitely xcievers errors.
It's set to 256!
I had thought I had cleared this a possible cause, but guess I was wrong.
Gonna retest right away.
Thanks!

On Fri, Feb 5, 2010 at 11:05 AM, Todd Lipcon  wrote:

> Yes, you're likely to see an error in the DN log. Do you see anything
> about max number of xceivers?
>
> -Todd
>
> On Thu, Feb 4, 2010 at 11:42 PM, Meng Mao  wrote:
> > not sure what else I could be checking to see where the problem lies.
> Should
> > I be looking in the datanode logs? I looked briefly in there and didn't
> see
> > anything from around the time exceptions started getting reported.
> > lsof during the job execution? Number of open threads?
> >
> > I'm at a loss here.
> >
> > On Thu, Feb 4, 2010 at 2:52 PM, Meng Mao  wrote:
> >
> >> I wrote a hadoop job that checks for ulimits across the nodes, and every
> >> node is reporting:
> >> core file size  (blocks, -c) 0
> >> data seg size   (kbytes, -d) unlimited
> >> scheduling priority (-e) 0
> >> file size   (blocks, -f) unlimited
> >> pending signals (-i) 139264
> >> max locked memory   (kbytes, -l) 32
> >> max memory size (kbytes, -m) unlimited
> >> open files  (-n) 65536
> >> pipe size(512 bytes, -p) 8
> >> POSIX message queues (bytes, -q) 819200
> >> real-time priority  (-r) 0
> >> stack size  (kbytes, -s) 10240
> >> cpu time   (seconds, -t) unlimited
> >> max user processes  (-u) 139264
> >> virtual memory  (kbytes, -v) unlimited
> >> file locks  (-x) unlimited
> >>
> >>
> >> Is anything in there telling about file number limits? From what I
> >> understand, a high open files limit like 65536 should be enough. I
> estimate
> >> only a couple thousand part-files on HDFS being written to at once, and
> >> around 200 on the filesystem per node.
> >>
> >> On Wed, Feb 3, 2010 at 4:04 PM, Meng Mao  wrote:
> >>
> >>> also, which is the ulimit that's important, the one for the user who is
> >>> running the job, or the hadoop user that owns the Hadoop processes?
> >>>
> >>>
> >>> On Tue, Feb 2, 2010 at 7:29 PM, Meng Mao  wrote:
> >>>
> >>>> I've been trying to run a fairly small input file (300MB) on Cloudera
> >>>> Hadoop 0.20.1. The job I'm using probably writes to on the order of
> over
> >>>> 1000 part-files at once, across the whole grid. The grid has 33 nodes
> in it.
> >>>> I get the following exception in the run logs:
> >>>>
> >>>> 10/01/30 17:24:25 INFO mapred.JobClient:  map 100% reduce 12%
> >>>> 10/01/30 17:24:25 INFO mapred.JobClient: Task Id :
> >>>> attempt_201001261532_1137_r_13_0, Status : FAILED
> >>>> java.io.EOFException
> >>>> at java.io.DataInputStream.readByte(DataInputStream.java:250)
> >>>> at
> >>>> org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
> >>>> at
> >>>> org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
> >>>> at org.apache.hadoop.io.Text.readString(Text.java:400)
> >>>> at
> >>>>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2869)
> >>>> at
> >>>>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2794)
> >>>> at
> >>>>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2077)
> >>>> at
> >>>>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2263)
> >>>>
> >>>> lots of EOFExceptions
> >>>>
> >>>> 10/01/30 17:24:25 INFO mapred.JobClient: Task Id :
> >>>> attempt_201001261532_1137_r_19_0, Status : FAILED
> >>>> java.io.IOException: Bad connect ack with firstBadLink
> 10.2.19.1:50010
> >>>> at
> >>>>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2871)
> >>>> at
> >>>>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2794)
> >>>>  at
> >>>>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2077)
> >>>> at
> >>>>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2263)
> >>>>
> >>>> 10/01/30 17:24:36 INFO mapred.JobClient:  map 100% reduce 11%
> >>>> 10/01/30 17:24:42 INFO mapred.JobClient:  map 100% reduce 12%
> >>>> 10/01/30 17:24:49 INFO mapred.JobClient:  map 100% reduce 13%
> >>>> 10/01/30 17:24:55 INFO mapred.JobClient:  map 100% reduce 14%
> >>>> 10/01/30 17:25:00 INFO mapred.JobClient:  map 100% reduce 15%
> >>>>
> >>>> From searching around, it seems like the most common cause of BadLink
> and
> >>>> EOFExceptions is when the nodes don't have enough file descriptors
> set. But
> >>>> across all the grid machines, the file-max has been set to 1573039.
> >>>> Furthermore, we set ulimit -n to 65536 using hadoop-env.sh.
> >>>>
> >>>> Where else should I be looking for what's causing this?
> >>>>
> >>>
> >>>
> >>
> >
>


Re: Is it possible to write each key-value pair emitted by the reducer to a different output file

2010-02-06 Thread Meng Mao
It's possible to write your own class to take better encapsulate writing of
side-effect files, but as people have said, you can run into unanticipated
issues if the number of files you try to write at once becomes high.

On Sat, Feb 6, 2010 at 3:47 AM, Udaya Lakshmi  wrote:

> Hi Amareshwari,
>
>   But this feature is not available in Hadoop 0.18.3. Is there any work
> around for this version.
> Thanks,
> Udaya.
>
> On Fri, Feb 5, 2010 at 10:49 AM, Amareshwari Sri Ramadasu <
> amar...@yahoo-inc.com> wrote:
>
> > See MultipleOutputs at
> >
> http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html
> >
> > -Amareshwari
> >
> > On 2/5/10 10:41 AM, "Udaya Lakshmi"  wrote:
> >
> > Hi,
> >  I was wondering if it is possible to write each key-value pair produced
> by
> > the reduce function to a different file. How could I open a new file in
> the
> > reduce function of the reducer? I know its possible in configure function
> > but it will write all the output that reducer to that file.
> > Thanks,
> > Udaya.
> >
> >
>


has anyone ported hadoop.lib.aggregate?

2010-02-06 Thread Meng Mao
>From what I can tell, while the ValueAggregator stuff should be useable, the
ValueAggregatorJob and ValueAggregatorJobBase classes still use the old
Mapper and Reducer signatures, and basically aren't compatible with the new
mapreduce.* API. Is that correct?

Has anyone out there done a port? We've been dragging our feet very hard
about getting away from use of deprecated API for our classes that take
advantage of the aggregate lib. It would be a huge boost if there was any
stuff we could borrow to port over.


Re: has anyone ported hadoop.lib.aggregate?

2010-02-08 Thread Meng Mao
I'm not familiar with the current roadmap for 0.20, but is there any plan to
backport the new mapreduce.lib.aggregate library into 0.20.x?

I suppose our team could attempt to use the patch ourselves, but we'd be
much more comfortable going with a standard release if at all possible.

On Sun, Feb 7, 2010 at 10:41 PM, Amareshwari Sri Ramadasu <
amar...@yahoo-inc.com> wrote:

> Org.apache.hadoop.mapred.lib.aggregate has been ported to new api in branch
> 0.21.
> See http://issues.apache.org/jira/browse/MAPREDUCE-358
>
> Thanks
> Amareshwari
>
> On 2/7/10 5:34 AM, "Meng Mao"  wrote:
>
> From what I can tell, while the ValueAggregator stuff should be useable,
> the
> ValueAggregatorJob and ValueAggregatorJobBase classes still use the old
> Mapper and Reducer signatures, and basically aren't compatible with the new
> mapreduce.* API. Is that correct?
>
> Has anyone out there done a port? We've been dragging our feet very hard
> about getting away from use of deprecated API for our classes that take
> advantage of the aggregate lib. It would be a huge boost if there was any
> stuff we could borrow to port over.
>
>


Re: duplicate tasks getting started/killed

2010-02-09 Thread Meng Mao
Can you confirm that duplication is happening in the case that one attempt
gets underway but killed before the other's completion?
I believe by default (though I'm not sure for Pig), each attempt's output is
first isolated to a path keyed to its attempt id, and only committed when
one and only one attempt is complete.

On Tue, Feb 9, 2010 at 9:52 PM, prasenjit mukherjee <
pmukher...@quattrowireless.com> wrote:

> Any thoughts on this problem ? I am using a DEFINE command ( in PIG )
> and hence the actions are not idempotent. Because of which duplicate
> execution does have an affect on my results. Any way to overcome that
> ?
>
> On Tue, Feb 9, 2010 at 9:26 PM, prasenjit mukherjee
>  wrote:
> > But the second attempted job got killed even before the first one was
> > completed. How can we explain that.
> >
> > On Tue, Feb 9, 2010 at 7:38 PM, Eric Sammer  wrote:
> >> Prasen:
> >>
> >> This is most likely speculative execution. Hadoop fires up multiple
> >> attempts for the same task and lets them "race" to see which finishes
> >> first and then kills the others. This is meant to speed things along.
> >>
> >> Speculative execution is on by default, but can be disabled. See the
> >> configuration reference for mapred-*.xml.
> >>
> >> On 2/9/10 9:03 AM, prasenjit mukherjee wrote:
> >>> Sometimes for the same task I see that a duplicate task gets run on a
> >>> different machine and gets killed later. Not always but sometimes. Any
> >>> reason why duplicate tasks get run. I thought tasks are duplicated
> >>> only if  either the first attempt exits( exceptions etc ) or  exceeds
> >>> mapred.task.timeout. In this case none of them happens. As can be seen
> >>> from timestamp, the second attempt starts even though the first
> >>> attempt is still running ( only for 1 minute ).
> >>>
> >>> Any explanation ?
> >>>
> >>> attempt_201002090552_0009_m_01_0
> >>> /default-rack/ip-10-242-142-193.ec2.internal
> >>> SUCCEEDED
> >>> 100.00%
> >>> 9-Feb-2010 07:04:37
> >>> 9-Feb-2010 07:07:00 (2mins, 23sec)
> >>>
> >>> attempt_201002090552_0009_m_01_1
> >>> Task attempt: /default-rack/ip-10-212-147-129.ec2.internal
> >>> Cleanup Attempt: /default-rack/ip-10-212-147-129.ec2.internal
> >>> KILLED
> >>> 100.00%
> >>> 9-Feb-2010 07:05:34
> >>> 9-Feb-2010 07:07:10 (1mins, 36sec)
> >>>
> >>>  -Prasen
> >>>
> >>
> >>
> >> --
> >> Eric Sammer
> >> e...@lifeless.net
> >> http://esammer.blogspot.com
> >>
> >
>


Re: duplicate tasks getting started/killed

2010-02-10 Thread Meng Mao
That cleanup action looks promising in terms of preventing duplication. What
I'd meant was, could you ever find an instance where the results of your
DEFINE statement were made incorrect by multiple attempts?

On Wed, Feb 10, 2010 at 5:05 AM, prasenjit mukherjee <
pmukher...@quattrowireless.com> wrote:

> Below is the log :
>
> attempt_201002090552_0009_m_01_0
>/default-rack/ip-10-242-142-193.ec2.internal
>SUCCEEDED
>100.00%
>9-Feb-2010 07:04:37
>9-Feb-2010 07:07:00 (2mins, 23sec)
>
>  attempt_201002090552_0009_m_01_1
>Task attempt: /default-rack/ip-10-212-147-129.ec2.internal
>Cleanup Attempt: /default-rack/ip-10-212-147-129.ec2.internal
>KILLED
>100.00%
>9-Feb-2010 07:05:34
>9-Feb-2010 07:07:10 (1mins,
>
> So, here is the time-line for both the  attempts :
> attempt_1's start_time=07:04:37 , end_time=07:07:00
> attempt_2's  start_time=07:05:34,   end_time=07:07:10
>
> -Thanks,
> Prasen
>
> On Wed, Feb 10, 2010 at 1:15 PM, Meng Mao  wrote:
> > Can you confirm that duplication is happening in the case that one
> attempt
> > gets underway but killed before the other's completion?
> > I believe by default (though I'm not sure for Pig), each attempt's output
> is
> > first isolated to a path keyed to its attempt id, and only committed when
> > one and only one attempt is complete.
> >
> > On Tue, Feb 9, 2010 at 9:52 PM, prasenjit mukherjee <
> > pmukher...@quattrowireless.com> wrote:
> >
> >> Any thoughts on this problem ? I am using a DEFINE command ( in PIG )
> >> and hence the actions are not idempotent. Because of which duplicate
> >> execution does have an affect on my results. Any way to overcome that
> >> ?
>


Re: duplicate tasks getting started/killed

2010-02-10 Thread Meng Mao
Right, so have you ever seen your non-idempotent DEFINE command have an
incorrect result? That would essentially point to duplicate attempts
behaving badly.

To your second question -- I think spec exec assumes that not all machines
run at the same speed. If a machine is free (not used for some other
attempt), then Hadoop might schedule an attempt right away on it. It's
possible, depending on the granularity of the work or issues with the
original attempt, that this later attempt finishes first, and thus becomes
the committing attempt.


On Wed, Feb 10, 2010 at 12:10 PM, prasenjit mukherjee <
pmukher...@quattrowireless.com> wrote:

> Correctness of the results actually  depends on my DEFINE command. If
> the commands are  idempotent ( which is not in my case ) then I
> believe it wont have any affect on the results, otherwise it will
> indeed make the results incorrect. For example if my command fetches
> some data and appends to a mysql db in that case it is undesirable.
>
> I have a question though, why in the first place the second attempt
> was kicked off just seconds after the first one. I mean what is the
> rationale behind kicking off a second attempt immediately afterwards ?
> Baffling...
>
> -Prasen
>
> On Wed, Feb 10, 2010 at 10:32 PM, Meng Mao  wrote:
> > That cleanup action looks promising in terms of preventing duplication.
> What
> > I'd meant was, could you ever find an instance where the results of your
> > DEFINE statement were made incorrect by multiple attempts?
> >
> > On Wed, Feb 10, 2010 at 5:05 AM, prasenjit mukherjee <
> > pmukher...@quattrowireless.com> wrote:
> >
> >> Below is the log :
> >>
> >> attempt_201002090552_0009_m_01_0
> >>/default-rack/ip-10-242-142-193.ec2.internal
> >>SUCCEEDED
> >>100.00%
> >>9-Feb-2010 07:04:37
> >>9-Feb-2010 07:07:00 (2mins, 23sec)
> >>
> >>  attempt_201002090552_0009_m_01_1
> >>Task attempt: /default-rack/ip-10-212-147-129.ec2.internal
> >>Cleanup Attempt: /default-rack/ip-10-212-147-129.ec2.internal
> >>KILLED
> >>100.00%
> >>9-Feb-2010 07:05:34
> >>9-Feb-2010 07:07:10 (1mins,
> >>
> >> So, here is the time-line for both the  attempts :
> >> attempt_1's start_time=07:04:37 , end_time=07:07:00
> >> attempt_2's  start_time=07:05:34,
> end_time=07:07:10
> >>
> >> -Thanks,
> >> Prasen
> >>
> >> On Wed, Feb 10, 2010 at 1:15 PM, Meng Mao  wrote:
> >> > Can you confirm that duplication is happening in the case that one
> >> attempt
> >> > gets underway but killed before the other's completion?
> >> > I believe by default (though I'm not sure for Pig), each attempt's
> output
> >> is
> >> > first isolated to a path keyed to its attempt id, and only committed
> when
> >> > one and only one attempt is complete.
> >> >
> >> > On Tue, Feb 9, 2010 at 9:52 PM, prasenjit mukherjee <
> >> > pmukher...@quattrowireless.com> wrote:
> >> >
> >> >> Any thoughts on this problem ? I am using a DEFINE command ( in PIG )
> >> >> and hence the actions are not idempotent. Because of which duplicate
> >> >> execution does have an affect on my results. Any way to overcome that
> >> >> ?
> >>
> >
>


streaming: -f with a zip file?

2010-04-14 Thread Meng Mao
We are trying to pass into a streaming program a zip file. An invocation
looks like this:

/usr/local/hadoop/bin/hadoop jar
/usr/local/hadoop/contrib/streaming/hadoop-0.20.1+152-streaming.jar ...
 -file 'myfile.zip'
Within the program, it seems like a ZipFile invocation can't locate the zip
file.

Is there anything that might be problematic with a zip file when used with a
-f?


Re: operation of DistributedCache following manual deletion of cached files?

2011-09-27 Thread Meng Mao
Who is in charge of getting the files there for the first time? The
addCacheFile call in the mapreduce job? Or a manual setup by the
user/operator?

On Tue, Sep 27, 2011 at 11:35 AM, Robert Evans  wrote:

> The problem is the step 4 in the breaking sequence.  Currently the
> TaskTracker never looks at the disk to know if a file is in the distributed
> cache or not.  It assumes that if it downloaded the file and did not delete
> that file itself then the file is still there in its original form.  It does
> not know that you deleted those files, or if wrote to the files, or in any
> way altered those files.  In general you should not be modifying those
> files.  This is not only because it messes up the tracking of those files,
> but because other jobs running concurrently with your task may also be using
> those files.
>
> --Bobby Evans
>
>
> On 9/26/11 4:40 PM, "Meng Mao"  wrote:
>
> Let's frame the issue in another way. I'll describe a sequence of Hadoop
> operations that I think should work, and then I'll get into what we did and
> how it failed.
>
> Normal sequence:
> 1. have files to be cached in HDFS
> 2. Run Job A, which specifies those files to be put into DistributedCache
> space
> 3. job runs fine
> 4. Run Job A some time later. job runs fine again.
>
> Breaking sequence:
> 1. have files to be cached in HDFS
> 2. Run Job A, which specifies those files to be put into DistributedCache
> space
> 3. job runs fine
> 4. Manually delete cached files out of local disk on worker nodes
> 5. Run Job A again, expect it to push out cache copies as needed.
> 6. job fails because the cache copies didn't get distributed
>
> Should this second sequence have broken?
>
> On Fri, Sep 23, 2011 at 3:09 PM, Meng Mao  wrote:
>
> > Hmm, I must have really missed an important piece somewhere. This is from
> > the MapRed tutorial text:
> >
> > "DistributedCache is a facility provided by the Map/Reduce framework to
> > cache files (text, archives, jars and so on) needed by applications.
> >
> > Applications specify the files to be cached via urls (hdfs://) in the
> > JobConf. The DistributedCache* assumes that the files specified via
> > hdfs:// urls are already present on the FileSystem.*
> >
> > *The framework will copy the necessary files to the slave node before any
> > tasks for the job are executed on that node*. Its efficiency stems from
> > the fact that the files are only copied once per job and the ability to
> > cache archives which are un-archived on the slaves."
> >
> >
> > After some close reading, the two bolded pieces seem to be in
> contradiction
> > of each other? I'd always that addCacheFile() would perform the 2nd
> bolded
> > statement. If that sentence is true, then I still don't have an
> explanation
> > of why our job didn't correctly push out new versions of the cache files
> > upon the startup and execution of JobConfiguration. We deleted them
> before
> > our job started, not during.
> >
> > On Fri, Sep 23, 2011 at 9:35 AM, Robert Evans 
> wrote:
> >
> >> Meng Mao,
> >>
> >> The way the distributed cache is currently written, it does not verify
> the
> >> integrity of the cache files at all after they are downloaded.  It just
> >> assumes that if they were downloaded once they are still there and in
> the
> >> proper shape.  It might be good to file a JIRA to add in some sort of
> check.
> >>  Another thing to do is that the distributed cache also includes the
> time
> >> stamp of the original file, just incase you delete the file and then use
> a
> >> different version.  So if you want it to force a download again you can
> copy
> >> it delete the original and then move it back to what it was before.
> >>
> >> --Bobby Evans
> >>
> >> On 9/23/11 1:57 AM, "Meng Mao"  wrote:
> >>
> >> We use the DistributedCache class to distribute a few lookup files for
> our
> >> jobs. We have been aggressively deleting failed task attempts' leftover
> >> data
> >> , and our script accidentally deleted the path to our distributed cache
> >> files.
> >>
> >> Our task attempt leftover data was here [per node]:
> >> /hadoop/hadoop-metadata/cache/mapred/local/
> >> and our distributed cache path was:
> >> hadoop/hadoop-metadata/cache/mapred/local/taskTracker/archive/
> >> We deleted this path by accident.
> >>
> >> Does this latter path look normal? I'm not that familiar with
> >> Distri

Re: operation of DistributedCache following manual deletion of cached files?

2011-09-27 Thread Meng Mao
>From that interpretation, it then seems like it would be safe to delete the
files between completed runs? How could it distinguish between the files
having been deleted and their not having been downloaded from a previous
run?

On Tue, Sep 27, 2011 at 12:25 PM, Robert Evans  wrote:

> addCacheFile sets a config value in your jobConf that indicates which files
> your particular job depends on.  When the TaskTracker is assigned to run
> part of your job (map task or reduce task), it will download your jobConf,
> read it in, and then download the files listed in the conf, if it has not
> already downloaded them from a previous run.  Then it will set up the
> directory structure for your job, possibly adding in symbolic links to these
> files in the working directory for your task.  After that it will launch
> your task.
>
> --Bobby Evans
>
> On 9/27/11 11:17 AM, "Meng Mao"  wrote:
>
> Who is in charge of getting the files there for the first time? The
> addCacheFile call in the mapreduce job? Or a manual setup by the
> user/operator?
>
> On Tue, Sep 27, 2011 at 11:35 AM, Robert Evans 
> wrote:
>
> > The problem is the step 4 in the breaking sequence.  Currently the
> > TaskTracker never looks at the disk to know if a file is in the
> distributed
> > cache or not.  It assumes that if it downloaded the file and did not
> delete
> > that file itself then the file is still there in its original form.  It
> does
> > not know that you deleted those files, or if wrote to the files, or in
> any
> > way altered those files.  In general you should not be modifying those
> > files.  This is not only because it messes up the tracking of those
> files,
> > but because other jobs running concurrently with your task may also be
> using
> > those files.
> >
> > --Bobby Evans
> >
> >
> > On 9/26/11 4:40 PM, "Meng Mao"  wrote:
> >
> > Let's frame the issue in another way. I'll describe a sequence of Hadoop
> > operations that I think should work, and then I'll get into what we did
> and
> > how it failed.
> >
> > Normal sequence:
> > 1. have files to be cached in HDFS
> > 2. Run Job A, which specifies those files to be put into DistributedCache
> > space
> > 3. job runs fine
> > 4. Run Job A some time later. job runs fine again.
> >
> > Breaking sequence:
> > 1. have files to be cached in HDFS
> > 2. Run Job A, which specifies those files to be put into DistributedCache
> > space
> > 3. job runs fine
> > 4. Manually delete cached files out of local disk on worker nodes
> > 5. Run Job A again, expect it to push out cache copies as needed.
> > 6. job fails because the cache copies didn't get distributed
> >
> > Should this second sequence have broken?
> >
> > On Fri, Sep 23, 2011 at 3:09 PM, Meng Mao  wrote:
> >
> > > Hmm, I must have really missed an important piece somewhere. This is
> from
> > > the MapRed tutorial text:
> > >
> > > "DistributedCache is a facility provided by the Map/Reduce framework to
> > > cache files (text, archives, jars and so on) needed by applications.
> > >
> > > Applications specify the files to be cached via urls (hdfs://) in the
> > > JobConf. The DistributedCache* assumes that the files specified via
> > > hdfs:// urls are already present on the FileSystem.*
> > >
> > > *The framework will copy the necessary files to the slave node before
> any
> > > tasks for the job are executed on that node*. Its efficiency stems from
> > > the fact that the files are only copied once per job and the ability to
> > > cache archives which are un-archived on the slaves."
> > >
> > >
> > > After some close reading, the two bolded pieces seem to be in
> > contradiction
> > > of each other? I'd always that addCacheFile() would perform the 2nd
> > bolded
> > > statement. If that sentence is true, then I still don't have an
> > explanation
> > > of why our job didn't correctly push out new versions of the cache
> files
> > > upon the startup and execution of JobConfiguration. We deleted them
> > before
> > > our job started, not during.
> > >
> > > On Fri, Sep 23, 2011 at 9:35 AM, Robert Evans 
> > wrote:
> > >
> > >> Meng Mao,
> > >>
> > >> The way the distributed cache is currently written, it does not verify
> > the
> > >> integrity of the cache files at all after they are downloaded.  It
> just
> > >> 

Re: operation of DistributedCache following manual deletion of cached files?

2011-09-27 Thread Meng Mao
I'm not concerned about disk space usage -- the script we used that deleted
the taskTracker cache path has been fixed not to do so.

I'm curious about the exact behavior of jobs that use DistributedCache
files. Again, it seems safe from your description to delete files between
completed runs. How could the job or the taskTracker distinguish between the
files having been deleted and their not having been downloaded from a
previous run of the job? Is it state in memory that the taskTracker
maintains?


On Tue, Sep 27, 2011 at 1:44 PM, Robert Evans  wrote:

> If you are never ever going to use that file again for any map/reduce task
> in the future then yes you can delete it, but I would not recommend it.  If
> you want to reduce the amount of space that is used by the distributed cache
> there is a config parameter for that.
>
> "local.cache.size"  it is the number of bytes per drive that will be used
> for storing data in the distributed cache.   This is in 0.20 for hadoop I am
> not sure if it has changed at all for trunk.  It is not documented as far as
> I can tell, and it defaults to 10GB.
>
> --Bobby Evans
>
>
> On 9/27/11 12:04 PM, "Meng Mao"  wrote:
>
> From that interpretation, it then seems like it would be safe to delete the
> files between completed runs? How could it distinguish between the files
> having been deleted and their not having been downloaded from a previous
> run?
>
> On Tue, Sep 27, 2011 at 12:25 PM, Robert Evans 
> wrote:
>
> > addCacheFile sets a config value in your jobConf that indicates which
> files
> > your particular job depends on.  When the TaskTracker is assigned to run
> > part of your job (map task or reduce task), it will download your
> jobConf,
> > read it in, and then download the files listed in the conf, if it has not
> > already downloaded them from a previous run.  Then it will set up the
> > directory structure for your job, possibly adding in symbolic links to
> these
> > files in the working directory for your task.  After that it will launch
> > your task.
> >
> > --Bobby Evans
> >
> > On 9/27/11 11:17 AM, "Meng Mao"  wrote:
> >
> > Who is in charge of getting the files there for the first time? The
> > addCacheFile call in the mapreduce job? Or a manual setup by the
> > user/operator?
> >
> > On Tue, Sep 27, 2011 at 11:35 AM, Robert Evans 
> > wrote:
> >
> > > The problem is the step 4 in the breaking sequence.  Currently the
> > > TaskTracker never looks at the disk to know if a file is in the
> > distributed
> > > cache or not.  It assumes that if it downloaded the file and did not
> > delete
> > > that file itself then the file is still there in its original form.  It
> > does
> > > not know that you deleted those files, or if wrote to the files, or in
> > any
> > > way altered those files.  In general you should not be modifying those
> > > files.  This is not only because it messes up the tracking of those
> > files,
> > > but because other jobs running concurrently with your task may also be
> > using
> > > those files.
> > >
> > > --Bobby Evans
> > >
> > >
> > > On 9/26/11 4:40 PM, "Meng Mao"  wrote:
> > >
> > > Let's frame the issue in another way. I'll describe a sequence of
> Hadoop
> > > operations that I think should work, and then I'll get into what we did
> > and
> > > how it failed.
> > >
> > > Normal sequence:
> > > 1. have files to be cached in HDFS
> > > 2. Run Job A, which specifies those files to be put into
> DistributedCache
> > > space
> > > 3. job runs fine
> > > 4. Run Job A some time later. job runs fine again.
> > >
> > > Breaking sequence:
> > > 1. have files to be cached in HDFS
> > > 2. Run Job A, which specifies those files to be put into
> DistributedCache
> > > space
> > > 3. job runs fine
> > > 4. Manually delete cached files out of local disk on worker nodes
> > > 5. Run Job A again, expect it to push out cache copies as needed.
> > > 6. job fails because the cache copies didn't get distributed
> > >
> > > Should this second sequence have broken?
> > >
> > > On Fri, Sep 23, 2011 at 3:09 PM, Meng Mao  wrote:
> > >
> > > > Hmm, I must have really missed an important piece somewhere. This is
> > from
> > > > the MapRed tutorial text:
> > > >
> > > > "DistributedCache is a facility provided by the M

Re: operation of DistributedCache following manual deletion of cached files?

2011-09-27 Thread Meng Mao
So the proper description of how DistributedCache normally works is:

1. have files to be cached sitting around in HDFS
2. Run Job A, which specifies those files to be put into DistributedCache
space. Each worker node copies the to-be-cached files from HDFS to local
disk, but more importantly, the TaskTracker acknowledges this distribution
and marks somewhere the fact that these files are cached for the first (and
only time)
3. job runs fine
4. Run Job A some time later. TaskTracker simply assumes (by looking at its
memory) that the files are still cached. The tasks on the workers, on this
second call to addCacheFile, don't actually copy the files from HDFS to
local disk again, but instead accept TaskTracker's word that they're still
there. Because the files actually still exist, the workers run fine and the
job finishes normally.

Is that a correct interpretation? If so, the caution, then, must be that if
you accidentally deleted the local disk cache file copies, you either
repopulate them (as well as their crc checksums) or you restart the
TaskTracker?



On Tue, Sep 27, 2011 at 3:03 PM, Robert Evans  wrote:

> Yes, all of the state for the task tracker is in memory.  It never looks at
> the disk to see what is there, it only maintains the state in memory.
>
> --bobby Evans
>
>
> On 9/27/11 1:00 PM, "Meng Mao"  wrote:
>
> I'm not concerned about disk space usage -- the script we used that deleted
> the taskTracker cache path has been fixed not to do so.
>
> I'm curious about the exact behavior of jobs that use DistributedCache
> files. Again, it seems safe from your description to delete files between
> completed runs. How could the job or the taskTracker distinguish between
> the
> files having been deleted and their not having been downloaded from a
> previous run of the job? Is it state in memory that the taskTracker
> maintains?
>
>
> On Tue, Sep 27, 2011 at 1:44 PM, Robert Evans  wrote:
>
> > If you are never ever going to use that file again for any map/reduce
> task
> > in the future then yes you can delete it, but I would not recommend it.
>  If
> > you want to reduce the amount of space that is used by the distributed
> cache
> > there is a config parameter for that.
> >
> > "local.cache.size"  it is the number of bytes per drive that will be used
> > for storing data in the distributed cache.   This is in 0.20 for hadoop I
> am
> > not sure if it has changed at all for trunk.  It is not documented as far
> as
> > I can tell, and it defaults to 10GB.
> >
> > --Bobby Evans
> >
> >
> > On 9/27/11 12:04 PM, "Meng Mao"  wrote:
> >
> > From that interpretation, it then seems like it would be safe to delete
> the
> > files between completed runs? How could it distinguish between the files
> > having been deleted and their not having been downloaded from a previous
> > run?
> >
> > On Tue, Sep 27, 2011 at 12:25 PM, Robert Evans 
> > wrote:
> >
> > > addCacheFile sets a config value in your jobConf that indicates which
> > files
> > > your particular job depends on.  When the TaskTracker is assigned to
> run
> > > part of your job (map task or reduce task), it will download your
> > jobConf,
> > > read it in, and then download the files listed in the conf, if it has
> not
> > > already downloaded them from a previous run.  Then it will set up the
> > > directory structure for your job, possibly adding in symbolic links to
> > these
> > > files in the working directory for your task.  After that it will
> launch
> > > your task.
> > >
> > > --Bobby Evans
> > >
> > > On 9/27/11 11:17 AM, "Meng Mao"  wrote:
> > >
> > > Who is in charge of getting the files there for the first time? The
> > > addCacheFile call in the mapreduce job? Or a manual setup by the
> > > user/operator?
> > >
> > > On Tue, Sep 27, 2011 at 11:35 AM, Robert Evans 
> > > wrote:
> > >
> > > > The problem is the step 4 in the breaking sequence.  Currently the
> > > > TaskTracker never looks at the disk to know if a file is in the
> > > distributed
> > > > cache or not.  It assumes that if it downloaded the file and did not
> > > delete
> > > > that file itself then the file is still there in its original form.
>  It
> > > does
> > > > not know that you deleted those files, or if wrote to the files, or
> in
> > > any
> > > > way altered those files.  In general you should not be modifying
> those
> > > > files.  This 

ways to expand hadoop.tmp.dir capacity?

2011-10-04 Thread Meng Mao
Currently, we've got defined:
  
 hadoop.tmp.dir
 /hadoop/hadoop-metadata/cache/
  

In our experiments with SOLR, the intermediate files are so large that they
tend to blow out disk space and fail (and annoyingly leave behind their huge
failed attempts). We've had issues with it in the past, but we're having
real problems with SOLR if we can't comfortably get more space out of
hadoop.tmp.dir somehow.

1) It seems we never set *mapred.system.dir* to anything special, so it's
defaulting to ${hadoop.tmp.dir}/mapred/system.
Is this a problem? The docs seem to recommend against it when hadoop.tmp.dir
had ${user.name} in it, which ours doesn't.

1b) The doc says mapred.system.dir is "the in-HDFS path to shared MapReduce
system files." To me, that means there's must be 1 single path for
mapred.system.dir, which sort of forces hadoop.tmp.dir to be 1 path.
Otherwise, one might imagine that you could specify multiple paths to store
hadoop.tmp.dir, like you can for dfs.data.dir. Is this a correct
interpretation? -- hadoop.tmp.dir could live on multiple paths/disks if
there were more mapping/lookup between mapred.system.dir and hadoop.tmp.dir?

2) IIRC, there's a -D switch for supplying config name/value pairs into
indivdiual jobs. Does such a switch exist? Googling for single letters is
fruitless. If we had a path on our workers with more space (in our case,
another hard disk), could we simply pass that path in as hadoop.tmp.dir for
our SOLR jobs? Without incurring any consistency issues on future jobs that
might use the SOLR output on HDFS?


Re: ways to expand hadoop.tmp.dir capacity?

2011-10-04 Thread Meng Mao
I just read this:

MapReduce performance can also be improved by distributing the temporary
data generated by MapReduce tasks across multiple disks on each machine:

  
mapred.local.dir

/d1/mapred/local,/d2/mapred/local,/d3/mapred/local,/d4/mapred/local
true

  

Given that the default value is ${hadoop.tmp.dir}/mapred/local, would the
expanded capacity we're looking for be as easily accomplished as by defining
mapred.local.dir to span multiple disks? Setting aside the issue of temp
files so big that they could still fill a whole disk.

On Wed, Oct 5, 2011 at 1:32 AM, Meng Mao  wrote:

> Currently, we've got defined:
>   
>  hadoop.tmp.dir
>  /hadoop/hadoop-metadata/cache/
>   
>
> In our experiments with SOLR, the intermediate files are so large that they
> tend to blow out disk space and fail (and annoyingly leave behind their huge
> failed attempts). We've had issues with it in the past, but we're having
> real problems with SOLR if we can't comfortably get more space out of
> hadoop.tmp.dir somehow.
>
> 1) It seems we never set *mapred.system.dir* to anything special, so it's
> defaulting to ${hadoop.tmp.dir}/mapred/system.
> Is this a problem? The docs seem to recommend against it when
> hadoop.tmp.dir had ${user.name} in it, which ours doesn't.
>
> 1b) The doc says mapred.system.dir is "the in-HDFS path to shared MapReduce
> system files." To me, that means there's must be 1 single path for
> mapred.system.dir, which sort of forces hadoop.tmp.dir to be 1 path.
> Otherwise, one might imagine that you could specify multiple paths to store
> hadoop.tmp.dir, like you can for dfs.data.dir. Is this a correct
> interpretation? -- hadoop.tmp.dir could live on multiple paths/disks if
> there were more mapping/lookup between mapred.system.dir and hadoop.tmp.dir?
>
> 2) IIRC, there's a -D switch for supplying config name/value pairs into
> indivdiual jobs. Does such a switch exist? Googling for single letters is
> fruitless. If we had a path on our workers with more space (in our case,
> another hard disk), could we simply pass that path in as hadoop.tmp.dir for
> our SOLR jobs? Without incurring any consistency issues on future jobs that
> might use the SOLR output on HDFS?
>
>
>
>
>


Re: ways to expand hadoop.tmp.dir capacity?

2011-10-10 Thread Meng Mao
So the only way we can expand to multiple mapred.local.dir paths is to
config our site.xml and to restart the DataNode?

On Mon, Oct 10, 2011 at 9:36 AM, Marcos Luis Ortiz Valmaseda <
marcosluis2...@googlemail.com> wrote:

> 2011/10/9 Harsh J 
>
> > Hello Meng,
> >
> > On Wed, Oct 5, 2011 at 11:02 AM, Meng Mao  wrote:
> > > Currently, we've got defined:
> > >  
> > > hadoop.tmp.dir
> > > /hadoop/hadoop-metadata/cache/
> > >  
> > >
> > > In our experiments with SOLR, the intermediate files are so large that
> > they
> > > tend to blow out disk space and fail (and annoyingly leave behind their
> > huge
> > > failed attempts). We've had issues with it in the past, but we're
> having
> > > real problems with SOLR if we can't comfortably get more space out of
> > > hadoop.tmp.dir somehow.
> > >
> > > 1) It seems we never set *mapred.system.dir* to anything special, so
> it's
> > > defaulting to ${hadoop.tmp.dir}/mapred/system.
> > > Is this a problem? The docs seem to recommend against it when
> > hadoop.tmp.dir
> > > had ${user.name} in it, which ours doesn't.
> >
> > The {mapred.system.dir} is a HDFS location, and you shouldn't really
> > be worried about it as much.
> >
> > > 1b) The doc says mapred.system.dir is "the in-HDFS path to shared
> > MapReduce
> > > system files." To me, that means there's must be 1 single path for
> > > mapred.system.dir, which sort of forces hadoop.tmp.dir to be 1 path.
> > > Otherwise, one might imagine that you could specify multiple paths to
> > store
> > > hadoop.tmp.dir, like you can for dfs.data.dir. Is this a correct
> > > interpretation? -- hadoop.tmp.dir could live on multiple paths/disks if
> > > there were more mapping/lookup between mapred.system.dir and
> > hadoop.tmp.dir?
> >
> > {hadoop.tmp.dir} is indeed reused for {mapred.system.dir}, although it
> > is on HDFS, and hence is confusing, but there should just be one
> > mapred.system.dir, yes.
> >
> > Also, the config {hadoop.tmp.dir} doesn't support > 1 path. What you
> > need here is a proper {mapred.local.dir} configuration.
> >
> > > 2) IIRC, there's a -D switch for supplying config name/value pairs into
> > > indivdiual jobs. Does such a switch exist? Googling for single letters
> is
> > > fruitless. If we had a path on our workers with more space (in our
> case,
> > > another hard disk), could we simply pass that path in as hadoop.tmp.dir
> > for
> > > our SOLR jobs? Without incurring any consistency issues on future jobs
> > that
> > > might use the SOLR output on HDFS?
> >
> > Only a few parameters of a job are user-configurable. Stuff like
> > hadoop.tmp.dir and mapred.local.dir are not override-able by user set
> > parameters as they are server side configurations (static).
> >
> > > Given that the default value is ${hadoop.tmp.dir}/mapred/local, would
> the
> > > expanded capacity we're looking for be as easily accomplished as by
> > defining
> > > mapred.local.dir to span multiple disks? Setting aside the issue of
> temp
> > > files so big that they could still fill a whole disk.
> >
> > 1. You can set mapred.local.dir independent of hadoop.tmp.dir
> > 2. mapred.local.dir can have comma separated values in it, spanning
> > multiple disks
> > 3. Intermediate outputs may spread across these disks but shall not
> > consume > 1 disk at a time. So if your largest configured disk is 500
> > GB while the total set of them may be 2 TB, then your intermediate
> > output size can't really exceed 500 GB, cause only one disk is
> > consumed by one task -- the multiple disks are for better I/O
> > parallelism between tasks.
> >
> > Know that hadoop.tmp.dir is a convenience property, for quickly
> > starting up dev clusters and such. For a proper configuration, you
> > need to remove dependency on it (almost nothing uses hadoop.tmp.dir on
> > the server side, once the right properties are configured - ex:
> > dfs.data.dir, dfs.name.dir, fs.checkpoint.dir, mapred.local.dir, etc.)
> >
> > --
> > Harsh J
> >
>
> Here it's a excellent explanation how to install Apache Hadoop manually,
> and
> Lars explains this very good.
>
>
> http://blog.lars-francke.de/2011/01/26/setting-up-a-hadoop-cluster-part-1-manual-installation/
>
> Regards
>
> --
> Marcos Luis Ortíz Valmaseda
>  Linux Infrastructure Engineer
>  Linux User # 418229
>  http://marcosluis2186.posterous.com
>  http://www.linkedin.com/in/marcosluis2186
>  Twitter: @marcosluis2186
>


Re: ways to expand hadoop.tmp.dir capacity?

2011-10-25 Thread Meng Mao
If we do that rolling restart scenario, will we have a completely quiet
migration? That is, if no jobs are running during the rolling restart of
TaskTrackers, then we will end up with expanded capacity with no risk of
data inconsistency in the cache paths?

Our data nodes already use multiple disks for storage. It was an early lack
of foresight that brings us to the present day where mapred.local.dir isn't
"distributed."

That said, one of our problems is that the SOLR index files we're building
are just plain huge. Even with expand disk capacity, I think we'd still run
into disk space issues. Is this something that's been generally reported for
SOLR hadoop jobs?

On Mon, Oct 10, 2011 at 10:08 PM, Harsh J  wrote:

> Meng,
>
> Yes, configure the mapred-site.xml (mapred.local.dir) to add the
> property and roll-restart your TaskTrackers. If you'd like to expand
> your DataNode to multiple disks as well (helps HDFS I/O greatly), do
> the same with hdfs-site.xml (dfs.data.dir) and perform the same
> rolling restart of DataNodes.
>
> Ensure that for each service, the directories you create are owned by
> the same user as the one running the process. This will help avoid
> permission nightmares.
>
> On Tue, Oct 11, 2011 at 3:58 AM, Meng Mao  wrote:
> > So the only way we can expand to multiple mapred.local.dir paths is to
> > config our site.xml and to restart the DataNode?
> >
> > On Mon, Oct 10, 2011 at 9:36 AM, Marcos Luis Ortiz Valmaseda <
> > marcosluis2...@googlemail.com> wrote:
> >
> >> 2011/10/9 Harsh J 
> >>
> >> > Hello Meng,
> >> >
> >> > On Wed, Oct 5, 2011 at 11:02 AM, Meng Mao  wrote:
> >> > > Currently, we've got defined:
> >> > >  
> >> > > hadoop.tmp.dir
> >> > > /hadoop/hadoop-metadata/cache/
> >> > >  
> >> > >
> >> > > In our experiments with SOLR, the intermediate files are so large
> that
> >> > they
> >> > > tend to blow out disk space and fail (and annoyingly leave behind
> their
> >> > huge
> >> > > failed attempts). We've had issues with it in the past, but we're
> >> having
> >> > > real problems with SOLR if we can't comfortably get more space out
> of
> >> > > hadoop.tmp.dir somehow.
> >> > >
> >> > > 1) It seems we never set *mapred.system.dir* to anything special, so
> >> it's
> >> > > defaulting to ${hadoop.tmp.dir}/mapred/system.
> >> > > Is this a problem? The docs seem to recommend against it when
> >> > hadoop.tmp.dir
> >> > > had ${user.name} in it, which ours doesn't.
> >> >
> >> > The {mapred.system.dir} is a HDFS location, and you shouldn't really
> >> > be worried about it as much.
> >> >
> >> > > 1b) The doc says mapred.system.dir is "the in-HDFS path to shared
> >> > MapReduce
> >> > > system files." To me, that means there's must be 1 single path for
> >> > > mapred.system.dir, which sort of forces hadoop.tmp.dir to be 1 path.
> >> > > Otherwise, one might imagine that you could specify multiple paths
> to
> >> > store
> >> > > hadoop.tmp.dir, like you can for dfs.data.dir. Is this a correct
> >> > > interpretation? -- hadoop.tmp.dir could live on multiple paths/disks
> if
> >> > > there were more mapping/lookup between mapred.system.dir and
> >> > hadoop.tmp.dir?
> >> >
> >> > {hadoop.tmp.dir} is indeed reused for {mapred.system.dir}, although it
> >> > is on HDFS, and hence is confusing, but there should just be one
> >> > mapred.system.dir, yes.
> >> >
> >> > Also, the config {hadoop.tmp.dir} doesn't support > 1 path. What you
> >> > need here is a proper {mapred.local.dir} configuration.
> >> >
> >> > > 2) IIRC, there's a -D switch for supplying config name/value pairs
> into
> >> > > indivdiual jobs. Does such a switch exist? Googling for single
> letters
> >> is
> >> > > fruitless. If we had a path on our workers with more space (in our
> >> case,
> >> > > another hard disk), could we simply pass that path in as
> hadoop.tmp.dir
> >> > for
> >> > > our SOLR jobs? Without incurring any consistency issues on future
> jobs
> >> > that
> >> > > might use the SOLR output on HDFS?
> >> >
>

Re: ways to expand hadoop.tmp.dir capacity?

2011-10-26 Thread Meng Mao
k. One more related question --
is there any mechanism in place to remove failed task attempt directories
from the TaskTracker's jobcache?

It seems like for us, the only way to get rid of them is manually.

On Wed, Oct 26, 2011 at 3:07 AM, Harsh J  wrote:

> Meng,
>
> You should have no issue at all, with respect to the late addition.
>
> On Wednesday, October 26, 2011, Meng Mao  wrote:
> > If we do that rolling restart scenario, will we have a completely quiet
> > migration? That is, if no jobs are running during the rolling restart of
> > TaskTrackers, then we will end up with expanded capacity with no risk of
> > data inconsistency in the cache paths?
> >
> > Our data nodes already use multiple disks for storage. It was an early
> lack
> > of foresight that brings us to the present day where mapred.local.dir
> isn't
> > "distributed."
> >
> > That said, one of our problems is that the SOLR index files we're
> building
> > are just plain huge. Even with expand disk capacity, I think we'd still
> run
> > into disk space issues. Is this something that's been generally reported
> for
> > SOLR hadoop jobs?
> >
> > On Mon, Oct 10, 2011 at 10:08 PM, Harsh J  wrote:
> >
> >> Meng,
> >>
> >> Yes, configure the mapred-site.xml (mapred.local.dir) to add the
> >> property and roll-restart your TaskTrackers. If you'd like to expand
> >> your DataNode to multiple disks as well (helps HDFS I/O greatly), do
> >> the same with hdfs-site.xml (dfs.data.dir) and perform the same
> >> rolling restart of DataNodes.
> >>
> >> Ensure that for each service, the directories you create are owned by
> >> the same user as the one running the process. This will help avoid
> >> permission nightmares.
> >>
> >> On Tue, Oct 11, 2011 at 3:58 AM, Meng Mao  wrote:
> >> > So the only way we can expand to multiple mapred.local.dir paths is to
> >> > config our site.xml and to restart the DataNode?
> >> >
> >> > On Mon, Oct 10, 2011 at 9:36 AM, Marcos Luis Ortiz Valmaseda <
> >> > marcosluis2...@googlemail.com> wrote:
> >> >
> >> >> 2011/10/9 Harsh J 
> >> >>
> >> >> > Hello Meng,
> >> >> >
> >> >> > On Wed, Oct 5, 2011 at 11:02 AM, Meng Mao 
> wrote:
> >> >> > > Currently, we've got defined:
> >> >> > >  
> >> >> > > hadoop.tmp.dir
> >> >> > > /hadoop/hadoop-metadata/cache/
> >> >> > >  
> >> >> > >
> >> >> > > In our experiments with SOLR, the intermediate files are so large
> >> that
> >> >> > they
> >> >> > > tend to blow out disk space and fail (and annoyingly leave behind
> >> their
> >> >> > huge
> >> >> > > failed attempts). We've had issues with it in the past, but we're
> >> >> having
> >> >> > > real problems with SOLR if we can't comfortably get more space
> out
> >> of
> >> >> > > hadoop.tmp.dir somehow.
> >> >> > >
> >> >> > > 1) It seems we never set *mapred.system.dir* to anything special,
> so
> >> >> it's
> >> >> > > defaulting to ${hadoop.tmp.dir}/mapred/system.
> >> >> > > Is this a problem? The docs seem to recommend against it when
> >> >> > hadoop.tmp.dir
> >> >> > > had ${user.name} in it, which ours doesn't.
> >> >> >
> >> >> > The {mapred.system.dir} is a HDFS location, and you shouldn't
> really
> >> >> > be worried about it as much.
> >> >> >
> >> >> > > 1b) The doc says mapred.system.dir is "the in-HDFS path to shared
> >> >> > MapReduce
> >> >> > > system files." To me, that means there's must be 1 single path
> for
> >> >> > > mapred.system.dir, which sort of forces hadoop.tmp.dir to be 1
> path.
> >> >> > > Otherwise, one might imagine that you could specify multiple
> paths
> >> to
> >> >> > store
> >> >> > > hadoop.tmp.dir, like you can for dfs.data.dir. Is this a correct
> >> >> > > interpretation? -- hadoop.tmp.dir could live on multiple
> paths/disks
> >> if
> >> >> > > there were more mapping/lookup between mapred.system.dir and
> >> >> > hadoop.tmp.dir?
> >> >> >
> >> >> > {hadoop.tmp.dir} is indeed reused for {mapred.system.dir}, although
> it
> >> >> > is on HDFS, and hence is confusing, but there should just be one
> >> >> > mapred.system.dir, yes.
> >> >> >
> >> >> > Also, the config {hadoop.tmp.dir} doesn't support > 1 path. What
> you
> >> >> > need here is a proper {mapred.local.dir} configuration.
> >> >> >
> >> >> > > 2) IIRC, there's a -D switch for supplying config name/value
> pairs
> >> into
> >> >> > > indivdiual jobs. Does such a switch exist? Googling for single
> >> letters
> >> >> is
> >> >> > > fruitless. If we had a path on our workers with more space (in
> our
> >> >> case,
> >> >> > > another hard disk), could
>
> --
> Harsh J
>


apache-solr-3.3.0, corrupt indexes, and speculative execution

2011-10-29 Thread Meng Mao
We've been getting up to speed on SOLR, and one of the recent problems we've
run into is with successful jobs delivering corrupt index shards.

This is a look at a 12-sharded index we built and then copied to local disk
off of HDFS.

$ ls -l part-1
total 16
drwxr-xr-x 2 vmc visible 4096 Oct 29 09:39 conf
drwxr-xr-x 4 vmc visible 4096 Oct 29 09:42 data

$ ls -l part-4/
total 16
drwxr-xr-x 4 vmc visible 4096 Oct 29 09:54 data
drwxr-xr-x 2 vmc visible 4096 Oct 29 09:54
solr_attempt_201101270134_42143_r_04_1.1


Right away, you can see that there's apparently some lack of cleanup or
incompleteness. Shard 04 is missing a conf directory, and has an empty
attempt directory lying around.

This is what a complete shard listing looks like:
$ ls -l part-1/*/*
-rw-r--r-- 1 vmc visible 33402 Oct 29 09:39 part-1/conf/schema.xml

part-1/data/index:
total 13088776
-rw-r--r-- 1 vmc visible 6036701453 Oct 29 09:40 _1m3.fdt
*-rw-r--r-- 1 vmc visible  246345692 Oct 29 09:40 _1m3.fdx #missing from
shard 04*
-rw-r--r-- 1 vmc visible211 Oct 29 09:40 _1m3.fnm
-rw-r--r-- 1 vmc visible 3516034769 Oct 29 09:41 _1m3.frq
-rw-r--r-- 1 vmc visible   92379637 Oct 29 09:41 _1m3.nrm
-rw-r--r-- 1 vmc visible  695935796 Oct 29 09:41 _1m3.prx
-rw-r--r-- 1 vmc visible   28548963 Oct 29 09:41 _1m3.tii
-rw-r--r-- 1 vmc visible 2773769958 Oct 29 09:42 _1m3.tis
-rw-r--r-- 1 vmc visible284 Oct 29 09:42 segments_2
-rw-r--r-- 1 vmc visible 20 Oct 29 09:42 segments.gen

part-1/data/spellchecker:
total 16
-rw-r--r-- 1 vmc visible 32 Oct 29 09:42 segments_1
-rw-r--r-- 1 vmc visible 20 Oct 29 09:42 segments.gen


And shard 04:
$ ls -l part-4/*/*
#missing conf/

part-4/data/index:
total 12818420
-rw-r--r-- 1 vmc visible 6036000614 Oct 29 09:52 _1m1.fdt
-rw-r--r-- 1 vmc visible211 Oct 29 09:52 _1m1.fnm
-rw-r--r-- 1 vmc visible 3515333900 Oct 29 09:53 _1m1.frq
-rw-r--r-- 1 vmc visible   92361544 Oct 29 09:53 _1m1.nrm
-rw-r--r-- 1 vmc visible  696258210 Oct 29 09:54 _1m1.prx
-rw-r--r-- 1 vmc visible   28552866 Oct 29 09:54 _1m1.tii
-rw-r--r-- 1 vmc visible 2744647680 Oct 29 09:54 _1m1.tis
-rw-r--r-- 1 vmc visible283 Oct 29 09:54 segments_2
-rw-r--r-- 1 vmc visible 20 Oct 29 09:54 segments.gen

part-4/data/spellchecker:
total 16
-rw-r--r-- 1 vmc visible 32 Oct 29 09:54 segments_1
-rw-r--r-- 1 vmc visible 20 Oct 29 09:54 segments.gen


What might cause that attempt path to be lying around at the time of
completion? Has anyone seen anything like this? My gut says if we were able
to disable speculative execution, we would probably see this go away. But
that might be overreacting.


In this job, of the 12 reduce tasks, 5 had an extra speculative attempt. Of
those 5, 2 were cases where the speculative attempt won out over the first
attempt. And one of them had this output inconsistency.

Here is an excerpt from the task log for shard 04:
2011-10-29 07:08:33,152 INFO org.apache.solr.hadoop.SolrRecordWriter:
SolrHome:
/hadoop/hadoop-metadata/cache/mapred/local/taskTracker/archive/prod1-ha-vip/tmp/f615e27f-38cd-485b-8cf1-c45e77c2a2b8.solr.zip
2011-10-29 07:08:33,889 INFO org.apache.solr.hadoop.SolrRecordWriter:
Constructed instance information solr.home
/hadoop/hadoop-metadata/cache/mapred/local/taskTracker/archive/prod1-ha-vip/tmp/f615e27f-38cd-485b-8cf1-c45e77c2a2b8.solr.zip
(/hadoop/hadoop-metadata/cache/mapred/local/taskTracker/archive/prod1-ha-vip/tmp/f615e27f-38cd-485b-8cf1-c45e77c2a2b8.solr.zip),
instance dir
/hadoop/hadoop-metadata/cache/mapred/local/taskTracker/archive/prod1-ha-vip/tmp/f615e27f-38cd-485b-8cf1-c45e77c2a2b8.solr.zip/,
conf dir
/hadoop/hadoop-metadata/cache/mapred/local/taskTracker/archive/prod1-ha-vip/tmp/f615e27f-38cd-485b-8cf1-c45e77c2a2b8.solr.zip/conf/,
writing index to temporary directory
/hadoop/hadoop-metadata/cache/mapred/local/taskTracker/jobcache/job_201101270134_42143/work/solr_attempt_201101270134_42143_r_04_1.1/data,
with permdir /PROD/output/solr/solr-20111029063514-12/part-4
2011-10-29 07:08:35,868 INFO org.apache.solr.schema.IndexSchema: Reading
Solr Schema

... much later ...

2011-10-29 07:08:37,263 WARN org.apache.solr.core.SolrCore: [core1]
Solr index directory
'/hadoop/hadoop-metadata/cache/mapred/local/taskTracker/jobcache/job_201101270134_42143/work/solr_attempt_201101270134_42143_r_04_1.1/data/index'
doesn't exist. Creating new index...

The log doesn't look much different from the successful shard 3 (where the
later attempt beat the earlier attempt). I have the full logs if anyone has
diagnostic advice.


Re: apache-solr-3.3.0, corrupt indexes, and speculative execution

2011-10-30 Thread Meng Mao
Today we again had some leftover attempt dirs -- 2 out of 5 reduce tasks
with a redundant speculative attempt left around their extra attempt:

part-5:
conf  data  solr_attempt_201101270134_42328_r_05_1.1

part-6:
conf  data  solr_attempt_201101270134_42328_r_06_1.1

Is it possible that the disk being really full could cause transient
filesystem glitches that aren't being thrown as exceptions?

On Sat, Oct 29, 2011 at 2:23 PM, Meng Mao  wrote:

> We've been getting up to speed on SOLR, and one of the recent problems
> we've run into is with successful jobs delivering corrupt index shards.
>
> This is a look at a 12-sharded index we built and then copied to local
> disk off of HDFS.
>
> $ ls -l part-1
> total 16
> drwxr-xr-x 2 vmc visible 4096 Oct 29 09:39 conf
> drwxr-xr-x 4 vmc visible 4096 Oct 29 09:42 data
>
> $ ls -l part-4/
> total 16
> drwxr-xr-x 4 vmc visible 4096 Oct 29 09:54 data
> drwxr-xr-x 2 vmc visible 4096 Oct 29 09:54
> solr_attempt_201101270134_42143_r_04_1.1
>
>
> Right away, you can see that there's apparently some lack of cleanup or
> incompleteness. Shard 04 is missing a conf directory, and has an empty
> attempt directory lying around.
>
> This is what a complete shard listing looks like:
> $ ls -l part-1/*/*
> -rw-r--r-- 1 vmc visible 33402 Oct 29 09:39 part-1/conf/schema.xml
>
> part-1/data/index:
> total 13088776
> -rw-r--r-- 1 vmc visible 6036701453 Oct 29 09:40 _1m3.fdt
> *-rw-r--r-- 1 vmc visible  246345692 Oct 29 09:40 _1m3.fdx #missing from
> shard 04*
> -rw-r--r-- 1 vmc visible211 Oct 29 09:40 _1m3.fnm
> -rw-r--r-- 1 vmc visible 3516034769 Oct 29 09:41 _1m3.frq
> -rw-r--r-- 1 vmc visible   92379637 Oct 29 09:41 _1m3.nrm
> -rw-r--r-- 1 vmc visible  695935796 Oct 29 09:41 _1m3.prx
> -rw-r--r-- 1 vmc visible   28548963 Oct 29 09:41 _1m3.tii
> -rw-r--r-- 1 vmc visible 2773769958 Oct 29 09:42 _1m3.tis
> -rw-r--r-- 1 vmc visible284 Oct 29 09:42 segments_2
> -rw-r--r-- 1 vmc visible 20 Oct 29 09:42 segments.gen
>
> part-1/data/spellchecker:
> total 16
> -rw-r--r-- 1 vmc visible 32 Oct 29 09:42 segments_1
> -rw-r--r-- 1 vmc visible 20 Oct 29 09:42 segments.gen
>
>
> And shard 04:
> $ ls -l part-4/*/*
> #missing conf/
>
> part-4/data/index:
> total 12818420
> -rw-r--r-- 1 vmc visible 6036000614 Oct 29 09:52 _1m1.fdt
> -rw-r--r-- 1 vmc visible211 Oct 29 09:52 _1m1.fnm
> -rw-r--r-- 1 vmc visible 3515333900 Oct 29 09:53 _1m1.frq
> -rw-r--r-- 1 vmc visible   92361544 Oct 29 09:53 _1m1.nrm
> -rw-r--r-- 1 vmc visible  696258210 Oct 29 09:54 _1m1.prx
> -rw-r--r-- 1 vmc visible   28552866 Oct 29 09:54 _1m1.tii
> -rw-r--r-- 1 vmc visible 2744647680 Oct 29 09:54 _1m1.tis
> -rw-r--r-- 1 vmc visible283 Oct 29 09:54 segments_2
> -rw-r--r-- 1 vmc visible 20 Oct 29 09:54 segments.gen
>
> part-4/data/spellchecker:
> total 16
> -rw-r--r-- 1 vmc visible 32 Oct 29 09:54 segments_1
> -rw-r--r-- 1 vmc visible 20 Oct 29 09:54 segments.gen
>
>
> What might cause that attempt path to be lying around at the time of
> completion? Has anyone seen anything like this? My gut says if we were able
> to disable speculative execution, we would probably see this go away. But
> that might be overreacting.
>
>
> In this job, of the 12 reduce tasks, 5 had an extra speculative attempt.
> Of those 5, 2 were cases where the speculative attempt won out over the
> first attempt. And one of them had this output inconsistency.
>
> Here is an excerpt from the task log for shard 04:
> 2011-10-29 07:08:33,152 INFO org.apache.solr.hadoop.SolrRecordWriter:
> SolrHome:
> /hadoop/hadoop-metadata/cache/mapred/local/taskTracker/archive/prod1-ha-vip/tmp/f615e27f-38cd-485b-8cf1-c45e77c2a2b8.solr.zip
> 2011-10-29 07:08:33,889 INFO org.apache.solr.hadoop.SolrRecordWriter:
> Constructed instance information solr.home
> /hadoop/hadoop-metadata/cache/mapred/local/taskTracker/archive/prod1-ha-vip/tmp/f615e27f-38cd-485b-8cf1-c45e77c2a2b8.solr.zip
> (/hadoop/hadoop-metadata/cache/mapred/local/taskTracker/archive/prod1-ha-vip/tmp/f615e27f-38cd-485b-8cf1-c45e77c2a2b8.solr.zip),
> instance dir
> /hadoop/hadoop-metadata/cache/mapred/local/taskTracker/archive/prod1-ha-vip/tmp/f615e27f-38cd-485b-8cf1-c45e77c2a2b8.solr.zip/,
> conf dir
> /hadoop/hadoop-metadata/cache/mapred/local/taskTracker/archive/prod1-ha-vip/tmp/f615e27f-38cd-485b-8cf1-c45e77c2a2b8.solr.zip/conf/,
> writing index to temporary directory
> /hadoop/hadoop-metadata/cache/mapred/local/taskTracker/jobcache/job_201101270134_42143/work/solr_attempt_201101270134_42143_r_04_1.1/data,
> with permdir /PROD/output/solr/solr-20111029063514-12/part-4
>

Re: Do failed task attempts stick around the jobcache on local disk?

2011-11-05 Thread Meng Mao
Am I being completely silly asking about this? Does anyone know?

On Wed, Nov 2, 2011 at 6:27 PM, Meng Mao  wrote:

> Is there any mechanism in place to remove failed task attempt directories
> from the TaskTracker's jobcache?
>
> It seems like for us, the only way to get rid of them is manually.


desperate question about NameNode startup sequence

2011-12-16 Thread Meng Mao
Our CDH2 production grid just crashed with some sort of master node failure.
When I went in there, JobTracker was missing and NameNode was up.
Trying to ls on HDFS met with no connection.

We decided to go for a restart. This is in the namenode log right now:

2011-12-17 01:37:35,568 INFO
org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics:
Initializing NameNodeMeterics using context
object:org.apache.hadoop.metrics.spi.NullContext
2011-12-17 01:37:35,612 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop,hadoop
2011-12-17 01:37:35,613 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
2011-12-17 01:37:35,613 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
isPermissionEnabled=true
2011-12-17 01:37:35,620 INFO
org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
Initializing FSNamesystemMetrics using context
object:org.apache.hadoop.metrics.spi.NullContext
2011-12-17 01:37:35,621 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
FSNamesystemStatusMBean
2011-12-17 01:37:35,648 INFO org.apache.hadoop.hdfs.server.common.Storage:
Number of files = 16978046
2011-12-17 01:43:24,023 INFO org.apache.hadoop.hdfs.server.common.Storage:
Number of files under construction = 1
2011-12-17 01:43:24,025 INFO org.apache.hadoop.hdfs.server.common.Storage:
Image file of size 2589456651 loaded in 348 seconds.
2011-12-17 01:43:24,030 INFO org.apache.hadoop.hdfs.server.common.Storage:
Edits file /hadoop/hadoop-metadata/cache/dfs/name/current/edits of size
3885 edits # 43 loaded in 0 seconds.


What's coming up in the startup sequence? We have a ton of data on there.
Is there any way to estimate startup time?


Re: desperate question about NameNode startup sequence

2011-12-16 Thread Meng Mao
All of the worker nodes datanodes' logs haven't logged anything after the
initial startup announcement:
STARTUP_MSG:   host = prod1-worker075/10.2.19.75
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.20.1+169.56
STARTUP_MSG:   build =  -r 8e662cb065be1c4bc61c55e6bff161e09c1d36f3;
compiled by 'root' on Tue Feb  9 13:40:08 EST 2010
/

On Sat, Dec 17, 2011 at 2:00 AM, Meng Mao  wrote:

> Our CDH2 production grid just crashed with some sort of master node
> failure.
> When I went in there, JobTracker was missing and NameNode was up.
> Trying to ls on HDFS met with no connection.
>
>  We decided to go for a restart. This is in the namenode log right now:
>
> 2011-12-17 01:37:35,568 INFO
> org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics:
> Initializing NameNodeMeterics using context
> object:org.apache.hadoop.metrics.spi.NullContext
> 2011-12-17 01:37:35,612 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop,hadoop
> 2011-12-17 01:37:35,613 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
> 2011-12-17 01:37:35,613 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> isPermissionEnabled=true
> 2011-12-17 01:37:35,620 INFO
> org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
> Initializing FSNamesystemMetrics using context
> object:org.apache.hadoop.metrics.spi.NullContext
> 2011-12-17 01:37:35,621 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
> FSNamesystemStatusMBean
> 2011-12-17 01:37:35,648 INFO org.apache.hadoop.hdfs.server.common.Storage:
> Number of files = 16978046
> 2011-12-17 01:43:24,023 INFO org.apache.hadoop.hdfs.server.common.Storage:
> Number of files under construction = 1
> 2011-12-17 01:43:24,025 INFO org.apache.hadoop.hdfs.server.common.Storage:
> Image file of size 2589456651 loaded in 348 seconds.
> 2011-12-17 01:43:24,030 INFO org.apache.hadoop.hdfs.server.common.Storage:
> Edits file /hadoop/hadoop-metadata/cache/dfs/name/current/edits of size
> 3885 edits # 43 loaded in 0 seconds.
>
>
> What's coming up in the startup sequence? We have a ton of data on there.
> Is there any way to estimate startup time?
>


Re: desperate question about NameNode startup sequence

2011-12-17 Thread Meng Mao
Maybe this is a bad sign -- the edits.new was created before the master
node crashed, and is huge:

-bash-3.2$ ls -lh /hadoop/hadoop-metadata/cache/dfs/name/current
total 41G
-rw-r--r-- 1 hadoop hadoop 3.8K Jan 27  2011 edits
-rw-r--r-- 1 hadoop hadoop  39G Dec 17 00:44 edits.new
-rw-r--r-- 1 hadoop hadoop 2.5G Jan 27  2011 fsimage
-rw-r--r-- 1 hadoop hadoop8 Jan 27  2011 fstime
-rw-r--r-- 1 hadoop hadoop  101 Jan 27  2011 VERSION

could this mean something was up with our SecondaryNameNode and rolling the
edits file?

On Sat, Dec 17, 2011 at 2:53 AM, Meng Mao  wrote:

> All of the worker nodes datanodes' logs haven't logged anything after the
> initial startup announcement:
> STARTUP_MSG:   host = prod1-worker075/10.2.19.75
> STARTUP_MSG:   args = []
> STARTUP_MSG:   version = 0.20.1+169.56
> STARTUP_MSG:   build =  -r 8e662cb065be1c4bc61c55e6bff161e09c1d36f3;
> compiled by 'root' on Tue Feb  9 13:40:08 EST 2010
> ****/
>
> On Sat, Dec 17, 2011 at 2:00 AM, Meng Mao  wrote:
>
>> Our CDH2 production grid just crashed with some sort of master node
>> failure.
>> When I went in there, JobTracker was missing and NameNode was up.
>> Trying to ls on HDFS met with no connection.
>>
>>  We decided to go for a restart. This is in the namenode log right now:
>>
>> 2011-12-17 01:37:35,568 INFO
>> org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics:
>> Initializing NameNodeMeterics using context
>> object:org.apache.hadoop.metrics.spi.NullContext
>> 2011-12-17 01:37:35,612 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop,hadoop
>> 2011-12-17 01:37:35,613 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
>> 2011-12-17 01:37:35,613 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
>> isPermissionEnabled=true
>> 2011-12-17 01:37:35,620 INFO
>> org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
>> Initializing FSNamesystemMetrics using context
>> object:org.apache.hadoop.metrics.spi.NullContext
>> 2011-12-17 01:37:35,621 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
>> FSNamesystemStatusMBean
>> 2011-12-17 01:37:35,648 INFO
>> org.apache.hadoop.hdfs.server.common.Storage: Number of files = 16978046
>> 2011-12-17 01:43:24,023 INFO
>> org.apache.hadoop.hdfs.server.common.Storage: Number of files under
>> construction = 1
>> 2011-12-17 01:43:24,025 INFO
>> org.apache.hadoop.hdfs.server.common.Storage: Image file of size 2589456651
>> loaded in 348 seconds.
>> 2011-12-17 01:43:24,030 INFO
>> org.apache.hadoop.hdfs.server.common.Storage: Edits file
>> /hadoop/hadoop-metadata/cache/dfs/name/current/edits of size 3885 edits #
>> 43 loaded in 0 seconds.
>>
>>
>> What's coming up in the startup sequence? We have a ton of data on there.
>> Is there any way to estimate startup time?
>>
>
>


Re: desperate question about NameNode startup sequence

2011-12-17 Thread Meng Mao
The namenode eventually came up. Here's the resumation of the logging:

2011-12-17 01:37:35,648 INFO org.apache.hadoop.hdfs.server.common.Storage:
Number of files = 16978046
2011-12-17 01:43:24,023 INFO org.apache.hadoop.hdfs.server.common.Storage:
Number of files under construction = 1
2011-12-17 01:43:24,025 INFO org.apache.hadoop.hdfs.server.common.Storage:
Image file of size 2589456651 loaded in 348 seconds.
2011-12-17 01:43:24,030 INFO org.apache.hadoop.hdfs.server.common.Storage:
Edits file /hadoop/hadoop-metadata/cache/dfs/name/current/edits of size
3885 edits # 43 loaded in 0 seconds.
2011-12-17 03:06:26,731 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Invalid opcode,
reached end of edit log Number of transactions found 306757368
2011-12-17 03:06:26,732 INFO org.apache.hadoop.hdfs.server.common.Storage:
Edits file /hadoop/hadoop-metadata/cache/dfs/name/current/edits.new of size
41011966085 edits # 306757368 loaded in 4982 seconds.
2011-12-17 03:06:47,264 INFO org.apache.hadoop.hdfs.server.common.Storage:
Image file of size 1724849462 saved in 19 seconds.
2011-12-17 03:07:09,051 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Finished loading
FSImage in 5373458 msecs

It took a long time to load edits.new, consistent with the speed at which
it loaded fsimage.

So at 3:12 or so, the namenode came back up. At that time, the fsimage got
updated on our secondary namenode:
[vmc@prod1-secondary ~]$ ls -l
/hadoop/hadoop-metadata/cache/dfs/namesecondary/current/
total 1686124
-rw-r--r-- 1 hadoop hadoop  38117 Dec 17 03:12 edits
-rw-r--r-- 1 hadoop hadoop 1724849462 Dec 17 03:12 fsimage

But that's it. No updates since then. Is that normal 2NN behavior? I don't
think we've tuned away from the defaults for fsimage and edits maintenance.
How should I diagnose?

Similarly, primary namenode seems to continue to log changes to edits.new:
-bash-3.2$ ls -l /hadoop/hadoop-metadata/cache/dfs/name/current/
total 1728608
-rw-r--r-- 1 hadoop hadoop  38117 Dec 17 03:12 edits
-rw-r--r-- 1 hadoop hadoop   44064633 Dec 17 11:41 edits.new
-rw-r--r-- 1 hadoop hadoop 1724849462 Dec 17 03:06 fsimage
-rw-r--r-- 1 hadoop hadoop  8 Dec 17 03:07 fstime
-rw-r--r-- 1 hadoop hadoop101 Dec 17 03:07 VERSION

Is this normal? Have I been misunderstanding normal NN operation?

On Sat, Dec 17, 2011 at 3:01 AM, Meng Mao  wrote:

> Maybe this is a bad sign -- the edits.new was created before the master
> node crashed, and is huge:
>
> -bash-3.2$ ls -lh /hadoop/hadoop-metadata/cache/dfs/name/current
> total 41G
> -rw-r--r-- 1 hadoop hadoop 3.8K Jan 27  2011 edits
> -rw-r--r-- 1 hadoop hadoop  39G Dec 17 00:44 edits.new
> -rw-r--r-- 1 hadoop hadoop 2.5G Jan 27  2011 fsimage
> -rw-r--r-- 1 hadoop hadoop8 Jan 27  2011 fstime
> -rw-r--r-- 1 hadoop hadoop  101 Jan 27  2011 VERSION
>
> could this mean something was up with our SecondaryNameNode and rolling
> the edits file?
>
> On Sat, Dec 17, 2011 at 2:53 AM, Meng Mao  wrote:
>
>> All of the worker nodes datanodes' logs haven't logged anything after the
>> initial startup announcement:
>> STARTUP_MSG:   host = prod1-worker075/10.2.19.75
>> STARTUP_MSG:   args = []
>> STARTUP_MSG:   version = 0.20.1+169.56
>> STARTUP_MSG:   build =  -r 8e662cb065be1c4bc61c55e6bff161e09c1d36f3;
>> compiled by 'root' on Tue Feb  9 13:40:08 EST 2010
>> /
>>
>> On Sat, Dec 17, 2011 at 2:00 AM, Meng Mao  wrote:
>>
>>> Our CDH2 production grid just crashed with some sort of master node
>>> failure.
>>> When I went in there, JobTracker was missing and NameNode was up.
>>> Trying to ls on HDFS met with no connection.
>>>
>>>  We decided to go for a restart. This is in the namenode log right now:
>>>
>>> 2011-12-17 01:37:35,568 INFO
>>> org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics:
>>> Initializing NameNodeMeterics using context
>>> object:org.apache.hadoop.metrics.spi.NullContext
>>> 2011-12-17 01:37:35,612 INFO
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop,hadoop
>>> 2011-12-17 01:37:35,613 INFO
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
>>> 2011-12-17 01:37:35,613 INFO
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
>>> isPermissionEnabled=true
>>> 2011-12-17 01:37:35,620 INFO
>>> org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
>>> Initializing FSNamesystemMetrics using context
>>> object:org.apache.hadoop.metrics.spi.NullContext
>>> 2011-12-17 01:37:35,621 INFO
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
>&g

Re: desperate question about NameNode startup sequence

2011-12-17 Thread Meng Mao
Bruce, thanks for moving this over. I wasn't aware there were new lists for
CDH.

How should I diagnose if our 2NN is working right now?

On Sat, Dec 17, 2011 at 4:00 PM, Edward Capriolo wrote:

> The problem with checkpoint /2nn is that it happily "runs" and has no
> outward indication that it is unable to connect.
>
> Because you have a large edits file you startup will complete, however
> with that size it could take hours. It logs nothing while this is going on
> but as long as the CPU is working that means it is progressing.
>
> We have a nagios check on the size of this directory so if the edit
> rolling stops we know about it.
>
>
> On Saturday, December 17, 2011, Brock Noland  wrote:
> > Hi,
> >
> > Since your using CDH2, I am moving this to CDH-USER. You can subscribe
> here:
> >
> > http://groups.google.com/a/cloudera.org/group/cdh-user
> >
> > BCC'd common-user
> >
> > On Sat, Dec 17, 2011 at 2:01 AM, Meng Mao  wrote:
> >> Maybe this is a bad sign -- the edits.new was created before the master
> >> node crashed, and is huge:
> >>
> >> -bash-3.2$ ls -lh /hadoop/hadoop-metadata/cache/dfs/name/current
> >> total 41G
> >> -rw-r--r-- 1 hadoop hadoop 3.8K Jan 27  2011 edits
> >> -rw-r--r-- 1 hadoop hadoop  39G Dec 17 00:44 edits.new
> >> -rw-r--r-- 1 hadoop hadoop 2.5G Jan 27  2011 fsimage
> >> -rw-r--r-- 1 hadoop hadoop8 Jan 27  2011 fstime
> >> -rw-r--r-- 1 hadoop hadoop  101 Jan 27  2011 VERSION
> >>
> >> could this mean something was up with our SecondaryNameNode and rolling
> the
> >> edits file?
> >
> > Yes it looks like a checkpoint never completed. It's a good idea to
> > monitor the mtime on fsimage to ensure it never gets too old.
> >
> > Has a checkpoint completed since you restarted?
> >
> > Brock
> >
>


do HDFS files starting with _ (underscore) have special properties?

2011-09-02 Thread Meng Mao
We have a compression utility that tries to grab all subdirs to a directory
on HDFS. It makes a call like this:
FileStatus[] subdirs = fs.globStatus(new Path(inputdir, "*"));

and handles files vs dirs accordingly.

We tried to run our utility against a dir containing a computed SOLR shard,
which has files that look like this:
-rw-r--r--   2 hadoopuser visible 8538430603 2011-09-01 18:58
/test/output/solr-20110901165238/part-0/data/index/_ox.fdt
-rw-r--r--   2 hadoopuser visible  233396596 2011-09-01 18:57
/test/output/solr-20110901165238/part-0/data/index/_ox.fdx
-rw-r--r--   2 hadoopuser visible130 2011-09-01 18:57
/test/output/solr-20110901165238/part-0/data/index/_ox.fnm
-rw-r--r--   2 hadoopuser visible 2147948283 2011-09-01 18:55
/test/output/solr-20110901165238/part-0/data/index/_ox.frq
-rw-r--r--   2 hadoopuser visible   87523726 2011-09-01 18:57
/test/output/solr-20110901165238/part-0/data/index/_ox.nrm
-rw-r--r--   2 hadoopuser visible  920936168 2011-09-01 18:57
/test/output/solr-20110901165238/part-0/data/index/_ox.prx
-rw-r--r--   2 hadoopuser visible   22619542 2011-09-01 18:58
/test/output/solr-20110901165238/part-0/data/index/_ox.tii
-rw-r--r--   2 hadoopuser visible 2070214402 2011-09-01 18:51
/test/output/solr-20110901165238/part-0/data/index/_ox.tis
-rw-r--r--   2 hadoopuser visible 20 2011-09-01 18:51
/test/output/solr-20110901165238/part-0/data/index/segments.gen
-rw-r--r--   2 hadoopuser visible282 2011-09-01 18:55
/test/output/solr-20110901165238/part-0/data/index/segments_2


The globStatus call seems only able to pick up those last 2 files; the
several files that start with _ don't register.

I've skimmed the FileSystem and GlobExpander source to see if there's
anything related to this, but didn't see it. Google didn't turn up anything
about underscores. Am I misunderstanding something about the regex patterns
needed to pick these up or unaware of some filename convention in HDFS?


Re: do HDFS files starting with _ (underscore) have special properties?

2011-09-02 Thread Meng Mao
Is there a programmatic way to access these hidden files then?

On Fri, Sep 2, 2011 at 5:20 PM, Edward Capriolo wrote:

> On Fri, Sep 2, 2011 at 4:04 PM, Meng Mao  wrote:
>
> > We have a compression utility that tries to grab all subdirs to a
> directory
> > on HDFS. It makes a call like this:
> > FileStatus[] subdirs = fs.globStatus(new Path(inputdir, "*"));
> >
> > and handles files vs dirs accordingly.
> >
> > We tried to run our utility against a dir containing a computed SOLR
> shard,
> > which has files that look like this:
> > -rw-r--r--   2 hadoopuser visible 8538430603 2011-09-01 18:58
> > /test/output/solr-20110901165238/part-0/data/index/_ox.fdt
> > -rw-r--r--   2 hadoopuser visible  233396596 2011-09-01 18:57
> > /test/output/solr-20110901165238/part-0/data/index/_ox.fdx
> > -rw-r--r--   2 hadoopuser visible130 2011-09-01 18:57
> > /test/output/solr-20110901165238/part-0/data/index/_ox.fnm
> > -rw-r--r--   2 hadoopuser visible 2147948283 2011-09-01 18:55
> > /test/output/solr-20110901165238/part-0/data/index/_ox.frq
> > -rw-r--r--   2 hadoopuser visible   87523726 2011-09-01 18:57
> > /test/output/solr-20110901165238/part-0/data/index/_ox.nrm
> > -rw-r--r--   2 hadoopuser visible  920936168 2011-09-01 18:57
> > /test/output/solr-20110901165238/part-0/data/index/_ox.prx
> > -rw-r--r--   2 hadoopuser visible   22619542 2011-09-01 18:58
> > /test/output/solr-20110901165238/part-0/data/index/_ox.tii
> > -rw-r--r--   2 hadoopuser visible 2070214402 2011-09-01 18:51
> > /test/output/solr-20110901165238/part-0/data/index/_ox.tis
> > -rw-r--r--   2 hadoopuser visible 20 2011-09-01 18:51
> > /test/output/solr-20110901165238/part-0/data/index/segments.gen
> > -rw-r--r--   2 hadoopuser visible282 2011-09-01 18:55
> > /test/output/solr-20110901165238/part-0/data/index/segments_2
> >
> >
> > The globStatus call seems only able to pick up those last 2 files; the
> > several files that start with _ don't register.
> >
> > I've skimmed the FileSystem and GlobExpander source to see if there's
> > anything related to this, but didn't see it. Google didn't turn up
> anything
> > about underscores. Am I misunderstanding something about the regex
> patterns
> > needed to pick these up or unaware of some filename convention in HDFS?
> >
>
> Files starting with '_' are considered 'hidden' like unix files starting
> with '.'. I did not know that for a very long time because not everyone
> follows this rule or even knows about it.
>


Re: do HDFS files starting with _ (underscore) have special properties?

2011-09-03 Thread Meng Mao
I get the opposite behavior --

[this is more or less how I listed the files in the original email]
hadoop dfs -ls /test/output/solr-20110901165238/part-0/data/index/*
-rw-r--r--   2 hadoopuser visible 8538430603 2011-09-01 18:58
/test/output/solr-20110901165238/part-0/data/index/_ox.fdt
-rw-r--r--   2 hadoopuser visible  233396596 2011-09-01 18:57
/test/output/solr-20110901165238/part-0/data/index/_ox.fdx
-rw-r--r--   2 hadoopuser visible130 2011-09-01 18:57
/test/output/solr-20110901165238/part-0/data/index/_ox.fnm
-rw-r--r--   2 hadoopuser visible 2147948283 2011-09-01 18:55
/test/output/solr-20110901165238/part-0/data/index/_ox.frq
-rw-r--r--   2 hadoopuser visible   87523726 2011-09-01 18:57
/test/output/solr-20110901165238/part-0/data/index/_ox.nrm
-rw-r--r--   2 hadoopuser visible  920936168 2011-09-01 18:57
/test/output/solr-20110901165238/part-0/data/index/_ox.prx
-rw-r--r--   2 hadoopuser visible   22619542 2011-09-01 18:58
/test/output/solr-20110901165238/part-0/data/index/_ox.tii
-rw-r--r--   2 hadoopuser visible 2070214402 2011-09-01 18:51
/test/output/solr-20110901165238/part-0/data/index/_ox.tis
-rw-r--r--   2 hadoopuser visible 20 2011-09-01 18:51
/test/output/solr-20110901165238/part-0/data/index/segments.gen
-rw-r--r--   2 hadoopuser visible282 2011-09-01 18:55
/test/output/solr-20110901165238/part-0/data/index/segments_2

Whereas my globStatus doesn't capture them.

I thought we were on Cloudera's CDH3, but now I'm not sure. This is what
version reports:
$ hadoop version
Hadoop 0.20.1+169.56
Subversion  -r 8e662cb065be1c4bc61c55e6bff161e09c1d36f3
Compiled by root on Tue Feb  9 13:40:08 EST 2010





On Fri, Sep 2, 2011 at 11:45 PM, Harsh J  wrote:

> Meng,
>
> What version of hadoop are you on? I'm able to use globStatus(Path)
> for '_' listing successfully, with a '*' glob. Although the same
> doesn't apply to what FsShell's ls utility provide (which is odd
> here!).
>
> Here's my test code which can validate that the listing is indeed
> done: http://pastebin.com/vCbd2wmK
>
> $ hadoop dfs -ls
> Found 4 items
> drwxr-xr-x   - harshchouraria supergroup  0 2011-09-03 09:09
> /user/harshchouraria/_abc
> -rw-r--r--   1 harshchouraria supergroup  0 2011-09-03 09:10
> /user/harshchouraria/_def
> drwxr-xr-x   - harshchouraria supergroup  0 2011-09-03 08:10
> /user/harshchouraria/abc
> -rw-r--r--   1 harshchouraria supergroup  0 2011-09-03 09:10
> /user/harshchouraria/def
>
>
> $ hadoop dfs -ls '*'
> -rw-r--r--   1 harshchouraria supergroup  0 2011-09-03 09:10
> /user/harshchouraria/_def
> -rw-r--r--   1 harshchouraria supergroup  0 2011-09-03 09:10
> /user/harshchouraria/def
>
> $ # No dir results! ^^
>
> $ hadoop jar myjar.jar # (My code)
> hdfs://localhost/user/harshchouraria/_abc
> hdfs://localhost/user/harshchouraria/_def
> hdfs://localhost/user/harshchouraria/abc
> hdfs://localhost/user/harshchouraria/def
>
> I suppose that means globStatus is fine, but the FsShell.ls(…) code
> does something more than a simple glob status, and filters away
> directory results when used with a glob.
>
> On Sat, Sep 3, 2011 at 3:07 AM, Meng Mao  wrote:
> > Is there a programmatic way to access these hidden files then?
> >
> > On Fri, Sep 2, 2011 at 5:20 PM, Edward Capriolo  >wrote:
> >
> >> On Fri, Sep 2, 2011 at 4:04 PM, Meng Mao  wrote:
> >>
> >> > We have a compression utility that tries to grab all subdirs to a
> >> directory
> >> > on HDFS. It makes a call like this:
> >> > FileStatus[] subdirs = fs.globStatus(new Path(inputdir, "*"));
> >> >
> >> > and handles files vs dirs accordingly.
> >> >
> >> > We tried to run our utility against a dir containing a computed SOLR
> >> shard,
> >> > which has files that look like this:
> >> > -rw-r--r--   2 hadoopuser visible 8538430603 2011-09-01 18:58
> >> > /test/output/solr-20110901165238/part-0/data/index/_ox.fdt
> >> > -rw-r--r--   2 hadoopuser visible  233396596 2011-09-01 18:57
> >> > /test/output/solr-20110901165238/part-0/data/index/_ox.fdx
> >> > -rw-r--r--   2 hadoopuser visible130 2011-09-01 18:57
> >> > /test/output/solr-20110901165238/part-0/data/index/_ox.fnm
> >> > -rw-r--r--   2 hadoopuser visible 2147948283 2011-09-01 18:55
> >> > /test/output/solr-20110901165238/part-0/data/index/_ox.frq
> >> > -rw-r--r--   2 hadoopuser visible   87523726 2011-09-01 18:57
> >> > /test/output/solr-20110901165238/part-0/data/index/_ox.nrm
> >

Re: Disable Sorting?

2011-09-10 Thread Meng Mao
Is there a way to collate the possibly large number of map output files,
though?

On Sat, Sep 10, 2011 at 2:48 PM, Arun C Murthy  wrote:

> Run a map-only job with #reduces set to 0.
>
> Arun
>
> On Sep 10, 2011, at 2:06 AM, john smith wrote:
>
> > Hi,
> >
> > Some of the MR jobs I run doesn't need sorting of map-output in each
> > partition. Is there someway I can disable it?
> >
> > Any help?
> >
> > Thanks
> > jS
>
>


operation of DistributedCache following manual deletion of cached files?

2011-09-22 Thread Meng Mao
We use the DistributedCache class to distribute a few lookup files for our
jobs. We have been aggressively deleting failed task attempts' leftover data
, and our script accidentally deleted the path to our distributed cache
files.

Our task attempt leftover data was here [per node]:
/hadoop/hadoop-metadata/cache/mapred/local/
and our distributed cache path was:
hadoop/hadoop-metadata/cache/mapred/local/taskTracker/archive/
We deleted this path by accident.

Does this latter path look normal? I'm not that familiar with
DistributedCache but I'm up right now investigating the issue so I thought
I'd ask.

After that deletion, the first 2 jobs to run (which are use the addCacheFile
method to distribute their files) didn't seem to push the files out to the
cache path, except on one node. Is this expected behavior? Shouldn't
addCacheFile check to see if the files are missing, and if so, repopulate
them as needed?

I'm trying to get a handle on whether it's safe to delete the distributed
cache path when the grid is quiet and no jobs are running. That is, if
addCacheFile is designed to be robust against the files it's caching not
being at each job start.


Re: operation of DistributedCache following manual deletion of cached files?

2011-09-23 Thread Meng Mao
Hmm, I must have really missed an important piece somewhere. This is from
the MapRed tutorial text:

"DistributedCache is a facility provided by the Map/Reduce framework to
cache files (text, archives, jars and so on) needed by applications.

Applications specify the files to be cached via urls (hdfs://) in the
JobConf. The DistributedCache* assumes that the files specified via hdfs://
urls are already present on the FileSystem.*

*The framework will copy the necessary files to the slave node before any
tasks for the job are executed on that node*. Its efficiency stems from the
fact that the files are only copied once per job and the ability to cache
archives which are un-archived on the slaves."


After some close reading, the two bolded pieces seem to be in contradiction
of each other? I'd always that addCacheFile() would perform the 2nd bolded
statement. If that sentence is true, then I still don't have an explanation
of why our job didn't correctly push out new versions of the cache files
upon the startup and execution of JobConfiguration. We deleted them before
our job started, not during.

On Fri, Sep 23, 2011 at 9:35 AM, Robert Evans  wrote:

> Meng Mao,
>
> The way the distributed cache is currently written, it does not verify the
> integrity of the cache files at all after they are downloaded.  It just
> assumes that if they were downloaded once they are still there and in the
> proper shape.  It might be good to file a JIRA to add in some sort of check.
>  Another thing to do is that the distributed cache also includes the time
> stamp of the original file, just incase you delete the file and then use a
> different version.  So if you want it to force a download again you can copy
> it delete the original and then move it back to what it was before.
>
> --Bobby Evans
>
> On 9/23/11 1:57 AM, "Meng Mao"  wrote:
>
> We use the DistributedCache class to distribute a few lookup files for our
> jobs. We have been aggressively deleting failed task attempts' leftover
> data
> , and our script accidentally deleted the path to our distributed cache
> files.
>
> Our task attempt leftover data was here [per node]:
> /hadoop/hadoop-metadata/cache/mapred/local/
> and our distributed cache path was:
> hadoop/hadoop-metadata/cache/mapred/local/taskTracker/archive/
> We deleted this path by accident.
>
> Does this latter path look normal? I'm not that familiar with
> DistributedCache but I'm up right now investigating the issue so I thought
> I'd ask.
>
> After that deletion, the first 2 jobs to run (which are use the
> addCacheFile
> method to distribute their files) didn't seem to push the files out to the
> cache path, except on one node. Is this expected behavior? Shouldn't
> addCacheFile check to see if the files are missing, and if so, repopulate
> them as needed?
>
> I'm trying to get a handle on whether it's safe to delete the distributed
> cache path when the grid is quiet and no jobs are running. That is, if
> addCacheFile is designed to be robust against the files it's caching not
> being at each job start.
>
>


Re: operation of DistributedCache following manual deletion of cached files?

2011-09-26 Thread Meng Mao
Let's frame the issue in another way. I'll describe a sequence of Hadoop
operations that I think should work, and then I'll get into what we did and
how it failed.

Normal sequence:
1. have files to be cached in HDFS
2. Run Job A, which specifies those files to be put into DistributedCache
space
3. job runs fine
4. Run Job A some time later. job runs fine again.

Breaking sequence:
1. have files to be cached in HDFS
2. Run Job A, which specifies those files to be put into DistributedCache
space
3. job runs fine
4. Manually delete cached files out of local disk on worker nodes
5. Run Job A again, expect it to push out cache copies as needed.
6. job fails because the cache copies didn't get distributed

Should this second sequence have broken?

On Fri, Sep 23, 2011 at 3:09 PM, Meng Mao  wrote:

> Hmm, I must have really missed an important piece somewhere. This is from
> the MapRed tutorial text:
>
> "DistributedCache is a facility provided by the Map/Reduce framework to
> cache files (text, archives, jars and so on) needed by applications.
>
> Applications specify the files to be cached via urls (hdfs://) in the
> JobConf. The DistributedCache* assumes that the files specified via
> hdfs:// urls are already present on the FileSystem.*
>
> *The framework will copy the necessary files to the slave node before any
> tasks for the job are executed on that node*. Its efficiency stems from
> the fact that the files are only copied once per job and the ability to
> cache archives which are un-archived on the slaves."
>
>
> After some close reading, the two bolded pieces seem to be in contradiction
> of each other? I'd always that addCacheFile() would perform the 2nd bolded
> statement. If that sentence is true, then I still don't have an explanation
> of why our job didn't correctly push out new versions of the cache files
> upon the startup and execution of JobConfiguration. We deleted them before
> our job started, not during.
>
> On Fri, Sep 23, 2011 at 9:35 AM, Robert Evans  wrote:
>
>> Meng Mao,
>>
>> The way the distributed cache is currently written, it does not verify the
>> integrity of the cache files at all after they are downloaded.  It just
>> assumes that if they were downloaded once they are still there and in the
>> proper shape.  It might be good to file a JIRA to add in some sort of check.
>>  Another thing to do is that the distributed cache also includes the time
>> stamp of the original file, just incase you delete the file and then use a
>> different version.  So if you want it to force a download again you can copy
>> it delete the original and then move it back to what it was before.
>>
>> --Bobby Evans
>>
>> On 9/23/11 1:57 AM, "Meng Mao"  wrote:
>>
>> We use the DistributedCache class to distribute a few lookup files for our
>> jobs. We have been aggressively deleting failed task attempts' leftover
>> data
>> , and our script accidentally deleted the path to our distributed cache
>> files.
>>
>> Our task attempt leftover data was here [per node]:
>> /hadoop/hadoop-metadata/cache/mapred/local/
>> and our distributed cache path was:
>> hadoop/hadoop-metadata/cache/mapred/local/taskTracker/archive/
>> We deleted this path by accident.
>>
>> Does this latter path look normal? I'm not that familiar with
>> DistributedCache but I'm up right now investigating the issue so I thought
>> I'd ask.
>>
>> After that deletion, the first 2 jobs to run (which are use the
>> addCacheFile
>> method to distribute their files) didn't seem to push the files out to the
>> cache path, except on one node. Is this expected behavior? Shouldn't
>> addCacheFile check to see if the files are missing, and if so, repopulate
>> them as needed?
>>
>> I'm trying to get a handle on whether it's safe to delete the distributed
>> cache path when the grid is quiet and no jobs are running. That is, if
>> addCacheFile is designed to be robust against the files it's caching not
>> being at each job start.
>>
>>
>