Re: Optimal Filesystem (and Settings) for HDFS

2009-05-19 Thread Bryan Duxbury
We use XFS for our data drives, and we've had somewhat mixed results.  
One of the biggest pros is that XFS has more free space than ext3,  
even with the reserved space settings turned all the way to 0.  
Another is that you can format a 1TB drive as XFS in about 0 seconds,  
versus minutes for ext3. This makes it really fast to kickstart our  
worker nodes.


We have seen some weird stuff happen though when machines run out of  
memory, apparently because the XFS driver does something odd with  
kernel memory. When this happens, we end up having to do some fscking  
before we can get that node back online.


As far as outright performance, I actually *did* do some tests of xfs  
vs ext3 performance on our cluster. If you just look at a single  
machine's local disk speed, you can write and read noticeably faster  
when using XFS instead of ext3. However, the reality is that this  
extra disk performance won't have much of an effect on your overall  
job completion performance, since you will find yourself network  
bottlenecked well in advance of even ext3's performance.


The long and short of it is that we use XFS to speed up our new  
machine deployment, and that's it.


-Bryan

On May 18, 2009, at 10:31 AM, Alex Loddengaard wrote:

I believe Yahoo! uses ext3, though I know other people have said  
that XFS
has performed better in various benchmarks.  We use ext3, though we  
haven't

done any benchmarks to prove its worth.

This question has come up a lot, so I think it'd be worth doing a  
benchmark
and writing up the results.  I haven't been able to find a detailed  
analysis

/ benchmark writeup comparing various filesystems, unfortunately.

Hope this helps,

Alex

On Mon, May 18, 2009 at 8:54 AM, Bob Schulze  
b.schu...@ecircle.com wrote:


We are currently rebuilding our cluster - has anybody  
recommendations on

the underlaying file system? Just standard Ext3?

I could imagine that the block size could be larger than its  
default...


Thx for any tips,

   Bob






Re: fyi: A Comparison of Approaches to Large-Scale Data Analysis: MapReduce vs. DBMS Benchmarks

2009-04-14 Thread Bryan Duxbury
I thought it a conspicuous omission to not discuss the cost of  
various approaches. Hadoop is free, though you have to spend  
developer time; how much does Vertica cost on 100 nodes?


-Bryan

On Apr 14, 2009, at 7:16 AM, Guilherme Germoglio wrote:


(Hadoop is used in the benchmarks)

http://database.cs.brown.edu/sigmod09/

There is currently considerable enthusiasm around the MapReduce
(MR) paradigm for large-scale data analysis [17]. Although the
basic control flow of this framework has existed in parallel SQL
database management systems (DBMS) for over 20 years, some
have called MR a dramatically new computing model [8, 17]. In
this paper, we describe and compare both paradigms. Furthermore,
we evaluate both kinds of systems in terms of performance and de-
velopment complexity. To this end, we define a benchmark con-
sisting of a collection of tasks that we have run on an open source
version of MR as well as on two parallel DBMSs. For each task,
we measure each system’s performance for various degrees of par-
allelism on a cluster of 100 nodes. Our results reveal some inter-
esting trade-offs. Although the process to load data into and tune
the execution of parallel DBMSs took much longer than the MR
system, the observed performance of these DBMSs was strikingly
better. We speculate about the causes of the dramatic performance
difference and consider implementation concepts that future sys-
tems should take from both kinds of architectures.


--
Guilherme

msn: guigermog...@hotmail.com
homepage: http://germoglio.googlepages.com




Issue distcp'ing from 0.19.2 to 0.18.3

2009-04-09 Thread Bryan Duxbury

Hey all,

I was trying to copy some data from our cluster on 0.19.2 to a new  
cluster on 0.18.3 by using disctp and the hftp:// filesystem.  
Everything seemed to be going fine for a few hours, but then a few  
tasks failed because a few files got 500 errors when trying to be  
read from the 19 cluster. As a result the job died. Now that I'm  
trying to restart it, I get this error:


[rapl...@ds-nn2 ~]$ hadoop distcp hftp://ds-nn1:7276/ hdfs://ds- 
nn2:7276/cluster-a

09/04/08 23:32:39 INFO tools.DistCp: srcPaths=[hftp://ds-nn1:7276/]
09/04/08 23:32:39 INFO tools.DistCp: destPath=hdfs://ds-nn2:7276/ 
cluster-a

With failures, global counters are inaccurate; consider running with -i
Copy failed: java.net.SocketException: Unexpected end of file from  
server
at sun.net.www.http.HttpClient.parseHTTPHeader 
(HttpClient.java:769)

at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632)
at sun.net.www.http.HttpClient.parseHTTPHeader 
(HttpClient.java:766)

at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream 
(HttpURLConnection.java:1000)
at org.apache.hadoop.dfs.HftpFileSystem$LsParser.fetchList 
(HftpFileSystem.java:183)
at org.apache.hadoop.dfs.HftpFileSystem 
$LsParser.getFileStatus(HftpFileSystem.java:193)
at org.apache.hadoop.dfs.HftpFileSystem.getFileStatus 
(HftpFileSystem.java:222)

at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:667)
at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:588)
at org.apache.hadoop.tools.DistCp.copy(DistCp.java:609)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:768)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:788)

I changed nothing at all between the first attempt and the subsequent  
failed attempts. The only clues in the namenode log for the 19  
cluster are:


2009-04-08 23:29:09,786 WARN org.apache.hadoop.ipc.Server: Incorrect  
header or version mismatch from 10.100.50.252:47733 got version 47  
expected version 2


Anyone have any ideas?

-Bryan


Re: Issue distcp'ing from 0.19.2 to 0.18.3

2009-04-09 Thread Bryan Duxbury
Ah, nevermind. It turns out that I just shouldn't rely on command  
history so much. I accidentally pointed the hftp:// at the actual  
namenode port, not the namenode HTTP port. It appears to be starting  
a regular copy again.


-Bryan

On Apr 8, 2009, at 11:57 PM, Todd Lipcon wrote:


Hey Bryan,

Any chance you can get a tshark trace on the 0.19 namenode? Maybe  
tshark -s

10 -w nndump.pcap port 7276

Also, are the clocks synced on the two machines? The failure of  
your distcp
is at 23:32:39, but the namenode log message you posted was  
23:29:09. Did

those messages actually pop out at the same time?

Thanks
-Todd

On Wed, Apr 8, 2009 at 11:39 PM, Bryan Duxbury br...@rapleaf.com  
wrote:



Hey all,

I was trying to copy some data from our cluster on 0.19.2 to a new  
cluster
on 0.18.3 by using disctp and the hftp:// filesystem. Everything  
seemed to
be going fine for a few hours, but then a few tasks failed because  
a few
files got 500 errors when trying to be read from the 19 cluster.  
As a result

the job died. Now that I'm trying to restart it, I get this error:

[rapl...@ds-nn2 ~]$ hadoop distcp hftp://ds-nn1:7276/
hdfs://ds-nn2:7276/cluster-a
09/04/08 23:32:39 INFO tools.DistCp: srcPaths=[hftp://ds-nn1:7276/]
09/04/08 23:32:39 INFO tools.DistCp: destPath=hdfs://ds-nn2:7276/ 
cluster-a
With failures, global counters are inaccurate; consider running  
with -i
Copy failed: java.net.SocketException: Unexpected end of file from  
server
   at sun.net.www.http.HttpClient.parseHTTPHeader 
(HttpClient.java:769)

   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632)
   at sun.net.www.http.HttpClient.parseHTTPHeader 
(HttpClient.java:766)

   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632)
   at
sun.net.www.protocol.http.HttpURLConnection.getInputStream 
(HttpURLConnection.java:1000)

   at
org.apache.hadoop.dfs.HftpFileSystem$LsParser.fetchList 
(HftpFileSystem.java:183)

   at
org.apache.hadoop.dfs.HftpFileSystem$LsParser.getFileStatus 
(HftpFileSystem.java:193)

   at
org.apache.hadoop.dfs.HftpFileSystem.getFileStatus 
(HftpFileSystem.java:222)

   at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:667)
   at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java: 
588)

   at org.apache.hadoop.tools.DistCp.copy(DistCp.java:609)
   at org.apache.hadoop.tools.DistCp.run(DistCp.java:768)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
   at org.apache.hadoop.tools.DistCp.main(DistCp.java:788)

I changed nothing at all between the first attempt and the subsequent
failed attempts. The only clues in the namenode log for the 19  
cluster are:


2009-04-08 23:29:09,786 WARN org.apache.hadoop.ipc.Server:  
Incorrect header

or version mismatch from 10.100.50.252:47733 got version 47 expected
version 2

Anyone have any ideas?

-Bryan





Re: HELP: I wanna store the output value into a list not write to the disk

2009-04-02 Thread Bryan Duxbury
I don't really see what the downside of reading it from disk is. A  
list of word counts should be pretty small on disk so it shouldn't  
take long to read it into a HashMap. Doing anything else is going to  
cause you to go a long way out of your way to end up with the same  
result.


-Bryan

On Apr 2, 2009, at 2:41 AM, andy2005cst wrote:



I need to use the output of the reduce, but I don't know how to do.
use the wordcount program as an example if i want to collect the  
wordcount

into a hashtable for further use, how can i do?
the example just show how to let the result onto disk.
myemail is : andy2005...@gmail.com
looking forward your help. thanks a lot.
--
View this message in context: http://www.nabble.com/HELP%3A-I-wanna- 
store-the-output-value-into-a-list-not-write-to-the-disk- 
tp22844277p22844277.html

Sent from the Hadoop core-user mailing list archive at Nabble.com.





Re: Massive discrepancies in job's bytes written/read

2009-03-18 Thread Bryan Duxbury
Is there some area of the codebase that deals with aggregating  
counters that I should be looking at?

-Bryan

On Mar 17, 2009, at 10:20 PM, Owen O'Malley wrote:



On Mar 17, 2009, at 7:44 PM, Bryan Duxbury wrote:


There is no compression in the mix for us, so that's not the culprit.

I'd be sort of willing to believe that spilling and sorting play a  
role in this, but, wow, over 10x read and write? That seems like a  
big problem.


It happened recently to me too. It was off by 6x. The strange thing  
was that the individual tasks looked right. It was just the  
aggregate that was wrong.


-- Owen




Re: Does HDFS provide a way to append A file to B ?

2009-03-17 Thread Bryan Duxbury
I believe the last word on appends right now is that the patch that  
was committed broke a lot of other things, so it's been disabled. As  
such, there is no working append in HDFS, and certainly not in  
hadoop-17.x.


-Bryan

On Mar 17, 2009, at 4:50 PM, Steve Gao wrote:

Thanks, but I was told there is an append command, isn't there? But  
I don't know how to apply this patch https://issues.apache.org/jira/ 
browse/HADOOP-1700


--- On Tue, 3/17/09, Bo Shi b...@visiblemeasures.com wrote:
From: Bo Shi b...@visiblemeasures.com
Subject: Re: Does HDFS provide a way to append A file to B ?
To: core-user@hadoop.apache.org
Date: Tuesday, March 17, 2009, 7:42 PM

what about an identity mapper taking A and B as inputs?  this will
likely mix rows of A and B together though...

On Tue, Mar 17, 2009 at 7:35 PM, Steve Gao steve@yahoo.com  
wrote:

BTW, I am using hadoop 0.17.0 and jdk 1.6

--- On Tue, 3/17/09, Steve Gao steve@yahoo.com wrote:
From: Steve Gao steve@yahoo.com
Subject: Does HDFS provide a way to append A file to B ?
To: core-user@hadoop.apache.org
Date: Tuesday, March 17, 2009, 7:22 PM

I need to append file A to file B in HDFS without downloading/ 
uploading

them to

local disk. Is there a way?















Massive discrepancies in job's bytes written/read

2009-03-17 Thread Bryan Duxbury

Hey all,

In looking at the stats for a number of our jobs, the amount of data  
that the UI claims we've read from or written to HDFS is vastly  
larger than the amount of data that should be involved in the job.  
For instance, we have a job that combines small files into big files  
that we're operating on around 2TB worth of data. The outputs in HDFS  
(via hadoop dfs -du) matches the expected size, but the jobtracker UI  
claims that we've read and written around 22TB of data!


By all accounts, Hadoop is actually *doing* the right thing - we're  
not observing excess data reading or writing anywhere. However, this  
massive discrepancy makes the job stats essentially worthless for  
understanding IO in our jobs.


Does anyone know why there's such an enormous difference? Have others  
experienced this problem?


-Bryan


Re: Does HDFS provide a way to append A file to B ?

2009-03-17 Thread Bryan Duxbury

No. There isn't *any* version of Hadoop with a (stable) append command.

On Mar 17, 2009, at 5:08 PM, Steve Gao wrote:


Thanks, Bryan. Does 0.18.3 has built-in append command?

--- On Tue, 3/17/09, Bryan Duxbury br...@rapleaf.com wrote:
From: Bryan Duxbury br...@rapleaf.com
Subject: Re: Does HDFS provide a way to append A file to B ?
To: core-user@hadoop.apache.org
Date: Tuesday, March 17, 2009, 8:04 PM

I believe the last word on appends right now is that the patch that  
was
committed broke a lot of other things, so it's been disabled. As  
such, there

is no working append in HDFS, and certainly not in hadoop-17.x.

-Bryan

On Mar 17, 2009, at 4:50 PM, Steve Gao wrote:


Thanks, but I was told there is an append command, isn't there? But I

don't know how to apply this patch
https://issues.apache.org/jira/browse/HADOOP-1700


--- On Tue, 3/17/09, Bo Shi b...@visiblemeasures.com wrote:
From: Bo Shi b...@visiblemeasures.com
Subject: Re: Does HDFS provide a way to append A file to B ?
To: core-user@hadoop.apache.org
Date: Tuesday, March 17, 2009, 7:42 PM

what about an identity mapper taking A and B as inputs?  this will
likely mix rows of A and B together though...

On Tue, Mar 17, 2009 at 7:35 PM, Steve Gao steve@yahoo.com

wrote:

BTW, I am using hadoop 0.17.0 and jdk 1.6

--- On Tue, 3/17/09, Steve Gao steve@yahoo.com wrote:
From: Steve Gao steve@yahoo.com
Subject: Does HDFS provide a way to append A file to B ?
To: core-user@hadoop.apache.org
Date: Tuesday, March 17, 2009, 7:22 PM

I need to append file A to file B in HDFS without

downloading/uploading

them to

local disk. Is there a way?




















Re: Massive discrepancies in job's bytes written/read

2009-03-17 Thread Bryan Duxbury

There is no compression in the mix for us, so that's not the culprit.

I'd be sort of willing to believe that spilling and sorting play a  
role in this, but, wow, over 10x read and write? That seems like a  
big problem.


-Bryan

On Mar 17, 2009, at 6:46 PM, Stefan Will wrote:

Some of the discrepancy could be due to compression of map input/ 
output
format. E.g. The mapper output bytes will show the compressed size,  
while
the reduce input will show the uncompressed size. Or something  
along those
lines. But I'm also questioning the accuracy of the reporting and  
suspect

that some if it is due to all the disk activity that happens while
processing spills in the mapper and the copy/shuffle/sort phase in the
reducer. It would certainly be nice if all the byte counts were  
reported in

a way that they're comparable.

-- Stefan



From: Bryan Duxbury br...@rapleaf.com
Reply-To: core-user@hadoop.apache.org
Date: Tue, 17 Mar 2009 17:26:52 -0700
To: core-user@hadoop.apache.org
Subject: Massive discrepancies in job's bytes written/read

Hey all,

In looking at the stats for a number of our jobs, the amount of data
that the UI claims we've read from or written to HDFS is vastly
larger than the amount of data that should be involved in the job.
For instance, we have a job that combines small files into big files
that we're operating on around 2TB worth of data. The outputs in HDFS
(via hadoop dfs -du) matches the expected size, but the jobtracker UI
claims that we've read and written around 22TB of data!

By all accounts, Hadoop is actually *doing* the right thing - we're
not observing excess data reading or writing anywhere. However, this
massive discrepancy makes the job stats essentially worthless for
understanding IO in our jobs.

Does anyone know why there's such an enormous difference? Have others
experienced this problem?

-Bryan







Re: Profiling Hadoop

2009-02-27 Thread Bryan Duxbury
I've used YourKit Java Profiler pretty successfully. There's a  
JobConf parameter you can flip on that will cause a few maps and  
reduces to start with profiling on, so you won't be overwhelmed with  
info.


-Bryan

On Feb 27, 2009, at 11:12 AM, Sandy wrote:


Hello,

Could anyone recommend any software for profiling the performance of
MapReduce applications one may write for Hadoop? I am currently  
developing

in Java.

Thanks,
-SM




Big HDFS deletes lead to dead datanodes

2009-02-24 Thread Bryan Duxbury
On occasion, I've deleted a few TB of stuff in DFS at once. I've  
noticed that when I do this, datanodes start taking a really long  
time to check in and ultimately get marked dead. Some time later,  
they'll get done deleting stuff and come back and get unmarked.


I'm wondering, why do deletions get more priority than checking in?  
Ideally we would never see dead datanodes from doing deletes.


-Bryan


Super-long reduce task timeouts in hadoop-0.19.0

2009-02-20 Thread Bryan Duxbury

(Repost from the dev list)

I noticed some really odd behavior today while reviewing the job  
history of some of our jobs. Our Ganglia graphs showed really long  
periods of inactivity across the entire cluster, which should  
definitely not be the case - we have a really long string of jobs in  
our workflow that should execute one after another. I figured out  
which jobs were running during those periods of inactivity, and  
discovered that almost all of them had 4-5 failed reduce tasks, with  
the reason for failure being something like:


Task attempt_200902061117_3382_r_38_0 failed to report status for  
1282 seconds. Killing!


The actual timeout reported varies from 700-5000 seconds. Virtually  
all of our longer-running jobs were affected by this problem. The  
period of inactivity on the cluster seems to correspond to the amount  
of time the job waited for these reduce tasks to fail.


I checked out the tasktracker log for the machines with timed-out  
reduce tasks looking for something that might explain the problem,  
but the only thing I came up with that actually referenced the failed  
task was this log message, which was repeated many times:


2009-02-19 22:48:19,380 INFO org.apache.hadoop.mapred.TaskTracker:  
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find  
taskTracker/jobcache/job_200902061117_3388/ 
attempt_200902061117_3388_r_66_0/output/file.out in any of the  
configured local directories


I'm not sure what this means; can anyone shed some light on this  
message?


Further confusing the issue, on the affected machines, I looked in  
logs/userlogs/task id, and to my surprise, the directory and log  
files existed, and the syslog file seemed to contain logs of a  
perfectly good reduce task!


Overall, this seems like a pretty critical bug. It's consuming up to  
50% of the runtime of our jobs in some instances, killing our  
throughput. At the very least, it seems like the reduce task timeout  
period should be MUCH shorter than the current 10-20 minutes.


-Bryan


Re: Super-long reduce task timeouts in hadoop-0.19.0

2009-02-20 Thread Bryan Duxbury
We didn't customize this value, to my knowledge, so I'd suspect it's  
the default.

-Bryan

On Feb 20, 2009, at 5:00 PM, Ted Dunning wrote:


How often do your reduce tasks report status?

On Fri, Feb 20, 2009 at 3:58 PM, Bryan Duxbury br...@rapleaf.com  
wrote:



(Repost from the dev list)


I noticed some really odd behavior today while reviewing the job  
history of

some of our jobs. Our Ganglia graphs showed really long periods of
inactivity across the entire cluster, which should definitely not  
be the
case - we have a really long string of jobs in our workflow that  
should
execute one after another. I figured out which jobs were running  
during
those periods of inactivity, and discovered that almost all of  
them had 4-5
failed reduce tasks, with the reason for failure being something  
like:


Task attempt_200902061117_3382_r_38_0 failed to report status  
for 1282

seconds. Killing!

The actual timeout reported varies from 700-5000 seconds.  
Virtually all of

our longer-running jobs were affected by this problem. The period of
inactivity on the cluster seems to correspond to the amount of  
time the job

waited for these reduce tasks to fail.

I checked out the tasktracker log for the machines with timed-out  
reduce
tasks looking for something that might explain the problem, but  
the only
thing I came up with that actually referenced the failed task was  
this log

message, which was repeated many times:

2009-02-19 22:48:19,380 INFO org.apache.hadoop.mapred.TaskTracker:
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
taskTracker/jobcache/job_200902061117_3388/ 
attempt_200902061117_3388_r_66_0/output/file.out

in any of the configured local directories

I'm not sure what this means; can anyone shed some light on this  
message?


Further confusing the issue, on the affected machines, I looked in
logs/userlogs/task id, and to my surprise, the directory and log  
files
existed, and the syslog file seemed to contain logs of a perfectly  
good

reduce task!

Overall, this seems like a pretty critical bug. It's consuming up  
to 50% of
the runtime of our jobs in some instances, killing our throughput.  
At the
very least, it seems like the reduce task timeout period should be  
MUCH

shorter than the current 10-20 minutes.

-Bryan





--
Ted Dunning, CTO
DeepDyve

111 West Evelyn Ave. Ste. 202
Sunnyvale, CA 94086
www.deepdyve.com
408-773-0110 ext. 738
858-414-0013 (m)
408-773-0220 (fax)




Measuring IO time in map/reduce jobs?

2009-02-12 Thread Bryan Duxbury

Hey all,

Does anyone have any experience trying to measure IO time spent in  
their map/reduce jobs? I know how to profile a sample of map and  
reduce tasks, but that appears to exclude IO time. Just subtracting  
the total cpu time from the total run time of a task seems like too  
coarse an approach.


-Bryan


Re: java.io.IOException: Could not get block locations. Aborting...

2009-02-09 Thread Bryan Duxbury
Small files are bad for hadoop. You should avoid keeping a lot of  
small files if possible.


That said, that error is something I've seen a lot. It usually  
happens when the number of xcievers hasn't been adjusted upwards from  
the default of 256. We run with 8000 xcievers, and that seems to  
solve our problems. I think that if you have a lot of open files,  
this problem happens a lot faster.


-Bryan

On Feb 9, 2009, at 1:01 PM, Scott Whitecross wrote:


Hi all -

I've been running into this error the past few days:
java.io.IOException: Could not get block locations. Aborting...
	at org.apache.hadoop.dfs.DFSClient 
$DFSOutputStream.processDatanodeError(DFSClient.java:2143)
	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400 
(DFSClient.java:1735)
	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run 
(DFSClient.java:1889)


It seems to be related to trying to write to many files to HDFS.  I  
have a class extending  
org.apache.hadoop.mapred.lib.MultipleOutputFormat and if I output  
to a few file names, everything works.  However, if I output to  
thousands of small files, the above error occurs.  I'm having  
trouble isolating the problem, as the problem doesn't occur in the  
debugger unfortunately.


Is this a memory issue, or is there an upper limit to the number of  
files HDFS can hold?  Any settings to adjust?


Thanks.




Re: java.io.IOException: Could not get block locations. Aborting...

2009-02-09 Thread Bryan Duxbury

Correct.

+1 to Jason's more unix file handles suggestion. That's a must-have.

-Bryan

On Feb 9, 2009, at 3:09 PM, Scott Whitecross wrote:

This would be an addition to the hadoop-site.xml file, to up  
dfs.datanode.max.xcievers?


Thanks.



On Feb 9, 2009, at 5:54 PM, Bryan Duxbury wrote:

Small files are bad for hadoop. You should avoid keeping a lot of  
small files if possible.


That said, that error is something I've seen a lot. It usually  
happens when the number of xcievers hasn't been adjusted upwards  
from the default of 256. We run with 8000 xcievers, and that seems  
to solve our problems. I think that if you have a lot of open  
files, this problem happens a lot faster.


-Bryan

On Feb 9, 2009, at 1:01 PM, Scott Whitecross wrote:


Hi all -

I've been running into this error the past few days:
java.io.IOException: Could not get block locations. Aborting...
	at org.apache.hadoop.dfs.DFSClient 
$DFSOutputStream.processDatanodeError(DFSClient.java:2143)
	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400 
(DFSClient.java:1735)
	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream 
$DataStreamer.run(DFSClient.java:1889)


It seems to be related to trying to write to many files to HDFS.   
I have a class extending  
org.apache.hadoop.mapred.lib.MultipleOutputFormat and if I output  
to a few file names, everything works.  However, if I output to  
thousands of small files, the above error occurs.  I'm having  
trouble isolating the problem, as the problem doesn't occur in  
the debugger unfortunately.


Is this a memory issue, or is there an upper limit to the number  
of files HDFS can hold?  Any settings to adjust?


Thanks.









Re: Control over max map/reduce tasks per job

2009-02-03 Thread Bryan Duxbury

This sounds good enough for a JIRA ticket to me.
-Bryan

On Feb 3, 2009, at 11:44 AM, Jonathan Gray wrote:


Chris,

For my specific use cases, it would be best to be able to set N
mappers/reducers per job per node (so I can explicitly say, run at  
most 2 at
a time of this CPU bound task on any given node).  However, the  
other way
would work as well (on 10 node system, would set job to max 20  
tasks at a
time globally), but opens up the possibility that a node could be  
assigned

more than 2 of that task.

I would work with whatever is easiest to implement as either would  
be a vast
improvement for me (can run high numbers of network latency bound  
tasks

without fear of cpu bound tasks killing the cluster).

JG




-Original Message-
From: Chris K Wensel [mailto:ch...@wensel.net]
Sent: Tuesday, February 03, 2009 11:34 AM
To: core-user@hadoop.apache.org
Subject: Re: Control over max map/reduce tasks per job

Hey Jonathan

Are you looking to limit the total number of concurrent mapper/
reducers a single job can consume cluster wide, or limit the number
per node?

That is, you have X mappers/reducers, but only can allow N mappers/
reducers to run at a time globally, for a given job.

Or, you are cool with all X running concurrently globally, but  
want to

guarantee that no node can run more than N tasks from that job?

Or both?

just reconciling the conversation we had last week with this thread.

ckw

On Feb 3, 2009, at 11:16 AM, Jonathan Gray wrote:


All,



I have a few relatively small clusters (5-20 nodes) and am having
trouble
keeping them loaded with my MR jobs.



The primary issue is that I have different jobs that have  
drastically

different patterns.  I have jobs that read/write to/from HBase or
Hadoop
with minimal logic (network throughput bound or io bound), others

that

perform crawling (network latency bound), and one huge parsing
streaming job
(very CPU bound, each task eats a core).



I'd like to launch very large numbers of tasks for network latency
bound
jobs, however the large CPU bound job means I have to keep the max
maps
allowed per node low enough as to not starve the Datanode and
Regionserver.



I'm an HBase dev but not familiar enough with Hadoop MR code to even
know
what would be involved with implementing this.  However, in talking
with
other users, it seems like this would be a well-received option.



I wanted to ping the list before filing an issue because it seems

like

someone may have thought about this in the past.



Thanks.



Jonathan Gray



--
Chris K Wensel
ch...@wensel.net
http://www.cascading.org/
http://www.scaleunlimited.com/






Re: Question about HDFS capacity and remaining

2009-01-30 Thread Bryan Duxbury
Hm, very interesting. Didn't know about that. What's the purpose of  
the reservation? Just to give root preference or leave wiggle room?  
If it's not strictly necessary it seems like it would make sense to  
reduce it to essentially 0%.


-Bryan

On Jan 29, 2009, at 6:18 PM, Doug Cutting wrote:

Ext2 by default reserves 5% of the drive for use by root only.   
That'd be 45MB of your 907GB capacity which would account for most  
of the discrepancy.  You can adjust this with tune2fs.


Doug

Bryan Duxbury wrote:

There are no non-dfs files on the partitions in question.
df -h indicates that there is 907GB capacity, but only 853GB  
remaining, with 200M used. The only thing I can think of is the  
filesystem overhead.

-Bryan
On Jan 29, 2009, at 4:06 PM, Hairong Kuang wrote:

It's taken by non-dfs files.

Hairong


On 1/29/09 3:23 PM, Bryan Duxbury br...@rapleaf.com wrote:


Hey all,

I'm currently installing a new cluster, and noticed something a
little confusing. My DFS is *completely* empty - 0 files in DFS.
However, in the namenode web interface, the reported capacity is
3.49 TB, but the remaining is 3.25TB. Where'd that .24TB go?  
There
are literally zero other files on the partitions hosting the DFS  
data

directories. Where am I losing 240GB?

-Bryan






Re: Question about HDFS capacity and remaining

2009-01-30 Thread Bryan Duxbury

Did you publish those results anywhere?

On Jan 30, 2009, at 9:56 AM, Brian Bockelman wrote:

For what it's worth, our organization did extensive tests on many  
filesystems benchmarking their performance when they are 90 - 95%  
full.


Only XFS retained most of its performance when it was mostly  
full (ext4 was not tested)... so, if you are thinking of pushing  
things to the limits, that might be something worth considering.


Brian

On Jan 30, 2009, at 11:18 AM, stephen mulcahy wrote:



Bryan Duxbury wrote:
Hm, very interesting. Didn't know about that. What's the purpose  
of the reservation? Just to give root preference or leave wiggle  
room? If it's not strictly necessary it seems like it would make  
sense to reduce it to essentially 0%.


AFAIK It is needed for defragmentation / fsck to work properly and  
your filesystem performance will degrade a lot if you reduce this  
to 0% (but I'd love to hear otherwise :)


-stephen






Question about HDFS capacity and remaining

2009-01-29 Thread Bryan Duxbury

Hey all,

I'm currently installing a new cluster, and noticed something a  
little confusing. My DFS is *completely* empty - 0 files in DFS.  
However, in the namenode web interface, the reported capacity is  
3.49 TB, but the remaining is 3.25TB. Where'd that .24TB go? There  
are literally zero other files on the partitions hosting the DFS data  
directories. Where am I losing 240GB?


-Bryan


Re: Question about HDFS capacity and remaining

2009-01-29 Thread Bryan Duxbury

There are no non-dfs files on the partitions in question.

df -h indicates that there is 907GB capacity, but only 853GB  
remaining, with 200M used. The only thing I can think of is the  
filesystem overhead.


-Bryan

On Jan 29, 2009, at 4:06 PM, Hairong Kuang wrote:


It's taken by non-dfs files.

Hairong


On 1/29/09 3:23 PM, Bryan Duxbury br...@rapleaf.com wrote:


Hey all,

I'm currently installing a new cluster, and noticed something a
little confusing. My DFS is *completely* empty - 0 files in DFS.
However, in the namenode web interface, the reported capacity is
3.49 TB, but the remaining is 3.25TB. Where'd that .24TB go? There
are literally zero other files on the partitions hosting the DFS data
directories. Where am I losing 240GB?

-Bryan






Re: Q about storage architecture

2008-12-07 Thread Bryan Duxbury
If you are considering using it as a conventional filesystem from a  
few clients, then it most resembles NAS. However, I don't think it  
makes sense to try and classify it as SAN or NAS. HDFS is a  
distributed filesystem designed to be consumed in a massively  
distributed fashion, so it does fall into its own category.


-Bryan

On Dec 6, 2008, at 9:06 PM, Sirisha Akkala wrote:


Hi
I would like to know if Hadoop architecture more resembles SAN or  
NAS? -I'm guessing it is NAS.
Or does it fall under a totally different category? If so, can you  
please email brief information?


thanks,sirisha.




Re: Filesystem closed errors

2008-11-26 Thread Bryan Duxbury

My app isn't a map/reduce job.

On Nov 25, 2008, at 9:07 PM, David B. Ritch wrote:


Do you have speculative execution enabled?  I've seen error messages
like this caused by speculative execution.

David

Bryan Duxbury wrote:

I have an app that runs for a long time with no problems, but when I
signal it to shut down, I get errors like this:

java.io.IOException: Filesystem closed
at org.apache.hadoop.dfs.DFSClient.checkOpen(DFSClient.java:196)
at org.apache.hadoop.dfs.DFSClient.rename(DFSClient.java:502)
at
org.apache.hadoop.dfs.DistributedFileSystem.rename 
(DistributedFileSystem.java:176)



The problems occur when I am trying to close open HDFS files. Any
ideas why I might be seeing this? I though it was because I was
abruptly shutting down without giving the streams a chance to get
closed, but after some refactoring, that's not the case.

-Bryan







Re: Filesystem closed errors

2008-11-26 Thread Bryan Duxbury
I'm fairly certain that I'm not closing the Filesystem anywhere. That  
said, the issue you pointed at could somehow be connected.


On Nov 25, 2008, at 9:11 PM, Hong Tang wrote:

Does your code ever call fs.close()? If so, https:// 
issues.apache.org/jira/browse/HADOOP-4655 might be relevant to your  
problem.


On Nov 25, 2008, at 9:07 PM, David B. Ritch wrote:


Do you have speculative execution enabled?  I've seen error messages
like this caused by speculative execution.

David

Bryan Duxbury wrote:

I have an app that runs for a long time with no problems, but when I
signal it to shut down, I get errors like this:

java.io.IOException: Filesystem closed
at org.apache.hadoop.dfs.DFSClient.checkOpen(DFSClient.java:196)
at org.apache.hadoop.dfs.DFSClient.rename(DFSClient.java:502)
at
org.apache.hadoop.dfs.DistributedFileSystem.rename 
(DistributedFileSystem.java:176)



The problems occur when I am trying to close open HDFS files. Any
ideas why I might be seeing this? I though it was because I was
abruptly shutting down without giving the streams a chance to get
closed, but after some refactoring, that's not the case.

-Bryan









Filesystem closed errors

2008-11-25 Thread Bryan Duxbury
I have an app that runs for a long time with no problems, but when I  
signal it to shut down, I get errors like this:


java.io.IOException: Filesystem closed
at org.apache.hadoop.dfs.DFSClient.checkOpen(DFSClient.java:196)
at org.apache.hadoop.dfs.DFSClient.rename(DFSClient.java:502)
	at org.apache.hadoop.dfs.DistributedFileSystem.rename 
(DistributedFileSystem.java:176)


The problems occur when I am trying to close open HDFS files. Any  
ideas why I might be seeing this? I though it was because I was  
abruptly shutting down without giving the streams a chance to get  
closed, but after some refactoring, that's not the case.


-Bryan


Re: Hadoop Design Question

2008-11-06 Thread Bryan Duxbury

Comments inline.

On Nov 6, 2008, at 9:29 AM, Ricky Ho wrote:


Hi,

While exploring how Hadoop fits in our usage scenarios, there are 4  
recurring issues keep popping up.  I don't know if they are real  
issues or just our misunderstanding of Hadoop.  Can any expert shed  
some light here ?


Disk I/O overhead
==
- The output of a Map task is written to a local disk and then  
later on upload to the Reduce task.  While this enable a simple  
recovery strategy when the map task failed, it incur additional  
disk I/O overhead.


- For example, in our popular Hadoop example of calculating the  
approximation of Pi, there isn't any input data.  The map tasks  
in this example, should just directly feed its output to the reduce  
task.  So I am wondering if there is an option to bypassing the  
step of writing the map result to the local disk.


In most data-intensive map/reduce jobs, you have to spill your map  
output to disk at some point because you will run out of memory  
otherwise. Additionally, Pi calculation is a really bad example,  
because you could always start reducing any pairs together  
arbitrarily. This is because pi calculation is commutative and  
associative. We have a special construct for situations like that  
called a combiner, which is basically a map-side reducer.





Pipelining between Map  Reduce phases is not possible
===
- In the current setting, it sounds like no reduce task will be  
started before all map tasks have completed.  In case if there are  
a few slow running map tasks, the whole job will be delayed.


- The overall job execution can be shortened if the reduce tasks  
can starts its processing as soon as some map results are available  
rather than waiting for all the map tasks to complete.


You can't start reducing until all map tasks are complete because  
until all map tasks complete, you can't do an accurate sort of all  
intermediate key/value pairs. That is, if you just started reducing  
the results of a single map task immediately, you might have other  
values for some keys that come from different map tasks, and your  
reduce would be inaccurate. In theory if you know that each map task  
produces keys only in a certain range, you could start reducing  
immediately after the map task finishes, but that seems like an  
unlikely case.





Pipelining between jobs

- In many cases, we've found the parallel computation doesn't  
involve just one single map/reduce job, but multiple inter- 
dependent map/reduce jobs then work together in some coordinating  
fashion.


- Again, I haven't seen any mechanism available for 2 MapReduce  
jobs to directly interact with each other.  Job1 must write its  
output to HDFS for Job2 to pickup. On the other hand, once the  
map phase of a Job2 has started, all its input HDFS files has to  
be freezed (in other words, Job1 cannot append more records into  
the HDFS files)


- Therefore it is impossible for the reduce phase of Job1 to stream  
its output data to a file while the map phase of Job2 start reading  
the same file.  Job2 can only start after ALL REDUCE TASKS of Job1  
is completed, which makes pipelining between jobs impossible.


Certainly, many transformations take more than one map/reduce job.  
However, very few could actually be pipelined such that the output of  
one fed directly into another without an intermediate stop in a file.  
If the first job does any grouping or sorting, then the reduce is  
necessary and it will have to write out to a file before anything  
else can go on. If the second job also does grouping or sorting, then  
you definitely need two jobs. If the second job doesn't do grouping  
or sorting, then it can probably be collapsed into either the map or  
reduce of the first job.





No parallelism of reduce task with one key
===
- Parallelism only happens in the map phase, as well as reduce  
phase (on different keys).  But there is no parallelism within a  
reduce process of a particular key


- This means the partitioning function has to be chosen carefully  
to make sure the workload of the reduce processes is balanced.   
(maybe not a big deal)


- Is there any thoughts of running a pool of reduce tasks on the  
same key and have they combine their results later ?


I think you will find very few situations where you have only one key  
on reduce. If you do, it's probably a scenario where you can use a  
combiner and eliminate the problem. Basically all map/reduce jobs  
I've worked on have a large number of keys going into the reduce phase.





Rgds, Ricky




Re: Can anyone recommend me a inter-language data file format?

2008-11-01 Thread Bryan Duxbury
Agree, we use Thrift at Rapleaf for this purpose. It's trivial to  
make a ThriftWritable if you want to be crafty, but you can also just  
use byte[]s and do the serialization and deserialization yourself.


-Bryan

On Nov 1, 2008, at 8:01 PM, Alex Loddengaard wrote:


Take a look at Thrift:
http://developers.facebook.com/thrift/

Alex

On Sat, Nov 1, 2008 at 7:15 PM, Zhou, Yunqing [EMAIL PROTECTED]  
wrote:


The project I focused on has many modules written in different  
languages

(several modules are hadoop jobs).
So I'd like to utilize a common record based data file format for  
data

exchange.
XML is not efficient for appending new records.
SequenceFile seems not having API of other languages except Java.
Protocol Buffers' hadoop API seems under development.
any recommendation for this?

Thanks





Re: Lazily deserializing Writables

2008-10-02 Thread Bryan Duxbury
We do this with some of our Thrift-serialized types. We account for  
this behavior explicitly in the ThrittWritable class and make it so  
that we can read the serialized version off the wire completely by  
prepending the size. Then, we can read in the raw bytes and hang on  
to them for later as we see fit. I would think that leaving the bytes  
on the DataInput would break things in a very impressive way.


-Bryan

On Oct 2, 2008, at 2:48 PM, Jimmy Lin wrote:


Hi everyone,

I'm wondering if it's possible to lazily deserialize a Writable.   
That is,

when my custom Writable is handed a DataInput from readFields, can I
simply hang on to the reference and read from it later?  This would be
useful if the Writable is a complex data structure that may be  
expensive
to deserialize, so I'd only want to do it on-demand.  Or does the  
runtime

mutate the underlying stream, leaving the Writable with a reference to
something completely different later?

I'm wondering about both present behavior, and the implicit contract
provided by the Hadoop API.

Thanks!

-Jimmy






rename return values

2008-09-30 Thread Bryan Duxbury

Hey all,

Why is it that FileSystem.rename returns true or false instead of  
throwing an exception? It seems incredibly inconvenient to get a  
false result and then have to go poring over the namenode logs  
looking for the actual error message. I had this case recently where  
I'd forgotten to create the parent directories, but I had no idea it  
was failing since there were no exceptions.


-Bryan


Re: rename return values

2008-09-30 Thread Bryan Duxbury
It's very interesting that the Java File API doesn't return  
exceptions, but that doesn't mean it's a good interface. The fact  
that there IS further exceptional information somewhere in the system  
but that it is currently ignored is sort of troubling. Perhaps, at  
least, we could add an overload that WILL throw an exception if there  
is one to report.


-Bryan

On Sep 30, 2008, at 1:53 PM, Chris Douglas wrote:

FileSystem::rename doesn't always have the cause, per  
java.io.File::renameTo:


http://java.sun.com/javase/6/docs/api/java/io/File.html#renameTo 
(java.io.File)


Even if it did, it's not clear to FileSystem that the failure to  
rename is fatal/exceptional to the application. -C


On Sep 30, 2008, at 1:37 PM, Bryan Duxbury wrote:


Hey all,

Why is it that FileSystem.rename returns true or false instead of  
throwing an exception? It seems incredibly inconvenient to get a  
false result and then have to go poring over the namenode logs  
looking for the actual error message. I had this case recently  
where I'd forgotten to create the parent directories, but I had no  
idea it was failing since there were no exceptions.


-Bryan






Re: Could not get block locations. Aborting... exception

2008-09-29 Thread Bryan Duxbury
Ok, so, what might I do next to try and diagnose this? Does it sound  
like it might be an HDFS/mapreduce bug, or should I pore over my own  
code first?


Also, did any of the other exceptions look interesting?

-Bryan

On Sep 29, 2008, at 10:40 AM, Raghu Angadi wrote:


Raghu Angadi wrote:

Doug Cutting wrote:

Raghu Angadi wrote:
For the current implementation, you need around 3x fds. 1024 is  
too low for Hadoop. The Hadoop requirement will come down, but  
1024 would be too low anyway.


1024 is the default on many systems.  Shouldn't we try to make  
the default configuration work well there?

How can 1024 work well for different kinds of loads?


oops! 1024 should work for anyone working with just one file for  
any load. I didn't notice that. My comment can be ignored.


Raghu.




Could not get block locations. Aborting... exception

2008-09-26 Thread Bryan Duxbury

Hey all.

We've been running into a very annoying problem pretty frequently  
lately. We'll be running some job, for instance a distcp, and it'll  
be moving along quite nicely, until all of the sudden, it sort of  
freezes up. It takes a while, and then we'll get an error like this one:


attempt_200809261607_0003_m_02_0: Exception closing file /tmp/ 
dustin/input/input_dataunits/_distcp_tmp_1dk90o/part-01897.bucketfile
attempt_200809261607_0003_m_02_0: java.io.IOException: Could not  
get block locations. Aborting...
attempt_200809261607_0003_m_02_0:   at  
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError 
(DFSClient.java:2143)
attempt_200809261607_0003_m_02_0:   at  
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400 
(DFSClient.java:1735)
attempt_200809261607_0003_m_02_0:   at  
org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run 
(DFSClient.java:1889)


At approximately the same time, we start seeing lots of these errors  
in the namenode log:


2008-09-26 16:19:26,502 WARN org.apache.hadoop.dfs.StateChange: DIR*  
NameSystem.startFile: failed to create file /tmp/dustin/input/ 
input_dataunits/_distcp_tmp_1dk90o/part-01897.bucketfile for  
DFSClient_attempt_200809261607_0003_m_02_1 on client 10.100.11.83  
because current leaseholder is trying to recreate file.
2008-09-26 16:19:26,502 INFO org.apache.hadoop.ipc.Server: IPC Server  
handler 8 on 7276, call create(/tmp/dustin/input/input_dataunits/ 
_distcp_tmp_1dk90o/part-01897.bucketfile, rwxr-xr-x,  
DFSClient_attempt_200809261607_0003_m_02_1, true, 3, 67108864)  
from 10.100.11.83:60056: error:  
org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create  
file /tmp/dustin/input/input_dataunits/_distcp_tmp_1dk90o/ 
part-01897.bucketfile for  
DFSClient_attempt_200809261607_0003_m_02_1 on client 10.100.11.83  
because current leaseholder is trying to recreate file.
org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create  
file /tmp/dustin/input/input_dataunits/_distcp_tmp_1dk90o/ 
part-01897.bucketfile for  
DFSClient_attempt_200809261607_0003_m_02_1 on client 10.100.11.83  
because current leaseholder is trying to recreate file.
at org.apache.hadoop.dfs.FSNamesystem.startFileInternal 
(FSNamesystem.java:952)
at org.apache.hadoop.dfs.FSNamesystem.startFile 
(FSNamesystem.java:903)

at org.apache.hadoop.dfs.NameNode.create(NameNode.java:284)
at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke 
(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)



Eventually, the job fails because of these errors. Subsequent job  
runs also experience this problem and fail. The only way we've been  
able to recover is to restart the DFS. It doesn't happen every time,  
but it does happen often enough that I'm worried.


Does anyone have any ideas as to why this might be happening? I  
thought that https://issues.apache.org/jira/browse/HADOOP-2669 might  
be the culprit, but today we upgraded to hadoop 0.18.1 and the  
problem still happens.


Thanks,

Bryan


Re: Could not get block locations. Aborting... exception

2008-09-26 Thread Bryan Duxbury
Well, I did find some more errors in the datanode log. Here's a  
sampling:


2008-09-26 10:43:57,287 ERROR org.apache.hadoop.dfs.DataNode:  
DatanodeRegistration(10.100.11.115:50010,  
storageID=DS-1784982905-10.100.11.115-50010-1221785192226,
infoPort=50075, ipcPort=50020):DataXceiver: java.io.IOException:  
Block blk_-3923611845661840838_176295 is not valid.
at org.apache.hadoop.dfs.FSDataset.getBlockFile 
(FSDataset.java:716)
at org.apache.hadoop.dfs.FSDataset.getLength(FSDataset.java: 
704)
at org.apache.hadoop.dfs.DataNode$BlockSender.init 
(DataNode.java:1678)
at org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock 
(DataNode.java:1101)
at org.apache.hadoop.dfs.DataNode$DataXceiver.run 
(DataNode.java:1037)


2008-09-26 10:56:19,325 ERROR org.apache.hadoop.dfs.DataNode:  
DatanodeRegistration(10.100.11.115:50010,  
storageID=DS-1784982905-10.100.11.115-50010-1221785192226,
infoPort=50075, ipcPort=50020):DataXceiver: java.io.EOFException:  
while trying to read 65557 bytes
at org.apache.hadoop.dfs.DataNode$BlockReceiver.readToBuf 
(DataNode.java:2464)
at org.apache.hadoop.dfs.DataNode 
$BlockReceiver.readNextPacket(DataNode.java:2508)
at org.apache.hadoop.dfs.DataNode$BlockReceiver.receivePacket 
(DataNode.java:2572)
at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveBlock 
(DataNode.java:2698)
at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock 
(DataNode.java:1283)


2008-09-26 10:56:19,779 ERROR org.apache.hadoop.dfs.DataNode:  
DatanodeRegistration(10.100.11.115:50010,  
storageID=DS-1784982905-10.100.11.115-50010-1221785192226,

infoPort=50075, ipcPort=50020):DataXceiver: java.io.EOFException
at java.io.DataInputStream.readShort(DataInputStream.java:298)
at org.apache.hadoop.dfs.DataNode$DataXceiver.run 
(DataNode.java:1021)

at java.lang.Thread.run(Thread.java:619)

2008-09-26 10:56:21,816 ERROR org.apache.hadoop.dfs.DataNode:  
DatanodeRegistration(10.100.11.115:50010,  
storageID=DS-1784982905-10.100.11.115-50010-1221785192226,
infoPort=50075, ipcPort=50020):DataXceiver: java.io.IOException:  
Could not read from stream
at org.apache.hadoop.net.SocketInputStream.read 
(SocketInputStream.java:119)

at java.io.DataInputStream.readByte(DataInputStream.java:248)
at org.apache.hadoop.io.WritableUtils.readVLong 
(WritableUtils.java:324)
at org.apache.hadoop.io.WritableUtils.readVInt 
(WritableUtils.java:345)

at org.apache.hadoop.io.Text.readString(Text.java:410)

2008-09-26 10:56:28,380 ERROR org.apache.hadoop.dfs.DataNode:  
DatanodeRegistration(10.100.11.115:50010,  
storageID=DS-1784982905-10.100.11.115-50010-1221785192226,
infoPort=50075, ipcPort=50020):DataXceiver: java.io.IOException:  
Connection reset by peer

at sun.nio.ch.FileDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233)
at sun.nio.ch.IOUtil.read(IOUtil.java:206)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java: 
236)


2008-09-26 10:56:52,387 ERROR org.apache.hadoop.dfs.DataNode:  
DatanodeRegistration(10.100.11.115:50010,  
storageID=DS-1784982905-10.100.11.115-50010-1221785192226,
infoPort=50075, ipcPort=50020):DataXceiver: java.io.IOException: Too  
many open files

at sun.nio.ch.EPollArrayWrapper.epollCreate(Native Method)
at sun.nio.ch.EPollArrayWrapper.init 
(EPollArrayWrapper.java:59)
at sun.nio.ch.EPollSelectorImpl.init 
(EPollSelectorImpl.java:52)
at sun.nio.ch.EPollSelectorProvider.openSelector 
(EPollSelectorProvider.java:18)

at sun.nio.ch.Util.getTemporarySelector(Util.java:123)

The most interesting one in my eyes is the too many open files one.  
My ulimit is 1024. How much should it be? I don't think that I have  
that many files open in my mappers. They should only be operating on  
a single file at a time. I can try to run the job again and get an  
lsof if it would be interesting.


Thanks for taking the time to reply, by the way.

-Bryan


On Sep 26, 2008, at 4:48 PM, Hairong Kuang wrote:

Does your failed map task open a lot of files to write? Could you  
please check the log of the datanode running at the machine where  
the map tasks failed? Do you see any error message containing  
exceeds the limit of concurrent xcievers?


Hairong



From: Bryan Duxbury [mailto:[EMAIL PROTECTED]
Sent: Fri 9/26/2008 4:36 PM
To: core-user@hadoop.apache.org
Subject: Could not get block locations. Aborting... exception



Hey all.

We've been running into a very annoying problem pretty frequently
lately. We'll be running some job, for instance a distcp, and it'll
be moving along quite nicely, until all of the sudden, it sort of
freezes up. It takes a while, and then we'll get an error like this  
one:


attempt_200809261607_0003_m_02_0: Exception

Hadoop job scheduling issue

2008-09-24 Thread Bryan Duxbury
I encountered an interesting situation today. I'm running Hadoop  
0.17.1. What happened was that 3 jobs started simultaneously, which  
is expected in my workflow, but then resources got very mixed up.


One of the jobs grabbed all the available reducers (5) and got one  
map task in before the second job started taking all the map tasks.  
This means the first job (the one with the reducers) was holding the  
reducers and doing absolutely no work. The other job was mapping, but  
was suboptimally using resources since it wasn't shuffling at the  
same time as it mapped. (The third job was doing nothing at all.)


Does Hadoop not schedule jobs first-come-first served? I'm pretty  
confident that the jobs all had identical priority since I haven't  
set it to be different anywhere else. If it doesn't schedule jobs in  
this manner, is there a reason why it doesn't? It seems like this  
problem will decrease total throughput significantly in some situations.


-Bryan


Re: Serialization format for structured data

2008-05-23 Thread Bryan Duxbury


On May 23, 2008, at 9:51 AM, Ted Dunning wrote:
Relative to thrift, JSON has the advantage of not requiring a  
schema as well
as the disadvantage of not having a schema.  The advantage is that  
the data
is more fluid and I don't have to generate code to handle the  
records.  The
disadvantage is that I lose some data completeness and typing  
guarantees.
On balance, I would like to use JSON-like data quite a bit in ad  
hoc data
streams and in logs where the producer and consumer of the data are  
not

visible to parts of the data processing chain.


That about sums it up. If you want schema, Thrift is your friend. If  
you don't, JSON probably will do pretty well for you.


-Bryan


Re: Trouble hooking up my app to HDFS

2008-05-14 Thread Bryan Duxbury

Nobody has any ideas about this?

-Bryan

On May 13, 2008, at 11:27 AM, Bryan Duxbury wrote:

I'm trying to create a java application that writes to HDFS. I have  
it set up such that hadoop-0.16.3 is on my machine, and the env  
variables HADOOP_HOME and HADOOP_CONF_DIR point to the correct  
respective directories. My app lives elsewhere, but generates it's  
classpath by looking in those environment variables. Here's what my  
generated classpath looks like:


/Users/bryanduxbury/hadoop-0.16.3/conf:/Users/bryanduxbury/ 
hadoop-0.16.3/hadoop-0.16.3-core.jar:/Users/bryanduxbury/ 
hadoop-0.16.3/hadoop-0.16.3-test.jar:/Users/bryanduxbury/ 
hadoop-0.16.3/lib/commons-cli-2.0-SNAPSHOT.jar:/Users/bryanduxbury/ 
hadoop-0.16.3/lib/commons-codec-1.3.jar:/Users/bryanduxbury/ 
hadoop-0.16.3/lib/commons-httpclient-3.0.1.jar:/Users/bryanduxbury/ 
hadoop-0.16.3/lib/commons-logging-1.0.4.jar:/Users/bryanduxbury/ 
hadoop-0.16.3/lib/commons-logging-api-1.0.4.jar:/Users/bryanduxbury/ 
hadoop-0.16.3/lib/jets3t-0.5.0.jar:/Users/bryanduxbury/ 
hadoop-0.16.3/lib/jetty-5.1.4.jar:/Users/bryanduxbury/hadoop-0.16.3/ 
lib/jetty-ext/commons-el.jar:/Users/bryanduxbury/hadoop-0.16.3/lib/ 
jetty-ext/jasper-compiler.jar:/Users/bryanduxbury/hadoop-0.16.3/lib/ 
jetty-ext/jasper-runtime.jar:/Users/bryanduxbury/hadoop-0.16.3/lib/ 
jetty-ext/jsp-api.jar:/Users/bryanduxbury/hadoop-0.16.3/lib/ 
junit-3.8.1.jar:/Users/bryanduxbury/hadoop-0.16.3/lib/kfs-0.1.jar:/ 
Users/bryanduxbury/hadoop-0.16.3/lib/log4j-1.2.13.jar:/Users/ 
bryanduxbury/hadoop-0.16.3/lib/servlet-api.jar:/Users/bryanduxbury/ 
hadoop-0.16.3/lib/xmlenc-0.52.jar:/Users/bryanduxbury/projects/ 
hdfs_collector/lib/jtestr-0.2.jar:/Users/bryanduxbury/projects/ 
hdfs_collector/lib/jvyaml.jar:/Users/bryanduxbury/projects/ 
hdfs_collector/lib/libthrift.jar:/Users/bryanduxbury/projects/ 
hdfs_collector/build/hdfs_collector.jar


The problem I have is that when I go to get a FileSystem object for  
my file:/// files (for testing locally), I'm getting errors like this:


   [jtestr] java.io.IOException: No FileSystem for scheme: file
   [jtestr]   org/apache/hadoop/fs/FileSystem.java:1179:in  
`createFileSystem'
   [jtestr]   org/apache/hadoop/fs/FileSystem.java:55:in `access 
$300'

   [jtestr]   org/apache/hadoop/fs/FileSystem.java:1193:in `get'
   [jtestr]   org/apache/hadoop/fs/FileSystem.java:150:in `get'
   [jtestr]   org/apache/hadoop/fs/FileSystem.java:124:in  
`getNamed'

   [jtestr]   org/apache/hadoop/fs/FileSystem.java:96:in `get'

I saw an archived message that suggested this was a problem with  
the classpath, but there was no resolution that I saw. Does anyone  
have any ideas why this might be occurring?


Thanks,

Bryan




Re: Trouble hooking up my app to HDFS

2008-05-14 Thread Bryan Duxbury
And the conf dir (/Users/bryanduxbury/hadoop-0.16.3/conf) I hope it  
is the similar as the one you are using for your hadoop installation.


I'm not sure I understand this. It isn't similar, it's the same as my  
hadoop installation. I'm only operating on localhost at the moment.  
I'm just trying to get a LocalFileSystem up and running so I can run  
some tests.


On May 14, 2008, at 8:24 PM, lohit wrote:


You could do this.
open up hadoop (its a shell script). The last line is the one which  
executes the corresponding class of hadoop, instead of exec, make  
it echo and see what all is present in your classpath. Make sure  
your generated class path matches the same. And the conf dir (/ 
Users/bryanduxbury/hadoop-0.16.3/conf) I hope it is the similar as  
the one you are using for your hadoop installation.


Thanks,
lohit

- Original Message 
From: Bryan Duxbury [EMAIL PROTECTED]
To: core-user@hadoop.apache.org
Sent: Wednesday, May 14, 2008 7:30:26 PM
Subject: Re: Trouble hooking up my app to HDFS

Nobody has any ideas about this?

-Bryan

On May 13, 2008, at 11:27 AM, Bryan Duxbury wrote:


I'm trying to create a java application that writes to HDFS. I have
it set up such that hadoop-0.16.3 is on my machine, and the env
variables HADOOP_HOME and HADOOP_CONF_DIR point to the correct
respective directories. My app lives elsewhere, but generates it's
classpath by looking in those environment variables. Here's what my
generated classpath looks like:

/Users/bryanduxbury/hadoop-0.16.3/conf:/Users/bryanduxbury/
hadoop-0.16.3/hadoop-0.16.3-core.jar:/Users/bryanduxbury/
hadoop-0.16.3/hadoop-0.16.3-test.jar:/Users/bryanduxbury/
hadoop-0.16.3/lib/commons-cli-2.0-SNAPSHOT.jar:/Users/bryanduxbury/
hadoop-0.16.3/lib/commons-codec-1.3.jar:/Users/bryanduxbury/
hadoop-0.16.3/lib/commons-httpclient-3.0.1.jar:/Users/bryanduxbury/
hadoop-0.16.3/lib/commons-logging-1.0.4.jar:/Users/bryanduxbury/
hadoop-0.16.3/lib/commons-logging-api-1.0.4.jar:/Users/bryanduxbury/
hadoop-0.16.3/lib/jets3t-0.5.0.jar:/Users/bryanduxbury/
hadoop-0.16.3/lib/jetty-5.1.4.jar:/Users/bryanduxbury/hadoop-0.16.3/
lib/jetty-ext/commons-el.jar:/Users/bryanduxbury/hadoop-0.16.3/lib/
jetty-ext/jasper-compiler.jar:/Users/bryanduxbury/hadoop-0.16.3/lib/
jetty-ext/jasper-runtime.jar:/Users/bryanduxbury/hadoop-0.16.3/lib/
jetty-ext/jsp-api.jar:/Users/bryanduxbury/hadoop-0.16.3/lib/
junit-3.8.1.jar:/Users/bryanduxbury/hadoop-0.16.3/lib/kfs-0.1.jar:/
Users/bryanduxbury/hadoop-0.16.3/lib/log4j-1.2.13.jar:/Users/
bryanduxbury/hadoop-0.16.3/lib/servlet-api.jar:/Users/bryanduxbury/
hadoop-0.16.3/lib/xmlenc-0.52.jar:/Users/bryanduxbury/projects/
hdfs_collector/lib/jtestr-0.2.jar:/Users/bryanduxbury/projects/
hdfs_collector/lib/jvyaml.jar:/Users/bryanduxbury/projects/
hdfs_collector/lib/libthrift.jar:/Users/bryanduxbury/projects/
hdfs_collector/build/hdfs_collector.jar

The problem I have is that when I go to get a FileSystem object for
my file:/// files (for testing locally), I'm getting errors like  
this:


   [jtestr] java.io.IOException: No FileSystem for scheme: file
   [jtestr]   org/apache/hadoop/fs/FileSystem.java:1179:in
`createFileSystem'
   [jtestr]   org/apache/hadoop/fs/FileSystem.java:55:in `access
$300'
   [jtestr]   org/apache/hadoop/fs/FileSystem.java:1193:in `get'
   [jtestr]   org/apache/hadoop/fs/FileSystem.java:150:in `get'
   [jtestr]   org/apache/hadoop/fs/FileSystem.java:124:in
`getNamed'
   [jtestr]   org/apache/hadoop/fs/FileSystem.java:96:in `get'

I saw an archived message that suggested this was a problem with
the classpath, but there was no resolution that I saw. Does anyone
have any ideas why this might be occurring?

Thanks,

Bryan




Trouble hooking up my app to HDFS

2008-05-13 Thread Bryan Duxbury
I'm trying to create a java application that writes to HDFS. I have  
it set up such that hadoop-0.16.3 is on my machine, and the env  
variables HADOOP_HOME and HADOOP_CONF_DIR point to the correct  
respective directories. My app lives elsewhere, but generates it's  
classpath by looking in those environment variables. Here's what my  
generated classpath looks like:


/Users/bryanduxbury/hadoop-0.16.3/conf:/Users/bryanduxbury/ 
hadoop-0.16.3/hadoop-0.16.3-core.jar:/Users/bryanduxbury/ 
hadoop-0.16.3/hadoop-0.16.3-test.jar:/Users/bryanduxbury/ 
hadoop-0.16.3/lib/commons-cli-2.0-SNAPSHOT.jar:/Users/bryanduxbury/ 
hadoop-0.16.3/lib/commons-codec-1.3.jar:/Users/bryanduxbury/ 
hadoop-0.16.3/lib/commons-httpclient-3.0.1.jar:/Users/bryanduxbury/ 
hadoop-0.16.3/lib/commons-logging-1.0.4.jar:/Users/bryanduxbury/ 
hadoop-0.16.3/lib/commons-logging-api-1.0.4.jar:/Users/bryanduxbury/ 
hadoop-0.16.3/lib/jets3t-0.5.0.jar:/Users/bryanduxbury/hadoop-0.16.3/ 
lib/jetty-5.1.4.jar:/Users/bryanduxbury/hadoop-0.16.3/lib/jetty-ext/ 
commons-el.jar:/Users/bryanduxbury/hadoop-0.16.3/lib/jetty-ext/jasper- 
compiler.jar:/Users/bryanduxbury/hadoop-0.16.3/lib/jetty-ext/jasper- 
runtime.jar:/Users/bryanduxbury/hadoop-0.16.3/lib/jetty-ext/jsp- 
api.jar:/Users/bryanduxbury/hadoop-0.16.3/lib/junit-3.8.1.jar:/Users/ 
bryanduxbury/hadoop-0.16.3/lib/kfs-0.1.jar:/Users/bryanduxbury/ 
hadoop-0.16.3/lib/log4j-1.2.13.jar:/Users/bryanduxbury/hadoop-0.16.3/ 
lib/servlet-api.jar:/Users/bryanduxbury/hadoop-0.16.3/lib/ 
xmlenc-0.52.jar:/Users/bryanduxbury/projects/hdfs_collector/lib/ 
jtestr-0.2.jar:/Users/bryanduxbury/projects/hdfs_collector/lib/ 
jvyaml.jar:/Users/bryanduxbury/projects/hdfs_collector/lib/ 
libthrift.jar:/Users/bryanduxbury/projects/hdfs_collector/build/ 
hdfs_collector.jar


The problem I have is that when I go to get a FileSystem object for  
my file:/// files (for testing locally), I'm getting errors like this:


   [jtestr] java.io.IOException: No FileSystem for scheme: file
   [jtestr]   org/apache/hadoop/fs/FileSystem.java:1179:in  
`createFileSystem'
   [jtestr]   org/apache/hadoop/fs/FileSystem.java:55:in `access 
$300'

   [jtestr]   org/apache/hadoop/fs/FileSystem.java:1193:in `get'
   [jtestr]   org/apache/hadoop/fs/FileSystem.java:150:in `get'
   [jtestr]   org/apache/hadoop/fs/FileSystem.java:124:in  
`getNamed'

   [jtestr]   org/apache/hadoop/fs/FileSystem.java:96:in `get'

I saw an archived message that suggested this was a problem with the  
classpath, but there was no resolution that I saw. Does anyone have  
any ideas why this might be occurring?


Thanks,

Bryan 


Re: Hadoop and retrieving data from HDFS

2008-04-24 Thread Bryan Duxbury
I think what you're saying is that you are mostly interested in data  
locality. I don't think it's done yet, but it would be pretty easy to  
make HBase provide start keys as well as region locations for splits  
for a MapReduce job. In theory, that would give you all the pieces  
you need to run locality-aware processing.


-Bryan

On Apr 24, 2008, at 10:16 AM, Leon Mergen wrote:


Hello,

I'm sorry if a question like this has been asked before, but I was  
unable to
find an answer for this anywhere on google; if it is off-topic, I  
apologize

in advance.

I'm trying to look a bit into the future, and predict scalability  
problems
for the company I work for: we're using PostgreSQL, and processing  
many

writes/second (access logs, currently around 250, but this will only
increase significantly in the future). Furthermore, we perform data  
mining
on this data, and ideally, need to have this data stored in a  
structured

form (the data is searched in various ways). In other words: a very
interesting problem.

Now, I'm trying to understand a bit of the hadoop/hbase  
architecture: as I
understand it, HDFS, MapReduce and HBase are sufficiently decoupled  
that the
use case I was hoping for is not available; however, I'm still  
going to ask:



Is it possible to store this data in hbase, and thus have all  
access logs
distributed amongst many different servers, and start MapReduce  
jobs on
those actual servers, which process all the data on those servers ?  
In other

words, the data never leaves the actual servers ?

If this isn't possible, is this because someone simply never took  
the time
to implement such a thing, or is it hard to fit in the design (for  
example,
that the JobTracker needs to be aware of the physical locations of  
all the
data, since you don't want to analyze the same (replicated) data  
twice) ?


From what I understand by playing with hadoop for the past few  
days, the
idea is that you fetch your MapReduce data from HDFS rather than  
BigTable,

or am I mistaken ?

Thanks for your time!

Regards,

Leon Mergen




Re: ID Service with HBase?

2008-04-16 Thread Bryan Duxbury
HBASE-493 was created, and seems similar. It's a write-if-not- 
modified-since.


I would guess that you probably don't want to use HBase to maintain a  
distributed auto-increment. You need to think of some other approach  
the produces unique ids across concurrent access, like hash or GUID  
or something like that.


-Bryan

On Apr 16, 2008, at 2:18 PM, Jim Kellerman wrote:

Row locks do not apply to reads, only updates. They prevent two  
applications from updating the same row simultaneously. There is no  
other locking mechanism in HBase. (It follows Bigtable in this  
regard. See http://labs.google.com/papers/bigtable.html )


There has been some discussion about adding a conditional write  
(i.e. only completes successfully if the current value of the cell  
being updated has value x), but noone has thought it important  
enough to enter an enhancement request on the HBase Jira: https:// 
issues.apache.org/jira/browse/HBASE


By the way, you will get a more timely response to HBase questions  
if you address them to the hbase mailing list: hbase- 
[EMAIL PROTECTED]


---
Jim Kellerman, Senior Engineer; Powerset



-Original Message-
From: Thomas Thevis [mailto:[EMAIL PROTECTED]
Sent: Wednesday, April 16, 2008 4:22 AM
To: core-user@hadoop.apache.org
Subject: ID Service with HBase?

Hello list readers,

I'd like to perform mass data operations resulting in several
output files with cross-references between lines in different
files. For this purpose, I want to use a kind of ID service
and I wonder whether I could use HBase for this task.
However, until now I was not able to use the HBase locking
mechanism in a way that newly created IDs are unique.

The setup:
- each Mapper has its own instance of an IDSevice implementation
- each IDService instance has its own reference to the ID
table in the HBase

The code snippet which is used to return and update IDs:
[code]
final String columnName = this.config.get(ID_COLUMN_ID);
final Text column = new Text(columnName); final String
tableName = this.config.get(ID_SERVICE_TABLE_ID);
final HTable table = new HTable(this.config, new
Text(tableName)); final Text rowName = new Text(namespace);
final long startValue;

final long lockid = table.startUpdate(rowName); final byte[]
bytes = table.get(rowName, column); if (bytes == null) {
 startValue = 0;
} else {
 final ByteArrayInputStream byteArrayInputStream
 = new ByteArrayInputStream(bytes);
 final LongWritable longWritable = new LongWritable();
 longWritable.readFields(new
DataInputStream(byteArrayInputStream));
 startValue = longWritable.get();
}
final long stopValue = startValue + size; table.put(lockid,
column, new LongWritable(stopValue)); table.commit(lockid); [/code]

As stated above, resulting IDs are not unique, about a
quarter of all created IDs appears several times.
Now my question: Do I use the locking mechanism the wrong way
or is my approach to use HBase locking and synchronizing for
this task completely wrong?

Thanks,

Thomas

No virus found in this incoming message.
Checked by AVG.
Version: 7.5.524 / Virus Database: 269.23.0/1381 - Release
Date: 4/16/2008 9:34 AM




No virus found in this outgoing message.
Checked by AVG.
Version: 7.5.524 / Virus Database: 269.23.0/1381 - Release Date:  
4/16/2008 9:34 AM






Re: how to connect to hbase using php

2008-03-11 Thread Bryan Duxbury
To connect to HBase from PHP, you should use either REST or Thrift  
integration.


-Bryan

On Mar 11, 2008, at 4:20 AM, Ved Prakash wrote:


I have seen examples to connect to hbase using php, which mentions of
hshellconnect.class.php, I would like to know where can I download  
this

file, or is there any alternative way to connect to hbase using php

Thanks




Re: loading data into hbase table

2008-03-11 Thread Bryan Duxbury

Ved,

At the moment you're stuck loading the data via one of the APIs  
(Java, REST or Thrift) yourself. We would like to have import tools  
for HBase, but we haven't gotten around to it yet.


Also, there's now a separate HBase mailing list at hbase- 
[EMAIL PROTECTED] Your questions about HBase are better asked  
there in the future.


-Bryan

On Mar 11, 2008, at 2:41 AM, Ved Prakash wrote:


Hi friends,

I have a table dump in csv format, I wanted to load this data into  
my hbase
table instead of typing it as inserts. I did a web search and also  
looked

into the hbase documentation but couldn't find any thing.

Can someone tell me how to load a file from local disk into hbase  
table on

hdfs?

Thanks




Re: Hbase Matrix Package for Map/Reduce-based Parallel Matrix Computations

2008-01-30 Thread Bryan Duxbury
There's nothing stopping you from storing doubles in HBase. All you  
have to do is convert your double into a byte array.


-Bryan

On Jan 30, 2008, at 4:31 PM, Chanwit Kaewkasi wrote:


Hi Edward,

On 29/01/2008, edward yoon [EMAIL PROTECTED] wrote:

Did you mean the MATLAB-like scientific language interface for matrix
computations? If so, you are welcome any time.


Yes, it is going to be a language for matrix computation. Groovy
allows overriding methods and very good Java integration. With these,
it's my choice for wrapping low-level library like what you're doing
as well as Linpack and BLAS. I've done earlier some wrapper classes
for them through MTJ library.


I made a matrix page on hadoop wiki.
http://wiki.apache.org/hadoop/Matrix

Freely add your profile and your plans.


I will do it.

BTW, I'd like to get my hand dirty to add the double data type support
for HBase, as I'm considering it as one of my important issues. Do you
have any suggestion to start modifying it?

Thanks,

Chanwit

--
Chanwit Kaewkasi
PhD Student,
Centre for Novel Computing
School of Computer Science
The University of Manchester
Oxford Road
Manchester
M13 9PL, UK