Re: Optimal Filesystem (and Settings) for HDFS
We use XFS for our data drives, and we've had somewhat mixed results. One of the biggest pros is that XFS has more free space than ext3, even with the reserved space settings turned all the way to 0. Another is that you can format a 1TB drive as XFS in about 0 seconds, versus minutes for ext3. This makes it really fast to kickstart our worker nodes. We have seen some weird stuff happen though when machines run out of memory, apparently because the XFS driver does something odd with kernel memory. When this happens, we end up having to do some fscking before we can get that node back online. As far as outright performance, I actually *did* do some tests of xfs vs ext3 performance on our cluster. If you just look at a single machine's local disk speed, you can write and read noticeably faster when using XFS instead of ext3. However, the reality is that this extra disk performance won't have much of an effect on your overall job completion performance, since you will find yourself network bottlenecked well in advance of even ext3's performance. The long and short of it is that we use XFS to speed up our new machine deployment, and that's it. -Bryan On May 18, 2009, at 10:31 AM, Alex Loddengaard wrote: I believe Yahoo! uses ext3, though I know other people have said that XFS has performed better in various benchmarks. We use ext3, though we haven't done any benchmarks to prove its worth. This question has come up a lot, so I think it'd be worth doing a benchmark and writing up the results. I haven't been able to find a detailed analysis / benchmark writeup comparing various filesystems, unfortunately. Hope this helps, Alex On Mon, May 18, 2009 at 8:54 AM, Bob Schulze b.schu...@ecircle.com wrote: We are currently rebuilding our cluster - has anybody recommendations on the underlaying file system? Just standard Ext3? I could imagine that the block size could be larger than its default... Thx for any tips, Bob
Re: fyi: A Comparison of Approaches to Large-Scale Data Analysis: MapReduce vs. DBMS Benchmarks
I thought it a conspicuous omission to not discuss the cost of various approaches. Hadoop is free, though you have to spend developer time; how much does Vertica cost on 100 nodes? -Bryan On Apr 14, 2009, at 7:16 AM, Guilherme Germoglio wrote: (Hadoop is used in the benchmarks) http://database.cs.brown.edu/sigmod09/ There is currently considerable enthusiasm around the MapReduce (MR) paradigm for large-scale data analysis [17]. Although the basic control flow of this framework has existed in parallel SQL database management systems (DBMS) for over 20 years, some have called MR a dramatically new computing model [8, 17]. In this paper, we describe and compare both paradigms. Furthermore, we evaluate both kinds of systems in terms of performance and de- velopment complexity. To this end, we define a benchmark con- sisting of a collection of tasks that we have run on an open source version of MR as well as on two parallel DBMSs. For each task, we measure each system’s performance for various degrees of par- allelism on a cluster of 100 nodes. Our results reveal some inter- esting trade-offs. Although the process to load data into and tune the execution of parallel DBMSs took much longer than the MR system, the observed performance of these DBMSs was strikingly better. We speculate about the causes of the dramatic performance difference and consider implementation concepts that future sys- tems should take from both kinds of architectures. -- Guilherme msn: guigermog...@hotmail.com homepage: http://germoglio.googlepages.com
Issue distcp'ing from 0.19.2 to 0.18.3
Hey all, I was trying to copy some data from our cluster on 0.19.2 to a new cluster on 0.18.3 by using disctp and the hftp:// filesystem. Everything seemed to be going fine for a few hours, but then a few tasks failed because a few files got 500 errors when trying to be read from the 19 cluster. As a result the job died. Now that I'm trying to restart it, I get this error: [rapl...@ds-nn2 ~]$ hadoop distcp hftp://ds-nn1:7276/ hdfs://ds- nn2:7276/cluster-a 09/04/08 23:32:39 INFO tools.DistCp: srcPaths=[hftp://ds-nn1:7276/] 09/04/08 23:32:39 INFO tools.DistCp: destPath=hdfs://ds-nn2:7276/ cluster-a With failures, global counters are inaccurate; consider running with -i Copy failed: java.net.SocketException: Unexpected end of file from server at sun.net.www.http.HttpClient.parseHTTPHeader (HttpClient.java:769) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632) at sun.net.www.http.HttpClient.parseHTTPHeader (HttpClient.java:766) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632) at sun.net.www.protocol.http.HttpURLConnection.getInputStream (HttpURLConnection.java:1000) at org.apache.hadoop.dfs.HftpFileSystem$LsParser.fetchList (HftpFileSystem.java:183) at org.apache.hadoop.dfs.HftpFileSystem $LsParser.getFileStatus(HftpFileSystem.java:193) at org.apache.hadoop.dfs.HftpFileSystem.getFileStatus (HftpFileSystem.java:222) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:667) at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:588) at org.apache.hadoop.tools.DistCp.copy(DistCp.java:609) at org.apache.hadoop.tools.DistCp.run(DistCp.java:768) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.tools.DistCp.main(DistCp.java:788) I changed nothing at all between the first attempt and the subsequent failed attempts. The only clues in the namenode log for the 19 cluster are: 2009-04-08 23:29:09,786 WARN org.apache.hadoop.ipc.Server: Incorrect header or version mismatch from 10.100.50.252:47733 got version 47 expected version 2 Anyone have any ideas? -Bryan
Re: Issue distcp'ing from 0.19.2 to 0.18.3
Ah, nevermind. It turns out that I just shouldn't rely on command history so much. I accidentally pointed the hftp:// at the actual namenode port, not the namenode HTTP port. It appears to be starting a regular copy again. -Bryan On Apr 8, 2009, at 11:57 PM, Todd Lipcon wrote: Hey Bryan, Any chance you can get a tshark trace on the 0.19 namenode? Maybe tshark -s 10 -w nndump.pcap port 7276 Also, are the clocks synced on the two machines? The failure of your distcp is at 23:32:39, but the namenode log message you posted was 23:29:09. Did those messages actually pop out at the same time? Thanks -Todd On Wed, Apr 8, 2009 at 11:39 PM, Bryan Duxbury br...@rapleaf.com wrote: Hey all, I was trying to copy some data from our cluster on 0.19.2 to a new cluster on 0.18.3 by using disctp and the hftp:// filesystem. Everything seemed to be going fine for a few hours, but then a few tasks failed because a few files got 500 errors when trying to be read from the 19 cluster. As a result the job died. Now that I'm trying to restart it, I get this error: [rapl...@ds-nn2 ~]$ hadoop distcp hftp://ds-nn1:7276/ hdfs://ds-nn2:7276/cluster-a 09/04/08 23:32:39 INFO tools.DistCp: srcPaths=[hftp://ds-nn1:7276/] 09/04/08 23:32:39 INFO tools.DistCp: destPath=hdfs://ds-nn2:7276/ cluster-a With failures, global counters are inaccurate; consider running with -i Copy failed: java.net.SocketException: Unexpected end of file from server at sun.net.www.http.HttpClient.parseHTTPHeader (HttpClient.java:769) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632) at sun.net.www.http.HttpClient.parseHTTPHeader (HttpClient.java:766) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632) at sun.net.www.protocol.http.HttpURLConnection.getInputStream (HttpURLConnection.java:1000) at org.apache.hadoop.dfs.HftpFileSystem$LsParser.fetchList (HftpFileSystem.java:183) at org.apache.hadoop.dfs.HftpFileSystem$LsParser.getFileStatus (HftpFileSystem.java:193) at org.apache.hadoop.dfs.HftpFileSystem.getFileStatus (HftpFileSystem.java:222) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:667) at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java: 588) at org.apache.hadoop.tools.DistCp.copy(DistCp.java:609) at org.apache.hadoop.tools.DistCp.run(DistCp.java:768) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.tools.DistCp.main(DistCp.java:788) I changed nothing at all between the first attempt and the subsequent failed attempts. The only clues in the namenode log for the 19 cluster are: 2009-04-08 23:29:09,786 WARN org.apache.hadoop.ipc.Server: Incorrect header or version mismatch from 10.100.50.252:47733 got version 47 expected version 2 Anyone have any ideas? -Bryan
Re: HELP: I wanna store the output value into a list not write to the disk
I don't really see what the downside of reading it from disk is. A list of word counts should be pretty small on disk so it shouldn't take long to read it into a HashMap. Doing anything else is going to cause you to go a long way out of your way to end up with the same result. -Bryan On Apr 2, 2009, at 2:41 AM, andy2005cst wrote: I need to use the output of the reduce, but I don't know how to do. use the wordcount program as an example if i want to collect the wordcount into a hashtable for further use, how can i do? the example just show how to let the result onto disk. myemail is : andy2005...@gmail.com looking forward your help. thanks a lot. -- View this message in context: http://www.nabble.com/HELP%3A-I-wanna- store-the-output-value-into-a-list-not-write-to-the-disk- tp22844277p22844277.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: Massive discrepancies in job's bytes written/read
Is there some area of the codebase that deals with aggregating counters that I should be looking at? -Bryan On Mar 17, 2009, at 10:20 PM, Owen O'Malley wrote: On Mar 17, 2009, at 7:44 PM, Bryan Duxbury wrote: There is no compression in the mix for us, so that's not the culprit. I'd be sort of willing to believe that spilling and sorting play a role in this, but, wow, over 10x read and write? That seems like a big problem. It happened recently to me too. It was off by 6x. The strange thing was that the individual tasks looked right. It was just the aggregate that was wrong. -- Owen
Re: Does HDFS provide a way to append A file to B ?
I believe the last word on appends right now is that the patch that was committed broke a lot of other things, so it's been disabled. As such, there is no working append in HDFS, and certainly not in hadoop-17.x. -Bryan On Mar 17, 2009, at 4:50 PM, Steve Gao wrote: Thanks, but I was told there is an append command, isn't there? But I don't know how to apply this patch https://issues.apache.org/jira/ browse/HADOOP-1700 --- On Tue, 3/17/09, Bo Shi b...@visiblemeasures.com wrote: From: Bo Shi b...@visiblemeasures.com Subject: Re: Does HDFS provide a way to append A file to B ? To: core-user@hadoop.apache.org Date: Tuesday, March 17, 2009, 7:42 PM what about an identity mapper taking A and B as inputs? this will likely mix rows of A and B together though... On Tue, Mar 17, 2009 at 7:35 PM, Steve Gao steve@yahoo.com wrote: BTW, I am using hadoop 0.17.0 and jdk 1.6 --- On Tue, 3/17/09, Steve Gao steve@yahoo.com wrote: From: Steve Gao steve@yahoo.com Subject: Does HDFS provide a way to append A file to B ? To: core-user@hadoop.apache.org Date: Tuesday, March 17, 2009, 7:22 PM I need to append file A to file B in HDFS without downloading/ uploading them to local disk. Is there a way?
Massive discrepancies in job's bytes written/read
Hey all, In looking at the stats for a number of our jobs, the amount of data that the UI claims we've read from or written to HDFS is vastly larger than the amount of data that should be involved in the job. For instance, we have a job that combines small files into big files that we're operating on around 2TB worth of data. The outputs in HDFS (via hadoop dfs -du) matches the expected size, but the jobtracker UI claims that we've read and written around 22TB of data! By all accounts, Hadoop is actually *doing* the right thing - we're not observing excess data reading or writing anywhere. However, this massive discrepancy makes the job stats essentially worthless for understanding IO in our jobs. Does anyone know why there's such an enormous difference? Have others experienced this problem? -Bryan
Re: Does HDFS provide a way to append A file to B ?
No. There isn't *any* version of Hadoop with a (stable) append command. On Mar 17, 2009, at 5:08 PM, Steve Gao wrote: Thanks, Bryan. Does 0.18.3 has built-in append command? --- On Tue, 3/17/09, Bryan Duxbury br...@rapleaf.com wrote: From: Bryan Duxbury br...@rapleaf.com Subject: Re: Does HDFS provide a way to append A file to B ? To: core-user@hadoop.apache.org Date: Tuesday, March 17, 2009, 8:04 PM I believe the last word on appends right now is that the patch that was committed broke a lot of other things, so it's been disabled. As such, there is no working append in HDFS, and certainly not in hadoop-17.x. -Bryan On Mar 17, 2009, at 4:50 PM, Steve Gao wrote: Thanks, but I was told there is an append command, isn't there? But I don't know how to apply this patch https://issues.apache.org/jira/browse/HADOOP-1700 --- On Tue, 3/17/09, Bo Shi b...@visiblemeasures.com wrote: From: Bo Shi b...@visiblemeasures.com Subject: Re: Does HDFS provide a way to append A file to B ? To: core-user@hadoop.apache.org Date: Tuesday, March 17, 2009, 7:42 PM what about an identity mapper taking A and B as inputs? this will likely mix rows of A and B together though... On Tue, Mar 17, 2009 at 7:35 PM, Steve Gao steve@yahoo.com wrote: BTW, I am using hadoop 0.17.0 and jdk 1.6 --- On Tue, 3/17/09, Steve Gao steve@yahoo.com wrote: From: Steve Gao steve@yahoo.com Subject: Does HDFS provide a way to append A file to B ? To: core-user@hadoop.apache.org Date: Tuesday, March 17, 2009, 7:22 PM I need to append file A to file B in HDFS without downloading/uploading them to local disk. Is there a way?
Re: Massive discrepancies in job's bytes written/read
There is no compression in the mix for us, so that's not the culprit. I'd be sort of willing to believe that spilling and sorting play a role in this, but, wow, over 10x read and write? That seems like a big problem. -Bryan On Mar 17, 2009, at 6:46 PM, Stefan Will wrote: Some of the discrepancy could be due to compression of map input/ output format. E.g. The mapper output bytes will show the compressed size, while the reduce input will show the uncompressed size. Or something along those lines. But I'm also questioning the accuracy of the reporting and suspect that some if it is due to all the disk activity that happens while processing spills in the mapper and the copy/shuffle/sort phase in the reducer. It would certainly be nice if all the byte counts were reported in a way that they're comparable. -- Stefan From: Bryan Duxbury br...@rapleaf.com Reply-To: core-user@hadoop.apache.org Date: Tue, 17 Mar 2009 17:26:52 -0700 To: core-user@hadoop.apache.org Subject: Massive discrepancies in job's bytes written/read Hey all, In looking at the stats for a number of our jobs, the amount of data that the UI claims we've read from or written to HDFS is vastly larger than the amount of data that should be involved in the job. For instance, we have a job that combines small files into big files that we're operating on around 2TB worth of data. The outputs in HDFS (via hadoop dfs -du) matches the expected size, but the jobtracker UI claims that we've read and written around 22TB of data! By all accounts, Hadoop is actually *doing* the right thing - we're not observing excess data reading or writing anywhere. However, this massive discrepancy makes the job stats essentially worthless for understanding IO in our jobs. Does anyone know why there's such an enormous difference? Have others experienced this problem? -Bryan
Re: Profiling Hadoop
I've used YourKit Java Profiler pretty successfully. There's a JobConf parameter you can flip on that will cause a few maps and reduces to start with profiling on, so you won't be overwhelmed with info. -Bryan On Feb 27, 2009, at 11:12 AM, Sandy wrote: Hello, Could anyone recommend any software for profiling the performance of MapReduce applications one may write for Hadoop? I am currently developing in Java. Thanks, -SM
Big HDFS deletes lead to dead datanodes
On occasion, I've deleted a few TB of stuff in DFS at once. I've noticed that when I do this, datanodes start taking a really long time to check in and ultimately get marked dead. Some time later, they'll get done deleting stuff and come back and get unmarked. I'm wondering, why do deletions get more priority than checking in? Ideally we would never see dead datanodes from doing deletes. -Bryan
Super-long reduce task timeouts in hadoop-0.19.0
(Repost from the dev list) I noticed some really odd behavior today while reviewing the job history of some of our jobs. Our Ganglia graphs showed really long periods of inactivity across the entire cluster, which should definitely not be the case - we have a really long string of jobs in our workflow that should execute one after another. I figured out which jobs were running during those periods of inactivity, and discovered that almost all of them had 4-5 failed reduce tasks, with the reason for failure being something like: Task attempt_200902061117_3382_r_38_0 failed to report status for 1282 seconds. Killing! The actual timeout reported varies from 700-5000 seconds. Virtually all of our longer-running jobs were affected by this problem. The period of inactivity on the cluster seems to correspond to the amount of time the job waited for these reduce tasks to fail. I checked out the tasktracker log for the machines with timed-out reduce tasks looking for something that might explain the problem, but the only thing I came up with that actually referenced the failed task was this log message, which was repeated many times: 2009-02-19 22:48:19,380 INFO org.apache.hadoop.mapred.TaskTracker: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_200902061117_3388/ attempt_200902061117_3388_r_66_0/output/file.out in any of the configured local directories I'm not sure what this means; can anyone shed some light on this message? Further confusing the issue, on the affected machines, I looked in logs/userlogs/task id, and to my surprise, the directory and log files existed, and the syslog file seemed to contain logs of a perfectly good reduce task! Overall, this seems like a pretty critical bug. It's consuming up to 50% of the runtime of our jobs in some instances, killing our throughput. At the very least, it seems like the reduce task timeout period should be MUCH shorter than the current 10-20 minutes. -Bryan
Re: Super-long reduce task timeouts in hadoop-0.19.0
We didn't customize this value, to my knowledge, so I'd suspect it's the default. -Bryan On Feb 20, 2009, at 5:00 PM, Ted Dunning wrote: How often do your reduce tasks report status? On Fri, Feb 20, 2009 at 3:58 PM, Bryan Duxbury br...@rapleaf.com wrote: (Repost from the dev list) I noticed some really odd behavior today while reviewing the job history of some of our jobs. Our Ganglia graphs showed really long periods of inactivity across the entire cluster, which should definitely not be the case - we have a really long string of jobs in our workflow that should execute one after another. I figured out which jobs were running during those periods of inactivity, and discovered that almost all of them had 4-5 failed reduce tasks, with the reason for failure being something like: Task attempt_200902061117_3382_r_38_0 failed to report status for 1282 seconds. Killing! The actual timeout reported varies from 700-5000 seconds. Virtually all of our longer-running jobs were affected by this problem. The period of inactivity on the cluster seems to correspond to the amount of time the job waited for these reduce tasks to fail. I checked out the tasktracker log for the machines with timed-out reduce tasks looking for something that might explain the problem, but the only thing I came up with that actually referenced the failed task was this log message, which was repeated many times: 2009-02-19 22:48:19,380 INFO org.apache.hadoop.mapred.TaskTracker: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_200902061117_3388/ attempt_200902061117_3388_r_66_0/output/file.out in any of the configured local directories I'm not sure what this means; can anyone shed some light on this message? Further confusing the issue, on the affected machines, I looked in logs/userlogs/task id, and to my surprise, the directory and log files existed, and the syslog file seemed to contain logs of a perfectly good reduce task! Overall, this seems like a pretty critical bug. It's consuming up to 50% of the runtime of our jobs in some instances, killing our throughput. At the very least, it seems like the reduce task timeout period should be MUCH shorter than the current 10-20 minutes. -Bryan -- Ted Dunning, CTO DeepDyve 111 West Evelyn Ave. Ste. 202 Sunnyvale, CA 94086 www.deepdyve.com 408-773-0110 ext. 738 858-414-0013 (m) 408-773-0220 (fax)
Measuring IO time in map/reduce jobs?
Hey all, Does anyone have any experience trying to measure IO time spent in their map/reduce jobs? I know how to profile a sample of map and reduce tasks, but that appears to exclude IO time. Just subtracting the total cpu time from the total run time of a task seems like too coarse an approach. -Bryan
Re: java.io.IOException: Could not get block locations. Aborting...
Small files are bad for hadoop. You should avoid keeping a lot of small files if possible. That said, that error is something I've seen a lot. It usually happens when the number of xcievers hasn't been adjusted upwards from the default of 256. We run with 8000 xcievers, and that seems to solve our problems. I think that if you have a lot of open files, this problem happens a lot faster. -Bryan On Feb 9, 2009, at 1:01 PM, Scott Whitecross wrote: Hi all - I've been running into this error the past few days: java.io.IOException: Could not get block locations. Aborting... at org.apache.hadoop.dfs.DFSClient $DFSOutputStream.processDatanodeError(DFSClient.java:2143) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400 (DFSClient.java:1735) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run (DFSClient.java:1889) It seems to be related to trying to write to many files to HDFS. I have a class extending org.apache.hadoop.mapred.lib.MultipleOutputFormat and if I output to a few file names, everything works. However, if I output to thousands of small files, the above error occurs. I'm having trouble isolating the problem, as the problem doesn't occur in the debugger unfortunately. Is this a memory issue, or is there an upper limit to the number of files HDFS can hold? Any settings to adjust? Thanks.
Re: java.io.IOException: Could not get block locations. Aborting...
Correct. +1 to Jason's more unix file handles suggestion. That's a must-have. -Bryan On Feb 9, 2009, at 3:09 PM, Scott Whitecross wrote: This would be an addition to the hadoop-site.xml file, to up dfs.datanode.max.xcievers? Thanks. On Feb 9, 2009, at 5:54 PM, Bryan Duxbury wrote: Small files are bad for hadoop. You should avoid keeping a lot of small files if possible. That said, that error is something I've seen a lot. It usually happens when the number of xcievers hasn't been adjusted upwards from the default of 256. We run with 8000 xcievers, and that seems to solve our problems. I think that if you have a lot of open files, this problem happens a lot faster. -Bryan On Feb 9, 2009, at 1:01 PM, Scott Whitecross wrote: Hi all - I've been running into this error the past few days: java.io.IOException: Could not get block locations. Aborting... at org.apache.hadoop.dfs.DFSClient $DFSOutputStream.processDatanodeError(DFSClient.java:2143) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400 (DFSClient.java:1735) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream $DataStreamer.run(DFSClient.java:1889) It seems to be related to trying to write to many files to HDFS. I have a class extending org.apache.hadoop.mapred.lib.MultipleOutputFormat and if I output to a few file names, everything works. However, if I output to thousands of small files, the above error occurs. I'm having trouble isolating the problem, as the problem doesn't occur in the debugger unfortunately. Is this a memory issue, or is there an upper limit to the number of files HDFS can hold? Any settings to adjust? Thanks.
Re: Control over max map/reduce tasks per job
This sounds good enough for a JIRA ticket to me. -Bryan On Feb 3, 2009, at 11:44 AM, Jonathan Gray wrote: Chris, For my specific use cases, it would be best to be able to set N mappers/reducers per job per node (so I can explicitly say, run at most 2 at a time of this CPU bound task on any given node). However, the other way would work as well (on 10 node system, would set job to max 20 tasks at a time globally), but opens up the possibility that a node could be assigned more than 2 of that task. I would work with whatever is easiest to implement as either would be a vast improvement for me (can run high numbers of network latency bound tasks without fear of cpu bound tasks killing the cluster). JG -Original Message- From: Chris K Wensel [mailto:ch...@wensel.net] Sent: Tuesday, February 03, 2009 11:34 AM To: core-user@hadoop.apache.org Subject: Re: Control over max map/reduce tasks per job Hey Jonathan Are you looking to limit the total number of concurrent mapper/ reducers a single job can consume cluster wide, or limit the number per node? That is, you have X mappers/reducers, but only can allow N mappers/ reducers to run at a time globally, for a given job. Or, you are cool with all X running concurrently globally, but want to guarantee that no node can run more than N tasks from that job? Or both? just reconciling the conversation we had last week with this thread. ckw On Feb 3, 2009, at 11:16 AM, Jonathan Gray wrote: All, I have a few relatively small clusters (5-20 nodes) and am having trouble keeping them loaded with my MR jobs. The primary issue is that I have different jobs that have drastically different patterns. I have jobs that read/write to/from HBase or Hadoop with minimal logic (network throughput bound or io bound), others that perform crawling (network latency bound), and one huge parsing streaming job (very CPU bound, each task eats a core). I'd like to launch very large numbers of tasks for network latency bound jobs, however the large CPU bound job means I have to keep the max maps allowed per node low enough as to not starve the Datanode and Regionserver. I'm an HBase dev but not familiar enough with Hadoop MR code to even know what would be involved with implementing this. However, in talking with other users, it seems like this would be a well-received option. I wanted to ping the list before filing an issue because it seems like someone may have thought about this in the past. Thanks. Jonathan Gray -- Chris K Wensel ch...@wensel.net http://www.cascading.org/ http://www.scaleunlimited.com/
Re: Question about HDFS capacity and remaining
Hm, very interesting. Didn't know about that. What's the purpose of the reservation? Just to give root preference or leave wiggle room? If it's not strictly necessary it seems like it would make sense to reduce it to essentially 0%. -Bryan On Jan 29, 2009, at 6:18 PM, Doug Cutting wrote: Ext2 by default reserves 5% of the drive for use by root only. That'd be 45MB of your 907GB capacity which would account for most of the discrepancy. You can adjust this with tune2fs. Doug Bryan Duxbury wrote: There are no non-dfs files on the partitions in question. df -h indicates that there is 907GB capacity, but only 853GB remaining, with 200M used. The only thing I can think of is the filesystem overhead. -Bryan On Jan 29, 2009, at 4:06 PM, Hairong Kuang wrote: It's taken by non-dfs files. Hairong On 1/29/09 3:23 PM, Bryan Duxbury br...@rapleaf.com wrote: Hey all, I'm currently installing a new cluster, and noticed something a little confusing. My DFS is *completely* empty - 0 files in DFS. However, in the namenode web interface, the reported capacity is 3.49 TB, but the remaining is 3.25TB. Where'd that .24TB go? There are literally zero other files on the partitions hosting the DFS data directories. Where am I losing 240GB? -Bryan
Re: Question about HDFS capacity and remaining
Did you publish those results anywhere? On Jan 30, 2009, at 9:56 AM, Brian Bockelman wrote: For what it's worth, our organization did extensive tests on many filesystems benchmarking their performance when they are 90 - 95% full. Only XFS retained most of its performance when it was mostly full (ext4 was not tested)... so, if you are thinking of pushing things to the limits, that might be something worth considering. Brian On Jan 30, 2009, at 11:18 AM, stephen mulcahy wrote: Bryan Duxbury wrote: Hm, very interesting. Didn't know about that. What's the purpose of the reservation? Just to give root preference or leave wiggle room? If it's not strictly necessary it seems like it would make sense to reduce it to essentially 0%. AFAIK It is needed for defragmentation / fsck to work properly and your filesystem performance will degrade a lot if you reduce this to 0% (but I'd love to hear otherwise :) -stephen
Question about HDFS capacity and remaining
Hey all, I'm currently installing a new cluster, and noticed something a little confusing. My DFS is *completely* empty - 0 files in DFS. However, in the namenode web interface, the reported capacity is 3.49 TB, but the remaining is 3.25TB. Where'd that .24TB go? There are literally zero other files on the partitions hosting the DFS data directories. Where am I losing 240GB? -Bryan
Re: Question about HDFS capacity and remaining
There are no non-dfs files on the partitions in question. df -h indicates that there is 907GB capacity, but only 853GB remaining, with 200M used. The only thing I can think of is the filesystem overhead. -Bryan On Jan 29, 2009, at 4:06 PM, Hairong Kuang wrote: It's taken by non-dfs files. Hairong On 1/29/09 3:23 PM, Bryan Duxbury br...@rapleaf.com wrote: Hey all, I'm currently installing a new cluster, and noticed something a little confusing. My DFS is *completely* empty - 0 files in DFS. However, in the namenode web interface, the reported capacity is 3.49 TB, but the remaining is 3.25TB. Where'd that .24TB go? There are literally zero other files on the partitions hosting the DFS data directories. Where am I losing 240GB? -Bryan
Re: Q about storage architecture
If you are considering using it as a conventional filesystem from a few clients, then it most resembles NAS. However, I don't think it makes sense to try and classify it as SAN or NAS. HDFS is a distributed filesystem designed to be consumed in a massively distributed fashion, so it does fall into its own category. -Bryan On Dec 6, 2008, at 9:06 PM, Sirisha Akkala wrote: Hi I would like to know if Hadoop architecture more resembles SAN or NAS? -I'm guessing it is NAS. Or does it fall under a totally different category? If so, can you please email brief information? thanks,sirisha.
Re: Filesystem closed errors
My app isn't a map/reduce job. On Nov 25, 2008, at 9:07 PM, David B. Ritch wrote: Do you have speculative execution enabled? I've seen error messages like this caused by speculative execution. David Bryan Duxbury wrote: I have an app that runs for a long time with no problems, but when I signal it to shut down, I get errors like this: java.io.IOException: Filesystem closed at org.apache.hadoop.dfs.DFSClient.checkOpen(DFSClient.java:196) at org.apache.hadoop.dfs.DFSClient.rename(DFSClient.java:502) at org.apache.hadoop.dfs.DistributedFileSystem.rename (DistributedFileSystem.java:176) The problems occur when I am trying to close open HDFS files. Any ideas why I might be seeing this? I though it was because I was abruptly shutting down without giving the streams a chance to get closed, but after some refactoring, that's not the case. -Bryan
Re: Filesystem closed errors
I'm fairly certain that I'm not closing the Filesystem anywhere. That said, the issue you pointed at could somehow be connected. On Nov 25, 2008, at 9:11 PM, Hong Tang wrote: Does your code ever call fs.close()? If so, https:// issues.apache.org/jira/browse/HADOOP-4655 might be relevant to your problem. On Nov 25, 2008, at 9:07 PM, David B. Ritch wrote: Do you have speculative execution enabled? I've seen error messages like this caused by speculative execution. David Bryan Duxbury wrote: I have an app that runs for a long time with no problems, but when I signal it to shut down, I get errors like this: java.io.IOException: Filesystem closed at org.apache.hadoop.dfs.DFSClient.checkOpen(DFSClient.java:196) at org.apache.hadoop.dfs.DFSClient.rename(DFSClient.java:502) at org.apache.hadoop.dfs.DistributedFileSystem.rename (DistributedFileSystem.java:176) The problems occur when I am trying to close open HDFS files. Any ideas why I might be seeing this? I though it was because I was abruptly shutting down without giving the streams a chance to get closed, but after some refactoring, that's not the case. -Bryan
Filesystem closed errors
I have an app that runs for a long time with no problems, but when I signal it to shut down, I get errors like this: java.io.IOException: Filesystem closed at org.apache.hadoop.dfs.DFSClient.checkOpen(DFSClient.java:196) at org.apache.hadoop.dfs.DFSClient.rename(DFSClient.java:502) at org.apache.hadoop.dfs.DistributedFileSystem.rename (DistributedFileSystem.java:176) The problems occur when I am trying to close open HDFS files. Any ideas why I might be seeing this? I though it was because I was abruptly shutting down without giving the streams a chance to get closed, but after some refactoring, that's not the case. -Bryan
Re: Hadoop Design Question
Comments inline. On Nov 6, 2008, at 9:29 AM, Ricky Ho wrote: Hi, While exploring how Hadoop fits in our usage scenarios, there are 4 recurring issues keep popping up. I don't know if they are real issues or just our misunderstanding of Hadoop. Can any expert shed some light here ? Disk I/O overhead == - The output of a Map task is written to a local disk and then later on upload to the Reduce task. While this enable a simple recovery strategy when the map task failed, it incur additional disk I/O overhead. - For example, in our popular Hadoop example of calculating the approximation of Pi, there isn't any input data. The map tasks in this example, should just directly feed its output to the reduce task. So I am wondering if there is an option to bypassing the step of writing the map result to the local disk. In most data-intensive map/reduce jobs, you have to spill your map output to disk at some point because you will run out of memory otherwise. Additionally, Pi calculation is a really bad example, because you could always start reducing any pairs together arbitrarily. This is because pi calculation is commutative and associative. We have a special construct for situations like that called a combiner, which is basically a map-side reducer. Pipelining between Map Reduce phases is not possible === - In the current setting, it sounds like no reduce task will be started before all map tasks have completed. In case if there are a few slow running map tasks, the whole job will be delayed. - The overall job execution can be shortened if the reduce tasks can starts its processing as soon as some map results are available rather than waiting for all the map tasks to complete. You can't start reducing until all map tasks are complete because until all map tasks complete, you can't do an accurate sort of all intermediate key/value pairs. That is, if you just started reducing the results of a single map task immediately, you might have other values for some keys that come from different map tasks, and your reduce would be inaccurate. In theory if you know that each map task produces keys only in a certain range, you could start reducing immediately after the map task finishes, but that seems like an unlikely case. Pipelining between jobs - In many cases, we've found the parallel computation doesn't involve just one single map/reduce job, but multiple inter- dependent map/reduce jobs then work together in some coordinating fashion. - Again, I haven't seen any mechanism available for 2 MapReduce jobs to directly interact with each other. Job1 must write its output to HDFS for Job2 to pickup. On the other hand, once the map phase of a Job2 has started, all its input HDFS files has to be freezed (in other words, Job1 cannot append more records into the HDFS files) - Therefore it is impossible for the reduce phase of Job1 to stream its output data to a file while the map phase of Job2 start reading the same file. Job2 can only start after ALL REDUCE TASKS of Job1 is completed, which makes pipelining between jobs impossible. Certainly, many transformations take more than one map/reduce job. However, very few could actually be pipelined such that the output of one fed directly into another without an intermediate stop in a file. If the first job does any grouping or sorting, then the reduce is necessary and it will have to write out to a file before anything else can go on. If the second job also does grouping or sorting, then you definitely need two jobs. If the second job doesn't do grouping or sorting, then it can probably be collapsed into either the map or reduce of the first job. No parallelism of reduce task with one key === - Parallelism only happens in the map phase, as well as reduce phase (on different keys). But there is no parallelism within a reduce process of a particular key - This means the partitioning function has to be chosen carefully to make sure the workload of the reduce processes is balanced. (maybe not a big deal) - Is there any thoughts of running a pool of reduce tasks on the same key and have they combine their results later ? I think you will find very few situations where you have only one key on reduce. If you do, it's probably a scenario where you can use a combiner and eliminate the problem. Basically all map/reduce jobs I've worked on have a large number of keys going into the reduce phase. Rgds, Ricky
Re: Can anyone recommend me a inter-language data file format?
Agree, we use Thrift at Rapleaf for this purpose. It's trivial to make a ThriftWritable if you want to be crafty, but you can also just use byte[]s and do the serialization and deserialization yourself. -Bryan On Nov 1, 2008, at 8:01 PM, Alex Loddengaard wrote: Take a look at Thrift: http://developers.facebook.com/thrift/ Alex On Sat, Nov 1, 2008 at 7:15 PM, Zhou, Yunqing [EMAIL PROTECTED] wrote: The project I focused on has many modules written in different languages (several modules are hadoop jobs). So I'd like to utilize a common record based data file format for data exchange. XML is not efficient for appending new records. SequenceFile seems not having API of other languages except Java. Protocol Buffers' hadoop API seems under development. any recommendation for this? Thanks
Re: Lazily deserializing Writables
We do this with some of our Thrift-serialized types. We account for this behavior explicitly in the ThrittWritable class and make it so that we can read the serialized version off the wire completely by prepending the size. Then, we can read in the raw bytes and hang on to them for later as we see fit. I would think that leaving the bytes on the DataInput would break things in a very impressive way. -Bryan On Oct 2, 2008, at 2:48 PM, Jimmy Lin wrote: Hi everyone, I'm wondering if it's possible to lazily deserialize a Writable. That is, when my custom Writable is handed a DataInput from readFields, can I simply hang on to the reference and read from it later? This would be useful if the Writable is a complex data structure that may be expensive to deserialize, so I'd only want to do it on-demand. Or does the runtime mutate the underlying stream, leaving the Writable with a reference to something completely different later? I'm wondering about both present behavior, and the implicit contract provided by the Hadoop API. Thanks! -Jimmy
rename return values
Hey all, Why is it that FileSystem.rename returns true or false instead of throwing an exception? It seems incredibly inconvenient to get a false result and then have to go poring over the namenode logs looking for the actual error message. I had this case recently where I'd forgotten to create the parent directories, but I had no idea it was failing since there were no exceptions. -Bryan
Re: rename return values
It's very interesting that the Java File API doesn't return exceptions, but that doesn't mean it's a good interface. The fact that there IS further exceptional information somewhere in the system but that it is currently ignored is sort of troubling. Perhaps, at least, we could add an overload that WILL throw an exception if there is one to report. -Bryan On Sep 30, 2008, at 1:53 PM, Chris Douglas wrote: FileSystem::rename doesn't always have the cause, per java.io.File::renameTo: http://java.sun.com/javase/6/docs/api/java/io/File.html#renameTo (java.io.File) Even if it did, it's not clear to FileSystem that the failure to rename is fatal/exceptional to the application. -C On Sep 30, 2008, at 1:37 PM, Bryan Duxbury wrote: Hey all, Why is it that FileSystem.rename returns true or false instead of throwing an exception? It seems incredibly inconvenient to get a false result and then have to go poring over the namenode logs looking for the actual error message. I had this case recently where I'd forgotten to create the parent directories, but I had no idea it was failing since there were no exceptions. -Bryan
Re: Could not get block locations. Aborting... exception
Ok, so, what might I do next to try and diagnose this? Does it sound like it might be an HDFS/mapreduce bug, or should I pore over my own code first? Also, did any of the other exceptions look interesting? -Bryan On Sep 29, 2008, at 10:40 AM, Raghu Angadi wrote: Raghu Angadi wrote: Doug Cutting wrote: Raghu Angadi wrote: For the current implementation, you need around 3x fds. 1024 is too low for Hadoop. The Hadoop requirement will come down, but 1024 would be too low anyway. 1024 is the default on many systems. Shouldn't we try to make the default configuration work well there? How can 1024 work well for different kinds of loads? oops! 1024 should work for anyone working with just one file for any load. I didn't notice that. My comment can be ignored. Raghu.
Could not get block locations. Aborting... exception
Hey all. We've been running into a very annoying problem pretty frequently lately. We'll be running some job, for instance a distcp, and it'll be moving along quite nicely, until all of the sudden, it sort of freezes up. It takes a while, and then we'll get an error like this one: attempt_200809261607_0003_m_02_0: Exception closing file /tmp/ dustin/input/input_dataunits/_distcp_tmp_1dk90o/part-01897.bucketfile attempt_200809261607_0003_m_02_0: java.io.IOException: Could not get block locations. Aborting... attempt_200809261607_0003_m_02_0: at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError (DFSClient.java:2143) attempt_200809261607_0003_m_02_0: at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400 (DFSClient.java:1735) attempt_200809261607_0003_m_02_0: at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run (DFSClient.java:1889) At approximately the same time, we start seeing lots of these errors in the namenode log: 2008-09-26 16:19:26,502 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.startFile: failed to create file /tmp/dustin/input/ input_dataunits/_distcp_tmp_1dk90o/part-01897.bucketfile for DFSClient_attempt_200809261607_0003_m_02_1 on client 10.100.11.83 because current leaseholder is trying to recreate file. 2008-09-26 16:19:26,502 INFO org.apache.hadoop.ipc.Server: IPC Server handler 8 on 7276, call create(/tmp/dustin/input/input_dataunits/ _distcp_tmp_1dk90o/part-01897.bucketfile, rwxr-xr-x, DFSClient_attempt_200809261607_0003_m_02_1, true, 3, 67108864) from 10.100.11.83:60056: error: org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file /tmp/dustin/input/input_dataunits/_distcp_tmp_1dk90o/ part-01897.bucketfile for DFSClient_attempt_200809261607_0003_m_02_1 on client 10.100.11.83 because current leaseholder is trying to recreate file. org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file /tmp/dustin/input/input_dataunits/_distcp_tmp_1dk90o/ part-01897.bucketfile for DFSClient_attempt_200809261607_0003_m_02_1 on client 10.100.11.83 because current leaseholder is trying to recreate file. at org.apache.hadoop.dfs.FSNamesystem.startFileInternal (FSNamesystem.java:952) at org.apache.hadoop.dfs.FSNamesystem.startFile (FSNamesystem.java:903) at org.apache.hadoop.dfs.NameNode.create(NameNode.java:284) at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888) Eventually, the job fails because of these errors. Subsequent job runs also experience this problem and fail. The only way we've been able to recover is to restart the DFS. It doesn't happen every time, but it does happen often enough that I'm worried. Does anyone have any ideas as to why this might be happening? I thought that https://issues.apache.org/jira/browse/HADOOP-2669 might be the culprit, but today we upgraded to hadoop 0.18.1 and the problem still happens. Thanks, Bryan
Re: Could not get block locations. Aborting... exception
Well, I did find some more errors in the datanode log. Here's a sampling: 2008-09-26 10:43:57,287 ERROR org.apache.hadoop.dfs.DataNode: DatanodeRegistration(10.100.11.115:50010, storageID=DS-1784982905-10.100.11.115-50010-1221785192226, infoPort=50075, ipcPort=50020):DataXceiver: java.io.IOException: Block blk_-3923611845661840838_176295 is not valid. at org.apache.hadoop.dfs.FSDataset.getBlockFile (FSDataset.java:716) at org.apache.hadoop.dfs.FSDataset.getLength(FSDataset.java: 704) at org.apache.hadoop.dfs.DataNode$BlockSender.init (DataNode.java:1678) at org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock (DataNode.java:1101) at org.apache.hadoop.dfs.DataNode$DataXceiver.run (DataNode.java:1037) 2008-09-26 10:56:19,325 ERROR org.apache.hadoop.dfs.DataNode: DatanodeRegistration(10.100.11.115:50010, storageID=DS-1784982905-10.100.11.115-50010-1221785192226, infoPort=50075, ipcPort=50020):DataXceiver: java.io.EOFException: while trying to read 65557 bytes at org.apache.hadoop.dfs.DataNode$BlockReceiver.readToBuf (DataNode.java:2464) at org.apache.hadoop.dfs.DataNode $BlockReceiver.readNextPacket(DataNode.java:2508) at org.apache.hadoop.dfs.DataNode$BlockReceiver.receivePacket (DataNode.java:2572) at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveBlock (DataNode.java:2698) at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock (DataNode.java:1283) 2008-09-26 10:56:19,779 ERROR org.apache.hadoop.dfs.DataNode: DatanodeRegistration(10.100.11.115:50010, storageID=DS-1784982905-10.100.11.115-50010-1221785192226, infoPort=50075, ipcPort=50020):DataXceiver: java.io.EOFException at java.io.DataInputStream.readShort(DataInputStream.java:298) at org.apache.hadoop.dfs.DataNode$DataXceiver.run (DataNode.java:1021) at java.lang.Thread.run(Thread.java:619) 2008-09-26 10:56:21,816 ERROR org.apache.hadoop.dfs.DataNode: DatanodeRegistration(10.100.11.115:50010, storageID=DS-1784982905-10.100.11.115-50010-1221785192226, infoPort=50075, ipcPort=50020):DataXceiver: java.io.IOException: Could not read from stream at org.apache.hadoop.net.SocketInputStream.read (SocketInputStream.java:119) at java.io.DataInputStream.readByte(DataInputStream.java:248) at org.apache.hadoop.io.WritableUtils.readVLong (WritableUtils.java:324) at org.apache.hadoop.io.WritableUtils.readVInt (WritableUtils.java:345) at org.apache.hadoop.io.Text.readString(Text.java:410) 2008-09-26 10:56:28,380 ERROR org.apache.hadoop.dfs.DataNode: DatanodeRegistration(10.100.11.115:50010, storageID=DS-1784982905-10.100.11.115-50010-1221785192226, infoPort=50075, ipcPort=50020):DataXceiver: java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcher.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233) at sun.nio.ch.IOUtil.read(IOUtil.java:206) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java: 236) 2008-09-26 10:56:52,387 ERROR org.apache.hadoop.dfs.DataNode: DatanodeRegistration(10.100.11.115:50010, storageID=DS-1784982905-10.100.11.115-50010-1221785192226, infoPort=50075, ipcPort=50020):DataXceiver: java.io.IOException: Too many open files at sun.nio.ch.EPollArrayWrapper.epollCreate(Native Method) at sun.nio.ch.EPollArrayWrapper.init (EPollArrayWrapper.java:59) at sun.nio.ch.EPollSelectorImpl.init (EPollSelectorImpl.java:52) at sun.nio.ch.EPollSelectorProvider.openSelector (EPollSelectorProvider.java:18) at sun.nio.ch.Util.getTemporarySelector(Util.java:123) The most interesting one in my eyes is the too many open files one. My ulimit is 1024. How much should it be? I don't think that I have that many files open in my mappers. They should only be operating on a single file at a time. I can try to run the job again and get an lsof if it would be interesting. Thanks for taking the time to reply, by the way. -Bryan On Sep 26, 2008, at 4:48 PM, Hairong Kuang wrote: Does your failed map task open a lot of files to write? Could you please check the log of the datanode running at the machine where the map tasks failed? Do you see any error message containing exceeds the limit of concurrent xcievers? Hairong From: Bryan Duxbury [mailto:[EMAIL PROTECTED] Sent: Fri 9/26/2008 4:36 PM To: core-user@hadoop.apache.org Subject: Could not get block locations. Aborting... exception Hey all. We've been running into a very annoying problem pretty frequently lately. We'll be running some job, for instance a distcp, and it'll be moving along quite nicely, until all of the sudden, it sort of freezes up. It takes a while, and then we'll get an error like this one: attempt_200809261607_0003_m_02_0: Exception
Hadoop job scheduling issue
I encountered an interesting situation today. I'm running Hadoop 0.17.1. What happened was that 3 jobs started simultaneously, which is expected in my workflow, but then resources got very mixed up. One of the jobs grabbed all the available reducers (5) and got one map task in before the second job started taking all the map tasks. This means the first job (the one with the reducers) was holding the reducers and doing absolutely no work. The other job was mapping, but was suboptimally using resources since it wasn't shuffling at the same time as it mapped. (The third job was doing nothing at all.) Does Hadoop not schedule jobs first-come-first served? I'm pretty confident that the jobs all had identical priority since I haven't set it to be different anywhere else. If it doesn't schedule jobs in this manner, is there a reason why it doesn't? It seems like this problem will decrease total throughput significantly in some situations. -Bryan
Re: Serialization format for structured data
On May 23, 2008, at 9:51 AM, Ted Dunning wrote: Relative to thrift, JSON has the advantage of not requiring a schema as well as the disadvantage of not having a schema. The advantage is that the data is more fluid and I don't have to generate code to handle the records. The disadvantage is that I lose some data completeness and typing guarantees. On balance, I would like to use JSON-like data quite a bit in ad hoc data streams and in logs where the producer and consumer of the data are not visible to parts of the data processing chain. That about sums it up. If you want schema, Thrift is your friend. If you don't, JSON probably will do pretty well for you. -Bryan
Re: Trouble hooking up my app to HDFS
Nobody has any ideas about this? -Bryan On May 13, 2008, at 11:27 AM, Bryan Duxbury wrote: I'm trying to create a java application that writes to HDFS. I have it set up such that hadoop-0.16.3 is on my machine, and the env variables HADOOP_HOME and HADOOP_CONF_DIR point to the correct respective directories. My app lives elsewhere, but generates it's classpath by looking in those environment variables. Here's what my generated classpath looks like: /Users/bryanduxbury/hadoop-0.16.3/conf:/Users/bryanduxbury/ hadoop-0.16.3/hadoop-0.16.3-core.jar:/Users/bryanduxbury/ hadoop-0.16.3/hadoop-0.16.3-test.jar:/Users/bryanduxbury/ hadoop-0.16.3/lib/commons-cli-2.0-SNAPSHOT.jar:/Users/bryanduxbury/ hadoop-0.16.3/lib/commons-codec-1.3.jar:/Users/bryanduxbury/ hadoop-0.16.3/lib/commons-httpclient-3.0.1.jar:/Users/bryanduxbury/ hadoop-0.16.3/lib/commons-logging-1.0.4.jar:/Users/bryanduxbury/ hadoop-0.16.3/lib/commons-logging-api-1.0.4.jar:/Users/bryanduxbury/ hadoop-0.16.3/lib/jets3t-0.5.0.jar:/Users/bryanduxbury/ hadoop-0.16.3/lib/jetty-5.1.4.jar:/Users/bryanduxbury/hadoop-0.16.3/ lib/jetty-ext/commons-el.jar:/Users/bryanduxbury/hadoop-0.16.3/lib/ jetty-ext/jasper-compiler.jar:/Users/bryanduxbury/hadoop-0.16.3/lib/ jetty-ext/jasper-runtime.jar:/Users/bryanduxbury/hadoop-0.16.3/lib/ jetty-ext/jsp-api.jar:/Users/bryanduxbury/hadoop-0.16.3/lib/ junit-3.8.1.jar:/Users/bryanduxbury/hadoop-0.16.3/lib/kfs-0.1.jar:/ Users/bryanduxbury/hadoop-0.16.3/lib/log4j-1.2.13.jar:/Users/ bryanduxbury/hadoop-0.16.3/lib/servlet-api.jar:/Users/bryanduxbury/ hadoop-0.16.3/lib/xmlenc-0.52.jar:/Users/bryanduxbury/projects/ hdfs_collector/lib/jtestr-0.2.jar:/Users/bryanduxbury/projects/ hdfs_collector/lib/jvyaml.jar:/Users/bryanduxbury/projects/ hdfs_collector/lib/libthrift.jar:/Users/bryanduxbury/projects/ hdfs_collector/build/hdfs_collector.jar The problem I have is that when I go to get a FileSystem object for my file:/// files (for testing locally), I'm getting errors like this: [jtestr] java.io.IOException: No FileSystem for scheme: file [jtestr] org/apache/hadoop/fs/FileSystem.java:1179:in `createFileSystem' [jtestr] org/apache/hadoop/fs/FileSystem.java:55:in `access $300' [jtestr] org/apache/hadoop/fs/FileSystem.java:1193:in `get' [jtestr] org/apache/hadoop/fs/FileSystem.java:150:in `get' [jtestr] org/apache/hadoop/fs/FileSystem.java:124:in `getNamed' [jtestr] org/apache/hadoop/fs/FileSystem.java:96:in `get' I saw an archived message that suggested this was a problem with the classpath, but there was no resolution that I saw. Does anyone have any ideas why this might be occurring? Thanks, Bryan
Re: Trouble hooking up my app to HDFS
And the conf dir (/Users/bryanduxbury/hadoop-0.16.3/conf) I hope it is the similar as the one you are using for your hadoop installation. I'm not sure I understand this. It isn't similar, it's the same as my hadoop installation. I'm only operating on localhost at the moment. I'm just trying to get a LocalFileSystem up and running so I can run some tests. On May 14, 2008, at 8:24 PM, lohit wrote: You could do this. open up hadoop (its a shell script). The last line is the one which executes the corresponding class of hadoop, instead of exec, make it echo and see what all is present in your classpath. Make sure your generated class path matches the same. And the conf dir (/ Users/bryanduxbury/hadoop-0.16.3/conf) I hope it is the similar as the one you are using for your hadoop installation. Thanks, lohit - Original Message From: Bryan Duxbury [EMAIL PROTECTED] To: core-user@hadoop.apache.org Sent: Wednesday, May 14, 2008 7:30:26 PM Subject: Re: Trouble hooking up my app to HDFS Nobody has any ideas about this? -Bryan On May 13, 2008, at 11:27 AM, Bryan Duxbury wrote: I'm trying to create a java application that writes to HDFS. I have it set up such that hadoop-0.16.3 is on my machine, and the env variables HADOOP_HOME and HADOOP_CONF_DIR point to the correct respective directories. My app lives elsewhere, but generates it's classpath by looking in those environment variables. Here's what my generated classpath looks like: /Users/bryanduxbury/hadoop-0.16.3/conf:/Users/bryanduxbury/ hadoop-0.16.3/hadoop-0.16.3-core.jar:/Users/bryanduxbury/ hadoop-0.16.3/hadoop-0.16.3-test.jar:/Users/bryanduxbury/ hadoop-0.16.3/lib/commons-cli-2.0-SNAPSHOT.jar:/Users/bryanduxbury/ hadoop-0.16.3/lib/commons-codec-1.3.jar:/Users/bryanduxbury/ hadoop-0.16.3/lib/commons-httpclient-3.0.1.jar:/Users/bryanduxbury/ hadoop-0.16.3/lib/commons-logging-1.0.4.jar:/Users/bryanduxbury/ hadoop-0.16.3/lib/commons-logging-api-1.0.4.jar:/Users/bryanduxbury/ hadoop-0.16.3/lib/jets3t-0.5.0.jar:/Users/bryanduxbury/ hadoop-0.16.3/lib/jetty-5.1.4.jar:/Users/bryanduxbury/hadoop-0.16.3/ lib/jetty-ext/commons-el.jar:/Users/bryanduxbury/hadoop-0.16.3/lib/ jetty-ext/jasper-compiler.jar:/Users/bryanduxbury/hadoop-0.16.3/lib/ jetty-ext/jasper-runtime.jar:/Users/bryanduxbury/hadoop-0.16.3/lib/ jetty-ext/jsp-api.jar:/Users/bryanduxbury/hadoop-0.16.3/lib/ junit-3.8.1.jar:/Users/bryanduxbury/hadoop-0.16.3/lib/kfs-0.1.jar:/ Users/bryanduxbury/hadoop-0.16.3/lib/log4j-1.2.13.jar:/Users/ bryanduxbury/hadoop-0.16.3/lib/servlet-api.jar:/Users/bryanduxbury/ hadoop-0.16.3/lib/xmlenc-0.52.jar:/Users/bryanduxbury/projects/ hdfs_collector/lib/jtestr-0.2.jar:/Users/bryanduxbury/projects/ hdfs_collector/lib/jvyaml.jar:/Users/bryanduxbury/projects/ hdfs_collector/lib/libthrift.jar:/Users/bryanduxbury/projects/ hdfs_collector/build/hdfs_collector.jar The problem I have is that when I go to get a FileSystem object for my file:/// files (for testing locally), I'm getting errors like this: [jtestr] java.io.IOException: No FileSystem for scheme: file [jtestr] org/apache/hadoop/fs/FileSystem.java:1179:in `createFileSystem' [jtestr] org/apache/hadoop/fs/FileSystem.java:55:in `access $300' [jtestr] org/apache/hadoop/fs/FileSystem.java:1193:in `get' [jtestr] org/apache/hadoop/fs/FileSystem.java:150:in `get' [jtestr] org/apache/hadoop/fs/FileSystem.java:124:in `getNamed' [jtestr] org/apache/hadoop/fs/FileSystem.java:96:in `get' I saw an archived message that suggested this was a problem with the classpath, but there was no resolution that I saw. Does anyone have any ideas why this might be occurring? Thanks, Bryan
Trouble hooking up my app to HDFS
I'm trying to create a java application that writes to HDFS. I have it set up such that hadoop-0.16.3 is on my machine, and the env variables HADOOP_HOME and HADOOP_CONF_DIR point to the correct respective directories. My app lives elsewhere, but generates it's classpath by looking in those environment variables. Here's what my generated classpath looks like: /Users/bryanduxbury/hadoop-0.16.3/conf:/Users/bryanduxbury/ hadoop-0.16.3/hadoop-0.16.3-core.jar:/Users/bryanduxbury/ hadoop-0.16.3/hadoop-0.16.3-test.jar:/Users/bryanduxbury/ hadoop-0.16.3/lib/commons-cli-2.0-SNAPSHOT.jar:/Users/bryanduxbury/ hadoop-0.16.3/lib/commons-codec-1.3.jar:/Users/bryanduxbury/ hadoop-0.16.3/lib/commons-httpclient-3.0.1.jar:/Users/bryanduxbury/ hadoop-0.16.3/lib/commons-logging-1.0.4.jar:/Users/bryanduxbury/ hadoop-0.16.3/lib/commons-logging-api-1.0.4.jar:/Users/bryanduxbury/ hadoop-0.16.3/lib/jets3t-0.5.0.jar:/Users/bryanduxbury/hadoop-0.16.3/ lib/jetty-5.1.4.jar:/Users/bryanduxbury/hadoop-0.16.3/lib/jetty-ext/ commons-el.jar:/Users/bryanduxbury/hadoop-0.16.3/lib/jetty-ext/jasper- compiler.jar:/Users/bryanduxbury/hadoop-0.16.3/lib/jetty-ext/jasper- runtime.jar:/Users/bryanduxbury/hadoop-0.16.3/lib/jetty-ext/jsp- api.jar:/Users/bryanduxbury/hadoop-0.16.3/lib/junit-3.8.1.jar:/Users/ bryanduxbury/hadoop-0.16.3/lib/kfs-0.1.jar:/Users/bryanduxbury/ hadoop-0.16.3/lib/log4j-1.2.13.jar:/Users/bryanduxbury/hadoop-0.16.3/ lib/servlet-api.jar:/Users/bryanduxbury/hadoop-0.16.3/lib/ xmlenc-0.52.jar:/Users/bryanduxbury/projects/hdfs_collector/lib/ jtestr-0.2.jar:/Users/bryanduxbury/projects/hdfs_collector/lib/ jvyaml.jar:/Users/bryanduxbury/projects/hdfs_collector/lib/ libthrift.jar:/Users/bryanduxbury/projects/hdfs_collector/build/ hdfs_collector.jar The problem I have is that when I go to get a FileSystem object for my file:/// files (for testing locally), I'm getting errors like this: [jtestr] java.io.IOException: No FileSystem for scheme: file [jtestr] org/apache/hadoop/fs/FileSystem.java:1179:in `createFileSystem' [jtestr] org/apache/hadoop/fs/FileSystem.java:55:in `access $300' [jtestr] org/apache/hadoop/fs/FileSystem.java:1193:in `get' [jtestr] org/apache/hadoop/fs/FileSystem.java:150:in `get' [jtestr] org/apache/hadoop/fs/FileSystem.java:124:in `getNamed' [jtestr] org/apache/hadoop/fs/FileSystem.java:96:in `get' I saw an archived message that suggested this was a problem with the classpath, but there was no resolution that I saw. Does anyone have any ideas why this might be occurring? Thanks, Bryan
Re: Hadoop and retrieving data from HDFS
I think what you're saying is that you are mostly interested in data locality. I don't think it's done yet, but it would be pretty easy to make HBase provide start keys as well as region locations for splits for a MapReduce job. In theory, that would give you all the pieces you need to run locality-aware processing. -Bryan On Apr 24, 2008, at 10:16 AM, Leon Mergen wrote: Hello, I'm sorry if a question like this has been asked before, but I was unable to find an answer for this anywhere on google; if it is off-topic, I apologize in advance. I'm trying to look a bit into the future, and predict scalability problems for the company I work for: we're using PostgreSQL, and processing many writes/second (access logs, currently around 250, but this will only increase significantly in the future). Furthermore, we perform data mining on this data, and ideally, need to have this data stored in a structured form (the data is searched in various ways). In other words: a very interesting problem. Now, I'm trying to understand a bit of the hadoop/hbase architecture: as I understand it, HDFS, MapReduce and HBase are sufficiently decoupled that the use case I was hoping for is not available; however, I'm still going to ask: Is it possible to store this data in hbase, and thus have all access logs distributed amongst many different servers, and start MapReduce jobs on those actual servers, which process all the data on those servers ? In other words, the data never leaves the actual servers ? If this isn't possible, is this because someone simply never took the time to implement such a thing, or is it hard to fit in the design (for example, that the JobTracker needs to be aware of the physical locations of all the data, since you don't want to analyze the same (replicated) data twice) ? From what I understand by playing with hadoop for the past few days, the idea is that you fetch your MapReduce data from HDFS rather than BigTable, or am I mistaken ? Thanks for your time! Regards, Leon Mergen
Re: ID Service with HBase?
HBASE-493 was created, and seems similar. It's a write-if-not- modified-since. I would guess that you probably don't want to use HBase to maintain a distributed auto-increment. You need to think of some other approach the produces unique ids across concurrent access, like hash or GUID or something like that. -Bryan On Apr 16, 2008, at 2:18 PM, Jim Kellerman wrote: Row locks do not apply to reads, only updates. They prevent two applications from updating the same row simultaneously. There is no other locking mechanism in HBase. (It follows Bigtable in this regard. See http://labs.google.com/papers/bigtable.html ) There has been some discussion about adding a conditional write (i.e. only completes successfully if the current value of the cell being updated has value x), but noone has thought it important enough to enter an enhancement request on the HBase Jira: https:// issues.apache.org/jira/browse/HBASE By the way, you will get a more timely response to HBase questions if you address them to the hbase mailing list: hbase- [EMAIL PROTECTED] --- Jim Kellerman, Senior Engineer; Powerset -Original Message- From: Thomas Thevis [mailto:[EMAIL PROTECTED] Sent: Wednesday, April 16, 2008 4:22 AM To: core-user@hadoop.apache.org Subject: ID Service with HBase? Hello list readers, I'd like to perform mass data operations resulting in several output files with cross-references between lines in different files. For this purpose, I want to use a kind of ID service and I wonder whether I could use HBase for this task. However, until now I was not able to use the HBase locking mechanism in a way that newly created IDs are unique. The setup: - each Mapper has its own instance of an IDSevice implementation - each IDService instance has its own reference to the ID table in the HBase The code snippet which is used to return and update IDs: [code] final String columnName = this.config.get(ID_COLUMN_ID); final Text column = new Text(columnName); final String tableName = this.config.get(ID_SERVICE_TABLE_ID); final HTable table = new HTable(this.config, new Text(tableName)); final Text rowName = new Text(namespace); final long startValue; final long lockid = table.startUpdate(rowName); final byte[] bytes = table.get(rowName, column); if (bytes == null) { startValue = 0; } else { final ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(bytes); final LongWritable longWritable = new LongWritable(); longWritable.readFields(new DataInputStream(byteArrayInputStream)); startValue = longWritable.get(); } final long stopValue = startValue + size; table.put(lockid, column, new LongWritable(stopValue)); table.commit(lockid); [/code] As stated above, resulting IDs are not unique, about a quarter of all created IDs appears several times. Now my question: Do I use the locking mechanism the wrong way or is my approach to use HBase locking and synchronizing for this task completely wrong? Thanks, Thomas No virus found in this incoming message. Checked by AVG. Version: 7.5.524 / Virus Database: 269.23.0/1381 - Release Date: 4/16/2008 9:34 AM No virus found in this outgoing message. Checked by AVG. Version: 7.5.524 / Virus Database: 269.23.0/1381 - Release Date: 4/16/2008 9:34 AM
Re: how to connect to hbase using php
To connect to HBase from PHP, you should use either REST or Thrift integration. -Bryan On Mar 11, 2008, at 4:20 AM, Ved Prakash wrote: I have seen examples to connect to hbase using php, which mentions of hshellconnect.class.php, I would like to know where can I download this file, or is there any alternative way to connect to hbase using php Thanks
Re: loading data into hbase table
Ved, At the moment you're stuck loading the data via one of the APIs (Java, REST or Thrift) yourself. We would like to have import tools for HBase, but we haven't gotten around to it yet. Also, there's now a separate HBase mailing list at hbase- [EMAIL PROTECTED] Your questions about HBase are better asked there in the future. -Bryan On Mar 11, 2008, at 2:41 AM, Ved Prakash wrote: Hi friends, I have a table dump in csv format, I wanted to load this data into my hbase table instead of typing it as inserts. I did a web search and also looked into the hbase documentation but couldn't find any thing. Can someone tell me how to load a file from local disk into hbase table on hdfs? Thanks
Re: Hbase Matrix Package for Map/Reduce-based Parallel Matrix Computations
There's nothing stopping you from storing doubles in HBase. All you have to do is convert your double into a byte array. -Bryan On Jan 30, 2008, at 4:31 PM, Chanwit Kaewkasi wrote: Hi Edward, On 29/01/2008, edward yoon [EMAIL PROTECTED] wrote: Did you mean the MATLAB-like scientific language interface for matrix computations? If so, you are welcome any time. Yes, it is going to be a language for matrix computation. Groovy allows overriding methods and very good Java integration. With these, it's my choice for wrapping low-level library like what you're doing as well as Linpack and BLAS. I've done earlier some wrapper classes for them through MTJ library. I made a matrix page on hadoop wiki. http://wiki.apache.org/hadoop/Matrix Freely add your profile and your plans. I will do it. BTW, I'd like to get my hand dirty to add the double data type support for HBase, as I'm considering it as one of my important issues. Do you have any suggestion to start modifying it? Thanks, Chanwit -- Chanwit Kaewkasi PhD Student, Centre for Novel Computing School of Computer Science The University of Manchester Oxford Road Manchester M13 9PL, UK