Re: FYI, Large-scale graph computing at Google
Patterson, Josh wrote: Steve, I'm a little lost here; Is this a replacement for M/R or is it some new code that sits ontop of M/R that runs an iteration over some sort of graph's vertexes? My quick scan of Google's article didn't seem to yeild a distinction. Either way, I'd say for our data that a graph processing lib for M/R would be interesting. I'm thinking of graph algorithms that get implemented as MR jobs; work with HDFS, HBase, etc.
Re: FYI, Large-scale graph computing at Google
Edward J. Yoon wrote: I just made a wiki page -- http://wiki.apache.org/hadoop/Hambrug -- Let's discuss about the graph computing framework named Hambrug. ok, first Q, why the Hambrug. To me that's just Hamburg typed wrong, which is going to cause lots of confusion. What about something more graphy? like descartes
Re: Hadoop0.20 - Class Not Found exception
Amandeep Khurana wrote: I'm getting the following error while starting a MR job: Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: oracle.jdbc.driver.OracleDriver at org.apache.hadoop.mapred.lib.db.DBInputFormat.configure(DBInputFormat.java:297) ... 21 more Caused by: java.lang.ClassNotFoundException: oracle.jdbc.driver.OracleDriver at java.net.URLClassLoader$1.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClassInternal(Unknown Source) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Unknown Source) at org.apache.hadoop.mapred.lib.db.DBConfiguration.getConnection(DBConfiguration.java:123) at org.apache.hadoop.mapred.lib.db.DBInputFormat.configure(DBInputFormat.java:292) ... 21 more Interestingly, the relevant jar is bundled into the MR job jar and its also there in the $HADOOP_HOME/lib directory. Exactly same thing worked with 0.19.. Not sure what could have changed or I broke to cause this error... could be classloader hierarchy; the JDBC driver needs to be at the right level. Try preheating the driver by loading it in your own code, then jdbc:URLs might work, and take it out of the MR Job JAR
Re: Hadoop 0.20.0, xml parsing related error
Ram Kulbak wrote: Hi, The exception is a result of having xerces in the classpath. To resolve, make sure you are using Java 6 and set the following system property: -Djavax.xml.parsers.DocumentBuilderFactory=com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl This can also be resolved by the Configuration class(line 1045) making sure it loads the DocumentBuilderFactory bundled with the JVM and not a 'random' classpath-dependent factory.. Hope this helps, Ram Lovely -I've noted this in the comments of the bugrep
Re: FYI, Large-scale graph computing at Google
Edward J. Yoon wrote: What do you think about another new computation framework on HDFS? On Mon, Jun 22, 2009 at 3:50 PM, Edward J. Yoon edwardy...@apache.org wrote: http://googleresearch.blogspot.com/2009/06/large-scale-graph-computing-at-google.html -- It sounds like Pregel seems, a computing framework based on dynamic programming for the graph operations. I guess maybe they removed the file communications/intermediate files during iterations. Anyway, What do you think? I have a colleague (paolo) who would be interested in adding a set of graph algorithms on top of the MR engine
Re: FYI, Large-scale graph computing at Google
mike anderson wrote: This would be really useful for my current projects. I'd be more than happy to help out if needed. well the first bit of code to play with then is this http://smartfrog.svn.sourceforge.net/viewvc/smartfrog/trunk/core/extras/citerank/ the standalone.xml file is the one you want to build and run with, the other would require you to check out and build two levels up, but gives you the ability to bring up local or remote clusters to test. Call run-local to run it locally., which should give you some stats like this: [java] 09/06/25 17:09:22 INFO citerank.CiteRankTool: Counters: 11 [java] 09/06/25 17:09:22 INFO citerank.CiteRankTool: File Systems [java] 09/06/25 17:09:22 INFO citerank.CiteRankTool: Local bytes read=209445683448 [java] 09/06/25 17:09:22 INFO citerank.CiteRankTool: Local bytes written=173943642259 [java] 09/06/25 17:09:22 INFO citerank.CiteRankTool: Map-Reduce Framework [java] 09/06/25 17:09:22 INFO citerank.CiteRankTool: Reduce input groups=9985124 [java] 09/06/25 17:09:22 INFO citerank.CiteRankTool: Combine output records=34 [java] 09/06/25 17:09:22 INFO citerank.CiteRankTool: Map input records=24383448 [java] 09/06/25 17:09:22 INFO citerank.CiteRankTool: Reduce output records=16494967 [java] 09/06/25 17:09:22 INFO citerank.CiteRankTool: Map output bytes=1243216870 [java] 09/06/25 17:09:22 INFO citerank.CiteRankTool: Map input bytes=1528854187 [java] 09/06/25 17:09:22 INFO citerank.CiteRankTool: Combine input records=4528655 [java] 09/06/25 17:09:22 INFO citerank.CiteRankTool: Map output records=41958636 [java] 09/06/25 17:09:22 INFO citerank.CiteRankTool: Reduce input records=37430015 == Exiting project citerank == BUILD SUCCESSFUL - at 25/06/09 17:09 Total time: 9 minutes 1 second -- Steve Loughran http://www.1060.org/blogxter/publish/5 Author: Ant in Action http://antbook.org/
Re: Name Node HA (HADOOP-4539)
Andrew Wharton wrote: https://issues.apache.org/jira/browse/HADOOP-4539 I am curious about the state of this fix. It is listed as Incompatible, but is resolved and committed (according to the comments). Is the backup name node going to make it into 0.21? Will it remove the SPOF for HDFS? And if so, what is the proposed release timeline for 0.21? The way to deal with HA -which the BackupNode doesn't promise- is to get involved in developing and testing the leading edge source tree. The 0.21 cutoff is approaching, BackupNode is in there, but it needs a lot more tests. If you want to aid the development, helping to get more automated BackupNode tests in there (indeed, tests that simulate more complex NN failures, like a corrupt EditLog) would go a long way. -steve
Re: Too many open files error, which gets resolved after some time
jason hadoop wrote: Yes. Otherwise the file descriptors will flow away like water. I also strongly suggest having at least 64k file descriptors as the open file limit. On Sun, Jun 21, 2009 at 12:43 PM, Stas Oskin stas.os...@gmail.com wrote: Hi. Thanks for the advice. So you advice explicitly closing each and every file handle that I receive from HDFS? Regards. I must disagree somewhat If you use FileSystem.get() to get your client filesystem class, then that is shared by all threads/classes that use it. Call close() on that and any other thread or class holding a reference is in trouble. You have to wait for the finalizers for them to get cleaned up. If you use FileSystem.newInstance() - which came in fairly recently (0.20? 0.21?) then you can call close() safely. So: it depends on how you get your handle. see: https://issues.apache.org/jira/browse/HADOOP-5933 Also: the too many open files problem can be caused in the NN -you need to set up the Kernel to have lots more file handles around. Lots.
Re: Too many open files error, which gets resolved after some time
Scott Carey wrote: Furthermore, if for some reason it is required to dispose of any objects after others are GC'd, weak references and a weak reference queue will perform significantly better in throughput and latency - orders of magnitude better - than finalizers. Good point. I would make sense for the FileSystem cache to be weak referenced, so that on long-lived processes the client references will get cleaned up without waiting for app termination
Re: Too many open files error, which gets resolved after some time
Raghu Angadi wrote: Is this before 0.20.0? Assuming you have closed these streams, it is mostly https://issues.apache.org/jira/browse/HADOOP-4346 It is the JDK internal implementation that depends on GC to free up its cache of selectors. HADOOP-4346 avoids this by using hadoop's own cache. yes, and it's that change that led to my stack traces :( http://jira.smartfrog.org/jira/browse/SFOS-1208
Re: Too many open files error, which gets resolved after some time
Stas Oskin wrote: Hi. So what would be the recommended approach to pre-0.20.x series? To insure each file is used only by one thread, and then it safe to close the handle in that thread? Regards. good question -I'm not sure. For anythiong you get with FileSystem.get(), its now dangerous to close, so try just setting the reference to null and hoping that GC will do the finalize() when needed
Re: Running Hadoop/Hbase in a OSGi container
Ninad Raut wrote: OSGi provides navigability to your components and create a life cycle for each of those components viz; install. start, stop, un- deploy etc. This is the reason why we are thinking of creating components using OSGi. The problem we are facing is our components using mapreduce and HDFS, as such OSGi container cannot detect hadoop mapred engine or HDFS. I have searched through the net and looks like people are working or have achieved success in running hadoop in OSGi container Ninad 1. I am doing work on a simple lifecycle for the services, start/stop/ping, which is not OSGI (which worries a lot about classloading and versioning, check out HADOOP-3628 for this. 2. You can run it under OSGi systems, such as the OSGi branch of SmartFrog : http://smartfrog.svn.sourceforge.net/viewvc/smartfrog/branches/core-branch-osgi/, or under non-OSGi tools. Either way, these tools are left dealing with classloading and the like. 3. Any container is going to have to deal with the problem that there are bits of all the services that call System.Exit() by running under a security manager, trapping the call, raising an exception etc. 4. Any container is going to have to then deal with the fact that from 0.20 onwards, Hadoop does things with security policy that are incompatible with normal Java security managers. whatever security manager you have for trapping system exits, can't extend the default one. 5. any container also has to deal with every service (namenode, job tracker, etc) makes a lot of assumptions about singletons, that they have exclusive use of filesystem objects retrieved through FileSystem.get(), and the like. While OSGi can do that with its classloading work, its still fairly complex. 6. There are also lots of JVM memory/thread management issues, see the various Hadoop bugs If you look at the slides of what I've been up to, you can see that it can be done http://smartfrog.svn.sourceforge.net/viewvc/smartfrog/trunk/core/components/hadoop/doc/dynamic_hadoop_clusters.ppt However, * you really need to run every service in its own process, for memory and reliability alone * It's pretty leading edge * You will have to invest the time and effort to get it working If you want to do the work, start with what I've been doing, bring it up under the OSGi container of your choice. You can come and play with our tooling, I'm cutting a release today of this week's Hadoop trunk merged with my branch, it is of course experimental, as even the trunk is a bit up-and-down on feature stability. -steve
Re: Multiple NIC Cards
John Martyniak wrote: Does hadoop cache the server names anywhere? Because I changed to using DNS for name resolution, but when I go to the nodes view, it is trying to view with the old name. And I changed the hadoop-site.xml file so that it no longer has any of those values. in SVN head, we try and get Java to tell us what is going on http://svn.apache.org/viewvc/hadoop/core/trunk/src/core/org/apache/hadoop/net/DNS.java This uses InetAddress.getLocalHost().getCanonicalHostName() to get the value, which is cached for life of the process. I don't know of anything else, but wouldn't be surprised -the Namenode has to remember the machines where stuff was stored.
Re: Multiple NIC Cards
John Martyniak wrote: David, For the Option #1. I just changed the names to the IP Addresses, and it still comes up as the external name and ip address in the log files, and on the job tracker screen. So option 1 is a no go. When I change the dfs.datanode.dns.interface values it doesn't seem to do anything. When I was search archived mail, this seemed to be a the approach to change the NIC card being used for resolution. But when I change it nothing happens, I even put in bogus values and still no issues. -John I've been having similar but different fun with Hadoop-on-VMs, there's a lot of assumption that DNS and rDNS all works consistently in the code. Do you have separate internal and external hostnames? In which case, can you bring up the job tracker as jt.internal , namenode as nn.internal (so the full HDFS URl is something like hdfs://nn.internal/ ) , etc, etc.?
Re: Multiple NIC Cards
John Martyniak wrote: My original names where huey-direct and duey-direct, both names in the /etc/hosts file on both machines. Are nn.internal and jt.interal special names? no, just examples on a multihost network when your external names could be something completely different. What does /sbin/ifconfig say on each of the hosts?
Re: Multiple NIC Cards
John Martyniak wrote: I am running Mac OS X. So en0 points to the external address and en1 points to the internal address on both machines. Here is the internal results from duey: en1: flags=8963UP,BROADCAST,SMART,RUNNING,PROMISC,SIMPLEX,MULTICAST mtu 1500 inet6 fe80::21e:52ff:fef4:65%en1 prefixlen 64 scopeid 0x5 inet 192.168.1.102 netmask 0xff00 broadcast 192.168.1.255 ether 00:1e:52:f4:00:65 media: autoselect (1000baseT full-duplex) status: active lladdr 00:23:32:ff:fe:1a:20:66 media: autoselect full-duplex status: inactive supported media: autoselect full-duplex Here are the internal results from huey: en1: flags=8863UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST mtu 1500 inet6 fe80::21e:52ff:fef3:f489%en1 prefixlen 64 scopeid 0x5 inet 192.168.1.103 netmask 0xff00 broadcast 192.168.1.255 what does nslookup 192.168.1.103 and nslookup 192.168.1.102 say? There really ought to be different names for them. I have some other applications running on these machines, that communicate across the internal network and they work perfectly. I admire their strength. Multihost systems cause us trouble. That and machines that don't quite know who they are http://jira.smartfrog.org/jira/browse/SFOS-5 https://issues.apache.org/jira/browse/HADOOP-3612 https://issues.apache.org/jira/browse/HADOOP-3426 https://issues.apache.org/jira/browse/HADOOP-3613 https://issues.apache.org/jira/browse/HADOOP-5339 One thing to consider is that some of the various services of Hadoop are bound to 0:0:0:0, which means every Ipv4 address, you really want to bring up everything, including jetty services, on the en0 network adapter, by binding them to 192.168.1.102; this will cause anyone trying to talk to them over the other network to fail, which at least find the problem sooner rather than later
Re: Multiple NIC Cards
John Martyniak wrote: When I run either of those on either of the two machines, it is trying to resolve against the DNS servers configured for the external addresses for the box. Here is the result Server:xxx.xxx.xxx.69 Address:xxx.xxx.xxx.69#53 OK. in an ideal world, each NIC has a different hostname. Now, that confuses code that assumes a host has exactly one hostname, not zero or two, and I'm not sure how well Hadoop handles the 2+ situation (I know it doesn't like 0, but hey, its a distributed application). With separate hostnames, you set hadoop up to work on the inner addresses, and give out the inner hostnames of the jobtracker and namenode. As a result, all traffic to the master nodes should be routed on the internal network
Re: Hadoop scheduling question
Aaron Kimball wrote: Finally, there's a third scheduler called the Capacity scheduler. It's similar to the fair scheduler, in that it allows guarantees of minimum availability for different pools. I don't know how it apportions additional extra resources though -- this is the one I'm least familiar with. Someone else will have to chime in here. There's a dynamic priority scheduler in the patch queue, that I've promised to commit this week. Its the one with a notion of currency: you pay for your work/priority. At peak times, work costs more https://issues.apache.org/jira/browse/HADOOP-4768
Re: Every time the mapping phase finishes I see this
Mayuran Yogarajah wrote: There are always a few 'Failed/Killed Task Attempts' and when I view the logs for these I see: - some that are empty, ie stdout/stderr/syslog logs are all blank - several that say: 2009-06-06 20:47:15,309 WARN org.apache.hadoop.mapred.TaskTracker: Error running child java.io.IOException: Filesystem closed at org.apache.hadoop.dfs.DFSClient.checkOpen(DFSClient.java:195) at org.apache.hadoop.dfs.DFSClient.access$600(DFSClient.java:59) at org.apache.hadoop.dfs.DFSClient$DFSInputStream.close(DFSClient.java:1359) at java.io.FilterInputStream.close(FilterInputStream.java:159) at org.apache.hadoop.mapred.LineRecordReader$LineReader.close(LineRecordReader.java:103) at org.apache.hadoop.mapred.LineRecordReader.close(LineRecordReader.java:301) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.close(MapTask.java:173) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:231) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198) Any idea why this happens? I don't understand why I'd be seeing these only as the mappers get to 100%. Seen this when something in the same process got a FileSystem reference by FileSystem.get() and then called close() on it -it closes the client for every thread/class that has a reference to the same object. We're planning on adding more diagnostics, by tracking who closed the filesystem https://issues.apache.org/jira/browse/HADOOP-5933
Re: Monitoring hadoop?
Matt Massie wrote: Anthony- The ganglia web site is at http://ganglia.info/ with documentation in a wiki at http://ganglia.wiki.sourceforge.net/. There is also a good wiki page at IBM as well http://www.ibm.com/developerworks/wikis/display/WikiPtype/ganglia . Ganglia packages are available for most distributions to help with installation so make sure to grep for ganglia with your favorite package manager (e.g. aptitude, yum, etc). Ganglia will give you more information about your cluster than just Hadoop metrics. You'll get CPU, load, memory, disk and network monitoring as well for free. You can see live demos of ganglia at http://ganglia.info/?page_id=69. Good luck. -Matt Out of Modesty, Matt neglects to mention that Ganglia one of his projects, so not only does it work well with Hadoop today, I would expect the integration to only get better over time. Anthony -don't forget to feed those stats back into your DFS for later analysis...
Re: Hadoop ReInitialization.
b wrote: But after formatting and starting DFS i need to wait some time (sleep 60) before putting data into HDFS. Else i will receive NotReplicatedYetException. that means the namenode is up but there aren't enough workers yet.
Re: question about when shuffle/sort start working
Todd Lipcon wrote: Hi Jianmin, This is not (currently) supported by Hadoop (or Google's MapReduce either afaik). What you're looking for sounds like something more like Microsoft's Dryad. One thing that is supported in versions of Hadoop after 0.19 is JVM reuse. If you enable this feature, task trackers will persist JVMs between jobs. You can then persist some state in static variables. I'd caution you, however, from making too much use of this fact as anything but an optimization. The reason that Hadoop is limited to MR (or M+RM* as you said) is that simplicity and reliability often go hand in hand. If you start maintaining important state in RAM on the tasktracker JVMs, and one of them goes down, you may need to restart your entire job sequence from the top. In typical MapReduce, you may need to rerun a mapper or a reducer, but the state is all on disk ready to go. -Todd I'd thought the question is not necessarily one of maintaining state, but of chaining the output from one job into another, where the # of iterations depends on the outcome of the previous set. Funnily enough, this is what you (apparently) end up having to do when implementing PageRank-like ranking as MR jobs: http://skillsmatter.com/podcast/cloud-grid/having-fun-with-pagerank-and-mapreduce
Re: org.apache.hadoop.ipc.client : trying connect to server failed
ashish pareek wrote: Yes I am able to ping and ssh between two virtual machine and even i have set ip address of both the virtual machines in their respective /etc/hosts file ... thanx for reply .. if you suggest some other thing which i could have missed or any remedy Regards, Ashish Pareek. VMs? VMWare? Xen? Something else? I've encountered problems on virtual networks where the machines aren't locatable via DNS., and can't be sure who they say they are. 1. start the machines individually, instead of the start-all script that needs to have SSH working too. 2. check with netstat -a to see what ports/interfaces they are listening on -steve
Re: hadoop hardware configuration
Patrick Angeles wrote: Sorry for cross-posting, I realized I sent the following to the hbase list when it's really more a Hadoop question. This is an interesting question. Obviously as an HP employee you must assume that I'm biased when I say HP DL160 servers are good value for the workers, though our blade systems are very good for a high physical density -provided you have the power to fill up the rack. 2 x Hadoop Master (and Secondary NameNode) - 2 x 2.3Ghz Quad Core (Low Power Opteron -- 2376 HE @ 55W) - 16GB DDR2-800 Registered ECC Memory - 4 x 1TB 7200rpm SATA II Drives - Hardware RAID controller - Redundant Power Supply - Approx. 390W power draw (1.9amps 208V) - Approx. $4000 per unit I do not know the what the advantages of that many cores are on a NN. Someone needs to do some experiments. I do know you need enough RAM to hold the index in memory, and you may want to go for a bigger block size to keep the index size down. 6 x Hadoop Task Nodes - 1 x 2.3Ghz Quad Core (Opteron 1356) - 8GB DDR2-800 Registered ECC Memory - 4 x 1TB 7200rpm SATA II Drives - No RAID (JBOD) - Non-Redundant Power Supply - Approx. 210W power draw (1.0amps 208V) - Approx. $2000 per unit I had some specific questions regarding this configuration... 1. Is hardware RAID necessary for the master node? You need a good story to ensure that loss of a disk on the master doesn't lose the filesystem. I like RAID there, but the alternative is to push the stuff out over the network to other storage you trust. That could be NFS-mounted RAID storage, it could be NFS mounted JBOD. Whatever your chosen design, test it works before you go live by running the cluster then simulate different failures, see how well the hardware/ops team handles it. Keep an eye on where that data goes, because when the NN runs out of file storage, the consequences can be pretty dramatic (i,e the cluster doesnt come up unless you edit the editlog by hand) 2. What is a good processor-to-storage ratio for a task node with 4TB of raw storage? (The config above has 1 core per 1TB of raw storage.) That really depends on the work you are doing...the bytes in/out to CPU work, and the size of any memory structures that are built up over the run. With 1 core per physical disk, you get the bandwidth of a single disk per CPU; for some IO-intensive work you can make the case for two disks/CPU -one in, one out, but then you are using more power, and if/when you want to add more storage, you have to pull out the disks to stick in new ones. If you go for more CPUs, you will probably need more RAM to go with it. 3. Am I better off using dual quads for a task node, with a higher power draw? Dual quad task node with 16GB RAM and 4TB storage costs roughly $3200, but draws almost 2x as much power. The tradeoffs are: 1. I will get more CPU per dollar and per watt. 2. I will only be able to fit 1/2 as much dual quad machines into a rack. 3. I will get 1/2 the storage capacity per watt. 4. I will get less I/O throughput overall (less spindles per core) First there is the algorithm itself, and whether you are IO or CPU bound. Most MR jobs that I've encountered are fairly IO bound -without indexes, every lookup has to stream through all the data, so it's power inefficient and IO limited. but if you are trying to do higher level stuff than just lookup, then you will be doing more CPU-work Then there is the question of where your electricity comes from, what the limits for the room are, whether you are billed on power drawn or quoted PSU draw, what the HVAC limits are, what the maximum allowed weight per rack is, etc, etc. I'm a fan of low Joule work, though we don't have any benchmarks yet of the power efficiency of different clusters; the number of MJ used to do a a terasort. I'm debating doing some single-cpu tests for this on my laptop, as the battery knows how much gets used up by some work. 4. In planning storage capacity, how much spare disk space should I take into account for 'scratch'? For now, I'm assuming 1x the input data size. That you should probably be able to determine on experimental work on smaller datasets. Some maps can throw out a lot of data, most reduces do actually reduce the final amount. -Steve (Disclaimer: I'm not making any official recommendations for hardware here, just making my opinions known. If you do want an official recommendation from HP, talk to your reseller or account manager, someone will look at your problem in more detail and make some suggestions. If you have any code/data that could be shared for benchmarking, that would help validate those suggestions)
Re: ssh issues
hmar...@umbc.edu wrote: Steve, Security through obscurity is always a good practice from a development standpoint and one of the reasons why tricking you out is an easy task. :) My most recent presentation on HDFS clusters is now online, notice how it doesn't gloss over the security: http://www.slideshare.net/steve_l/hdfs-issues Please, keep hiding relevant details from people in order to keep everyone smiling. HDFS is as secure as NFS: you are trusted to be who you say you are. Which means that you have to run it on a secured subnet -access restricted to trusted hosts and/or one two front end servers or accept that your dataset is readable and writeable by anyone on the network. There is user identification going in; it is currently at the level where it will stop someone accidentally deleting the entire filesystem if they lack the rights. Which has been known to happen. If the team looking after the cluster demand separate SSH keys/login for every machine then not only are they making their operations costs high, once you have got the HDFS cluster and MR engine live, it's moot. You can push out work to the JobTracker, which then runs it on the machines, under whatever userid the TaskTrackers are running on. Now, 0.20+ will run it under the identity of the user who claimed to be submitting the job, but without that, your MR Jobs get the access rights to the filesystem of the user that is running the TT, but it's fairly straightforward to create a modified hadoop client JAR that doesn't call whoami to get the userid, and instead spoofs to be anyone. Which means that even if you lock down the filesystem -no out of datacentre access-, if I can run my java code as MR jobs in your cluster, I can have unrestricted access to the filesystem by way of the task tracker server. But Hal, if you are running Ant for your build I'm running my code on your machines anyway, so you had better be glad that I'm not malicious. -Steve
Re: ssh issues
Pankil Doshi wrote: Well i made ssh with passphares. as the system in which i need to login requires ssh with pass phrases and those systems have to be part of my cluster. and so I need a way where I can specify -i path/to key/ and passphrase to hadoop in before hand. Pankil Well, are trying to manage a system whose security policy is incompatible with hadoop's current shell scripts. If you push out the configs and manage the lifecycle using other tools, this becomes a non-issue. Dont raise the topic of HDFS security to your ops team though, as they will probably be unhappy about what is currently on offer. -steve
Re: Username in Hadoop cluster
Pankil Doshi wrote: Hello everyone, Till now I was using same username on all my hadoop cluster machines. But now I am building my new cluster and face a situation in which I have different usernames for different machines. So what changes will have to make in configuring hadoop. using same username ssh was easy. now will it face problem as now I have different username? Are you building these machines up by hand? How many? Why the different usernames? Can't you just create a new user and group hadoop on all the boxes?
Re: Optimal Filesystem (and Settings) for HDFS
Bryan Duxbury wrote: We use XFS for our data drives, and we've had somewhat mixed results. Thanks for that. I've just created a wiki page to put some of these notes up -extensions and some hard data would be welcome http://wiki.apache.org/hadoop/DiskSetup One problem we have for hard data is that we need some different benchmarks for MR jobs. Terasort is good for measuring IO and MR framework performance, but for more CPU intensive algorithms, or things that need to seek round a bit more, you can't be sure that terasort benchmarks are a good predictor of what's right for you in terms of hardware, filesystem, etc. Contributions in this area would be welcome. I'd like to measure the power consumed on a run too, which is actually possible as far as my laptop is concerned, because you can ask it's battery what happened. -steve
Re: Suspend or scale back hadoop instance
John Clarke wrote: Hi, I am working on a project that is suited to Hadoop and so want to create a small cluster (only 5 machines!) on our servers. The servers are however used during the day and (mostly) idle at night. So, I want Hadoop to run at full throttle at night and either scale back or suspend itself during certain times. You could add/remove new task trackers on idle systems, but * you don't want to take away datanodes, as there's a risk that data will become unavailable. * there's nothing in the scheduler to warn that machines will go away at a certain time If you only want to run the cluster at night, I'd just configure the entire cluster to go up and down
Re: Beware sun's jvm version 1.6.0_05-b13 on linux
Allen Wittenauer wrote: On 5/15/09 11:38 AM, Owen O'Malley o...@yahoo-inc.com wrote: We have observed that the default jvm on RedHat 5 I'm sure some people are scratching their heads at this. The default JVM on at least RHEL5u0/1 is a GCJ-based 1.4, clearly incapable of running Hadoop. We [and, really, this is my doing... ^.^ ] replace it with the JVM from the JPackage folks. So while this isn't the default JVM that comes from RHEL, the warning should still be heeded. Presumably its one of those hard-to-reproduce race conditions that only surfaces under load on a big cluster so is hard to replicate in a unit test, right?
Re: Is there any performance issue with Jrockit JVM for Hadoop
Grace wrote: To follow up this question, I have also asked help on Jrockit forum. They kindly offered some useful and detailed suggestions according to the JRA results. After updating the option list, the performance did become better to some extend. But it is still not comparable with the Sun JVM. Maybe, it is due to the use case with short duration and different implementation in JVM layer between Sun and Jrockit. I would like to be back to use Sun JVM currently. Thanks all for your time and help. what about flipping the switch that says run tasks in the TT's own JVM?. That should handle startup costs, and reduce the memory footprint
Re: Is there any performance issue with Jrockit JVM for Hadoop
Tom White wrote: On Mon, May 18, 2009 at 11:44 AM, Steve Loughran ste...@apache.org wrote: Grace wrote: To follow up this question, I have also asked help on Jrockit forum. They kindly offered some useful and detailed suggestions according to the JRA results. After updating the option list, the performance did become better to some extend. But it is still not comparable with the Sun JVM. Maybe, it is due to the use case with short duration and different implementation in JVM layer between Sun and Jrockit. I would like to be back to use Sun JVM currently. Thanks all for your time and help. what about flipping the switch that says run tasks in the TT's own JVM?. That should handle startup costs, and reduce the memory footprint The property mapred.job.reuse.jvm.num.tasks allows you to set how many tasks the JVM may be reused for (within a job), but it always runs in a separate JVM to the tasktracker. (BTW https://issues.apache.org/jira/browse/HADOOP-3675has some discussion about running tasks in the tasktracker's JVM). Tom Tom, that's why you are writing a book on Hadoop and I'm not ...you know the answers and I have some vague misunderstandings, -steve (returning to the svn book)
Re: public IP for datanode on EC2
Tom White wrote: Hi Joydeep, The problem you are hitting may be because port 50001 isn't open, whereas from within the cluster any node may talk to any other node (because the security groups are set up to do this). However I'm not sure this is a good approach. Configuring Hadoop to use public IP addresses everywhere should work, but you have to pay for all data transfer between nodes (see http://aws.amazon.com/ec2/, Public and Elastic IP Data Transfer). This is going to get expensive fast! So to get this to work well, we would have to make changes to Hadoop so it was aware of both public and private addresses, and use the appropriate one: clients would use the public address, while daemons would use the private address. I haven't looked at what it would take to do this or how invasive it would be. I thought that AWS had stopped you being able to talk to things within the cluster using the public IP addresses -stopped you using DynDNS as your way of bootstrapping discovery Here's what may work -bring up the EC2 cluster using the local names -open up the ports -have the clients talk using the public IP addresses the problem will arise when the namenode checks the fs name used and it doesnt match its expectations -there were some recent patches in the code to handle this when someone talks to the namenode using the ipaddress instead of the hostname; they may work for this situation too. personally, I wouldn't trust the NN in the EC2 datacentres to be secure to external callers, but that problem already exists within their datacentres anyway
Re: Winning a sixty second dash with a yellow elephant
Arun C Murthy wrote: ... oh, and getting it to run a marathon too! http://developer.yahoo.net/blogs/hadoop/2009/05/hadoop_sorts_a_petabyte_in_162.html Owen Arun Lovely. I will now stick up the pic of you getting the first results in on your laptop at apachecon
Re: Huge DataNode Virtual Memory Usage
Stefan Will wrote: Raghu, I don't actually have exact numbers from jmap, although I do remember that jmap -histo reported something less than 256MB for this process (before I restarted it). I just looked at another DFS process that is currently running and has a VM size of 1.5GB (~600 resident). Here jmap reports a total object heap usage of 120MB. The memory block list reported by jmap pid doesn't actually seem to contain the heap at all since the largest block in that list is 10MB in size (/usr/java/jdk1.6.0_10/jre/lib/amd64/server/libjvm.so). However, pmap reports a total usage of 1.56GB. -- Stefan you know, if you could get the Task Tracker to include stats on real and virtual memory use, I'm sure that others would welcome those reports -know that the job was slower and its VM was 2x physical would give you a good hint as to the root cause.
Re: How to do load control of MapReduce
zsongbo wrote: Hi Stefan, Yes, the 'nice' cannot resolve this problem. Now, in my cluster, there are 8GB of RAM. My java heap configuration is: HDFS DataNode : 1GB HBase-RegionServer: 1.5GB MR-TaskTracker: 1GB MR-child: 512MB (max child task is 6, 4 map task + 2 reduce task) But the memory usage is still tight. does TT need to be so big if you are running all your work in external VMs?
Re: How to do load control of MapReduce
Stefan Will wrote: Yes, I think the JVM uses way more memory than just its heap. Now some of it might be just reserved memory, but not actually used (not sure how to tell the difference). There are also things like thread stacks, jit compiler cache, direct nio byte buffers etc. that take up process space outside of the Java heap. But none of that should imho add up to Gigabytes... good article on this http://www.ibm.com/developerworks/linux/library/j-nativememory-linux/
Re: Re-Addressing a cluster
jason hadoop wrote: You should be able to relocate the cluster's IP space by stopping the cluster, modifying the configuration files, resetting the dns and starting the cluster. Be best to verify connectivity with the new IP addresses before starting the cluster. to the best of my knowledge the namenode doesn't care about the ip addresses of the datanodes, only what blocks they report as having. The namenode does care about loosing contact with a connected datanode, replicating the blocks that are now under replicated. I prefer IP addresses in my configuration files but that is a personal preference not a requirement. I do deployments on to Virtual clusters without fully functional reverse DNS, things do work badly in that situation. Hadoop assumes that if a machine looks up its hostname, it can pass that to peers and they can resolve it, the well managed network infrastructure assumption.
Re: datanode replication
Jeff Hammerbacher wrote: Hey Vishal, Check out the chooseTarget() method(s) of ReplicationTargetChooser.java in the org.apache.hadoop.hdfs.server.namenode package: http://svn.apache.org/viewvc/hadoop/core/trunk/src/hdfs/org/apache/hadoop/hdfs/server/namenode/ReplicationTargetChooser.java?view=markup . In words: assuming you're using the default replication level (3), the default strategy will put one block on the local node, one on a node in a remote rack, and another on that same remote rack. Note that HADOOP-3799 (http://issues.apache.org/jira/browse/HADOOP-3799) proposes making this strategy pluggable. Yes, there's some good reasons for having different placement algorithms for different datacentres, and I could even imagine different MR sequences providing hints about where they want data, depending on what they want to do afterwards
Re: Re-Addressing a cluster
jason hadoop wrote: Now that I think about it, the reverse lookups in my clusters work. and you have made sure that IPv6 is turned off, right?
Re: Is there any performance issue with Jrockit JVM for Hadoop
Grace wrote: Thanks all for your replying. I have run several times with different Java options for Map/Reduce tasks. However there is no much difference. Following is the example of my test setting: Test A: -Xmx1024m -server -XXlazyUnlocking -XlargePages -XgcPrio:deterministic -XXallocPrefetch -XXallocRedoPrefetch Test B: -Xmx1024m Test C: -Xmx1024m -XXaggressive Is there any tricky or special setting for Jrockit vm on Hadoop? In the Hadoop Quick Start guides, it says that JavaTM 1.6.x, preferably from Sun. Is there any concern about the Jrockit performance issue? The main thing is that all the big clusters are running (as far as I know), Linux (probably RedHat) and Sun Java. This is where the performance and scale testing is done. If you are willing to spend time doing the experiments and tuning, then I'm sure we can update those guides to say JRockit works, here are some options -steve
Re: Is there any performance issue with Jrockit JVM for Hadoop
Chris Collins wrote: a couple of years back we did a lot of experimentation between sun's vm and jrocket. We had initially assumed that jrocket was going to scream since thats what the press were saying. In short, what we discovered was that certain jdk library usage was a little bit faster with jrocket, but for core vm performance such as synchronization, primitive operations the sun vm out performed. We were not taking account of startup time, just raw code execution. As I said, this was a couple of years back so things may of changed. C I run JRockit as its what some of our key customers use, and we need to test things. One lovely feature is tests time out before the stack runs out on a recursive operation; clearly different stack management at work. Another: no PermGenHeapSpace to fiddle with. * I have to turn debug logging of in hadoop test runs, or there are problems. * It uses short pointers (32 bits long) for near memory on a 64 bit JVM. So your memory footprint on sub-4GB VM images is better. Java7 promises this, and with the merger, who knows what we will see. This is unimportant on 32-bit boxes * debug single stepping doesnt work. That's ok, I use functional tests instead :) I havent looked at outright performance. /
Re: move tasks to another machine on the fly
Tom White wrote: Hi David, The MapReduce framework will attempt to rerun failed tasks automatically. However, if a task is running out of memory on one machine, it's likely to run out of memory on another, isn't it? Have a look at the mapred.child.java.opts configuration property for the amount of memory that each task VM is given (200MB by default). You can also control the memory that each daemon gets using the HADOOP_HEAPSIZE variable in hadoop-env.sh. Or you can specify it on a per-daemon basis using the HADOOP_DAEMON_NAME_OPTS variables in the same file. Tom This looks not so much a VM out of memory problem as OS thread provisioning. ulimit may be useful, as is the java -Xss option http://candrews.integralblue.com/2009/01/preventing-outofmemoryerror-native-thread/ On Wed, May 6, 2009 at 1:28 AM, David Batista dsbati...@gmail.com wrote: I get this error when running Reduce tasks on a machine: java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:597) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.(DFSClient.java:2591) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:454) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:190) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:487) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:387) at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:117) at org.apache.hadoop.mapred.lib.MultipleTextOutputFormat.getBaseRecordWriter(MultipleTextOutputFormat.java:44) at org.apache.hadoop.mapred.lib.MultipleOutputFormat$1.write(MultipleOutputFormat.java:99) at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:410) is it possible to move a reduce task to other machine in the cluster on the fly? -- ./david
Re: What do we call Hadoop+HBase+Lucene+Zookeeper+etc....
Bradford Stephens wrote: Hey all, I'm going to be speaking at OSCON about my company's experiences with Hadoop and Friends, but I'm having a hard time coming up with a name for the entire software ecosystem. I'm thinking of calling it the Apache CloudStack. Does this sound legit to you all? :) Is there something more 'official'? We've been using Apache Cloud Computing Edition for this, to emphasise this is the successor to Java Enterprise Edition, and that it is cross language and being built at apache. If you use the same term, even if you put a different stack outline than us, it gives the idea more legitimacy. The slides that Andrew linked to are all in SVN under http://svn.apache.org/repos/asf/labs/clouds/ we have a space in the apache labs for apache clouds, where we want to do more work integrating things, and bringing the idea of deploy and test on someone else's infrastructure mainstream across all the apache products. We would welcome your involvement -and if you send a draft of your slides out, will happily review them -steve
Re: I need help
Razen Alharbi wrote: Thanks everybody, The issue was that hadoop writes all the outputs to stderr instead of stdout and i don't know why. I would really love to know why the usual hadoop job progress is written to stderr. because there is a line in log4.properties telling it to do just that? log4j.appender.console.target=System.err -- Steve Loughran http://www.1060.org/blogxter/publish/5 Author: Ant in Action http://antbook.org/
Re: programming java ee and hadoop at the same time
Bill Habermaas wrote: George, I haven't used the Hadoop perspective in Eclipse so I can't help with that specifically but map/reduce is a batch process (and can be long running). In my experience, I've written servlets that write to HDFS and then have a background process perform the map/reduce. They can both run in background under Eclipse but are not tightly coupled. I've discussed this recently, having a good binding from webapps to the back end, where the back end consists of HDFS, MapReduce queues, and things round them https://svn.apache.org/repos/asf/labs/clouds/src/doc/apache_cloud_computing_edition_oxford.odp If people are willing to help with this, we have an apache lab project, Apache Clouds, ready for you code, tests and ideas
Re: Can i make a node just an HDFS client to put/get data into hadoop
Usman Waheed wrote: Hi All, Is it possible to make a node just a hadoop client so that it can put/get files into HDFS but not act as a namenode or datanode? I already have a master node and 3 datanodes but need to execute puts/gets into hadoop in parallel using more than just one machine other than the master. Anything on the LAN can be a client of the filesystem, you just need appropriate hadoop configuration files to talk to the namenode and job tracker. I don't know how well the (custom) IPC works over long distances, and you have to keep the versions in sync for everything to work reliably.
Re: I need help
Razen Al Harbi wrote: Hi all, I am writing an application in which I create a forked process to execute a specific Map/Reduce job. The problem is that when I try to read the output stream of the forked process I get nothing and when I execute the same job manually it starts printing the output I am expecting. For clarification I will go through the simple code snippet: Process p = rt.exec(hadoop jar GraphClean args); BufferedReader reader = new BufferedReader(new InputStreamReader(p.getInputStream())); String line = null; check = true; while(check){ line = reader.readLine(); if(line != null){// I know this will not finish it's only for testing. System.out.println(line); } } If I run this code nothing shows up. But if execute the command (hadoop jar GraphClean args) from the command line it works fine. I am using hadoop 0.19.0. Why not just invoke the Hadoop job submission calls yourself, no need to exec anything? Look at org.apache.hadoop.util.RunJar to see what you need to do. Avoid calling RunJar.main() directly as - it calls System.exit() when it wants to exit with an error - it adds shutdown hooks -steve
Re: Storing data-node content to other machine
Vishal Ghawate wrote: Hi, I want to store the contents of all the client machine(datanode)of hadoop cluster to centralized machine with high storage capacity.so that tasktracker will be on the client machine but the contents are stored on the centralized machine. Can anybody help me on this please. set the datanode to point to the (mounted) filesystem with the dfs.data.dir parameter.
Re: Processing High CPU Memory intensive tasks on Hadoop - Architecture question
Aaron Kimball wrote: I'm not aware of any documentation about this particular use case for Hadoop. I think your best bet is to look into the JNI documentation about loading native libraries, and go from there. - Aaron You could also try 1. Starting the main processing app as a process on the machines -and leave it running- 2. have your mapper (somehow) talk to that running process, passing in parameters (including local filesystem filenames) to read and write. You can use RMI or other IPC mechanisms to talk to the long-lived process.
Re: No route to host prevents from storing files to HDFS
Stas Oskin wrote: Hi. 2009/4/23 Matt Massie m...@cloudera.com Just for clarity: are you using any type of virtualization (e.g. vmware, xen) or just running the DataNode java process on the same machine? What is fs.default.name set to in your hadoop-site.xml? This machine has OpenVZ installed indeed, but all the applications run withing the host node, meaning all Java processes are running withing same machine. Maybe, but there will still be at least one virtual network adapter on the host. Try turning them off. The fs.default.name is: hdfs://192.168.253.20:8020 what happens if you switch to hostnames over IP addresses?
Re: RPM spec file for 0.19.1
Ian Soboroff wrote: Steve Loughran ste...@apache.org writes: I think from your perpective it makes sense as it stops anyone getting itchy fingers and doing their own RPMs. Um, what's wrong with that? It's reallly hard to do good RPM spec files. If cloudera are willing to pay Matt to do it, not only do I welcome it, but will see if I can help him with some of the automated test setup they'll need. One thing which would be useful would be to package up all of the hadoop functional tests that need a live cluster up as its own JAR, so the test suite could be run against an RPM installation on different Virtual OS/JVM combos. I've just hit this problem with my own RPMs on Java6 (java security related), so know that having the ability to use the entire existing test suite against an RPM installation would be be beneficial (both in my case and for hadoop RPMS)
Re: How many people is using Hadoop Streaming ?
Tim Wintle wrote: On Fri, 2009-04-03 at 09:42 -0700, Ricky Ho wrote: 1) I can pick the language that offers a different programming paradigm (e.g. I may choose functional language, or logic programming if they suit the problem better). In fact, I can even chosen Erlang at the map() and Prolog at the reduce(). Mix and match can optimize me more. Agreed (as someone who has written mappers/reducers in Python, perl, shell script and Scheme before). sounds like a good argument for adding scripting support for in-JVM MR jobs; use the java6 scripting APIs and use any of the supported languages -java script out the box, other languages (jython, scala) with the right JARs.
Re: fyi: A Comparison of Approaches to Large-Scale Data Analysis: MapReduce vs. DBMS Benchmarks
Andrew Newman wrote: They are comparing an indexed system with one that isn't. Why is Hadoop faster at loading than the others? Surely no one would be surprised that it would be slower - I'm surprised at how well Hadoop does. Who want to write a paper for next year, grep vs reverse index? 2009/4/15 Guilherme Germoglio germog...@gmail.com: (Hadoop is used in the benchmarks) http://database.cs.brown.edu/sigmod09/ I think it is interesting, though it misses the point that the reason that few datasets are 1PB today is nobody could afford to store or process the data. With Hadoop cost is somewhat high (learn to patch the source to fix your cluster's problems) but scales well with the #of nodes. Commodity storage costs (my own home now has 2TB of storage) and commodity software costs are compatible. Some other things to look at -power efficiency. I actually think the DBs could come out better -ease of writing applications by skilled developers. Pig vs SQL -performance under different workloads (take a set of log files growing continually, mine it in near-real time. I think the last.fm use case would be a good one) One of the great ironies of SQL is most developers dont go near it, as it is a detail handed by the O/R mapping engine, except when building SQL selects for web pages. If Pig makes M/R easy, would it be used -and if so, does that show that we developers prefer procedural thinking? -steve
Re: Error reading task output
Cam Macdonell wrote: Well, for future googlers, I'll answer my own post. Watch our for the hostname at the end of localhost lines on slaves. One of my slaves was registering itself as localhost.localdomain with the jobtracker. Is there a way that Hadoop could be made to not be so dependent on /etc/hosts, but on more dynamic hostname resolution? DNS is trouble in Java; there are some (outstanding) bugreps/hadoop patches on the topic, mostly showing up on a machine of mine with a bad hosts entry. I also encountered some fun last month with ubuntu linux adding the local hostname to /etc/hosts along the 127.0.0.1 entry, which is precisely what you dont want for a cluster of vms with no DNS at all. This sounds like your problem too, in which case I have shared your pain http://www.1060.org/blogxter/entry?publicid=121ED68BB21DB8C060FE88607222EB52
Re: getting DiskErrorException during map
Jim Twensky wrote: Yes, here is how it looks: property namehadoop.tmp.dir/name value/scratch/local/jim/hadoop-${user.name}/value /property so I don't know why it still writes to /tmp. As a temporary workaround, I created a symbolic link from /tmp/hadoop-jim to /scratch/... and it works fine now but if you think this might be a considered as a bug, I can report it. I've encountered this somewhere too; could be something is using the java temp file API, which is not what you want. Try setting java.io.tmpdir to /scratch/local/tmp just to see if that makes it go away
Re: Error reading task output
Aaron Kimball wrote: Cam, This isn't Hadoop-specific, it's how Linux treats its network configuration. If you look at /etc/host.conf, you'll probably see a line that says order hosts, bind -- this is telling Linux's DNS resolution library to first read your /etc/hosts file, then check an external DNS server. You could probably disable local hostfile checking, but that means that every time a program on your system queries the authoritative hostname for localhost, it'll go out to the network. You'll probably see a big performance hit. The better solution, I think, is to get your nodes' /etc/hosts files squared away. I agree You only need to do so once :) No, you need to detect whenever the Linux networking stack has decided to add new entries to resolv.conf or /etc/hosts and detect when they are inappropriate. Which is a tricky thing to do as there are some cases where you may actually be grateful that someone in the debian codebase decided that adding the local hostname as 127.0.0.1 is actually a feature. I ended up writing a new SmartFrog component that can be configured to fail to start if the network is a mess, which is something worth pushing out. as part of hadoop diagnostics, this test would be one of the things to deal with and at least warn on. your hostname is local, you will not be visible over the network. -steve
Re: RPM spec file for 0.19.1
Ian Soboroff wrote: I created a JIRA (https://issues.apache.org/jira/browse/HADOOP-5615) with a spec file for building a 0.19.1 RPM. I like the idea of Cloudera's RPM file very much. In particular, it has nifty /etc/init.d scripts and RPM is nice for managing updates. However, it's for an older, patched version of Hadoop. This spec file is actually just Cloudera's, with suitable edits. The spec file does not contain an explicit license... if Cloudera have strong feelings about it, let me know and I'll pull the JIRA attachment. The JIRA includes instructions on how to roll the RPMs yourself. I would have attached the SRPM but they're too big for JIRA. I can offer noarch RPMs build with this spec file if someone wants to host them. Ian -RPM and deb packaging would be nice -the .spec file should be driven by ant properties to get dependencies from the ivy files -the jdk requirements are too harsh as it should run on openjdk's JRE or jrockit; no need for sun only. Too bad the only way to say that is leave off all jdk dependencies. -I worry about how they patch the rc.d files. I can see why, but wonder what that does with the RPM ownership As someone whose software does get released as RPMs (and tar files containing everything needed to create your own), I can state with experience that RPMs are very hard to get right, and very hard to test. The hardest thing to get right (and to test) is live update of the RPMs while the app is running. I am happy for the cloudera team to have taken on this problem.
Re: Using HDFS to serve www requests
Snehal Nagmote wrote: can you please explain exactly adding NIO bridge means what and how it can be done , what could be advantages in this case ? NIO: java non-blocking IO. It's a standard API to talk to different filesystems; support has been discussed in jira. If the DFS APIs were accessible under an NIO front end, then applications written for the NIO APIs would work with the supported filesystems, with no need to code specifically for hadoop's not-yet-stable APIs Steve Loughran wrote: Edward Capriolo wrote: It is a little more natural to connect to HDFS from apache tomcat. This will allow you to skip the FUSE mounts and just use the HDFS-API. I have modified this code to run inside tomcat. http://wiki.apache.org/hadoop/HadoopDfsReadWriteExample I will not testify to how well this setup will perform under internet traffic, but it does work. If someone adds an NIO bridge to hadoop filesystems then it would be easier; leaving you only with the performance issues.
Re: Amazon Elastic MapReduce
Brian Bockelman wrote: On Apr 2, 2009, at 3:13 AM, zhang jianfeng wrote: seems like I should pay for additional money, so why not configure a hadoop cluster in EC2 by myself. This already have been automatic using script. Not everyone has a support team or an operations team or enough time to learn how to do it themselves. You're basically paying for the fact that the only thing you need to know to use Hadoop is: 1) Be able to write the Java classes. 2) Press the go button on a webpage somewhere. You could use Hadoop with little-to-zero systems knowledge (and without institutional support), which would always make some researchers happy. Brian True, but this way nobody gets the opportunity to learn how to do it themselves, which can be a tactical error one comes to regret further down the line. By learning the pain of cluster management today, you get to keep it under control as your data grows. I am curious what bug patches AWS will supply, for they have been very silent on their hadoop work to date.
Re: Typical hardware configurations
Scott Carey wrote: On 3/30/09 4:41 AM, Steve Loughran ste...@apache.org wrote: Ryan Rawson wrote: You should also be getting 64-bit systems and running a 64 bit distro on it and a jvm that has -d64 available. For the namenode yes. For the others, you will take a fairly big memory hit (1.5X object size) due to the longer pointers. JRockit has special compressed pointers, so will JDK 7, apparently. Sun Java 6 update 14 has ³Ordinary Object Pointer² compression as well. -XX:+UseCompressedOops. I¹ve been testing out the pre-release of that with great success. Nice. Have you tried Hadoop with it yet? Jrockit has virtually no 64 bit overhead up to 4GB, Sun Java 6u14 has small overhead up to 32GB with the new compression scheme. IBM¹s VM also has some sort of pointer compression but I don¹t have experience with it myself. I use the JRockit JVM as it is what our customers use and we need to test on the same JVM. It is interesting in that recursive calls don't ever seem to run out; the way it does stack doesn't have separate memory spaces for stack, permanent generation heap space and the like. That doesn't mean apps are light: a freshly started IDE consumes more physical memory than a VMWare image running XP and outlook. But it is fairly responsive, which is good for a UI: 2295m 650m 22m S2 10.9 0:43.80 java 855m 543m 530m S 11 9.1 4:40.40 vmware-vmx http://wikis.sun.com/display/HotSpotInternals/CompressedOops http://blog.juma.me.uk/tag/compressed-oops/ With pointer compression, there may be gains to be had with running 64 bit JVMs smaller than 4GB on x86 since then the runtime has access to native 64 bit integer operations and registers (as well as 2x the register count). It will be highly use-case dependent. that would certainly benefit atomic operations on longs; for floating point math it would be less useful as JVMs have long made use of the SSE register set for FP work. 64 bit registers would make it easier to move stuff in and out of those registers. I will try and set up a hudson server with this update and see how well it behaves.
Re: JNI and calling Hadoop jar files
jason hadoop wrote: The exception reference to *org.apache.hadoop.hdfs.DistributedFileSystem*, implies strongly that a hadoop-default.xml file, or at least a job.xml file is present. Since hadoop-default.xml is bundled into the hadoop-0.X.Y-core.jar, the assumption is that the core jar is available. The class not found exception, the implication is that the hadoop-0.X.Y-core.jar is not available to jni. Given the above constraints, the two likely possibilities are that the -core jar is unavailable or damaged, or that somehow the classloader being used does not have access to the -core jar. A possible reason for the jar not being available is that the application is running on a different machine, or as a different user and the jar is not actually present or perhaps readable in the expected location. Which way is your JNI, java application calling into a native shared library, or a native application calling into a jvm that it instantiates via libjvm calls? Could you dump the classpath that is in effect before your failing jni call? System.getProperty( java.class.path), and for that matter, java.library.path, or getenv(CLASSPATH) and provide an ls -l of the core.jar from the class path, run as the user that owns the process, on the machine that the process is running on. Or something bad is happening with a dependent library of the filesystem that is causing the reflection-based load to fail and die with the root cause being lost in the process. Sometimes putting an explicit reference to the class you are trying to load is a good way to force the problem to surface earlier, and fail with better error messages.
Re: virtualization with hadoop
Oliver Fischer wrote: Hello Vishal, I did the same some weeks ago. The most important fact is, that it works. But it is horrible slow if you not have enough ram and multiple disks since all I/o-Operations go to the same disk. they may go to separate disks underneath, but performance is bad as what the virtual OS thinks is a raw hard disk could be a badly fragmented bit of storage on the container OS. Memory is another point of conflict; your VMs will swap out or block other vms. 0. Keep different VM virtual disks on different physical disks. Fast disks at that. 1. pre-allocate your virtual disks 2. defragment at both the VM and host OS levels. 3. Crank back the schedulers so that the VMs aren't competing too much for CPU time. One core for the host OS, one for each VM. 4. You can keep an eye on performance by looking at the clocks of the various machines: if they pause and get jittery then they are being swapped out. Using multiple VMs on a single host is OK for testing, but not for hard work. You can use VM images to do work, but you need to have enough physical cores and RAM to match that of the VMs. -steve
Re: Typical hardware configurations
Ryan Rawson wrote: You should also be getting 64-bit systems and running a 64 bit distro on it and a jvm that has -d64 available. For the namenode yes. For the others, you will take a fairly big memory hit (1.5X object size) due to the longer pointers. JRockit has special compressed pointers, so will JDK 7, apparently.
Re: Using HDFS to serve www requests
Edward Capriolo wrote: It is a little more natural to connect to HDFS from apache tomcat. This will allow you to skip the FUSE mounts and just use the HDFS-API. I have modified this code to run inside tomcat. http://wiki.apache.org/hadoop/HadoopDfsReadWriteExample I will not testify to how well this setup will perform under internet traffic, but it does work. If someone adds an NIO bridge to hadoop filesystems then it would be easier; leaving you only with the performance issues.
Re: Persistent HDFS On EC2
Kris Jirapinyo wrote: Why would you lose the locality of storage-per-machine if one EBS volume is mounted to each machine instance? When that machine goes down, you can just restart the instance and re-mount the exact same volume. I've tried this idea before successfully on a 10 node cluster on EC2, and didn't see any adverse performance effects-- I was thinking more of S3 FS, which is remote-ish and write times measurable and actually amazon claims that EBS I/O should be even better than the instance stores. Assuming the transient filesystems are virtual disks (and not physical disks that get scrubbed, formatted and mounted on every VM instantiation), and also assuming that EBS disks are on a SAN in the same datacentre, this is probably true. Disk IO performance in virtual disks is currently pretty slow as you are navigating through 1 filesystem, and potentially seeking at lot, even something that appears unfragmented at the VM level The only concerns I see are that you need to pay for EBS storage regardless of whether you use that storage or not. So, if you have 10 EBS volumes of 1 TB each, and you're just starting out with your cluster so you're using only 50GB on each EBS volume so far for the month, you'd still have to pay for 10TB worth of EBS volumes, and that could be a hefty price for each month. Also, currently EBS needs to be created in the same availability zone as your instances, so you need to make sure that they are created correctly, as there is no direct migration of EBS to different availability zones. View EBS as renting space in SAN and it starts to make sense. -- Steve Loughran http://www.1060.org/blogxter/publish/5 Author: Ant in Action http://antbook.org/
Re: Extending ClusterMapReduceTestCase
jason hadoop wrote: I am having trouble reproducing this one. It happened in a very specific environment that pulled in an alternate sax parser. The bottom line is that jetty expects a parser with particular capabilities and if it doesn't get one, odd things happen. In a day or so I will have hopefully worked out the details, but it has been have a year since I dealt with this last. Unless you are forking, to run your junit tests, ant won't let you change the class path for your unit tests - much chaos will ensue. Even if you fork, unless you set includeantruntime=false then you get Ant's classpath, as the junit test listeners are in the ant-optional-junit.jar and you'd better pull them in somehow. I can see why AElfred would cause problems for jetty; they need to handle web.xml and suchlike, and probably validate them against the schema to reduce support calls.
Re: using virtual slave machines
Karthikeyan V wrote: There is no specific procedure for configuring virtual machine slaves. make sure the following thing are done. I've used these as the beginning of a page on this http://wiki.apache.org/hadoop/VirtualCluster
Re: Persistent HDFS On EC2
Malcolm Matalka wrote: If this is not the correct place to ask Hadoop + EC2 questions please let me know. I am trying to get a handle on how to use Hadoop on EC2 before committing any money to it. My question is, how do I maintain a persistent HDFS between restarts of instances. Most of the tutorials I have found involve the cluster being wiped once all the instances are shut down but in my particular case I will be feeding output of a previous days run as the input of the current days run and this data will get large over time. I see I can use s3 as the file system, would I just create an EBS volume for each instance? What are my options? EBS would cost you more; you'd lose the locality of storage-per-machine. If you stick the output of some runs back into S3 then the next jobs have no locality and higher startup overhead to pull the data down, but you dont pay for that download (just the time it takes).
Re: master trying fetch data from slave using localhost hostname :)
pavelkolo...@gmail.com wrote: On Fri, 06 Mar 2009 14:41:57 -, jason hadoop jason.had...@gmail.com wrote: I see that when the host name of the node is also on the localhost line in /etc/hosts I erased all records with localhost from all /etc/hosts files and all fine now :) Thank you :) what does /etc/host look like now? I hit some problems with ubuntu and localhost last week; the hostname was set up in /etc/hosts not just to point to the loopback address, but to a different loopback address (127.0.1.1) from the normal value (127.0.0.1), so breaking everything. http://www.1060.org/blogxter/entry?publicid=121ED68BB21DB8C060FE88607222EB52
Re: [ANNOUNCE] Hadoop release 0.19.1 available
Aviad sela wrote: Nigel Thanks, I have extracted the new project. However, I am having problems building the project I am using Eclipse 3.4 and ant 1.7 I recieve error compiling core classes * compile-core-classes*: BUILD FAILED jsp-compile uriroot=${src.webapps}/task outputdir=${build.src} package=org.apache.hadoop.mapred webxml=${build.webapps}/task/WEB-INF/web.xml /jsp-compile * D:\Work\AviadWork\workspace\cur\WSAD\Hadoop_Core_19_1\Hadoop\build.xml:302: java.lang.ExceptionInInitializerError * it points to the the webxml tag Try an ant -verbose and post the full log, we may be able to look at the problem more. Also, run an ant -diagnostics and include what it prints -- Steve Loughran http://www.1060.org/blogxter/publish/5 Author: Ant in Action http://antbook.org/
Re: the question about the common pc?
Tim Wintle wrote: On Fri, 2009-02-20 at 13:07 +, Steve Loughran wrote: I've been doing MapReduce work over small in-memory datasets using Erlang, which works very well in such a context. I've got some (mainly python) scripts (that will probably be run with hadoop streaming eventually) that I run over multiple cpus/cores on a single machine by opening the appropriate number of named pipes and using tee and awk to split the workload something like mkfifo mypipe1 mkfifo mypipe2 awk '0 == NR % 2' mypipe1 | ./mapper | sort map_out_1 awk '0 == (NR+1) % 2' mypipe2 | ./mapper | sort map_out_2 ./get_lots_of_data | tee mypipe1 mypipe2 (wait until it's done... or send a signal from the get_lots_of_data process on completion if it's a cronjob) sort -m map_out* | ./reducer reduce_out works around the global interpreter lock in python quite nicely and doesn't need people that write the scripts (who may not be programmers) to understand multiple processes etc, just stdin and stdout. Dumbo provides py support under Hadoop: http://wiki.github.com/klbostee/dumbo https://issues.apache.org/jira/browse/HADOOP-4304 as well as that, given Hadoop is java1.6+, there's no reason why it couldn't support the javax.script engine, with JavaScript working without extra JAR files, groovy and jython once their JARs were stuck on the classpath. Some work would probably be needed to make it easier to use these languages, and then there are the tests...
Re: GenericOptionsParser warning
Rasit OZDAS wrote: Hi, There is a JIRA issue about this problem, if I understand it correctly: https://issues.apache.org/jira/browse/HADOOP-3743 Strange, that I searched all source code, but there exists only this control in 2 places: if (!(job.getBoolean(mapred.used.genericoptionsparser, false))) { LOG.warn(Use GenericOptionsParser for parsing the arguments. + Applications should implement Tool for the same.); } Just an if block for logging, no extra controls. Am I missing something? If your class implements Tool, than there shouldn't be a warning. OK, for my automated submission code I'll just set that switch and I won't get told off.
Re: the question about the common pc?
?? wrote: Actually, there's a widely misunderstanding of this Common PC . Common PC doesn't means PCs which are daily used, It means the performance of each node, can be measured by common pc's computing power. In the matter of fact, we dont use Gb enthernet for daily pcs' communication, we dont use linux for our document process, and most importantly, Hadoop cannot run effectively on thoese daily pcs. Hadoop is designed for High performance computing equipment, but claimed to be fit for daily pcs. Hadoop for pcs? what a joke. Hadoop is designed to build a high throughput dataprocessing infrastructure from commodity PC parts. SATA not RAID or SAN, x68+linux not supercomputer hardware and OS. You can bring it up on lighter weight systems, but it has a minimium overhead that is quite steep for small datasets. I've been doing MapReduce work over small in-memory datasets using Erlang, which works very well in such a context. -you need a good network, with DNS working (fast), good backbone and switches -the faster your disks, the better your throughput -ECC memory makes a lot of sense -you need a good cluster management setup unless you like SSH-ing to 20 boxes to find out which one is playing up
Re: How to use Hadoop API to submit job?
Wu Wei wrote: Hi, I used to submit Hadoop job with the utility RunJar.main() on hadoop 0.18. On hadoop 0.19, because the commandLineConfig of JobClient was null, I got a NullPointerException error when RunJar.main() calls GenericOptionsParser to get libJars (0.18 didn't do this call). I also tried the class JobShell to submit job, but it catches all exceptions and sends to stderr so that I cann't handle the exceptions myself. I noticed that if I can call JobClient's setCommandLineConfig method, everything goes easy. But this method has default package accessibility, I cann't see the method out of package org.apache.hadoop.mapred. Any advices on using Java APIs to submit job? Wei Looking at my code, the line that does the work is JobClient jc = new JobClient(jobConf); runningJob = jc.submitJob(jobConf); My full (LGPL) code is here : http://tinyurl.com/djk6vj there's more work with validating input and output directories, pulling back the results, handling timeouts if the job doesnt complete, etc,etc, but that's feature creep
Re: GenericOptionsParser warning
Sandhya E wrote: Hi All I prepare my JobConf object in a java class, by calling various set apis in JobConf object. When I submit the jobconf object using JobClient.runJob(conf), I'm seeing the warning: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. From hadoop sources it looks like setting mapred.used.genericoptionsparser will prevent this warning. But if I set this flag to true, will it have some other side effects. Thanks Sandhya Seen this message too -and it annoys me; not tracked it down
Re: HDFS architecture based on GFS?
Amr Awadallah wrote: I didn't understand usage of malicuous here, but any process using HDFS api should first ask NameNode where the Rasit, Matei is referring to fact that a malicious peace of code can bypass the Name Node and connect to any data node directly, or probe all data nodes for that matter. There is no strong authentication for RPC at this layer of HDFS, which is one of the current shortcomings that will be addressed in hadoop 1.0. -- amr This shouldn't be a problem in a locked down datacentre. Where it is a risk is when you host on EC2 or other Vm-hosting service that doesn't set up private VPNs
Re: datanode not being started
Sandy wrote: Since I last used this machine, Parallels Desktop was installed by the admin. I am currently suspecting that somehow this is interfering with the function of Hadoop (though Java_HOME still seems to be ok). Has anyone had any experience with this being a cause of interference? It could have added 1 virtual network adapter, and hadoop is starting on the wrong adapter. I dont think Hadoop handles this situation that well (yet), as you -need to be able to specify the adapter for every node -get rid of the notion of I have a hostname and move to every network adapter has its own hostname I haven't done enough experiments to be sure. I do know that if you start a datanode with IP addresses for the filesystem, it works out the hostname and then complains if anyone tries to talk to it using the same ip address URL it booted with. -steve
Re: HADOOP-2536 supports Oracle too?
sandhiya wrote: Hi, I'm using postgresql and the driver is not getting detected. How do you run it in the first place? I just typed bin/hadoop jar /root/sandy/netbeans/TableAccess/dist/TableAccess.jar at the terminal without the quotes. I didnt copy any files from my local drives into the Hadoop file system. I get an error like this : java.lang.RuntimeException: java.lang.ClassNotFoundException: org.postgresql.Driver and then the complete stack trace Am i doing something wrong? I downloaded a jar file for postgresql jdbc support and included it in my Libraries folder (I'm using NetBeans). please help JDBC drivers need to be (somehow) loaded before you can resolve the relevant jdbc urls; somehow your code needs to call Class.forName(jdbcdrivername), where that string is set to the relevant jdbc driver classname
Re: stable version
Anum Ali wrote: The parser problem is related to jar files , can be resolved not a bug. Forwarding link for its solution http://www.jroller.com/navanee/entry/unsupportedoperationexception_this_parser_does_not this site is down; cant see it It is a bug, because I view all operations problems as defects to be opened in the bug tracker, stack traces stuck in, the problem resolved. That's software or hardware -because that issue DB is your searchable history of what went wrong. Given on my system I was seeing a ClassNotFoundException for loading FSConstants, there was no easy way to work out what went wrong, and its cost me a couple of days work. furthermore, in the OSS world, every person who can't get your app to work is either going to walk away unhappy (=lost customer, lost developer and risk they compete with you), or they are going to get on the email list and ask questions, questions which may get answered, but it will cost them time. Hence * happyaxis.jsp: axis' diagnostics page, prints out useful stuff and warns if it knows it is unwell (and returns 500 error code so your monitoring tools can recognise this) * ant -diagnostics: detailed look at your ant system including xml parser experiments. Good open source tools have to be easy for people to get started with, and that means helpful error messages. If we left the code alone, knowing that the cause of a ClassNotFoundException was the fault of the user sticking the wrong XML parser on the classpath -and yet refusing to add the four lines of code needed to handle this- then we are letting down the users On 2/13/09, Steve Loughran ste...@apache.org wrote: Anum Ali wrote: This only occurs in linux , in windows its fine. do a java -version for me, and an ant -diagnostics, stick both on the bugrep https://issues.apache.org/jira/browse/HADOOP-5254 It may be that XInclude only went live in java1.6u5; I'm running a JRockit JVM which predates that and I'm seeing it (linux again); I will also try sticking xerces on the classpath to see what happens next
Re: Namenode not listening for remote connections to port 9000
Michael Lynch wrote: Hi, As far as I can tell I've followed the setup instructions for a hadoop cluster to the letter, but I find that the datanodes can't connect to the namenode on port 9000 because it is only listening for connections from localhost. In my case, the namenode is called centos1, and the datanode is called centos2. They are centos 5.1 servers with an unmodified sun java 6 runtime. fs.default.name takes a URL to the filesystem. such as hdfs://centos1:9000/ If the machine is only binding to localhost, that may mean DNS fun. Try a fully qualified name instead
Re: stable version
Anum Ali wrote: yes On Thu, Feb 12, 2009 at 4:33 PM, Steve Loughran ste...@apache.org wrote: Anum Ali wrote: Iam working on Hadoop SVN version 0.21.0-dev. Having some problems , regarding running its examples/file from eclipse. It gives error for Exception in thread main java.lang.UnsupportedOperationException: This parser does not support specification null version null at javax.xml.parsers.DocumentBuilderFactory.setXIncludeAware(DocumentBuilderFactory.java:590) Can anyone reslove or give some idea about it. You are using Java6, correct? I'm seeing this too, filed the bug https://issues.apache.org/jira/browse/HADOOP-5254 Any stack traces you can add on will help. Probable cause is https://issues.apache.org/jira/browse/HADOOP-4944
Re: stable version
Anum Ali wrote: This only occurs in linux , in windows its fine. do a java -version for me, and an ant -diagnostics, stick both on the bugrep https://issues.apache.org/jira/browse/HADOOP-5254 It may be that XInclude only went live in java1.6u5; I'm running a JRockit JVM which predates that and I'm seeing it (linux again); I will also try sticking xerces on the classpath to see what happens next
Re: stable version
Anum Ali wrote: Iam working on Hadoop SVN version 0.21.0-dev. Having some problems , regarding running its examples/file from eclipse. It gives error for Exception in thread main java.lang.UnsupportedOperationException: This parser does not support specification null version null at javax.xml.parsers.DocumentBuilderFactory.setXIncludeAware(DocumentBuilderFactory.java:590) Can anyone reslove or give some idea about it. You are using Java6, correct?
Re: Best practices on spliltting an input line?
Stefan Podkowinski wrote: I'm currently using OpenCSV which can be found at http://opencsv.sourceforge.net/ but haven't done any performance tests on it yet. In my case simply splitting strings would not work anyways, since I need to handle quotes and separators within quoted values, e.g. a,a,b,c. I've used it in the past; found it pretty reliable. Again, no perf tests, just reading in CSV files exported from spreadsheets
Re: stable version
Anum Ali wrote: yes On Thu, Feb 12, 2009 at 4:33 PM, Steve Loughran ste...@apache.org wrote: Anum Ali wrote: Iam working on Hadoop SVN version 0.21.0-dev. Having some problems , regarding running its examples/file from eclipse. It gives error for Exception in thread main java.lang.UnsupportedOperationException: This parser does not support specification null version null at javax.xml.parsers.DocumentBuilderFactory.setXIncludeAware(DocumentBuilderFactory.java:590) Can anyone reslove or give some idea about it. You are using Java6, correct? well, in that case something being passed down to setXIncludeAware may be picked up as invalid. More of a stack trace may help. Otherwise, now is your chance to learn your way around the hadoop codebase, and ensure that when the next version ships, your most pressing bugs have been fixed
Re: anybody knows an apache-license-compatible impl of Integer.parseInt?
Zheng Shao wrote: We need to implement a version of Integer.parseInt/atoi from byte[] instead of String to avoid the high cost of creating a String object. I wanted to take the open jdk code but the license is GPL: http://www.docjar.com/html/api/java/lang/Integer.java.html Does anybody know an implementation that I can use for hive (apache license)? I also need to do it for Byte, Short, Long, and Double. Just don't want to go over all the corner cases. Use the Apache Harmony code http://svn.apache.org/viewvc/harmony/enhanced/classlib/branches/java6/modules/
Re: File Transfer Rates
Brian Bockelman wrote: Just to toss out some numbers (and because our users are making interesting numbers right now) Here's our external network router: http://mrtg.unl.edu/~cricket/?target=%2Frouter-interfaces%2Fborder2%2Ftengigabitethernet2_2;view=Octets Here's the application-level transfer graph: http://t2.unl.edu/phedex/graphs/quantity_rates?link=srcno_mss=trueto_node=Nebraska In a squeeze, we can move 20-50TB / day to/from other heterogenous sites. Usually, we run out of free space before we can find the upper limit for a 24-hour period. We use a protocol called GridFTP to move data back and forth between external (non-HDFS) clusters. The other sites we transfer with use niche software you probably haven't heard of (Castor, DPM, and dCache) because, well, it's niche software. I have no available data on HDFS-S3 systems, but I'd again claim it's mostly a function of the amount of hardware you throw at it and the size of your network pipes. There are currently 182 datanodes; 180 are traditional ones of 3TB and 2 are big honking RAID arrays of 40TB. Transfers are load-balanced amongst ~ 7 GridFTP servers which each have 1Gbps connection. GridFTP is optimised for high bandwidth network connections with negotiated packet size and multiple TCP connections, so when nagel's algorithm triggers backoff from a dropped packet, only a fraction of the transmission gets dropped. It is probably best-in-class for long haul transfers over the big university backbones where someone else pays for your traffic. You would be very hard pressed to get even close to that on any other protocol. I have no data on S3 xfers other than hearsay * write time to S3 can be slow as it doesn't return until the data is persisted somewhere. That's a better guarantee than a posix write operation. * you have to rely on other people on your rack not wanting all the traffic for themselves. That's an EC2 API issue: you don't get to request/buy bandwidth to/from S3 One thing to remember is that if you bring up a Hadoop cluster on any virtual server farm, disk IO is going to be way below physical IO rates. Even when the data is in HDFS, it will be slower to get at than dedicated high-RPM SCSI or SATA storage.
Re: settin JAVA_HOME...
haizhou zhao wrote: hi Sandy, Every time I change the conf, i have to do the following to things: 1. kill all hadoop processes 2. manually delelte all the file under hadoop.tmp.dir to make sure hadoop runs correctly, otherwise it wont work. Is this cause'd by my using a JDK instead of sun java? No, you need to do that to get configuration changes picked up. There are scripts in hadoop/bin to help you and what do you mean by sun-java, please? Sandy means * sun-java6-jdk: Sun's released JDK * default-jdk ubuntu chooses. On 8.10, it is open-jdk * open-jdk-6-jdk: the full open source version of the JDK. Worse font rendering code, but comes with more source Others * Oracle JRockit: good 64-bit memory management, based on the sun JDK unsupported * IBM JVM unsupported. Based on the sun JDK * Apache Harmony: clean room rewrite of everything. unsupported * Kaffe. unsupported * Gcj. unsupported type java -version to get your java version Sun java version 1.6.0_10 Java(TM) SE Runtime Environment (build 1.6.0_10-b33) Java HotSpot(TM) Server VM (buld 11.0-b14, mixed mode JRockit: java version 1.6.0_02 Java(TM) SE Runtime Environment (build 1.6.0_02-b05) BEA JRockit(R) (build R27.4.0-90-89592-1.6.0_02-20070928-1715-linux-x86_64, compiled mode) 2009/1/31 Sandy snickerdoodl...@gmail.com Hi Zander, Do not use jdk. Horrific things happen. You must use sun java in order to use hadoop.
Re: tools for scrubbing HDFS data nodes?
Sriram Rao wrote: Does this read every block of every file from all replicas and verify that the checksums are good? Sriram The DataBlockScanner thread on every datanode does this for you automatically. You can tune the rate it reads it, but it reads in all local blocks and compares the MD5 sums, deals with failures by reporting a list of failures to the namenode after the scan. After that, it's the namenode's problem how to deal with the corrupt block. In an ideal system, at least one non-corrupt copy of the block is still live the configuration attribute dfs.datanode.scan.period.hours can tune the scan rate
Re: Cannot run program chmod: error=12, Not enough space
Andy Liu wrote: I'm running Hadoop 0.19.0 on Solaris (SunOS 5.10 on x86) and many jobs are failing with this exception: This isnt disk space, this is a RAM/swap problem related to fork() . Search for that error string on hadoop JIRA; I think it's been seen before on linux, though there the namenode was at fault Error initializing attempt_200901281655_0004_m_25_0: java.io.IOException: Cannot run program chmod: error=12, Not enough space at java.lang.ProcessBuilder.start(ProcessBuilder.java:459) at org.apache.hadoop.util.Shell.runCommand(Shell.java:149) at org.apache.hadoop.util.Shell.run(Shell.java:134) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:286) at org.apache.hadoop.util.Shell.execCommand(Shell.java:338) at org.apache.hadoop.fs.RawLocalFileSystem.execCommand(RawLocalFileSystem.java:540) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:532) at org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:274) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:364) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:487) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:468) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:375) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:367) at org.apache.hadoop.mapred.MapTask.localizeConfiguration(MapTask.java:107) at org.apache.hadoop.mapred.TaskTracker$TaskInProgress.localizeTask(TaskTracker.java:1803) at org.apache.hadoop.mapred.TaskTracker$TaskInProgress.launchTask(TaskTracker.java:1884) at org.apache.hadoop.mapred.TaskTracker.launchTaskForJob(TaskTracker.java:784) at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:778) at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1636) at org.apache.hadoop.mapred.TaskTracker.access$1200(TaskTracker.java:102) at org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1602) Caused by: java.io.IOException: error=12, Not enough space at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.(UNIXProcess.java:53) at java.lang.ProcessImpl.start(ProcessImpl.java:65) at java.lang.ProcessBuilder.start(ProcessBuilder.java:452) ... 20 more However, all the disks have plenty of disk space left (over 800 gigs). Can somebody point me in the right direction? Thanks, Andy -- Steve Loughran http://www.1060.org/blogxter/publish/5 Author: Ant in Action http://antbook.org/
Re: Zeroconf for hadoop
Edward Capriolo wrote: Zeroconf is more focused on simplicity then security. One of the original problems that may have been fixes is that any program can announce any service. IE my laptop can announce that it is the DNS for google.com etc. -1 to zeroconf as it is way too chatty. Every DNS lookup is mcast, in a busy network a lot of CPU time is spent discarding requests. Nor does it handle failure that well. It's OK on a home LAN to find a music player, but not what you want for a HA infrastructure in the datacentre, Our LAN discovery tool -Anubis -uses mcast only to do the initial discovery, then they have voting and things to select a nominated server that everyone just unicasts too at that point; failure of that node/network partition triggers a rebinding. See: http://wiki.smartfrog.org/wiki/display/sf/Anubis ; the paper discusses some of the fun you have, though that paper doesn't also include clock drift issue you can encounter when running Xen or VMWare-hosted nodes. I want to mention a related topic to the list. People are approaching the auto-discovery in a number of ways jira. There are a few ways I can think of to discover hadoop. A very simple way might be to publish the configuration over a web interface. I use a network storage system called gluster-fs. Gluster can be configured so the server holds the configuration for each client. If the hadoop name node held the entire configuration for all the nodes the namenode would only need to be aware of the namenode and it could retrieve its configuration from it. Having a central configuration management or a discovery system would be very useful. HOD is what I think to be the closest thing it is more of a top down deployment system. Allen is a fan of a well managed cluster; he pushes out Hadoop as RPMs via PXE and Kickstart and uses LDAP as the central CM tool. I am currently exploring bringing up virtual clusters by * putting the relevant RPMs out to all nodes; same files/conf for every node, * having custom configs for Namenode and job tracker; everything else becomes a Datanode with a task tracker bound to the masters. I will start worrying about discovery afterwards, because without the ability for the Job Tracker or Namenode to do failover to a fallback Job Tracker or Namenode, you don't really need so much in the way of dynamic cluster binding. -steve
Re: HDFS - millions of files in one directory?
Philip (flip) Kromer wrote: I ran in this problem, hard, and I can vouch that this is not a windows-only problem. ReiserFS, ext3 and OSX's HFS+ become cripplingly slow with more than a few hundred thousand files in the same directory. (The operation to correct this mistake took a week to run.) That is one of several hard lessons I learned about don't write your scraper to replicate the path structure of each document as a file on disk. I've seen a fair few machines (one of the network store programs) top out at 65K files/dir; shows while it is good to test your assumptions before you go live.
Re: running hadoop on heterogeneous hardware
Bill Au wrote: Is hadoop designed to run on homogeneous hardware only, or does it work just as well on heterogeneous hardware as well? If the datanodes have different disk capacities, does HDFS still spread the data blocks equally amount all the datanodes, or will the datanodes with high disk capacity end up storing more data blocks? Similarily, if the tasktrackres have different numbers of CPUs, is there a way to configure hadoop to run more tasks on those tasktrackers that have more CPUs? Is that simply a matter of setting mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum differently on the tasktrackers? Bill Life is simpler on homogenous boxes; by setting the maximum tasks differently for the different machines, you do limit the amount of work that gets pushed out to those boxes. More troublesome is slower CPUs/HDDs, they arent picked up directly, though speculative work can handle some of this One interesting bit of research would be something adaptive; something to monitor throughput and tune those values based on performance; that would detect variations in a cluster and work with with it, rather than requiring you to know the capabilities of every machine. -steve
Re: Why does Hadoop need ssh access to master and slaves?
Amit k. Saha wrote: On Wed, Jan 21, 2009 at 5:53 PM, Matthias Scherer matthias.sche...@1und1.de wrote: Hi all, we've made our first steps in evaluating hadoop. The setup of 2 VMs as a hadoop grid was very easy and works fine. Now our operations team wonders why hadoop has to be able to connect to the master and slaves via password-less ssh?! Can anyone give us an answer to this question? 1. There has to be a way to connect to the remote hosts- slaves and a secondary master, and SSH is the secure way to do it 2. It has to be password-less to enable automatic logins SSH is *a * secure way to do it, but not the only way. Other management tools can bring up hadoop clusters. Hadoop ships with scripted support for SSH as it is standard with Linux distros and generally the best way to bring up a remote console. Matthias, Your ops team should not be worrying about the SSH security, as long as they keep their keys under control. (a) Key-based SSH is more secure than passworded SSH, as man-in-middle attacks are prevented. passphrase protected SSH keys on external USB keys even better. (b) once the cluster is up, that filesystem is pretty vulnerable to anything on the LAN. You do need to lock down your datacentre, or set up the firewall/routing of the servers so that only trusted hosts can talk to the FS. SSH becomes a detail at that point.
Re: AW: Why does Hadoop need ssh access to master and slaves?
Matthias Scherer wrote: Hi Steve and Amit, Thanks for your answers. I agree with you that key-based ssh is nothing to worry about. But I'm wondering what exactly - that means wich grid administration tasks - hadoop does via ssh?! Does it restart crashed data nodes or tasks trackers on the slaves? Oder does it transfer data over the grid with ssh access? How can I find a short description what exactly hadoop needs ssh for? The documentation says only that I have to configure it. Thanks Regards Matthias SSH is used by the various scripts in bin/ to start and stop clusters, slaves.sh does the work, the other ones (like hadoop-daemons.sh) use it to run stuff on the machines. The EC2 scripts use SSH to talk to the machines brought up there; when you ask amazon for machines, you give it a public key to be set to the allowed keys list of root; you use that to ssh in and run code. There is currently no liveness/restarting built into the scripts; you need other things to do that. I am working on this, with HADOOP-3628, https://issues.apache.org/jira/browse/HADOOP-3628 I will be showing some other management options at ApacheCon EU 2009, which being on the same continent and timezone is something you may want to consider attending; lots of Hadoop people will be there, with some all-day sessions on it. http://eu.apachecon.com/c/aceu2009/sessions/227 One big problem with cluster management is not just recognising failed nodes, it's handling them. The actions you take are different with a VM-cluster like EC2 (fix: reboot, then kill that AMI and create a new one), from that of a VM-ware/Xen-managed cluster, to that of physical systems (Y!: phone Allen, us: email paolo). Once we have the health monitoring in there different people will need to apply their own policies. -steve -- Steve Loughran http://www.1060.org/blogxter/publish/5
Re: Java RMI and Hadoop RecordIO
David Alves wrote: Hi I've been testing some different serialization techniques, to go along with a research project. I know motivation behind hadoop serialization mechanism (e.g. Writable) and the enhancement of this feature through record I/O is not only performance, but also control of the input/output. Still I've been running some simple tests and I've foud that plain RMi beats Hadoop RecordIO almost every time (14-16% faster). In my test I have a simple java class that has 14 int fields and 1 long field and I'm serializing aroung 35000 instances. Am I doing anything wrong? are there ways to improve performance in RecordIO? Have I got the use case wrong? Regards David Alves -. Any speedups are welcome; people are looking at ProtocolBuffers and Thrift - Are you also measuring packet size and deserialization costs? - add a string or two - and references to other instances - then try pushing a few million round the network using the same serialization stream instance I do use RMI a lot at work, once you come up with a plan to deal with its brittleness against change (we keep the code in the cluster up to date, make no guarantees about compatibility across versions), it is easy to use. but it has so many, many problems, and if you hit one, as the code is deep in the JVM, it is very hard to deal with. One example, RMI tries to send a graph over; it likes to make sure it hasn't pushed a copy over earlier. The longer you keep a serialization stream up, the slower it gets. -steve
Re: RAID vs. JBOD
Runping Qi wrote: Hi, We at Yahoo did some Hadoop benchmarking experiments on clusters with JBOD and RAID0. We found that under heavy loads (such as gridmix), JBOD cluster performed better. Gridmix tests: Load: gridmix2 Cluster size: 190 nodes Test results: RAID0: 75 minutes JBOD: 67 minutes Difference: 10% Tests on HDFS writes performances We ran map only jobs writing data to dfs concurrently on different clusters. The overall dfs write throughputs on the jbod cluster are 30% (with a 58 nodes cluster) and 50% (with an 18 nodes cluster) better than that on the raid0 cluster, respectively. To understand why, we did some file level benchmarking on both clusters. We found that the file write throughput on a JBOD machine is 30% higher than that on a comparable machine with RAID0. This performance difference may be explained by the fact that the throughputs of different disks can vary 30% to 50%. With such variations, the overall throughput of a raid0 system may be bottlenecked by the slowest disk. -- Runping This is really interesting. Thank you for sharing these results! Presumably the servers were all set up with nominally homogenous hardware? And yet still the variations existed. That would be something to experiment with on new versus old clusters to see if it gets worse over time. Here we have a batch of desktop workstations all bought at the same time, to the same spec, but one of them, lucky is more prone to race conditions than any of the others. We don't know why, and assume its do with the (multiple) Xeon CPU chips being at different ends of the bell curve or something. all we know is: test on that box before shipping to find race conditions early. -steve
Re: Indexed Hashtables
Sean Shanny wrote: Delip, So far we have had pretty good luck with memcached. We are building a hadoop based solution for data warehouse ETL on XML based log files that represent click stream data on steroids. We process about 34 million records or about 70 GB data a day. We have to process dimensional data in our warehouse and then load the surrogate keyvalue pairs in memcached so we can traverse the XML files once again to perform the substitutions. We are using the memcached solution because is scales out just like hadoop. We will have code that allows us to fall back to the DB if the memcached lookup fails but that should not happen to often. LinkedIn have just opened up something they run internally, Project Voldemort: http://highscalability.com/product-project-voldemort-distributed-database http://project-voldemort.com/ It's a DHT, Java based. I haven't played with it yet, but it looks like a good part of the portfolio.
Re: issues with hadoop in AIX
Allen Wittenauer wrote: On 12/27/08 12:18 AM, Arun Venugopal arunvenugopa...@gmail.com wrote: Yes, I was able to run this on AIX as well with a minor change to the DF.java code. But this was more of a proof of concept than on a production system. There are lots of places where Hadoop (esp. in contrib) interprets the output of Unix command line utilities. Changes like this are likely going to be required for AIX and other Unix systems that aren't being used by a committer. :( that aren't being used in the test process, equally importantly I think hudson runs on Solaris, doesn't it?