Re: Probelms getting Eclipse Hadoop plugin to work.
Erik, did you correctly placed ports in properties window? Port 9000 under Map/Reduce Master on the left, 9001 under DFS Master on the right. 2009/2/19 Erik Holstad erikhols...@gmail.com Thanks guys! Running Linux and the remote cluster is also Linux. I have the properties set up like that already on my remote cluster, but not sure where to input this info into Eclipse. And when changing the ports to 9000 and 9001 I get: Error: java.io.IOException: Unknown protocol to job tracker: org.apache.hadoop.dfs.ClientProtocol Regards Erik -- M. Raşit ÖZDAŞ
Re: How to use Hadoop API to submit job?
You should implement Tool interface and submit jobs. For example see org.apache.hadoop.examples.WordCount -Amareshwari Wu Wei wrote: Hi, I used to submit Hadoop job with the utility RunJar.main() on hadoop 0.18. On hadoop 0.19, because the commandLineConfig of JobClient was null, I got a NullPointerException error when RunJar.main() calls GenericOptionsParser to get libJars (0.18 didn't do this call). I also tried the class JobShell to submit job, but it catches all exceptions and sends to stderr so that I cann't handle the exceptions myself. I noticed that if I can call JobClient's setCommandLineConfig method, everything goes easy. But this method has default package accessibility, I cann't see the method out of package org.apache.hadoop.mapred. Any advices on using Java APIs to submit job? Wei
Re: empty log file...
Zander, I've looked at my datanode logs on the slaves, but they are all in quite small sizes, although we've run many jobs on them. And running 2 new jobs also didn't add anything to them. (As I understand from the contents of the logs, hadoop logs especially operations about DFS performance tests.) Cheers, Rasit 2009/2/20 zander1013 zander1...@gmail.com hi, i am setting up hadoop for the first time on multi-node cluster. right now i have two nodes. the two node cluster consists of two laptops connected via ad-hoc wifi network. they they do not have access to the internet. i formated the datanodes on both machines prior to startup... output form the commands /usr/local/hadoop/bin/start-all.sh, jps (on both machines), and /usr/local/hadoop/bin/stop-all.sh all appear normal. however the file /usr/local/hadoop/logs/hadoop-hadoop-datanode-node1.log (the slave node) is empty. the same file for the master node shows the startup and shutdown events as normal and without error. is it okay that the log file on the slave is empty? zander -- View this message in context: http://www.nabble.com/empty-log-file...-tp22113398p22113398.html Sent from the Hadoop core-user mailing list archive at Nabble.com. -- M. Raşit ÖZDAŞ
Re: Probelms getting Eclipse Hadoop plugin to work.
This thread helped me fix a similar problem: http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200807.mbox/%3cc001e847c1fd4248a7d6537643690e2101c83...@mse16be2.mse16.exchange.ms%3e In my case, I had the ports specified in the hadoop-site.xml for the name node and job tracker switched in the Map/Reduce location's configuration. Iman. P.S. I sent this reply to the wrong thread before. Erik Holstad wrote: Thanks guys! Running Linux and the remote cluster is also Linux. I have the properties set up like that already on my remote cluster, but not sure where to input this info into Eclipse. And when changing the ports to 9000 and 9001 I get: Error: java.io.IOException: Unknown protocol to job tracker: org.apache.hadoop.dfs.ClientProtocol Regards Erik
Re: Map/Recuce Job done locally?
Philipp, I have no problem running jobs locally with eclipse (via hadoop plugin) and observing it from browser. (Please note that jobtracker page doesn't refresh automatically, you need to refresh it manually.) Cheers, Rasit 2009/2/19 Philipp Dobrigkeit pdobrigk...@gmx.de When I start my job from eclipse it gets processed and the output is generated, but it never shows up in my JobTracker, which is opened in my browser. Why is this happening? -- Pt! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger01 -- M. Raşit ÖZDAŞ
Re: GenericOptionsParser warning
Rasit OZDAS wrote: Hi, There is a JIRA issue about this problem, if I understand it correctly: https://issues.apache.org/jira/browse/HADOOP-3743 Strange, that I searched all source code, but there exists only this control in 2 places: if (!(job.getBoolean(mapred.used.genericoptionsparser, false))) { LOG.warn(Use GenericOptionsParser for parsing the arguments. + Applications should implement Tool for the same.); } Just an if block for logging, no extra controls. Am I missing something? If your class implements Tool, than there shouldn't be a warning. OK, for my automated submission code I'll just set that switch and I won't get told off.
Re: How to use Hadoop API to submit job?
My application implements Tool interface and I can submit the job using shell script hadoop jar xxx.jar app. But here I don't want to use this script. Instead, I want to catch errors in my Java code and do some further proccessings. -Wei Amareshwari Sriramadasu wrote: You should implement Tool interface and submit jobs. For example see org.apache.hadoop.examples.WordCount -Amareshwari Wu Wei wrote: Hi, I used to submit Hadoop job with the utility RunJar.main() on hadoop 0.18. On hadoop 0.19, because the commandLineConfig of JobClient was null, I got a NullPointerException error when RunJar.main() calls GenericOptionsParser to get libJars (0.18 didn't do this call). I also tried the class JobShell to submit job, but it catches all exceptions and sends to stderr so that I cann't handle the exceptions myself. I noticed that if I can call JobClient's setCommandLineConfig method, everything goes easy. But this method has default package accessibility, I cann't see the method out of package org.apache.hadoop.mapred. Any advices on using Java APIs to submit job? Wei
Re: the question about the common pc?
?? wrote: Actually, there's a widely misunderstanding of this Common PC . Common PC doesn't means PCs which are daily used, It means the performance of each node, can be measured by common pc's computing power. In the matter of fact, we dont use Gb enthernet for daily pcs' communication, we dont use linux for our document process, and most importantly, Hadoop cannot run effectively on thoese daily pcs. Hadoop is designed for High performance computing equipment, but claimed to be fit for daily pcs. Hadoop for pcs? what a joke. Hadoop is designed to build a high throughput dataprocessing infrastructure from commodity PC parts. SATA not RAID or SAN, x68+linux not supercomputer hardware and OS. You can bring it up on lighter weight systems, but it has a minimium overhead that is quite steep for small datasets. I've been doing MapReduce work over small in-memory datasets using Erlang, which works very well in such a context. -you need a good network, with DNS working (fast), good backbone and switches -the faster your disks, the better your throughput -ECC memory makes a lot of sense -you need a good cluster management setup unless you like SSH-ing to 20 boxes to find out which one is playing up
Re: How to use Hadoop API to submit job?
Wu Wei wrote: Hi, I used to submit Hadoop job with the utility RunJar.main() on hadoop 0.18. On hadoop 0.19, because the commandLineConfig of JobClient was null, I got a NullPointerException error when RunJar.main() calls GenericOptionsParser to get libJars (0.18 didn't do this call). I also tried the class JobShell to submit job, but it catches all exceptions and sends to stderr so that I cann't handle the exceptions myself. I noticed that if I can call JobClient's setCommandLineConfig method, everything goes easy. But this method has default package accessibility, I cann't see the method out of package org.apache.hadoop.mapred. Any advices on using Java APIs to submit job? Wei Looking at my code, the line that does the work is JobClient jc = new JobClient(jobConf); runningJob = jc.submitJob(jobConf); My full (LGPL) code is here : http://tinyurl.com/djk6vj there's more work with validating input and output directories, pulling back the results, handling timeouts if the job doesnt complete, etc,etc, but that's feature creep
Re: the question about the common pc?
On Fri, 2009-02-20 at 13:07 +, Steve Loughran wrote: I've been doing MapReduce work over small in-memory datasets using Erlang, which works very well in such a context. I've got some (mainly python) scripts (that will probably be run with hadoop streaming eventually) that I run over multiple cpus/cores on a single machine by opening the appropriate number of named pipes and using tee and awk to split the workload something like mkfifo mypipe1 mkfifo mypipe2 awk '0 == NR % 2' mypipe1 | ./mapper | sort map_out_1 awk '0 == (NR+1) % 2' mypipe2 | ./mapper | sort map_out_2 ./get_lots_of_data | tee mypipe1 mypipe2 (wait until it's done... or send a signal from the get_lots_of_data process on completion if it's a cronjob) sort -m map_out* | ./reducer reduce_out works around the global interpreter lock in python quite nicely and doesn't need people that write the scripts (who may not be programmers) to understand multiple processes etc, just stdin and stdout. Tim Wintle
Re: Probelms getting Eclipse Hadoop plugin to work.
Hi guys! Thanks for your help, but still no luck, I did try to set it up on a different machine with Eclipse 3.2.2 and the IBM plugin instead of the Hadoop one, in that one I only needed to fill out the install directory and the host and that worked just fine. I have filled out the ports correctly and the cluster is up and running and works just fine. Regards Erik
Hadoop build error
Hi , While trying to compile hadoop source (ant -Djavac.args=-Xlint -Xmaxwarns 1000 tar) i get below error . Kindly let me know how to fix this issue. I have java 6 installed . [javadoc] Standard Doclet version 1.6.0_07 [javadoc] Building tree for all the packages and classes... [javadoc] Building index for all the packages and classes... [javadoc] Building index for all classes... java5.check: BUILD FAILED /home/raghu/src-hadoop/trunk/build.xml:890: 'java5.home' is not defined. Forrest requires Java5. Please pass -Djava5.home=base of Java 5 distribution to Ant on the command-line. Thanks, Raghu
Hadoop JMX
I am working to graph the hadoop JMX variables. http://hadoop.apache.org/core/docs/r0.17.0/api/org/apache/hadoop/dfs/namenode/metrics/NameNodeStatistics.html I have a two nodes, one running 0.17 and the other running.0.19 The NameNode JMX objects and attributes seem to be working well. I am graphing Capacity, NumberOfBlocks, NumberOfFiles, as well as the operations numLiveDataNodes() numDeadDataNodes() It seems like the DataNode JMX objects are mostly 0 or -1. I do not have heavy load on these systems so telling if the counter is implemented is tricky. My questions: 1) If a JMX attribute is added, is it generally added as a placeholder to be implemented later or is it added implemented? 2) Is there a target version to have all these attributes implemented, or are these all being handled via separate Jira? 3) Can I set TaskTrackers to be monitored as I can for DataNodes, NameNodes? 4) Tips tricks gotcha? Thank you
Re: Limit number of records or total size in combiner input using jobconf?
So here are my questions: (1) is there a jobconf hint to limit the number of records in kviter? I can (and have) made a fix to my code that processes the values in a combiner step in batches (i.e takes N at a go,processes that and repeat), but was wondering if i could just set an option. Approximately and indirectly, yes. You can limit the amount of memory allocated to storing serialized records in memory (io.sort.mb) and the percentage of that space reserved for storing record metadata (io.sort.record.percent, IIRC). That can be used to limit the number of records in each spill, though you may also need to disable the combiner during the merge, where you may run into the same problem. You're almost certainly better off designing your combiner to scale well (as you have), since you'll hit this in the reduce, too. Since this occurred in the MapContext, changing the number of reducers wont help. (2) How does changing the number of reducers help at all? I have 7 machines, so I feel 11 (a prime close to 7, why a prime?) is good enough (some machines are 16GB others 32GB) Your combiner will look at all the records for a partition and only those records in a partition. If your partitioner distributes your records evenly in a particular spill, then increasing the total number of partitions will decrease the number of records your combiner considers in each call. For most partitioners, whether the number of reducers is prime should be irrelevant. -C
Re: [ANNOUNCE] Apache ZooKeeper 3.1.0
Hi Bill, I'm sorry, I missed this message initially. I'm sending below a table that gives you throughput figures for BookKeeper. The rows correspond to distinct BookKeeper configuration (ensemble size, quorum size, entry type), and the columns to different values for the length of an entry in bytes. The throughput values correspond to one client writing 400K records (we call them entries) asynchronously to a ledger. Finally, the table shows write throughput in thousands of operations per second. 1281024 8192 3-2-V 32.80 26.45 5.89 4-2-V 41.72 31.53 6.55 5-2-V 46.89 32.45 6.61 4-3-G 28.02 21.61 4.37 5-3-G 34.91 28.22 4.60 6-3-G 41.22 31.70 4.55 Let me know if you have more questions, I appreciate your interest. Thanks, -Flavio On Feb 14, 2009, at 2:56 PM, Bill de hOra wrote: Patrick Hunt wrote: A bit about BookKeeper: a system to reliably log streams of records. In BookKeeper, servers are bookies, log streams are ledgers, and each unit of a log (aka record) is a ledger entry. BookKeeper is designed to be reliable; bookies, the servers that store ledgers can be byzantine, which means that some subset of the bookies can fail, corrupt data, discard data, but as long as there are enough correctly behaving servers the service as a whole behaves correctly; the meta data for BookKeeper is stored in ZooKeeper. Hi Patrick, this sounds cool. Are there any figures on throughput, ie how many records BookKeeper can process per second? Bill
Re: [ANNOUNCE] Apache ZooKeeper 3.1.0
Also, you may consider checking a graph that we posted comparing the performance of BookKeeper with the one of HDFS using a local file system and local+NFS in the jira issue 5189 (https://issues.apache.org/jira/browse/HADOOP-5189 ). -Flavio On Feb 20, 2009, at 10:05 AM, Flavio Junqueira wrote: Hi Bill, I'm sorry, I missed this message initially. I'm sending below a table that gives you throughput figures for BookKeeper. The rows correspond to distinct BookKeeper configuration (ensemble size, quorum size, entry type), and the columns to different values for the length of an entry in bytes. The throughput values correspond to one client writing 400K records (we call them entries) asynchronously to a ledger. Finally, the table shows write throughput in thousands of operations per second. 1281024 8192 3-2-V 32.80 26.45 5.89 4-2-V 41.72 31.53 6.55 5-2-V 46.89 32.45 6.61 4-3-G 28.02 21.61 4.37 5-3-G 34.91 28.22 4.60 6-3-G 41.22 31.70 4.55 Let me know if you have more questions, I appreciate your interest. Thanks, -Flavio On Feb 14, 2009, at 2:56 PM, Bill de hOra wrote: Patrick Hunt wrote: A bit about BookKeeper: a system to reliably log streams of records. In BookKeeper, servers are bookies, log streams are ledgers, and each unit of a log (aka record) is a ledger entry. BookKeeper is designed to be reliable; bookies, the servers that store ledgers can be byzantine, which means that some subset of the bookies can fail, corrupt data, discard data, but as long as there are enough correctly behaving servers the service as a whole behaves correctly; the meta data for BookKeeper is stored in ZooKeeper. Hi Patrick, this sounds cool. Are there any figures on throughput, ie how many records BookKeeper can process per second? Bill
Super-long reduce task timeouts in hadoop-0.19.0
(Repost from the dev list) I noticed some really odd behavior today while reviewing the job history of some of our jobs. Our Ganglia graphs showed really long periods of inactivity across the entire cluster, which should definitely not be the case - we have a really long string of jobs in our workflow that should execute one after another. I figured out which jobs were running during those periods of inactivity, and discovered that almost all of them had 4-5 failed reduce tasks, with the reason for failure being something like: Task attempt_200902061117_3382_r_38_0 failed to report status for 1282 seconds. Killing! The actual timeout reported varies from 700-5000 seconds. Virtually all of our longer-running jobs were affected by this problem. The period of inactivity on the cluster seems to correspond to the amount of time the job waited for these reduce tasks to fail. I checked out the tasktracker log for the machines with timed-out reduce tasks looking for something that might explain the problem, but the only thing I came up with that actually referenced the failed task was this log message, which was repeated many times: 2009-02-19 22:48:19,380 INFO org.apache.hadoop.mapred.TaskTracker: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_200902061117_3388/ attempt_200902061117_3388_r_66_0/output/file.out in any of the configured local directories I'm not sure what this means; can anyone shed some light on this message? Further confusing the issue, on the affected machines, I looked in logs/userlogs/task id, and to my surprise, the directory and log files existed, and the syslog file seemed to contain logs of a perfectly good reduce task! Overall, this seems like a pretty critical bug. It's consuming up to 50% of the runtime of our jobs in some instances, killing our throughput. At the very least, it seems like the reduce task timeout period should be MUCH shorter than the current 10-20 minutes. -Bryan
Re: Super-long reduce task timeouts in hadoop-0.19.0
How often do your reduce tasks report status? On Fri, Feb 20, 2009 at 3:58 PM, Bryan Duxbury br...@rapleaf.com wrote: (Repost from the dev list) I noticed some really odd behavior today while reviewing the job history of some of our jobs. Our Ganglia graphs showed really long periods of inactivity across the entire cluster, which should definitely not be the case - we have a really long string of jobs in our workflow that should execute one after another. I figured out which jobs were running during those periods of inactivity, and discovered that almost all of them had 4-5 failed reduce tasks, with the reason for failure being something like: Task attempt_200902061117_3382_r_38_0 failed to report status for 1282 seconds. Killing! The actual timeout reported varies from 700-5000 seconds. Virtually all of our longer-running jobs were affected by this problem. The period of inactivity on the cluster seems to correspond to the amount of time the job waited for these reduce tasks to fail. I checked out the tasktracker log for the machines with timed-out reduce tasks looking for something that might explain the problem, but the only thing I came up with that actually referenced the failed task was this log message, which was repeated many times: 2009-02-19 22:48:19,380 INFO org.apache.hadoop.mapred.TaskTracker: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_200902061117_3388/attempt_200902061117_3388_r_66_0/output/file.out in any of the configured local directories I'm not sure what this means; can anyone shed some light on this message? Further confusing the issue, on the affected machines, I looked in logs/userlogs/task id, and to my surprise, the directory and log files existed, and the syslog file seemed to contain logs of a perfectly good reduce task! Overall, this seems like a pretty critical bug. It's consuming up to 50% of the runtime of our jobs in some instances, killing our throughput. At the very least, it seems like the reduce task timeout period should be MUCH shorter than the current 10-20 minutes. -Bryan -- Ted Dunning, CTO DeepDyve 111 West Evelyn Ave. Ste. 202 Sunnyvale, CA 94086 www.deepdyve.com 408-773-0110 ext. 738 858-414-0013 (m) 408-773-0220 (fax)
Re: HADOOP-2536 supports Oracle too?
On Wed, Feb 18, 2009 at 1:06 AM, sandhiya sandhiy...@gmail.com wrote: Thanks a million!!! It worked. but its a little weird though. I have to put the Library with the jdbc jars in BOTH the executable jar file AND the lib folder in $HADOOP_HOME. Do all of you do the same thing or is it just my computer acting strange?? It seems that things that are directly referenced by the jar you are running can be included in the lib directory in the jar, but things that are loaded with reflection like JDBC drivers have to be in the Hadoop lib directory. I don't think it's both.
Connection problem during data import into hbase
I am trying to import data from a flat file into Hbase using a Map Reduce job. There are close to 2 million rows. Mid way into the job, it starts giving me connection problems and eventually kills the job. When the error comes, the hbase shell also stops working. This is what I get: 2009-02-20 21:37:14,407 INFO org.apache.hadoop.ipc.HBaseClass: Retrying connect to server: /171.69.102.52:60020. Already tried 0 time(s). What could be going wrong? Amandeep Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz
Re: Connection problem during data import into hbase
Here's what it throws on the console: 09/02/20 21:45:29 INFO mapred.JobClient: Task Id : attempt_200902201300_0019_m_06_0, Status : FAILED java.io.IOException: table is null at IN_TABLE_IMPORT$MapClass.map(IN_TABLE_IMPORT.java:33) at IN_TABLE_IMPORT$MapClass.map(IN_TABLE_IMPORT.java:1) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) at org.apache.hadoop.mapred.Child.main(Child.java:155) attempt_200902201300_0019_m_06_0: org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying to locate root region attempt_200902201300_0019_m_06_0: at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:768) attempt_200902201300_0019_m_06_0: at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:448) attempt_200902201300_0019_m_06_0: at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:430) attempt_200902201300_0019_m_06_0: at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:557) attempt_200902201300_0019_m_06_0: at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:457) attempt_200902201300_0019_m_06_0: at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:430) attempt_200902201300_0019_m_06_0: at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:557) attempt_200902201300_0019_m_06_0: at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:461) attempt_200902201300_0019_m_06_0: at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:423) attempt_200902201300_0019_m_06_0: at org.apache.hadoop.hbase.client.HTable.init(HTable.java:114) attempt_200902201300_0019_m_06_0: at org.apache.hadoop.hbase.client.HTable.init(HTable.java:97) attempt_200902201300_0019_m_06_0: at IN_TABLE_IMPORT$MapClass.configure(IN_TABLE_IMPORT.java:120) attempt_200902201300_0019_m_06_0: at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58) attempt_200902201300_0019_m_06_0: at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83) attempt_200902201300_0019_m_06_0: at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) attempt_200902201300_0019_m_06_0: at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58) attempt_200902201300_0019_m_06_0: at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83) attempt_200902201300_0019_m_06_0: at org.apache.hadoop.mapred.MapTask.run(MapTask.java:328) attempt_200902201300_0019_m_06_0: at org.apache.hadoop.mapred.Child.main(Child.java:155) Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz On Fri, Feb 20, 2009 at 9:43 PM, Amandeep Khurana ama...@gmail.com wrote: I am trying to import data from a flat file into Hbase using a Map Reduce job. There are close to 2 million rows. Mid way into the job, it starts giving me connection problems and eventually kills the job. When the error comes, the hbase shell also stops working. This is what I get: 2009-02-20 21:37:14,407 INFO org.apache.hadoop.ipc.HBaseClass: Retrying connect to server: /171.69.102.52:60020. Already tried 0 time(s). What could be going wrong? Amandeep Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz
Re: Connection problem during data import into hbase
I dont know if this is related or not, but it seems to be. After this map reduce job, I tried to count the number of entries in the table in hbase through the shell. It failed with the following error: hbase(main):002:0 count 'in_table' NativeException: java.lang.NullPointerException: null from java.lang.String:-1:in `init' from org/apache/hadoop/hbase/util/Bytes.java:92:in `toString' from org/apache/hadoop/hbase/client/RetriesExhaustedException.java:50:in `getMessage' from org/apache/hadoop/hbase/client/RetriesExhaustedException.java:40:in `init' from org/apache/hadoop/hbase/client/HConnectionManager.java:841:in `getRegionServerWithRetries' from org/apache/hadoop/hbase/client/MetaScanner.java:56:in `metaScan' from org/apache/hadoop/hbase/client/MetaScanner.java:30:in `metaScan' from org/apache/hadoop/hbase/client/HConnectionManager.java:411:in `getHTableDescriptor' from org/apache/hadoop/hbase/client/HTable.java:219:in `getTableDescriptor' from sun.reflect.NativeMethodAccessorImpl:-2:in `invoke0' from sun.reflect.NativeMethodAccessorImpl:-1:in `invoke' from sun.reflect.DelegatingMethodAccessorImpl:-1:in `invoke' from java.lang.reflect.Method:-1:in `invoke' from org/jruby/javasupport/JavaMethod.java:250:in `invokeWithExceptionHandling' from org/jruby/javasupport/JavaMethod.java:219:in `invoke' from org/jruby/javasupport/JavaClass.java:416:in `execute' ... 145 levels... from org/jruby/internal/runtime/methods/DynamicMethod.java:74:in `call' from org/jruby/internal/runtime/methods/CompiledMethod.java:48:in `call' from org/jruby/runtime/CallSite.java:123:in `cacheAndCall' from org/jruby/runtime/CallSite.java:298:in `call' from ruby/hadoop/install/hbase_minus_0_dot_19_dot_0/bin//hadoop/install/hbase/bin/../bin/hirb.rb:429:in `__file__' from ruby/hadoop/install/hbase_minus_0_dot_19_dot_0/bin//hadoop/install/hbase/bin/../bin/hirb.rb:-1:in `__file__' from ruby/hadoop/install/hbase_minus_0_dot_19_dot_0/bin//hadoop/install/hbase/bin/../bin/hirb.rb:-1:in `load' from org/jruby/Ruby.java:512:in `runScript' from org/jruby/Ruby.java:432:in `runNormally' from org/jruby/Ruby.java:312:in `runFromMain' from org/jruby/Main.java:144:in `run' from org/jruby/Main.java:89:in `run' from org/jruby/Main.java:80:in `main' from /hadoop/install/hbase/bin/../bin/HBase.rb:444:in `count' from /hadoop/install/hbase/bin/../bin/hirb.rb:348:in `count' from (hbase):3:in `binding' Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz On Fri, Feb 20, 2009 at 9:46 PM, Amandeep Khurana ama...@gmail.com wrote: Here's what it throws on the console: 09/02/20 21:45:29 INFO mapred.JobClient: Task Id : attempt_200902201300_0019_m_06_0, Status : FAILED java.io.IOException: table is null at IN_TABLE_IMPORT$MapClass.map(IN_TABLE_IMPORT.java:33) at IN_TABLE_IMPORT$MapClass.map(IN_TABLE_IMPORT.java:1) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) at org.apache.hadoop.mapred.Child.main(Child.java:155) attempt_200902201300_0019_m_06_0: org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying to locate root region attempt_200902201300_0019_m_06_0: at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:768) attempt_200902201300_0019_m_06_0: at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:448) attempt_200902201300_0019_m_06_0: at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:430) attempt_200902201300_0019_m_06_0: at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:557) attempt_200902201300_0019_m_06_0: at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:457) attempt_200902201300_0019_m_06_0: at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:430) attempt_200902201300_0019_m_06_0: at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:557) attempt_200902201300_0019_m_06_0: at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:461) attempt_200902201300_0019_m_06_0: at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:423) attempt_200902201300_0019_m_06_0: at org.apache.hadoop.hbase.client.HTable.init(HTable.java:114) attempt_200902201300_0019_m_06_0: at org.apache.hadoop.hbase.client.HTable.init(HTable.java:97) attempt_200902201300_0019_m_06_0: at
Re: Hadoop build error
java5.check: BUILD FAILED /home/raghu/src-hadoop/trunk/build.xml:890: 'java5.home' is not defined. Forrest requires Java5. Please pass -Djava5.home=base of Java 5 distribution to Ant on the command-line. I think the error is self-explanatory. Forrest need JDK1.5 and you can pass it using -Djava5.home argument. May be something like the following: ant -Djavac.args=-Xlint -Xmaxwarns 1000 -Djava5.home={base of Java 5 distribution} tar
Re: Super-long reduce task timeouts in hadoop-0.19.0
We didn't customize this value, to my knowledge, so I'd suspect it's the default. -Bryan On Feb 20, 2009, at 5:00 PM, Ted Dunning wrote: How often do your reduce tasks report status? On Fri, Feb 20, 2009 at 3:58 PM, Bryan Duxbury br...@rapleaf.com wrote: (Repost from the dev list) I noticed some really odd behavior today while reviewing the job history of some of our jobs. Our Ganglia graphs showed really long periods of inactivity across the entire cluster, which should definitely not be the case - we have a really long string of jobs in our workflow that should execute one after another. I figured out which jobs were running during those periods of inactivity, and discovered that almost all of them had 4-5 failed reduce tasks, with the reason for failure being something like: Task attempt_200902061117_3382_r_38_0 failed to report status for 1282 seconds. Killing! The actual timeout reported varies from 700-5000 seconds. Virtually all of our longer-running jobs were affected by this problem. The period of inactivity on the cluster seems to correspond to the amount of time the job waited for these reduce tasks to fail. I checked out the tasktracker log for the machines with timed-out reduce tasks looking for something that might explain the problem, but the only thing I came up with that actually referenced the failed task was this log message, which was repeated many times: 2009-02-19 22:48:19,380 INFO org.apache.hadoop.mapred.TaskTracker: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_200902061117_3388/ attempt_200902061117_3388_r_66_0/output/file.out in any of the configured local directories I'm not sure what this means; can anyone shed some light on this message? Further confusing the issue, on the affected machines, I looked in logs/userlogs/task id, and to my surprise, the directory and log files existed, and the syslog file seemed to contain logs of a perfectly good reduce task! Overall, this seems like a pretty critical bug. It's consuming up to 50% of the runtime of our jobs in some instances, killing our throughput. At the very least, it seems like the reduce task timeout period should be MUCH shorter than the current 10-20 minutes. -Bryan -- Ted Dunning, CTO DeepDyve 111 West Evelyn Ave. Ste. 202 Sunnyvale, CA 94086 www.deepdyve.com 408-773-0110 ext. 738 858-414-0013 (m) 408-773-0220 (fax)