Re: Probelms getting Eclipse Hadoop plugin to work.

2009-02-20 Thread Rasit OZDAS
Erik, did you correctly placed ports in properties window?
Port 9000 under Map/Reduce Master on the left, 9001 under DFS Master on
the right.


2009/2/19 Erik Holstad erikhols...@gmail.com

 Thanks guys!
 Running Linux and the remote cluster is also Linux.
 I have the properties set up like that already on my remote cluster, but
 not sure where to input this info into Eclipse.
 And when changing the ports to 9000 and 9001 I get:

 Error: java.io.IOException: Unknown protocol to job tracker:
 org.apache.hadoop.dfs.ClientProtocol

 Regards Erik




-- 
M. Raşit ÖZDAŞ


Re: How to use Hadoop API to submit job?

2009-02-20 Thread Amareshwari Sriramadasu

You should implement Tool interface and submit jobs.
For example see org.apache.hadoop.examples.WordCount

-Amareshwari
Wu Wei wrote:

Hi,

I used to submit Hadoop job with the utility RunJar.main() on hadoop 
0.18. On hadoop 0.19, because the commandLineConfig of JobClient was 
null, I got a NullPointerException error when RunJar.main() calls 
GenericOptionsParser to get libJars (0.18 didn't do this call). I also 
tried the class JobShell to submit job, but it catches all exceptions 
and sends to stderr so that I cann't handle the exceptions myself.


I noticed that if I can call JobClient's setCommandLineConfig method, 
everything goes easy. But this method has default package 
accessibility, I cann't see the method out of package 
org.apache.hadoop.mapred.


Any advices on using Java APIs to submit job?

Wei




Re: empty log file...

2009-02-20 Thread Rasit OZDAS
Zander,
I've looked at my datanode logs on the slaves, but they are all in quite
small sizes, although we've run many jobs on them.
And running 2 new jobs also didn't add anything to them.
(As I understand from the contents of the logs, hadoop logs especially
operations about DFS performance tests.)

Cheers,
Rasit

2009/2/20 zander1013 zander1...@gmail.com


 hi,

 i am setting up hadoop for the first time on multi-node cluster. right now
 i
 have two nodes. the two node cluster consists of two laptops connected via
 ad-hoc wifi network. they they do not have access to the internet. i
 formated the datanodes on both machines prior to startup...

 output form the commands /usr/local/hadoop/bin/start-all.sh, jps (on both
 machines), and /usr/local/hadoop/bin/stop-all.sh all appear normal. however
 the file /usr/local/hadoop/logs/hadoop-hadoop-datanode-node1.log (the slave
 node) is empty.

 the same file for the master node shows the startup and shutdown events as
 normal and without error.

 is it okay that the log file on the slave is empty?

 zander
 --
 View this message in context:
 http://www.nabble.com/empty-log-file...-tp22113398p22113398.html
 Sent from the Hadoop core-user mailing list archive at Nabble.com.




-- 
M. Raşit ÖZDAŞ


Re: Probelms getting Eclipse Hadoop plugin to work.

2009-02-20 Thread Iman
This thread helped me fix a similar problem: 
http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200807.mbox/%3cc001e847c1fd4248a7d6537643690e2101c83...@mse16be2.mse16.exchange.ms%3e 



In my case, I had the ports specified in the hadoop-site.xml for the 
name node and job tracker switched in the Map/Reduce location's 
configuration.


Iman.
P.S. I sent this reply to the wrong thread before.
Erik Holstad wrote:

Thanks guys!
Running Linux and the remote cluster is also Linux.
I have the properties set up like that already on my remote cluster, but
not sure where to input this info into Eclipse.
And when changing the ports to 9000 and 9001 I get:

Error: java.io.IOException: Unknown protocol to job tracker:
org.apache.hadoop.dfs.ClientProtocol

Regards Erik

  




Re: Map/Recuce Job done locally?

2009-02-20 Thread Rasit OZDAS
Philipp, I have no problem running jobs locally with eclipse (via hadoop
plugin) and observing it from browser.
(Please note that jobtracker page doesn't refresh automatically, you need to
refresh it manually.)

Cheers,
Rasit

2009/2/19 Philipp Dobrigkeit pdobrigk...@gmx.de

 When I start my job from eclipse it gets processed and the output is
 generated, but it never shows up in my JobTracker, which is opened in my
 browser. Why is this happening?
 --
 Pt! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen:
 http://www.gmx.net/de/go/multimessenger01




-- 
M. Raşit ÖZDAŞ


Re: GenericOptionsParser warning

2009-02-20 Thread Steve Loughran

Rasit OZDAS wrote:

Hi,
There is a JIRA issue about this problem, if I understand it correctly:
https://issues.apache.org/jira/browse/HADOOP-3743

Strange, that I searched all source code, but there exists only this control
in 2 places:

if (!(job.getBoolean(mapred.used.genericoptionsparser, false))) {
  LOG.warn(Use GenericOptionsParser for parsing the arguments.  +
   Applications should implement Tool for the same.);
}

Just an if block for logging, no extra controls.
Am I missing something?

If your class implements Tool, than there shouldn't be a warning.


OK, for my automated submission code I'll just set that switch and I 
won't get told off.


Re: How to use Hadoop API to submit job?

2009-02-20 Thread Wu Wei
My application implements Tool interface and I can submit the job using 
shell script hadoop jar xxx.jar app. But here I don't want to use this 
script. Instead, I want to catch errors in my Java code and do some 
further proccessings.


-Wei

Amareshwari Sriramadasu wrote:

You should implement Tool interface and submit jobs.
For example see org.apache.hadoop.examples.WordCount

-Amareshwari
Wu Wei wrote:

Hi,

I used to submit Hadoop job with the utility RunJar.main() on hadoop 
0.18. On hadoop 0.19, because the commandLineConfig of JobClient was 
null, I got a NullPointerException error when RunJar.main() calls 
GenericOptionsParser to get libJars (0.18 didn't do this call). I 
also tried the class JobShell to submit job, but it catches all 
exceptions and sends to stderr so that I cann't handle the exceptions 
myself.


I noticed that if I can call JobClient's setCommandLineConfig method, 
everything goes easy. But this method has default package 
accessibility, I cann't see the method out of package 
org.apache.hadoop.mapred.


Any advices on using Java APIs to submit job?

Wei






Re: the question about the common pc?

2009-02-20 Thread Steve Loughran

?? wrote:

Actually, there's a widely misunderstanding of this Common PC . Common PC 
doesn't means PCs which are daily used, It means the performance of each node, can be 
measured by common pc's computing power.

In the matter of fact, we dont use Gb enthernet for daily pcs' communication, we dont use 
linux for our document process, and most importantly, Hadoop cannot run effectively on 
thoese daily pcs.

 
Hadoop is designed for High performance computing equipment, but claimed to be fit for daily pcs.


Hadoop for pcs? what a joke.


Hadoop is designed to build a high throughput dataprocessing 
infrastructure from commodity PC parts. SATA not RAID or SAN, x68+linux 
not supercomputer hardware and OS. You can bring it up on lighter weight 
systems, but it has a minimium overhead that is quite steep for small 
datasets. I've been doing MapReduce work over small in-memory datasets 
using Erlang,  which works very well in such a context.


-you need a good network, with DNS working (fast), good backbone and 
switches

-the faster your disks, the better your throughput
-ECC memory makes a lot of sense
-you need a good cluster management setup unless you like SSH-ing to 20 
boxes to find out which one is playing up


Re: How to use Hadoop API to submit job?

2009-02-20 Thread Steve Loughran

Wu Wei wrote:

Hi,

I used to submit Hadoop job with the utility RunJar.main() on hadoop 
0.18. On hadoop 0.19, because the commandLineConfig of JobClient was 
null, I got a NullPointerException error when RunJar.main() calls 
GenericOptionsParser to get libJars (0.18 didn't do this call). I also 
tried the class JobShell to submit job, but it catches all exceptions 
and sends to stderr so that I cann't handle the exceptions myself.


I noticed that if I can call JobClient's setCommandLineConfig method, 
everything goes easy. But this method has default package accessibility, 
I cann't see the method out of package org.apache.hadoop.mapred.


Any advices on using Java APIs to submit job?

Wei


Looking at my code, the line that does the work is

JobClient jc = new JobClient(jobConf);
runningJob = jc.submitJob(jobConf);

My full (LGPL) code is here : http://tinyurl.com/djk6vj

there's more work with validating input and output directories, pulling 
back the results, handling timeouts if the job doesnt complete, etc,etc, 
but that's feature creep


Re: the question about the common pc?

2009-02-20 Thread Tim Wintle
On Fri, 2009-02-20 at 13:07 +, Steve Loughran wrote:
 I've been doing MapReduce work over small in-memory datasets 
 using Erlang,  which works very well in such a context.

I've got some (mainly python) scripts (that will probably be run with
hadoop streaming eventually) that I run over multiple cpus/cores on a
single machine by opening the appropriate number of named pipes and
using tee and awk to split the workload

something like

 mkfifo mypipe1
 mkfifo mypipe2
 awk '0 == NR % 2'  mypipe1 | ./mapper | sort  map_out_1
  awk '0 == (NR+1) % 2'  mypipe2 | ./mapper | sort  map_out_2
 ./get_lots_of_data | tee mypipe1  mypipe2

(wait until it's done... or send a signal from the get_lots_of_data
process on completion if it's a cronjob)

 sort -m map_out* | ./reducer  reduce_out

works around the global interpreter lock in python quite nicely and
doesn't need people that write the scripts (who may not be programmers)
to understand multiple processes etc, just stdin and stdout.

Tim Wintle



Re: Probelms getting Eclipse Hadoop plugin to work.

2009-02-20 Thread Erik Holstad
Hi guys!
Thanks for your help, but still no luck, I did try to set it up on a
different machine with Eclipse 3.2.2 and the
IBM plugin instead of the Hadoop one, in that one I only needed to fill out
the install directory and the host
and that worked just fine.
I have filled out the ports correctly and the cluster is up and running and
works just fine.

Regards Erik


Hadoop build error

2009-02-20 Thread raghu kishor
Hi ,

While trying to compile hadoop source (ant -Djavac.args=-Xlint
-Xmaxwarns 1000  tar)  i  get below error . Kindly let me know how
to fix this issue. 
I have java 6 installed .

[javadoc] Standard Doclet version 1.6.0_07
  [javadoc] Building tree for all the packages and classes...
  [javadoc] Building index for all the packages and classes...
  [javadoc] Building index for all classes...

java5.check:

BUILD FAILED
/home/raghu/src-hadoop/trunk/build.xml:890: 'java5.home' is not defined.  
Forrest requires Java5.  Please pass -Djava5.home=base of Java 5 distribution 
to Ant on the command-line.


Thanks,
Raghu


  

Hadoop JMX

2009-02-20 Thread Edward Capriolo
I am working to graph the hadoop JMX variables.
http://hadoop.apache.org/core/docs/r0.17.0/api/org/apache/hadoop/dfs/namenode/metrics/NameNodeStatistics.html
I have a two nodes, one running 0.17 and the other running.0.19

The NameNode JMX objects and attributes seem to be working well. I am
graphing Capacity, NumberOfBlocks, NumberOfFiles, as well as the
operations numLiveDataNodes() numDeadDataNodes()

It seems like the DataNode JMX objects are mostly 0 or -1. I do not
have heavy load on these systems so telling if the counter is
implemented is tricky.

My questions:
1) If a JMX attribute is added, is it generally added as a placeholder
to be implemented later or is it added implemented?
2) Is there a target version to have all these attributes implemented,
or are these all being handled via separate Jira?
3) Can I set TaskTrackers to be monitored as I can for DataNodes, NameNodes?
4) Tips tricks gotcha?

Thank you


Re: Limit number of records or total size in combiner input using jobconf?

2009-02-20 Thread Chris Douglas

So here are my questions:
(1) is there a  jobconf hint to limit the number of records in kviter?
I can (and have) made a fix to my code that processes the values in a
combiner step in batches (i.e takes N at a go,processes that and
repeat), but was wondering if i could just set an option.


Approximately and indirectly, yes. You can limit the amount of memory  
allocated to storing serialized records in memory (io.sort.mb) and the  
percentage of that space reserved for storing record metadata  
(io.sort.record.percent, IIRC). That can be used to limit the number  
of records in each spill, though you may also need to disable the  
combiner during the merge, where you may run into the same problem.


You're almost certainly better off designing your combiner to scale  
well (as you have), since you'll hit this in the reduce, too.



Since this occurred in the MapContext, changing the number of reducers
wont help.
(2) How does changing the number of reducers help at all? I have 7
machines, so I feel 11 (a prime close to 7, why a prime?) is good
enough (some machines are 16GB others 32GB)


Your combiner will look at all the records for a partition and only  
those records in a partition. If your partitioner distributes your  
records evenly in a particular spill, then increasing the total number  
of partitions will decrease the number of records your combiner  
considers in each call. For most partitioners, whether the number of  
reducers is prime should be irrelevant. -C


Re: [ANNOUNCE] Apache ZooKeeper 3.1.0

2009-02-20 Thread Flavio Junqueira
Hi Bill, I'm sorry, I missed this message initially. I'm sending below  
a table that gives you throughput figures for BookKeeper. The rows  
correspond to distinct BookKeeper configuration (ensemble size, quorum  
size, entry type), and the columns to different values for the length  
of an entry in bytes. The throughput values correspond to one client  
writing 400K records (we call them entries) asynchronously to a  
ledger.  Finally, the table shows write throughput in thousands of  
operations per second.



1281024 8192
3-2-V   32.80   26.45   5.89
4-2-V   41.72   31.53   6.55
5-2-V   46.89   32.45   6.61

4-3-G   28.02   21.61   4.37
5-3-G   34.91   28.22   4.60
6-3-G   41.22   31.70   4.55


Let me know if you have more questions, I appreciate your interest.

Thanks,
-Flavio





On Feb 14, 2009, at 2:56 PM, Bill de hOra wrote:


Patrick Hunt wrote:

A bit about BookKeeper: a system to reliably log streams of  
records. In BookKeeper, servers are bookies, log streams are  
ledgers, and each unit of a log (aka record) is a ledger entry.  
BookKeeper is designed to be reliable; bookies, the servers that  
store ledgers can be byzantine, which means that some subset of the  
bookies can fail, corrupt data, discard data, but as long as there  
are enough correctly behaving servers the service as a whole  
behaves correctly; the meta data for BookKeeper is stored in  
ZooKeeper.


Hi Patrick,

this sounds cool. Are there any figures on throughput, ie how many  
records BookKeeper can process per second?


Bill




Re: [ANNOUNCE] Apache ZooKeeper 3.1.0

2009-02-20 Thread Flavio Junqueira
Also, you may consider checking a graph that we posted comparing the  
performance of BookKeeper with the one of HDFS using a local file  
system and local+NFS in the jira issue 5189 (https://issues.apache.org/jira/browse/HADOOP-5189 
).


-Flavio


On Feb 20, 2009, at 10:05 AM, Flavio Junqueira wrote:

Hi Bill, I'm sorry, I missed this message initially. I'm sending  
below a table that gives you throughput figures for BookKeeper. The  
rows correspond to distinct BookKeeper configuration (ensemble size,  
quorum size, entry type), and the columns to different values for  
the length of an entry in bytes. The throughput values correspond to  
one client writing 400K records (we call them entries)  
asynchronously to a ledger.  Finally, the table shows write  
throughput in thousands of operations per second.



1281024 8192
3-2-V   32.80   26.45   5.89
4-2-V   41.72   31.53   6.55
5-2-V   46.89   32.45   6.61

4-3-G   28.02   21.61   4.37
5-3-G   34.91   28.22   4.60
6-3-G   41.22   31.70   4.55


Let me know if you have more questions, I appreciate your interest.

Thanks,
-Flavio





On Feb 14, 2009, at 2:56 PM, Bill de hOra wrote:


Patrick Hunt wrote:

A bit about BookKeeper: a system to reliably log streams of  
records. In BookKeeper, servers are bookies, log streams are  
ledgers, and each unit of a log (aka record) is a ledger  
entry. BookKeeper is designed to be reliable; bookies, the  
servers that store ledgers can be byzantine, which means that some  
subset of the bookies can fail, corrupt data, discard data, but as  
long as there are enough correctly behaving servers the service as  
a whole behaves correctly; the meta data for BookKeeper is stored  
in ZooKeeper.


Hi Patrick,

this sounds cool. Are there any figures on throughput, ie how many  
records BookKeeper can process per second?


Bill






Super-long reduce task timeouts in hadoop-0.19.0

2009-02-20 Thread Bryan Duxbury

(Repost from the dev list)

I noticed some really odd behavior today while reviewing the job  
history of some of our jobs. Our Ganglia graphs showed really long  
periods of inactivity across the entire cluster, which should  
definitely not be the case - we have a really long string of jobs in  
our workflow that should execute one after another. I figured out  
which jobs were running during those periods of inactivity, and  
discovered that almost all of them had 4-5 failed reduce tasks, with  
the reason for failure being something like:


Task attempt_200902061117_3382_r_38_0 failed to report status for  
1282 seconds. Killing!


The actual timeout reported varies from 700-5000 seconds. Virtually  
all of our longer-running jobs were affected by this problem. The  
period of inactivity on the cluster seems to correspond to the amount  
of time the job waited for these reduce tasks to fail.


I checked out the tasktracker log for the machines with timed-out  
reduce tasks looking for something that might explain the problem,  
but the only thing I came up with that actually referenced the failed  
task was this log message, which was repeated many times:


2009-02-19 22:48:19,380 INFO org.apache.hadoop.mapred.TaskTracker:  
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find  
taskTracker/jobcache/job_200902061117_3388/ 
attempt_200902061117_3388_r_66_0/output/file.out in any of the  
configured local directories


I'm not sure what this means; can anyone shed some light on this  
message?


Further confusing the issue, on the affected machines, I looked in  
logs/userlogs/task id, and to my surprise, the directory and log  
files existed, and the syslog file seemed to contain logs of a  
perfectly good reduce task!


Overall, this seems like a pretty critical bug. It's consuming up to  
50% of the runtime of our jobs in some instances, killing our  
throughput. At the very least, it seems like the reduce task timeout  
period should be MUCH shorter than the current 10-20 minutes.


-Bryan


Re: Super-long reduce task timeouts in hadoop-0.19.0

2009-02-20 Thread Ted Dunning
How often do your reduce tasks report status?

On Fri, Feb 20, 2009 at 3:58 PM, Bryan Duxbury br...@rapleaf.com wrote:

 (Repost from the dev list)


 I noticed some really odd behavior today while reviewing the job history of
 some of our jobs. Our Ganglia graphs showed really long periods of
 inactivity across the entire cluster, which should definitely not be the
 case - we have a really long string of jobs in our workflow that should
 execute one after another. I figured out which jobs were running during
 those periods of inactivity, and discovered that almost all of them had 4-5
 failed reduce tasks, with the reason for failure being something like:

 Task attempt_200902061117_3382_r_38_0 failed to report status for 1282
 seconds. Killing!

 The actual timeout reported varies from 700-5000 seconds. Virtually all of
 our longer-running jobs were affected by this problem. The period of
 inactivity on the cluster seems to correspond to the amount of time the job
 waited for these reduce tasks to fail.

 I checked out the tasktracker log for the machines with timed-out reduce
 tasks looking for something that might explain the problem, but the only
 thing I came up with that actually referenced the failed task was this log
 message, which was repeated many times:

 2009-02-19 22:48:19,380 INFO org.apache.hadoop.mapred.TaskTracker:
 org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
 taskTracker/jobcache/job_200902061117_3388/attempt_200902061117_3388_r_66_0/output/file.out
 in any of the configured local directories

 I'm not sure what this means; can anyone shed some light on this message?

 Further confusing the issue, on the affected machines, I looked in
 logs/userlogs/task id, and to my surprise, the directory and log files
 existed, and the syslog file seemed to contain logs of a perfectly good
 reduce task!

 Overall, this seems like a pretty critical bug. It's consuming up to 50% of
 the runtime of our jobs in some instances, killing our throughput. At the
 very least, it seems like the reduce task timeout period should be MUCH
 shorter than the current 10-20 minutes.

 -Bryan




-- 
Ted Dunning, CTO
DeepDyve

111 West Evelyn Ave. Ste. 202
Sunnyvale, CA 94086
www.deepdyve.com
408-773-0110 ext. 738
858-414-0013 (m)
408-773-0220 (fax)


Re: HADOOP-2536 supports Oracle too?

2009-02-20 Thread Kevin Peterson
On Wed, Feb 18, 2009 at 1:06 AM, sandhiya sandhiy...@gmail.com wrote:

 Thanks a million!!! It worked. but its a little weird though. I have to put
 the Library with the jdbc jars in BOTH the executable jar file AND the lib
 folder in $HADOOP_HOME. Do all of you do the same thing or is it just my
 computer acting strange??


It seems that things that are directly referenced by the jar you are running
can be included in the lib directory in the jar, but things that are loaded
with reflection like JDBC drivers have to be in the Hadoop lib directory. I
don't think it's both.


Connection problem during data import into hbase

2009-02-20 Thread Amandeep Khurana
I am trying to import data from a flat file into Hbase using a Map Reduce
job. There are close to 2 million rows. Mid way into the job, it starts
giving me connection problems and eventually kills the job. When the error
comes, the hbase shell also stops working.

This is what I get:

2009-02-20 21:37:14,407 INFO org.apache.hadoop.ipc.HBaseClass:
Retrying connect to server: /171.69.102.52:60020. Already tried 0
time(s).

What could be going wrong?

Amandeep


Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz


Re: Connection problem during data import into hbase

2009-02-20 Thread Amandeep Khurana
Here's what it throws on the console:

09/02/20 21:45:29 INFO mapred.JobClient: Task Id :
attempt_200902201300_0019_m_06_0, Status : FAILED
java.io.IOException: table is null
at IN_TABLE_IMPORT$MapClass.map(IN_TABLE_IMPORT.java:33)
at IN_TABLE_IMPORT$MapClass.map(IN_TABLE_IMPORT.java:1)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child.main(Child.java:155)

attempt_200902201300_0019_m_06_0:
org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
to locate root region
attempt_200902201300_0019_m_06_0:   at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:768)
attempt_200902201300_0019_m_06_0:   at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:448)
attempt_200902201300_0019_m_06_0:   at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:430)
attempt_200902201300_0019_m_06_0:   at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:557)
attempt_200902201300_0019_m_06_0:   at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:457)
attempt_200902201300_0019_m_06_0:   at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:430)
attempt_200902201300_0019_m_06_0:   at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:557)
attempt_200902201300_0019_m_06_0:   at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:461)
attempt_200902201300_0019_m_06_0:   at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:423)
attempt_200902201300_0019_m_06_0:   at
org.apache.hadoop.hbase.client.HTable.init(HTable.java:114)
attempt_200902201300_0019_m_06_0:   at
org.apache.hadoop.hbase.client.HTable.init(HTable.java:97)
attempt_200902201300_0019_m_06_0:   at
IN_TABLE_IMPORT$MapClass.configure(IN_TABLE_IMPORT.java:120)
attempt_200902201300_0019_m_06_0:   at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
attempt_200902201300_0019_m_06_0:   at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83)
attempt_200902201300_0019_m_06_0:   at
org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
attempt_200902201300_0019_m_06_0:   at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
attempt_200902201300_0019_m_06_0:   at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83)
attempt_200902201300_0019_m_06_0:   at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:328)
attempt_200902201300_0019_m_06_0:   at
org.apache.hadoop.mapred.Child.main(Child.java:155)





Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz


On Fri, Feb 20, 2009 at 9:43 PM, Amandeep Khurana ama...@gmail.com wrote:

 I am trying to import data from a flat file into Hbase using a Map Reduce
 job. There are close to 2 million rows. Mid way into the job, it starts
 giving me connection problems and eventually kills the job. When the error
 comes, the hbase shell also stops working.

 This is what I get:

 2009-02-20 21:37:14,407 INFO org.apache.hadoop.ipc.HBaseClass: Retrying 
 connect to server: /171.69.102.52:60020. Already tried 0 time(s).

 What could be going wrong?

 Amandeep


 Amandeep Khurana
 Computer Science Graduate Student
 University of California, Santa Cruz



Re: Connection problem during data import into hbase

2009-02-20 Thread Amandeep Khurana
I dont know if this is related or not, but it seems to be. After this map
reduce job, I tried to count the number of entries in the table in hbase
through the shell. It failed with the following error:

hbase(main):002:0 count 'in_table'
NativeException: java.lang.NullPointerException: null
from java.lang.String:-1:in `init'
from org/apache/hadoop/hbase/util/Bytes.java:92:in `toString'
from org/apache/hadoop/hbase/client/RetriesExhaustedException.java:50:in
`getMessage'
from org/apache/hadoop/hbase/client/RetriesExhaustedException.java:40:in
`init'
from org/apache/hadoop/hbase/client/HConnectionManager.java:841:in
`getRegionServerWithRetries'
from org/apache/hadoop/hbase/client/MetaScanner.java:56:in `metaScan'
from org/apache/hadoop/hbase/client/MetaScanner.java:30:in `metaScan'
from org/apache/hadoop/hbase/client/HConnectionManager.java:411:in
`getHTableDescriptor'
from org/apache/hadoop/hbase/client/HTable.java:219:in
`getTableDescriptor'
from sun.reflect.NativeMethodAccessorImpl:-2:in `invoke0'
from sun.reflect.NativeMethodAccessorImpl:-1:in `invoke'
from sun.reflect.DelegatingMethodAccessorImpl:-1:in `invoke'
from java.lang.reflect.Method:-1:in `invoke'
from org/jruby/javasupport/JavaMethod.java:250:in
`invokeWithExceptionHandling'
from org/jruby/javasupport/JavaMethod.java:219:in `invoke'
from org/jruby/javasupport/JavaClass.java:416:in `execute'
... 145 levels...
from org/jruby/internal/runtime/methods/DynamicMethod.java:74:in `call'
from org/jruby/internal/runtime/methods/CompiledMethod.java:48:in `call'
from org/jruby/runtime/CallSite.java:123:in `cacheAndCall'
from org/jruby/runtime/CallSite.java:298:in `call'
from
ruby/hadoop/install/hbase_minus_0_dot_19_dot_0/bin//hadoop/install/hbase/bin/../bin/hirb.rb:429:in
`__file__'
from
ruby/hadoop/install/hbase_minus_0_dot_19_dot_0/bin//hadoop/install/hbase/bin/../bin/hirb.rb:-1:in
`__file__'
from
ruby/hadoop/install/hbase_minus_0_dot_19_dot_0/bin//hadoop/install/hbase/bin/../bin/hirb.rb:-1:in
`load'
from org/jruby/Ruby.java:512:in `runScript'
from org/jruby/Ruby.java:432:in `runNormally'
from org/jruby/Ruby.java:312:in `runFromMain'
from org/jruby/Main.java:144:in `run'
from org/jruby/Main.java:89:in `run'
from org/jruby/Main.java:80:in `main'
from /hadoop/install/hbase/bin/../bin/HBase.rb:444:in `count'
from /hadoop/install/hbase/bin/../bin/hirb.rb:348:in `count'
from (hbase):3:in `binding'


Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz


On Fri, Feb 20, 2009 at 9:46 PM, Amandeep Khurana ama...@gmail.com wrote:

 Here's what it throws on the console:

 09/02/20 21:45:29 INFO mapred.JobClient: Task Id :
 attempt_200902201300_0019_m_06_0, Status : FAILED
 java.io.IOException: table is null
 at IN_TABLE_IMPORT$MapClass.map(IN_TABLE_IMPORT.java:33)
 at IN_TABLE_IMPORT$MapClass.map(IN_TABLE_IMPORT.java:1)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
 at org.apache.hadoop.mapred.Child.main(Child.java:155)

 attempt_200902201300_0019_m_06_0:
 org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
 to locate root region
 attempt_200902201300_0019_m_06_0:   at
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:768)
 attempt_200902201300_0019_m_06_0:   at
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:448)
 attempt_200902201300_0019_m_06_0:   at
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:430)
 attempt_200902201300_0019_m_06_0:   at
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:557)
 attempt_200902201300_0019_m_06_0:   at
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:457)
 attempt_200902201300_0019_m_06_0:   at
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:430)
 attempt_200902201300_0019_m_06_0:   at
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:557)
 attempt_200902201300_0019_m_06_0:   at
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:461)
 attempt_200902201300_0019_m_06_0:   at
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:423)
 attempt_200902201300_0019_m_06_0:   at
 org.apache.hadoop.hbase.client.HTable.init(HTable.java:114)
 attempt_200902201300_0019_m_06_0:   at
 org.apache.hadoop.hbase.client.HTable.init(HTable.java:97)
 attempt_200902201300_0019_m_06_0:   at
 

Re: Hadoop build error

2009-02-20 Thread Abdul Qadeer



 java5.check:

 BUILD FAILED
 /home/raghu/src-hadoop/trunk/build.xml:890: 'java5.home' is not defined.
  Forrest requires Java5.  Please pass -Djava5.home=base of Java 5
 distribution to Ant on the command-line.



I think the error is self-explanatory.  Forrest need JDK1.5 and you can pass
it using -Djava5.home argument.
May be something like the following:

ant -Djavac.args=-Xlint  -Xmaxwarns 1000  -Djava5.home={base of Java 5
distribution} tar


Re: Super-long reduce task timeouts in hadoop-0.19.0

2009-02-20 Thread Bryan Duxbury
We didn't customize this value, to my knowledge, so I'd suspect it's  
the default.

-Bryan

On Feb 20, 2009, at 5:00 PM, Ted Dunning wrote:


How often do your reduce tasks report status?

On Fri, Feb 20, 2009 at 3:58 PM, Bryan Duxbury br...@rapleaf.com  
wrote:



(Repost from the dev list)


I noticed some really odd behavior today while reviewing the job  
history of

some of our jobs. Our Ganglia graphs showed really long periods of
inactivity across the entire cluster, which should definitely not  
be the
case - we have a really long string of jobs in our workflow that  
should
execute one after another. I figured out which jobs were running  
during
those periods of inactivity, and discovered that almost all of  
them had 4-5
failed reduce tasks, with the reason for failure being something  
like:


Task attempt_200902061117_3382_r_38_0 failed to report status  
for 1282

seconds. Killing!

The actual timeout reported varies from 700-5000 seconds.  
Virtually all of

our longer-running jobs were affected by this problem. The period of
inactivity on the cluster seems to correspond to the amount of  
time the job

waited for these reduce tasks to fail.

I checked out the tasktracker log for the machines with timed-out  
reduce
tasks looking for something that might explain the problem, but  
the only
thing I came up with that actually referenced the failed task was  
this log

message, which was repeated many times:

2009-02-19 22:48:19,380 INFO org.apache.hadoop.mapred.TaskTracker:
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
taskTracker/jobcache/job_200902061117_3388/ 
attempt_200902061117_3388_r_66_0/output/file.out

in any of the configured local directories

I'm not sure what this means; can anyone shed some light on this  
message?


Further confusing the issue, on the affected machines, I looked in
logs/userlogs/task id, and to my surprise, the directory and log  
files
existed, and the syslog file seemed to contain logs of a perfectly  
good

reduce task!

Overall, this seems like a pretty critical bug. It's consuming up  
to 50% of
the runtime of our jobs in some instances, killing our throughput.  
At the
very least, it seems like the reduce task timeout period should be  
MUCH

shorter than the current 10-20 minutes.

-Bryan





--
Ted Dunning, CTO
DeepDyve

111 West Evelyn Ave. Ste. 202
Sunnyvale, CA 94086
www.deepdyve.com
408-773-0110 ext. 738
858-414-0013 (m)
408-773-0220 (fax)