Re: using StreamInputFormat, StreamXmlRecordReader with your custom Jobs

2010-03-11 Thread Reik Schatz
Uh, do I have to copy the jar file manually into HDFS before I invoke 
the hadoop jar command starting my own job?




Utkarsh Agarwal wrote:


I think you can use DistributedCache to specify the location of the jar
after you have it in hdfs..

On Wed, Mar 10, 2010 at 6:11 AM, Reik Schatz reik.sch...@bwin.org wrote:

  

Hi, I am playing around with version 0.20.2 of Hadoop. I have written and
packaged a Job using a custom Mapper and Reducer. The input format in my Job
is set to StreamInputFormat. Also setting property stream.recordreader.class
to org.apache.hadoop.streaming.StreamXmlRecordReader.

This is how I want to start my job:
hadoop jar custom-1.0-SNAPSHOT.jar EmailCountingJob /input /output

The problem is that in this case all classes from
hadoop-0.20.2-streaming.jar are missing (ClassNotFoundException). I tried
using -libjars without luck.
hadoop jar -libjars PATH/hadoop-0.20.2-streaming.jar
custom-1.0-SNAPSHOT.jar EmailCountingJob /input /output

Any chance to use streaming classes with your own Jobs without copying
these classes to your projects and packaging them into your own jar?


/Reik




Re: Passing whole text file to a single map

2010-03-11 Thread HypOo

I have the same problem, I need to asign a whole file per map but I don't
know how to do that.
I've tried to create a new WholeFileFormat.class and override the method
isSplitable() but it doesn't seems to work..
Have you achieved to do this ?
I'm using hadoop 0.20.2



stolikp wrote:
 
 I've got some text files in my input directory and I want to pass each
 single text file (whole file not just a line) to a map (one file per one
 map). How can I do this ? TextInputFormat splits text into lines and I do
 not want this to happen.
 I tried:
 http://hadoop.apache.org/common/docs/r0.20./streaming.html#How+do+I+process+files%2C+one+per+map%3F
 but it doesn't work for me, compiler doesn't know what
 NonSplitableTextInputFormat.class is.
 I'm using hadoop 0.20.1 
 

-- 
View this message in context: 
http://old.nabble.com/Passing-whole-text-file-to-a-single-map-tp27287649p27860526.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Re: Passing whole text file to a single map

2010-03-11 Thread Thomas Thevis
Do you guys have a copy of Tom White's Hadoop book available? There is 
an excellent example of a WholeFileInputFormat which definitly works 
with Hadoop-0.20.0.
What do you mean by 'it doesn't seem to work'? Exceptions, unexpected 
output, ... ?

Regards,
Thomas

Am 11.03.2010 10:24, schrieb HypOo:

 I have the same problem, I need to asign a whole file per map but I don't
 know how to do that.
 I've tried to create a new WholeFileFormat.class and override the method
 isSplitable() but it doesn't seems to work..
 Have you achieved to do this ?
 I'm using hadoop 0.20.2



 stolikp wrote:
  
   I've got some text files in my input directory and I want to pass each
   single text file (whole file not just a line) to a map (one file per one
   map). How can I do this ? TextInputFormat splits text into lines and I do
   not want this to happen.
   I tried:
  
 http://hadoop.apache.org/common/docs/r0.20./streaming.html#How+do+I+process+files%2C+one+per+map%3F
   but it doesn't work for me, compiler doesn't know what
   NonSplitableTextInputFormat.class is.
   I'm using hadoop 0.20.1
  

 --
 View this message in context:
 http://old.nabble.com/Passing-whole-text-file-to-a-single-map-tp27287649p27860526.html
 Sent from the Hadoop core-user mailing list archive at Nabble.com.




Call for presentations - Berlin Buzzwords - Summer 2010

2010-03-11 Thread Isabel Drost
Call for Presentations Berlin Buzzwords
 http://buzzwordsberlin.de
  Berlin Buzzwords 2010 - Search, Store, Scale
   7/8 June 2010


This is to announce the Berlin Buzzwords 2010. The first conference on scalable 
and open search, data processing and data storage in Germany, taking place in 
Berlin.

The event will comprise presentations on scalable data processing. We invite 
you 
to submit talks on the topics:

Information retrieval / Search - Lucene, Solr, katta or comparable solutions
NoSQL - like CouchDB, MongoDB, Jackrabbit, HBase and others
Hadoop - Hadoop itself, MapReduce, Cascading or Pig and relatives

Closely related topics not explicitly listed above are welcome. We are looking 
for presentations on the implementation of the systems themselves, real world 
applications and case studies. 

Important Dates (all dates in GMT +2):

Submission deadline: April 17th 2010, 23:59
Notification of accepted speakers: May 1st, 2010. 
Publication of final schedule: May 9th, 2010. 
Conference: June 7/8. 2010.

High quality, technical submissions are called for, ranging from principles to 
practice. We are looking for real world use cases, background on the 
architecture of specific projects and a deep dive into architectures built on 
top of e.g. Hadoop clusters. 

Proposals should be submitted at http://berlinbuzzwords.de/content/cfp no later 
than April 17th, 2010. Acceptance notifications will be sent out on May 1st. 
Please include your name, bio and email, the title of the talk, a brief 
abstract 
in English language. Please indicate whether you want to give a short (30min) 
or 
long (45min) presentation and indicate the level of experience with the topic 
your audience should have (e.g. whether your talk will be suitable for newbies 
or is targeted for experienced users.)

The presentation format is short: either 30 or 45 minutes including questions. 
We will be enforcing the schedule rigorously. 

If you are interested in sponsoring the event (e.g. we would be happy to 
provide 
videos after the event, free drinks for attendees as well as an after-show 
party), please contact us. 

Follow @hadoopberlin on Twitter for updates. News on the conference will be 
published on our website at http://berlinbuzzwords.de

Program Chairs: Isabel Drost, Jan Lehnardt, and Simon Willnauer.

Schedule and further updates on the event will be published on 
http://berlinbuzzwords.de Please re-distribute this CfP to people who might be 
interested.

Contact us at: 
newthinking communications GmbH
Schönhauser Allee 6/7
10119 Berlin, Germany
Andreas Gebhard a...@newthinking.de
Isabel Drost i...@newthinking.de
+49(0)30-9210 596


signature.asc
Description: This is a digitally signed message part.


Re: Call for presentations - Berlin Buzzwords - Summer 2010

2010-03-11 Thread Isabel Drost
On 11.03.2010 Isabel Drost wrote:
 Call for Presentations Berlin Buzzwords
  http://buzzwordsberlin.de

http://berlinbuzzwords.de of course...

Isabel


signature.asc
Description: This is a digitally signed message part.


How do I upgrade my hadoop cluster using hadoop?

2010-03-11 Thread Raymond Jennings III
I thought there was a util to do the upgrade for you that you run from one node 
and it would do a copy to every other node?


  


TaskTracker: Java heap space error

2010-03-11 Thread Boyu Zhang
Dear All,

I am running a hadoop job processing data. The output of map is really
large, and it spill 15 times. So I was trying to set io.sort.mb = 256
instead of 100. And I leave everything else default. I am using 0.20.2
version. And when I run the job, I got the following errors:

2010-03-11 11:09:37,581 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=MAP, sessionId=
2010-03-11 11:09:38,073 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 1
2010-03-11 11:09:38,086 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 256
2010-03-11 11:09:38,326 FATAL org.apache.hadoop.mapred.TaskTracker:
Error running child : java.lang.OutOfMemoryError: Java heap space
at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:781)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)


I can't figure out why, could anyone please give me a hint? Any hlep will be
appreciate! Thanks a lot!

SIncerely,

Boyu


Re: TaskTracker: Java heap space error

2010-03-11 Thread Arun C Murthy

Moving to mapreduce-user@, bcc: common-user

Have you tried bumping up the heap for the map task?

Since you are setting io.sort.mb to 256M, pls set heap-size to 512M at  
least, if not more.


mapred.child.java.opts - -Xmx512M or -Xmx1024m

Arun

On Mar 11, 2010, at 8:24 AM, Boyu Zhang wrote:


Dear All,

I am running a hadoop job processing data. The output of map is really
large, and it spill 15 times. So I was trying to set io.sort.mb = 256
instead of 100. And I leave everything else default. I am using 0.20.2
version. And when I run the job, I got the following errors:

2010-03-11 11:09:37,581 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=MAP, sessionId=
2010-03-11 11:09:38,073 INFO org.apache.hadoop.mapred.MapTask:  
numReduceTasks: 1
2010-03-11 11:09:38,086 INFO org.apache.hadoop.mapred.MapTask:  
io.sort.mb = 256

2010-03-11 11:09:38,326 FATAL org.apache.hadoop.mapred.TaskTracker:
Error running child : java.lang.OutOfMemoryError: Java heap space
	at org.apache.hadoop.mapred.MapTask 
$MapOutputBuffer.init(MapTask.java:781)

at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)


I can't figure out why, could anyone please give me a hint? Any hlep  
will be

appreciate! Thanks a lot!

SIncerely,

Boyu




Re: region server appearing twice on HBase Master page

2010-03-11 Thread Jean-Daniel Cryans
Bringing the discussion in hbase-user

That usually happens after a DNS hiccup. There's a fix for that in
https://issues.apache.org/jira/browse/HBASE-2174

J-D

On Wed, Mar 10, 2010 at 1:41 PM, Ted Yu yuzhih...@gmail.com wrote:
 I noticed two lines for the same region server on HBase Master page:
 X.com:60030    1268160765854    requests=0, regions=16, usedHeap=1068,
 maxHeap=6127
 X.com:60030    1268250726442    requests=21, regions=9, usedHeap=1258,
 maxHeap=6127

 I checked there is only one
 org.apache.hadoop.hbase.regionserver.HRegionServer instance running on that
 machine.

 This is from region server log:

 2010-03-10 13:25:38,157 ERROR [IPC Server handler 43 on 60020]
 regionserver.HRegionServer(844):
 org.apache.hadoop.hbase.NotServingRegionException: ruletable,,1268083966723
        at
 org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2307)
        at
 org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:1784)
        at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
        at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at
 org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
        at
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
 2010-03-10 13:25:38,189 ERROR [IPC Server handler 0 on 60020]
 regionserver.HRegionServer(844):
 org.apache.hadoop.hbase.NotServingRegionException: ruletable,,1268083966723
        at
 org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2307)
        at
 org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:1784)
        at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
        at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at
 org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
        at
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)

 If you know how to troubleshoot, please share.



Re: TaskTracker: Java heap space error

2010-03-11 Thread Boyu Zhang
Hi Arun,

I did what you said, and it seems to work. Thanks a lot! I guess I do not
completely understand how the tuning parameters affect each other. Thanks!

Boyu

On Thu, Mar 11, 2010 at 12:27 PM, Arun C Murthy a...@yahoo-inc.com wrote:

 Moving to mapreduce-user@, bcc: common-user

 Have you tried bumping up the heap for the map task?

 Since you are setting io.sort.mb to 256M, pls set heap-size to 512M at
 least, if not more.

 mapred.child.java.opts - -Xmx512M or -Xmx1024m

 Arun


 On Mar 11, 2010, at 8:24 AM, Boyu Zhang wrote:

  Dear All,

 I am running a hadoop job processing data. The output of map is really
 large, and it spill 15 times. So I was trying to set io.sort.mb = 256
 instead of 100. And I leave everything else default. I am using 0.20.2
 version. And when I run the job, I got the following errors:

 2010-03-11 11:09:37,581 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
 Initializing JVM Metrics with processName=MAP, sessionId=
 2010-03-11 11:09:38,073 INFO org.apache.hadoop.mapred.MapTask:
 numReduceTasks: 1
 2010-03-11 11:09:38,086 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb
 = 256
 2010-03-11 11:09:38,326 FATAL org.apache.hadoop.mapred.TaskTracker:
 Error running child : java.lang.OutOfMemoryError: Java heap space
at
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:781)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)


 I can't figure out why, could anyone please give me a hint? Any hlep will
 be
 appreciate! Thanks a lot!

 SIncerely,

 Boyu