Namenode UI - Browse File System not working in psedo-dist cluster..

2010-05-11 Thread Gokulakannan M
 

Hi,

 

 

The Browse File System link  in NameNode UI
(http://namenode:50070) is not working when I run NameNode and 1 DataNode in
the same system (Pseudo-distributed mode).

 

I thought it might be jetty issue. But if I run 1 NameNode and 1
DataNode in one system and a DataNode in another system (total 1 NN and 2
DN), the Browse File System link is working fine and I can see the files in
HDFS. 

 

Any idea why the issue with pseudo-distributed mode???

 

 Thanks,

  Gokul

 

  

 

 



Re: Questions about SequenceFiles

2010-05-11 Thread Ananth Sarathy
Yeah, no I get that. But when you use the sequence file reader example from
The Hadoop The Defintive Guide book page 106

reader = new SequenceFile.Reader(fs, path, conf);
 System.out.println(reader.getKeyClass());
System.out.println(reader.getValueClass());

Writable key = (Writable) ReflectionUtils.newInstance(reader
.getKeyClass(), conf);
Writable val = (Writable) ReflectionUtils.newInstance(reader
.getValueClass(), conf);

LuceneDocumentWrapper ldw = null;

long position = reader.getPosition();
while (reader.next(key, val)) {

ldw = (LuceneDocumentWrapper) val;
System.out.println(ldw.get());

 }

But when using a LuceneDocumentWrapper which uses the interface, I get this
error

java.lang.RuntimeException: java.lang.NoSuchMethodException:
org.apache.hadoop.hbase.mapreduce.LuceneDocumentWrapper.init()
at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115)
at
com.iswcorp.mapreduce.test.SequenceFileReaderTest.main(SequenceFileReaderTest.java:39)
Caused by: java.lang.NoSuchMethodException:
org.apache.hadoop.hbase.mapreduce.LuceneDocumentWrapper.init()
at java.lang.Class.getConstructor0(Class.java:2706)

Caused by this lineWritable val = (Writable)
ReflectionUtils.newInstance(reader
.getValueClass(), conf);

which has to do with not having a default constructor, which is why I asked
the orginal question. Is there some other way to get the values out?
Ananth T Sarathy


On Mon, May 10, 2010 at 11:46 PM, Ted Yu yuzhih...@gmail.com wrote:

 Writable is the recommended interface to work with.
 Writable implementations reuse instances which serves large scale data
 processing better than JavaSerialization.

 Cheers

 On Mon, May 10, 2010 at 6:29 PM, Ananth Sarathy
 ananth.t.sara...@gmail.comwrote:

  My team and I were working with sequence files and were using the
  LuceneDocumentWrapper. But when I try to get the valcall, i get a no such
  method exception from the ReflectionUtils, which is caused because it's
  trying to call a default constructor which doesn't exist for that class.
 
  So my question is  whether there is documentation or limitations to the
  type
  of objects that can be used with a sequencefile other than the Writable
  interface? I want to know if maybe I am trying to read from the file in
 the
  wrong way.
  Ananth T Sarathy
 



Re: Questions about SequenceFiles

2010-05-11 Thread Ted Yu
The class implementing Writable should provide a public default constructor.

On Tue, May 11, 2010 at 7:20 AM, Ananth Sarathy
ananth.t.sara...@gmail.comwrote:

 Yeah, no I get that. But when you use the sequence file reader example from
 The Hadoop The Defintive Guide book page 106

reader = new SequenceFile.Reader(fs, path, conf);
 System.out.println(reader.getKeyClass());
System.out.println(reader.getValueClass());

Writable key = (Writable) ReflectionUtils.newInstance(reader
.getKeyClass(), conf);
Writable val = (Writable) ReflectionUtils.newInstance(reader
.getValueClass(), conf);

LuceneDocumentWrapper ldw = null;

long position = reader.getPosition();
while (reader.next(key, val)) {

ldw = (LuceneDocumentWrapper) val;
System.out.println(ldw.get());

  }

 But when using a LuceneDocumentWrapper which uses the interface, I get this
 error

 java.lang.RuntimeException: java.lang.NoSuchMethodException:
 org.apache.hadoop.hbase.mapreduce.LuceneDocumentWrapper.init()
at

 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115)
at

 com.iswcorp.mapreduce.test.SequenceFileReaderTest.main(SequenceFileReaderTest.java:39)
 Caused by: java.lang.NoSuchMethodException:
 org.apache.hadoop.hbase.mapreduce.LuceneDocumentWrapper.init()
at java.lang.Class.getConstructor0(Class.java:2706)

 Caused by this lineWritable val = (Writable)
 ReflectionUtils.newInstance(reader
.getValueClass(), conf);

 which has to do with not having a default constructor, which is why I asked
 the orginal question. Is there some other way to get the values out?
 Ananth T Sarathy


 On Mon, May 10, 2010 at 11:46 PM, Ted Yu yuzhih...@gmail.com wrote:

  Writable is the recommended interface to work with.
  Writable implementations reuse instances which serves large scale data
  processing better than JavaSerialization.
 
  Cheers
 
  On Mon, May 10, 2010 at 6:29 PM, Ananth Sarathy
  ananth.t.sara...@gmail.comwrote:
 
   My team and I were working with sequence files and were using the
   LuceneDocumentWrapper. But when I try to get the valcall, i get a no
 such
   method exception from the ReflectionUtils, which is caused because it's
   trying to call a default constructor which doesn't exist for that
 class.
  
   So my question is  whether there is documentation or limitations to the
   type
   of objects that can be used with a sequencefile other than the Writable
   interface? I want to know if maybe I am trying to read from the file in
  the
   wrong way.
   Ananth T Sarathy
  
 



Re: Questions about SequenceFiles

2010-05-11 Thread Jeff Zhang
I think this is a bug, writable object should have default no-argument
constructor.


On Tue, May 11, 2010 at 7:20 AM, Ananth Sarathy
ananth.t.sara...@gmail.com wrote:
 Yeah, no I get that. But when you use the sequence file reader example from
 The Hadoop The Defintive Guide book page 106

        reader = new SequenceFile.Reader(fs, path, conf);
             System.out.println(reader.getKeyClass());
            System.out.println(reader.getValueClass());

            Writable key = (Writable) ReflectionUtils.newInstance(reader
                    .getKeyClass(), conf);
            Writable val = (Writable) ReflectionUtils.newInstance(reader
                    .getValueClass(), conf);

            LuceneDocumentWrapper ldw = null;

            long position = reader.getPosition();
            while (reader.next(key, val)) {

                ldw = (LuceneDocumentWrapper) val;
                System.out.println(ldw.get());

  }

 But when using a LuceneDocumentWrapper which uses the interface, I get this
 error

 java.lang.RuntimeException: java.lang.NoSuchMethodException:
 org.apache.hadoop.hbase.mapreduce.LuceneDocumentWrapper.init()
    at
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115)
    at
 com.iswcorp.mapreduce.test.SequenceFileReaderTest.main(SequenceFileReaderTest.java:39)
 Caused by: java.lang.NoSuchMethodException:
 org.apache.hadoop.hbase.mapreduce.LuceneDocumentWrapper.init()
    at java.lang.Class.getConstructor0(Class.java:2706)

 Caused by this line    Writable val = (Writable)
 ReflectionUtils.newInstance(reader
                    .getValueClass(), conf);

 which has to do with not having a default constructor, which is why I asked
 the orginal question. Is there some other way to get the values out?
 Ananth T Sarathy


 On Mon, May 10, 2010 at 11:46 PM, Ted Yu yuzhih...@gmail.com wrote:

 Writable is the recommended interface to work with.
 Writable implementations reuse instances which serves large scale data
 processing better than JavaSerialization.

 Cheers

 On Mon, May 10, 2010 at 6:29 PM, Ananth Sarathy
 ananth.t.sara...@gmail.comwrote:

  My team and I were working with sequence files and were using the
  LuceneDocumentWrapper. But when I try to get the valcall, i get a no such
  method exception from the ReflectionUtils, which is caused because it's
  trying to call a default constructor which doesn't exist for that class.
 
  So my question is  whether there is documentation or limitations to the
  type
  of objects that can be used with a sequencefile other than the Writable
  interface? I want to know if maybe I am trying to read from the file in
 the
  wrong way.
  Ananth T Sarathy
 





-- 
Best Regards

Jeff Zhang


Re: Hadoop performance - xfs and ext4

2010-05-11 Thread stephen mulcahy

On 23/04/10 15:43, Todd Lipcon wrote:

Hi Stephen,

Can you try mounting ext4 with the nodelalloc option? I've seen the same
improvement due to delayed allocation butbeen a little nervous about that
option (especially in the NN where we currently follow what the kernel
people call an antipattern for image rotation).


Hi Todd,

Sorry for the delayed response - I had to wait for another test window 
before trying this out.


To clarify, my namename and secondary namenode have been using ext4 in 
all tests - reconfiguring the datanodes is a fast operation, the nn and 
2nn less so. I figure any big performance benefit would appear on the 
data nodes anyway and can then apply it back to the nn and 2nn if 
testing shows any benefits in changing.


So I tried running our datanodes with their ext4 filesystems mounted 
using noatime,nodelalloc and after 6 runs of the TeraSort, it seems it 
runs SLOWER with those options by between 5-8%. The TeraGen itself 
seemed to run about 5% faster but it was only a single run so I'm not 
sure how reliable that is.


hth,

-stephen

--
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
http://di2.deri.iehttp://webstar.deri.iehttp://sindice.com


Re: Hadoop performance - xfs and ext4

2010-05-11 Thread Todd Lipcon
On Tue, May 11, 2010 at 7:33 AM, stephen mulcahy
stephen.mulc...@deri.orgwrote:

 On 23/04/10 15:43, Todd Lipcon wrote:

 Hi Stephen,

 Can you try mounting ext4 with the nodelalloc option? I've seen the same
 improvement due to delayed allocation butbeen a little nervous about that
 option (especially in the NN where we currently follow what the kernel
 people call an antipattern for image rotation).


 Hi Todd,

 Sorry for the delayed response - I had to wait for another test window
 before trying this out.

 To clarify, my namename and secondary namenode have been using ext4 in all
 tests - reconfiguring the datanodes is a fast operation, the nn and 2nn less
 so. I figure any big performance benefit would appear on the data nodes
 anyway and can then apply it back to the nn and 2nn if testing shows any
 benefits in changing.

 So I tried running our datanodes with their ext4 filesystems mounted using
 noatime,nodelalloc and after 6 runs of the TeraSort, it seems it runs
 SLOWER with those options by between 5-8%. The TeraGen itself seemed to run
 about 5% faster but it was only a single run so I'm not sure how reliable
 that is.


Yep, that's what I'd expect. noatime should be a small improvement,
nodelalloc should be a small detriment. The thing is that delayed allocation
has some strange cases that could theoretically cause data loss after a
power outage, so I was interested to see if it nullified all of your
performance gains or if it were just a small hit.

-Todd

-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Questions about SequenceFiles

2010-05-11 Thread Owen O'Malley
On Tue, May 11, 2010 at 7:48 AM, Ananth Sarathy
ananth.t.sara...@gmail.com wrote:
 Ok,  how can I report that?

File a jira on the project that manages the type. I assume it is
Lucene in this case.

  Also, it seems that requiring a no argument constructor but using an
 interface is kind of a broken paradigm. Shouldn't there be some other
 mechanism for this?

The problem is that given a class name from the SequenceFile, we need
to build an empty object. The most natural way to provide that
capability is with a 0 argument constructor.

-- Owen


Re: Hadoop performance - xfs and ext4

2010-05-11 Thread Edward Capriolo
On Tue, May 11, 2010 at 10:39 AM, Todd Lipcon t...@cloudera.com wrote:

 On Tue, May 11, 2010 at 7:33 AM, stephen mulcahy
 stephen.mulc...@deri.orgwrote:

  On 23/04/10 15:43, Todd Lipcon wrote:
 
  Hi Stephen,
 
  Can you try mounting ext4 with the nodelalloc option? I've seen the same
  improvement due to delayed allocation butbeen a little nervous about
 that
  option (especially in the NN where we currently follow what the kernel
  people call an antipattern for image rotation).
 
 
  Hi Todd,
 
  Sorry for the delayed response - I had to wait for another test window
  before trying this out.
 
  To clarify, my namename and secondary namenode have been using ext4 in
 all
  tests - reconfiguring the datanodes is a fast operation, the nn and 2nn
 less
  so. I figure any big performance benefit would appear on the data nodes
  anyway and can then apply it back to the nn and 2nn if testing shows any
  benefits in changing.
 
  So I tried running our datanodes with their ext4 filesystems mounted
 using
  noatime,nodelalloc and after 6 runs of the TeraSort, it seems it runs
  SLOWER with those options by between 5-8%. The TeraGen itself seemed to
 run
  about 5% faster but it was only a single run so I'm not sure how reliable
  that is.
 

 Yep, that's what I'd expect. noatime should be a small improvement,
 nodelalloc should be a small detriment. The thing is that delayed
 allocation
 has some strange cases that could theoretically cause data loss after a
 power outage, so I was interested to see if it nullified all of your
 performance gains or if it were just a small hit.

 -Todd

 --
 Todd Lipcon
 Software Engineer, Cloudera



For most people doing tuning of the disk configuration for the NameNode is
waisted time. Why? The current capacity of our hadoop cluster is

Present Capacity: 48799678056 (101.09 TB)

Yet the NameNode data itself is tiny.

du -hs /usr/local/hadoop_root/hdfs_master
684M/usr/local/hadoop_root/hdfs_master

Likely the entire Node table fits entirely inside the VFS cache, performance
is not usually an issue, reliability is. The more exotic you get with this
mount (EXT5, rarely used mount options), the less reliable it is going to be
(IMHO). This is because your configuration space is not shared by that many
people.

DataNodes are a different story. These are worth tuning. I suggest
configuring a single datanode as (say EXT4 with fancy options x,y,z), Wait a
while get real production load at it, then look at some performance data and
see if this node has any tangible difference in performance. Do not look for
low level things like, bonnie say delete rate is +5 but create rate  -%5.
Look at the big picture, if you can't see a tangible big picture difference
like ' map jobs seem to finish 5% faster on this node' what are you doing
the tuning for :) ?

I know this seems like a rather un-scientific approach, but disk
tuning/performance measuring is very complex because application, VFS cache,
available memory are the critical factors performance.


Namenode warnings

2010-05-11 Thread Runping Qi
Hi,

I saw a lot of  warnings like the following in namenode log:

2010-05-11 06:45:07,186 WARN /: /listPaths/s:
java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.server.namenode.ListPathsServlet.doGet(ListPathsServlet.java:153)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:596)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
at 
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427)
at 
org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567)
at org.mortbay.http.HttpContext.handle(HttpContext.java:1565)
at 
org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635)
at org.mortbay.http.HttpContext.handle(HttpContext.java:1517)
at org.mortbay.http.HttpServer.service(HttpServer.java:954)
at org.mortbay.http.HttpConnection.service(HttpConnection.java:814)
at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981)
at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831)
at 
org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244)
at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)

I am using Hadoop 0.19.

Anybody knows what might be the problem?

Thanks,

Runping

at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)


Re: Hadoop performance - xfs and ext4

2010-05-11 Thread Scott Carey
Did you try the XFS 'allocsize' mount parameter (for example, allocsize=8m)?  
This will reduce fragmentation during concurrent writes.   
Its more complicated, but using separate partitions for temp space versus HDFS 
also has an effect.  XFS isn't as good with the temp space.

In short, a single test with default configurations is useful, but doesn't 
complete the picture.  Both file systems have several important tuning knobs.


On Apr 22, 2010, at 1:02 AM, stephen mulcahy wrote:

 Hi,
 
 I've been tweaking our cluster roll-out process to refine it. While 
 doing so, I decided to check if XFS gives any performance benefit over EXT4.
 
 As per a comment I read somewhere on the hbase wiki - XFS makes for 
 faster formatting of filesystems (it takes us 5.5 minutes to rebuild a 
 datanode from bare metal to a full Hadoop config on top of Debian 
 Squeeze using XFS) versus EXT4 (same bare metal restore takes 9 minutes).
 
 However, TeraSort performance on a cluster of 45 of these data-nodes 
 shows XFS is slower (same configuration settings on both installs other 
 than changed filesystem), specifically,
 
 mkfs.xfs -f -l size=64m DEV
 (mounted with noatime,nodiratime,logbufs=8)
 gives me a cluster which runs TeraSort in about 23 minutes
 
 mkfs.ext4 -T largefile4 DEV
 (mounted with noatime)
 gives me a cluster which runs TeraSort in about 18.5 minutes
 
 So I'll be rolling our cluster back to EXT4, but thought the information 
 might be useful/interesting to others.
 
 -stephen
 
 
 XFS config chosen from notes at 
 http://everything2.com/index.pl?node_id=1479435
 
 -- 
 Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
 NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
 http://di2.deri.iehttp://webstar.deri.iehttp://sindice.com



Re: Hadoop performance - xfs and ext4

2010-05-11 Thread Scott Carey
Ah, one more thing.  With XFS there is an online defragmenter -- it runs every 
night on my cluster.  Performance on a fresh, empty system will not match a 
used one that has become fragmented.


On Apr 22, 2010, at 1:02 AM, stephen mulcahy wrote:

 Hi,
 
 I've been tweaking our cluster roll-out process to refine it. While 
 doing so, I decided to check if XFS gives any performance benefit over EXT4.
 
 As per a comment I read somewhere on the hbase wiki - XFS makes for 
 faster formatting of filesystems (it takes us 5.5 minutes to rebuild a 
 datanode from bare metal to a full Hadoop config on top of Debian 
 Squeeze using XFS) versus EXT4 (same bare metal restore takes 9 minutes).
 
 However, TeraSort performance on a cluster of 45 of these data-nodes 
 shows XFS is slower (same configuration settings on both installs other 
 than changed filesystem), specifically,
 
 mkfs.xfs -f -l size=64m DEV
 (mounted with noatime,nodiratime,logbufs=8)
 gives me a cluster which runs TeraSort in about 23 minutes
 
 mkfs.ext4 -T largefile4 DEV
 (mounted with noatime)
 gives me a cluster which runs TeraSort in about 18.5 minutes
 
 So I'll be rolling our cluster back to EXT4, but thought the information 
 might be useful/interesting to others.
 
 -stephen
 
 
 XFS config chosen from notes at 
 http://everything2.com/index.pl?node_id=1479435
 
 -- 
 Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
 NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
 http://di2.deri.iehttp://webstar.deri.iehttp://sindice.com



Re: Namenode warnings

2010-05-11 Thread Allen Wittenauer

On May 11, 2010, at 9:53 AM, Runping Qi wrote:

 I am using Hadoop 0.19.
 
 Anybody knows what might be the problem?

I think you answered your own question. :)



Re: Namenode warnings

2010-05-11 Thread Runping Qi
So it's a known problem of Hadoop 0.19?


On Tue, May 11, 2010 at 11:06 AM, Allen Wittenauer awittena...@linkedin.com
 wrote:


 On May 11, 2010, at 9:53 AM, Runping Qi wrote:

  I am using Hadoop 0.19.
 
  Anybody knows what might be the problem?

 I think you answered your own question. :)




Re: job executions fail with NotReplicatedYetException

2010-05-11 Thread Oscar Gothberg
For anyone else out there seeing this problem, this was alleviated for
me by increasing the dfs.namenode.handler.count and
dfs.datanode.handler.count.

/ Oscar

On Mon, May 10, 2010 at 11:23 AM, Oscar Gothberg
oscar.gothb...@gmail.com wrote:
 Hi,

 I keep having jobs fail at the very end, with 100% complete map,
 100% complete reduce,
 due to NotReplicatedYetException w.r.t the _temporary subdirectory of
 the job output directory.

 It doesn't happen 100% of the time, so it's not trivially
 reproducible, but it happens enough
 (10-20% of runs) to make it a real pain.

 Any ideas, has anyone seen something similar? Part of the stack trace:

 NotReplicatedYetException: Not replicated
 yet:/test/out/dayperiod=14731/_temporary/_attempt_201005052338_0194_r_01_0/part-1
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1253)
 at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
 at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
 ...

 Thanks,
 / Oscar



Hadoop Training @ Hadoop Summit - Early Bird Discount Expires Soon!

2010-05-11 Thread Christophe Bisciglia
Hadoop Fans, just a quick note about training options at the Hadoop
Summit. There are discounts expiring soon, so if you planned to
attend, or didn't know, we want to make sure you stay in the loop.

We're offering certification courses for developers and admins, as
well as an introduction to Hadoop. We'll also debut courses on Hive
and HBase because you asked for them.

You get the cost of your Summit registration ($100) off any of these
courses just by using the discount code included with your Summit
registration email confirmation, but if you register 45 days in
advance, you save another $100 per day (and Monday's courses are just
47 days out now!).

Intro to Hadoop (Monday): http://www.eventbrite.com/event/621620283/apache0511
Cloudera Desktop SDK (Monday):
http://www.eventbrite.com/event/621677454/apache0511
Hadoop for Developers + Certification (Wednesday-Thursday):
http://www.eventbrite.com/event/621640343/apache0511
Hadoop for Administrators + Certification (Wednesday-Thursday):
http://www.eventbrite.com/event/621643352/apache0511
Hive (Friday): http://www.eventbrite.com/event/621672439/apache0511
HBase (Friday): http://www.eventbrite.com/event/621670433/apache0511

You can see an overview here:
http://www.cloudera.com/hadoop-training/hadoop-summit-2010/

Cheers,
Christophe

-- 
get hadoop: cloudera.com/hadoop
online training: cloudera.com/hadoop-training
blog: cloudera.com/blog
twitter: twitter.com/cloudera


Re: Import the results into SimpleDB

2010-05-11 Thread Jones, Nick
Hi Mark,
It would be better to create an outputformat instead of directly connecting 
from the mapper. The outputformat would be called regardless of the existence 
of the reducers.

Make sure and set the job setNumReduceTasks(0). (I'm not sure setting the class 
to null would work.)

Nick
Sent by radiation.

- Original Message -
From: Mark Kerzner markkerz...@gmail.com
To: core-u...@hadoop.apache.org core-u...@hadoop.apache.org
Sent: Tue May 11 21:02:05 2010
Subject: Import the results into SimpleDB

Hi,

I want a Hadoop job that will simply take each line of the input text file
and store it (after parsing) in a database, like SimpleDB.

Can I put this code into Mapper, make no call to collect in it, and have
no reducers at all? Do I set the reduce class to
null, conf.setReducerClass(null)? or not set it at all?

Thank you,
Mark



Re: Namenode warnings

2010-05-11 Thread Tsz Wo (Nicholas), Sze
Hi Runping,
This is a known issue.  See https://issues.apache.org/jira/browse/HDFS-625.
Nicholas Sze




- Original Message 
 From: Runping Qi runping...@gmail.com
 To: common-user@hadoop.apache.org
 Sent: Wed, May 12, 2010 12:53:13 AM
 Subject: Namenode warnings
 
 Hi,

I saw a lot of  warnings like the following in namenode 
 log:

2010-05-11 06:45:07,186 WARN /: 
 /listPaths/s:
java.lang.NullPointerException

 at 
 org.apache.hadoop.hdfs.server.namenode.ListPathsServlet.doGet(ListPathsServlet.java:153)

 at 
 javax.servlet.http.HttpServlet.service(HttpServlet.java:596)

 at 
 javax.servlet.http.HttpServlet.service(HttpServlet.java:689)

 at 
 org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427)

 at 
 org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475)

 at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567)

 at 
 org.mortbay.http.HttpContext.handle(HttpContext.java:1565)

 at 
 org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635)

 at 
 org.mortbay.http.HttpContext.handle(HttpContext.java:1517)

 at 
 org.mortbay.http.HttpServer.service(HttpServer.java:954)

 at 
 org.mortbay.http.HttpConnection.service(HttpConnection.java:814)

 at 
 org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981)

 at 
 org.mortbay.http.HttpConnection.handle(HttpConnection.java:831)

 at 
 org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244)

 at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)

I am 
 using Hadoop 0.19.

Anybody knows what might be the 
 problem?

Thanks,

Runping

at 
 org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)



Re: Import the results into SimpleDB

2010-05-11 Thread Darren Govoni
Might as well not use Hadoop then...

On Tue, 2010-05-11 at 21:02 -0500, Mark Kerzner wrote:

 Hi,
 
 I want a Hadoop job that will simply take each line of the input text file
 and store it (after parsing) in a database, like SimpleDB.
 
 Can I put this code into Mapper, make no call to collect in it, and have
 no reducers at all? Do I set the reduce class to
 null, conf.setReducerClass(null)? or not set it at all?
 
 Thank you,
 Mark




Re: Import the results into SimpleDB

2010-05-11 Thread Mark Kerzner
Hi, Nick,

should I then Provide the RecordWriter implementation in the OutputFormat,
which will connect to the database and write a record to it, instead of to
HDFS?

Thank you,
Mark

On Tue, May 11, 2010 at 9:08 PM, Jones, Nick nick.jo...@amd.com wrote:

 Hi Mark,
 It would be better to create an outputformat instead of directly connecting
 from the mapper. The outputformat would be called regardless of the
 existence of the reducers.

 Make sure and set the job setNumReduceTasks(0). (I'm not sure setting the
 class to null would work.)

 Nick
 Sent by radiation.

 - Original Message -
 From: Mark Kerzner markkerz...@gmail.com
 To: core-u...@hadoop.apache.org core-u...@hadoop.apache.org
 Sent: Tue May 11 21:02:05 2010
 Subject: Import the results into SimpleDB

 Hi,

 I want a Hadoop job that will simply take each line of the input text file
 and store it (after parsing) in a database, like SimpleDB.

 Can I put this code into Mapper, make no call to collect in it, and have
 no reducers at all? Do I set the reduce class to
 null, conf.setReducerClass(null)? or not set it at all?

 Thank you,
 Mark




Re: Import the results into SimpleDB

2010-05-11 Thread Mark Kerzner
:)

I create this text file in Hadoop. Only I want to make the db import a
separate Hadoop job, run it in Amazon EMR, and make it fast by running
sufficient number of nodes.

Mark

On Tue, May 11, 2010 at 9:13 PM, Darren Govoni dar...@ontrenet.com wrote:

 Might as well not use Hadoop then...

 On Tue, 2010-05-11 at 21:02 -0500, Mark Kerzner wrote:

  Hi,
 
  I want a Hadoop job that will simply take each line of the input text
 file
  and store it (after parsing) in a database, like SimpleDB.
 
  Can I put this code into Mapper, make no call to collect in it, and
 have
  no reducers at all? Do I set the reduce class to
  null, conf.setReducerClass(null)? or not set it at all?
 
  Thank you,
  Mark





Re: Import the results into SimpleDB

2010-05-11 Thread Amandeep Khurana
Mark,

You can do it either ways. Create the connection object for the database in
the configure() or setup() method of the mapper (depending on which api you
are using) and insert the record from the mapper function. You dont have to
have a reducer. If you create an output format, the mapper can directly
write to it. In essence you'll be doing the same thing. Its easier to create
an output format if you'll be writing more of such code.

-Amandeep


Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz


On Tue, May 11, 2010 at 7:15 PM, Mark Kerzner markkerz...@gmail.com wrote:

 Hi, Nick,

 should I then Provide the RecordWriter implementation in the OutputFormat,
 which will connect to the database and write a record to it, instead of to
 HDFS?

 Thank you,
 Mark

 On Tue, May 11, 2010 at 9:08 PM, Jones, Nick nick.jo...@amd.com wrote:

  Hi Mark,
  It would be better to create an outputformat instead of directly
 connecting
  from the mapper. The outputformat would be called regardless of the
  existence of the reducers.
 
  Make sure and set the job setNumReduceTasks(0). (I'm not sure setting the
  class to null would work.)
 
  Nick
  Sent by radiation.
 
  - Original Message -
  From: Mark Kerzner markkerz...@gmail.com
  To: core-u...@hadoop.apache.org core-u...@hadoop.apache.org
  Sent: Tue May 11 21:02:05 2010
  Subject: Import the results into SimpleDB
 
  Hi,
 
  I want a Hadoop job that will simply take each line of the input text
 file
  and store it (after parsing) in a database, like SimpleDB.
 
  Can I put this code into Mapper, make no call to collect in it, and
 have
  no reducers at all? Do I set the reduce class to
  null, conf.setReducerClass(null)? or not set it at all?
 
  Thank you,
  Mark
 
 



Re: Import the results into SimpleDB

2010-05-11 Thread Amandeep Khurana

 Might as well not use Hadoop then...


Hadoop makes it easy to parallelize the work... Makes perfect sense to use
it!


Re: Import the results into SimpleDB

2010-05-11 Thread Jones, Nick
Hi Mark,
I haven't actually written one myself but take a look at DBOutputFormat as an 
example.  If SimpleDB has a JDBC connector, it might work as is. 

Nick
Sent by radiation.

- Original Message -
From: Mark Kerzner markkerz...@gmail.com
To: common-user@hadoop.apache.org common-user@hadoop.apache.org
Sent: Tue May 11 21:15:43 2010
Subject: Re: Import the results into SimpleDB

Hi, Nick,

should I then Provide the RecordWriter implementation in the OutputFormat,
which will connect to the database and write a record to it, instead of to
HDFS?

Thank you,
Mark

On Tue, May 11, 2010 at 9:08 PM, Jones, Nick nick.jo...@amd.com wrote:

 Hi Mark,
 It would be better to create an outputformat instead of directly connecting
 from the mapper. The outputformat would be called regardless of the
 existence of the reducers.

 Make sure and set the job setNumReduceTasks(0). (I'm not sure setting the
 class to null would work.)

 Nick
 Sent by radiation.

 - Original Message -
 From: Mark Kerzner markkerz...@gmail.com
 To: core-u...@hadoop.apache.org core-u...@hadoop.apache.org
 Sent: Tue May 11 21:02:05 2010
 Subject: Import the results into SimpleDB

 Hi,

 I want a Hadoop job that will simply take each line of the input text file
 and store it (after parsing) in a database, like SimpleDB.

 Can I put this code into Mapper, make no call to collect in it, and have
 no reducers at all? Do I set the reduce class to
 null, conf.setReducerClass(null)? or not set it at all?

 Thank you,
 Mark





Context needed by mapper

2010-05-11 Thread DNMILNE

Hi,

I am very new to the MapReduce paradigm so this could be a dumb question. 

What do you do if your mapper functions need to know more than just the data
being processed in order to do their job? The simplest example I can think
of is implementing a selective, phrase-based version of wordcount. 

Imagine you want to count the occurrences of all notable names (from the
notable names database) in a large collection of news stories. You can't
just count phrases - the number of potential word combinations is
ridiculously large, and the vast majority are irrelevant. 

You have a limited (large, but bounded) vocabulary of phrases you are
interested in--this list of names. You want each mapper to be aware of it,
and only count the relevant phrases. You basically want to give each mapper
read-only access to a HashSet of phrases as well as the documents they
should be counting over. How would you do that?

Cheers, 
Dave


-- 
View this message in context: 
http://old.nabble.com/Context-needed-by-mapper-tp28532164p28532164.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Re: Context needed by mapper

2010-05-11 Thread prashant ullegaddi
Hi,

To count phrases, you can choose not to split the file by writing your own
InputFormat which
extends org.apache.hadoop.mapred.TextInputFormat, where you can avoid
splitting of the text
file by overriding isSplittable to return a false. Also, you have to provide
your own RecordReader
which can read phrases from the the given text file. Take a look at
http://hadoop.apache.org/common/docs/r0.18.3/api/org/apache/hadoop/mapred/RecordReader.html
.

Thanks,
Prashant.


On Wed, May 12, 2010 at 11:03 AM, DNMILNE d.n.mi...@gmail.com wrote:


 Hi,

 I am very new to the MapReduce paradigm so this could be a dumb question.

 What do you do if your mapper functions need to know more than just the
 data
 being processed in order to do their job? The simplest example I can think
 of is implementing a selective, phrase-based version of wordcount.

 Imagine you want to count the occurrences of all notable names (from the
 notable names database) in a large collection of news stories. You can't
 just count phrases - the number of potential word combinations is
 ridiculously large, and the vast majority are irrelevant.

 You have a limited (large, but bounded) vocabulary of phrases you are
 interested in--this list of names. You want each mapper to be aware of it,
 and only count the relevant phrases. You basically want to give each mapper
 read-only access to a HashSet of phrases as well as the documents they
 should be counting over. How would you do that?

 Cheers,
 Dave


 --
 View this message in context:
 http://old.nabble.com/Context-needed-by-mapper-tp28532164p28532164.html
 Sent from the Hadoop core-user mailing list archive at Nabble.com.




-- 
Thanks and Regards,
Prashant Ullegaddi,
Search and Information Extraction Lab,
IIIT-Hyderabad, India.