only one reducer running in a hadoop cluster

2009-02-07 Thread Nick Cen
Hi,

I hava a hadoop cluster with 4 pc. And I wanna to integrate hadoop and
lucene together, so i copy some of the source code from nutch's Indexer
class, but when i run my job, i found that there is only 1 reducer running
on 1 pc, so the performance is not as far as expect.

-- 
http://daily.appspot.com/food/


Re: Problem with Counters

2009-02-07 Thread some speed
Thank you all so much.

That works. I made a stupid mistake with the naming of a local variable. so
the error. :(



On Thu, Feb 5, 2009 at 9:49 AM, Tom White  wrote:

> Try moving the enum to inside the top level class (as you already did)
> and then use getCounter() passing the enum value:
>
> public class MyJob {
>
>  static enum MyCounter{ct_key1};
>
>  // Mapper and Reducer defined here
>
>  public static void main(String[] args) throws IOException {
>// ...
> RunningJob running =JobClient.runJob(conf);
> Counters ct = running.getCounters();
> long res = ct.getCounter(MyCounter.ct_key1);
> // ...
>  }
>
> }
>
> BTW org.apache.hadoop.mapred.Task$Counter is a built-in MapReduce
> counter, so that won't help you retrieve your custom counter.
>
> Cheers,
>
> Tom
>
> On Thu, Feb 5, 2009 at 2:22 PM, Rasit OZDAS  wrote:
> > Sharath,
> >
> > You're using  reporter.incrCounter(enumVal, intVal);  to increment
> counter,
> > I think method to get should also be similar.
> >
> > Try to use findCounter(enumVal).getCounter() or  getCounter(enumVal).
> >
> > Hope this helps,
> > Rasit
> >
> > 2009/2/5 some speed :
> >> In fact I put the enum in my Reduce method as the following link (from
> >> Yahoo) says so:
> >>
> >>
> http://public.yahoo.com/gogate/hadoop-tutorial/html/module5.html#metrics
> >> --->Look at the section under Reporting Custom Metrics.
> >>
> >> 2009/2/5 some speed 
> >>
> >>> Thanks Rasit.
> >>>
> >>> I did as you said.
> >>>
> >>> 1) Put the static enum MyCounter{ct_key1} just above main()
> >>>
> >>> 2) Changed  result =
> >>> ct.findCounter("org.apache.hadoop.mapred.Task$Counter", 1,
> >>> "Reduce.MyCounter").getCounter();
> >>>
> >>> Still is doesnt seem to help. It throws a null pointer exception.Its
> not
> >>> able to find the Counter.
> >>>
> >>>
> >>>
> >>> Thanks,
> >>>
> >>> Sharath
> >>>
> >>>
> >>>
> >>>
> >>> On Thu, Feb 5, 2009 at 8:04 AM, Rasit OZDAS 
> wrote:
> >>>
>  Forgot to say, value "0" means that the requested counter does not
> exist.
> 
>  2009/2/5 Rasit OZDAS :
>  > Sharath,
>  >  I think the static enum definition should be out of Reduce class.
>  > Hadoop probably tries to find it elsewhere with "MyCounter", but
> it's
>  > actually "Reduce.MyCounter" in your example.
>  >
>  > Hope this helps,
>  > Rasit
>  >
>  > 2009/2/5 some speed :
>  >> I Tried the following...It gets compiled but the value of result
> seems
>  to be
>  >> 0 always.
>  >>
>  >>RunningJob running = JobClient.runJob(conf);
>  >>
>  >> Counters ct = new Counters();
>  >> ct = running.getCounters();
>  >>
>  >>long result =
>  >> ct.findCounter("org.apache.hadoop.mapred.Task$Counter", 0,
>  >> "*MyCounter*").getCounter();
>  >> //even tried MyCounter.Key1
>  >>
>  >>
>  >>
>  >> Does anyone know whay that is happening?
>  >>
>  >> Thanks,
>  >>
>  >> Sharath
>  >>
>  >>
>  >>
>  >> On Thu, Feb 5, 2009 at 5:59 AM, some speed 
>  wrote:
>  >>
>  >>> Hi Tom,
>  >>>
>  >>> I get the error :
>  >>>
>  >>> Cannot find Symbol* "**MyCounter.ct_key1 " *
>  >>>
>  >>>
>  >>>
>  >>>
>  >>>
>  >>>
>  >>> On Thu, Feb 5, 2009 at 5:51 AM, Tom White 
> wrote:
>  >>>
>   Hi Sharath,
>  
>   The code you posted looks right to me. Counters#getCounter() will
>   return the counter's value. What error are you getting?
>  
>   Tom
>  
>   On Thu, Feb 5, 2009 at 10:09 AM, some speed <
> speed.s...@gmail.com>
>  wrote:
>   > Hi,
>   >
>   > Can someone help me with the usage of counters please? I am
>  incrementing
>   a
>   > counter in Reduce method but I am unable to collect the counter
>  value
>   after
>   > the job is completed.
>   >
>   > Its something like this:
>   >
>   > public static class Reduce extends MapReduceBase implements
>   Reducer   > FloatWritable, Text, FloatWritable>
>   >{
>   >static enum MyCounter{ct_key1};
>   >
>   > public void reduce(..) throws IOException
>   >{
>   >
>   >reporter.incrCounter(MyCounter.ct_key1, 1);
>   >
>   >output.collect(..);
>   >
>   >}
>   > }
>   >
>   > -main method
>   > {
>   >RunningJob running = null;
>   >running=JobClient.runJob(conf);
>   >
>   >Counters ct = running.getCounters();
>   > /*  How do I Collect the ct_key1 value ***/
>   >long res = ct.getCounter(MyCounter.ct_key1);
>   >
>  

Re: Re: Re: Re: Regarding "Hadoop multi cluster" set-up

2009-02-07 Thread Amandeep Khurana
I ran into this trouble again. This time, formatting the namenode didnt
help. So, I changed the directories where the metadata and the data was
being stored. That made it work.

You might want to check this up at your end too.

Amandeep

PS: I dont have an explanation for how and why this made it work.


Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz


On Sat, Feb 7, 2009 at 9:06 AM, jason hadoop  wrote:

> On your master machine, use the netstat command to determine what ports and
> addresses the namenode process is listening on.
>
> On the datanode machines, examine the log files,, to verify that the
> datanode has attempted to connect to the namenode ip address on one of
> those
> ports, and was successfull.
>
> The common ports used for datanode -> namenode rondevu are 50010, 54320 and
> 8020, depending on your hadoop version
>
> If the datanodes have been started, and the connection to the namenode
> failed, there will be a log message with a socket error, indicating what
> host and port the datanode used to attempt to communicate with the
> namenode.
> Verify that that ip address is correct for your namenode, and reachable
> from
> the datanode host (for multi homed machines this can be an issue), and that
> the port listed is one of the tcp ports that the namenode process is
> listing
> on.
>
> For linux, you can use command
> *netstat -a -t -n -p | grep java | grep LISTEN*
> to determine the ip addresses and ports and pids of the java processes that
> are listening for tcp socket connections
>
> and the jps command from the bin directory of your java installation to
> determine the pid of the namenode.
>
> On Sat, Feb 7, 2009 at 6:27 AM, shefali pawar  >wrote:
>
> > Hi,
> >
> > No, not yet. We are still struggling! If you find the solution please let
> > me know.
> >
> > Shefali
> >
> > On Sat, 07 Feb 2009 02:56:15 +0530  wrote
> > >I had to change the master on my running cluster and ended up with the
> > same
> > >problem. Were you able to fix it at your end?
> > >
> > >Amandeep
> > >
> > >
> > >Amandeep Khurana
> > >Computer Science Graduate Student
> > >University of California, Santa Cruz
> > >
> > >
> > >On Thu, Feb 5, 2009 at 8:46 AM, shefali pawar wrote:
> > >
> > >> Hi,
> > >>
> > >> I do not think that the firewall is blocking the port because it has
> > been
> > >> turned off on both the computers! And also since it is a random port
> > number
> > >> I do not think it should create a problem.
> > >>
> > >> I do not understand what is going wrong!
> > >>
> > >> Shefali
> > >>
> > >> On Wed, 04 Feb 2009 23:23:04 +0530  wrote
> > >> >I'm not certain that the firewall is your problem but if that port is
> > >> >blocked on your master you should open it to let communication
> through.
> > >> Here
> > >> >is one website that might be relevant:
> > >> >
> > >> >
> > >>
> >
> http://stackoverflow.com/questions/255077/open-ports-under-fedora-core-8-for-vmware-server
> > >> >
> > >> >but again, this may not be your problem.
> > >> >
> > >> >John
> > >> >
> > >> >On Wed, Feb 4, 2009 at 12:46 PM, shefali pawar wrote:
> > >> >
> > >> >> Hi,
> > >> >>
> > >> >> I will have to check. I can do that tomorrow in college. But if
> that
> > is
> > >> the
> > >> >> case what should i do?
> > >> >>
> > >> >> Should i change the port number and try again?
> > >> >>
> > >> >> Shefali
> > >> >>
> > >> >> On Wed, 04 Feb 2009 S D wrote :
> > >> >>
> > >> >> >Shefali,
> > >> >> >
> > >> >> >Is your firewall blocking port 54310 on the master?
> > >> >> >
> > >> >> >John
> > >> >> >
> > >> >> >On Wed, Feb 4, 2009 at 12:34 PM, shefali pawar > >wrote:
> > >> >> >
> > >> >> > > Hi,
> > >> >> > >
> > >> >> > > I am trying to set-up a two node cluster using Hadoop0.19.0,
> with
> > 1
> > >> >> > > master(which should also work as a slave) and 1 slave node.
> > >> >> > >
> > >> >> > > But while running bin/start-dfs.sh the datanode is not starting
> > on
> > >> the
> > >> >> > > slave. I had read the previous mails on the list, but nothing
> > seems
> > >> to
> > >> >> be
> > >> >> > > working in this case. I am getting the following error in the
> > >> >> > > hadoop-root-datanode-slave log file while running the command
> > >> >> > > bin/start-dfs.sh =>
> > >> >> > >
> > >> >> > > 2009-02-03 13:00:27,516 INFO
> > >> >> > > org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
> > >> >> > > /
> > >> >> > > STARTUP_MSG: Starting DataNode
> > >> >> > > STARTUP_MSG:  host = slave/172.16.0.32
> > >> >> > > STARTUP_MSG:  args = []
> > >> >> > > STARTUP_MSG:  version = 0.19.0
> > >> >> > > STARTUP_MSG:  build =
> > >> >> > >
> > https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.19-r
> > >> >> > > 713890; compiled by 'ndaley' on Fri Nov 14 03:12:29 UTC 2008
> > >> >> > > /
> > >> >> > > 2009-02-03 13:00:28,725 INFO org.apache.hadoop.ipc.Client:
> > Re

Re: Re: Re: Re: Regarding "Hadoop multi cluster" set-up

2009-02-07 Thread jason hadoop
On your master machine, use the netstat command to determine what ports and
addresses the namenode process is listening on.

On the datanode machines, examine the log files,, to verify that the
datanode has attempted to connect to the namenode ip address on one of those
ports, and was successfull.

The common ports used for datanode -> namenode rondevu are 50010, 54320 and
8020, depending on your hadoop version

If the datanodes have been started, and the connection to the namenode
failed, there will be a log message with a socket error, indicating what
host and port the datanode used to attempt to communicate with the namenode.
Verify that that ip address is correct for your namenode, and reachable from
the datanode host (for multi homed machines this can be an issue), and that
the port listed is one of the tcp ports that the namenode process is listing
on.

For linux, you can use command
*netstat -a -t -n -p | grep java | grep LISTEN*
to determine the ip addresses and ports and pids of the java processes that
are listening for tcp socket connections

and the jps command from the bin directory of your java installation to
determine the pid of the namenode.

On Sat, Feb 7, 2009 at 6:27 AM, shefali pawar wrote:

> Hi,
>
> No, not yet. We are still struggling! If you find the solution please let
> me know.
>
> Shefali
>
> On Sat, 07 Feb 2009 02:56:15 +0530  wrote
> >I had to change the master on my running cluster and ended up with the
> same
> >problem. Were you able to fix it at your end?
> >
> >Amandeep
> >
> >
> >Amandeep Khurana
> >Computer Science Graduate Student
> >University of California, Santa Cruz
> >
> >
> >On Thu, Feb 5, 2009 at 8:46 AM, shefali pawar wrote:
> >
> >> Hi,
> >>
> >> I do not think that the firewall is blocking the port because it has
> been
> >> turned off on both the computers! And also since it is a random port
> number
> >> I do not think it should create a problem.
> >>
> >> I do not understand what is going wrong!
> >>
> >> Shefali
> >>
> >> On Wed, 04 Feb 2009 23:23:04 +0530  wrote
> >> >I'm not certain that the firewall is your problem but if that port is
> >> >blocked on your master you should open it to let communication through.
> >> Here
> >> >is one website that might be relevant:
> >> >
> >> >
> >>
> http://stackoverflow.com/questions/255077/open-ports-under-fedora-core-8-for-vmware-server
> >> >
> >> >but again, this may not be your problem.
> >> >
> >> >John
> >> >
> >> >On Wed, Feb 4, 2009 at 12:46 PM, shefali pawar wrote:
> >> >
> >> >> Hi,
> >> >>
> >> >> I will have to check. I can do that tomorrow in college. But if that
> is
> >> the
> >> >> case what should i do?
> >> >>
> >> >> Should i change the port number and try again?
> >> >>
> >> >> Shefali
> >> >>
> >> >> On Wed, 04 Feb 2009 S D wrote :
> >> >>
> >> >> >Shefali,
> >> >> >
> >> >> >Is your firewall blocking port 54310 on the master?
> >> >> >
> >> >> >John
> >> >> >
> >> >> >On Wed, Feb 4, 2009 at 12:34 PM, shefali pawar > >wrote:
> >> >> >
> >> >> > > Hi,
> >> >> > >
> >> >> > > I am trying to set-up a two node cluster using Hadoop0.19.0, with
> 1
> >> >> > > master(which should also work as a slave) and 1 slave node.
> >> >> > >
> >> >> > > But while running bin/start-dfs.sh the datanode is not starting
> on
> >> the
> >> >> > > slave. I had read the previous mails on the list, but nothing
> seems
> >> to
> >> >> be
> >> >> > > working in this case. I am getting the following error in the
> >> >> > > hadoop-root-datanode-slave log file while running the command
> >> >> > > bin/start-dfs.sh =>
> >> >> > >
> >> >> > > 2009-02-03 13:00:27,516 INFO
> >> >> > > org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
> >> >> > > /
> >> >> > > STARTUP_MSG: Starting DataNode
> >> >> > > STARTUP_MSG:  host = slave/172.16.0.32
> >> >> > > STARTUP_MSG:  args = []
> >> >> > > STARTUP_MSG:  version = 0.19.0
> >> >> > > STARTUP_MSG:  build =
> >> >> > >
> https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.19-r
> >> >> > > 713890; compiled by 'ndaley' on Fri Nov 14 03:12:29 UTC 2008
> >> >> > > /
> >> >> > > 2009-02-03 13:00:28,725 INFO org.apache.hadoop.ipc.Client:
> Retrying
> >> >> connect
> >> >> > > to server: master/172.16.0.46:54310. Already tried 0 time(s).
> >> >> > > 2009-02-03 13:00:29,726 INFO org.apache.hadoop.ipc.Client:
> Retrying
> >> >> connect
> >> >> > > to server: master/172.16.0.46:54310. Already tried 1 time(s).
> >> >> > > 2009-02-03 13:00:30,727 INFO org.apache.hadoop.ipc.Client:
> Retrying
> >> >> connect
> >> >> > > to server: master/172.16.0.46:54310. Already tried 2 time(s).
> >> >> > > 2009-02-03 13:00:31,728 INFO org.apache.hadoop.ipc.Client:
> Retrying
> >> >> connect
> >> >> > > to server: master/172.16.0.46:54310. Already tried 3 time(s).
> >> >> > > 2009-02-03 13:00:32,729 INFO org.apache.hadoop.ipc.Client:
> Retrying
> >> >> connect
> >> >> > 

Re: Cannot copy from local file system to DFS

2009-02-07 Thread jason hadoop
Please examine the web console for the namenode.

The url for this should be http://*namenodehost*:50070/

This will tell you what datanodes are successfully connected to the
namenode.

If the number is 0, then no datanodes are either running or were able to
connect to the namenode at start, or were able to be started.
The common reasons for this case are configuration errors, installation
errors, or network connectivity issues due to firewalls blocking ports, or
dns lookup errors (either failure or incorrect address returned) for the
namenode hostname on the datanodes.

At this point you will need to investigate the log files for the datanodes
to make an assessment of what has happened.


On Sat, Feb 7, 2009 at 6:17 AM, Rasit OZDAS  wrote:

> Hi, Mithila,
>
> "File /user/mithila/test/20417.txt could only be replicated to 0
> nodes, instead of 1"
>
> I think your datanode isn't working properly.
> please take a look at log file of your datanode (logs/*datanode*.log).
>
> If there is no error in that log file, I've heard that hadoop can sometimes
> mark
> a datanode as "BAD" and refuses to send the block to that node, this
> can be the cause.
> (List, please correct me if I'm wrong!)
>
> Hope this helps,
> Rasit
>
> 2009/2/6 Mithila Nagendra :
> > Hey all
> > I was trying to run the word count example on one of the hadoop systems I
> > installed, but when i try to copy the text files from the local file
> system
> > to the DFS, it throws up the following exception:
> >
> > [mith...@node02 hadoop]$ jps
> > 8711 JobTracker
> > 8805 TaskTracker
> > 8901 Jps
> > 8419 NameNode
> > 8642 SecondaryNameNode
> > [mith...@node02 hadoop]$ cd ..
> > [mith...@node02 mithila]$ ls
> > hadoop  hadoop-0.17.2.1.tar  hadoop-datastore  test
> > [mith...@node02 mithila]$ hadoop/bin/hadoop dfs -copyFromLocal test test
> > 09/02/06 11:26:26 INFO dfs.DFSClient:
> org.apache.hadoop.ipc.RemoteException:
> > java.io.IOException: File /user/mithila/test/20417.txt could only be
> > replicated to 0 nodes, instead of 1
> >at
> >
> org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1145)
> >at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:300)
> >at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >at java.lang.reflect.Method.invoke(Method.java:597)
> >at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
> >at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)
> >
> >at org.apache.hadoop.ipc.Client.call(Client.java:557)
> >at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
> >at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)
> >at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >at java.lang.reflect.Method.invoke(Method.java:597)
> >at
> >
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> >at
> >
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> >at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)
> >at
> >
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2335)
> >at
> >
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2220)
> >at
> >
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1700(DFSClient.java:1702)
> >at
> >
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1842)
> >
> > 09/02/06 11:26:26 WARN dfs.DFSClient: NotReplicatedYetException sleeping
> > /user/mithila/test/20417.txt retries left 4
> > 09/02/06 11:26:27 INFO dfs.DFSClient:
> org.apache.hadoop.ipc.RemoteException:
> > java.io.IOException: File /user/mithila/test/20417.txt could only be
> > replicated to 0 nodes, instead of 1
> >at
> >
> org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1145)
> >at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:300)
> >at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >at java.lang.reflect.Method.invoke(Method.java:597)
> >at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
> >at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)
> >
> >at org.apache.hadoop.ipc.Client.call(Client.java:557)
> >at 

Re: Heap size error

2009-02-07 Thread jason hadoop
The default task memory allocation size is set in the hadoop-default.xml
file for your configuration and is usually
The parameter is mapred.child.java.opts, and the value is generally
-Xmx200m.

You may alter this value in your JobConf object before you submit the job
and the individual tasks will use the altered value


If the variable that contains your JobConf object is named conf,
*
conf.set( "mapred.child.java.opts", "-Xmx512m");*

will override any existing value from your configuation with the value
"-Xmx512m", for job are are about to launch.


A way to do this that, in general, will preserve any values, with the sun
JDK would be to:
*
conf.set( "mapred.child.java.opts", conf.get("mapred.child.java.opts","") +
" -Xmx512m");*

The above line will append -Xmx512m to the current value of the
mapred.child.java.opts parameter, and use the value of "" if there is no
value set, or the value is null.

It of course may the that your application is using more memory than you
expect do to an incorrect assumption or programming error, and the above
will not be effective.


The hadoop script, in the bin directory of your installation, provides a way
to pass arguments to the
On Sat, Feb 7, 2009 at 5:54 AM, Rasit OZDAS  wrote:

> Hi, Amandeep,
> I've copied following lines from a site:
> --
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>
> This can have two reasons:
>
>* Your Java application has a memory leak. There are tools like
> YourKit Java Profiler that help you to identify such leaks.
>* Your Java application really needs a lot of memory (more than
> 128 MB by default!). In this case the Java heap size can be increased
> using the following runtime parameters:
>
> java -Xms -Xmx
>
> Defaults are:
>
> java -Xms32m -Xmx128m
>
> You can set this either in the Java Control Panel or on the command
> line, depending on the environment you run your application.
> -
>
> Hope this helps,
> Rasit
>
> 2009/2/7 Amandeep Khurana :
> > I'm getting the following error while running my hadoop job:
> >
> > 09/02/06 15:33:03 INFO mapred.JobClient: Task Id :
> > attempt_200902061333_0004_r_00_1, Status : FAILED
> > java.lang.OutOfMemoryError: Java heap space
> >at java.util.Arrays.copyOf(Unknown Source)
> >at java.lang.AbstractStringBuilder.expandCapacity(Unknown Source)
> >at java.lang.AbstractStringBuilder.append(Unknown Source)
> >at java.lang.StringBuffer.append(Unknown Source)
> >at TableJoin$Reduce.reduce(TableJoin.java:61)
> >at TableJoin$Reduce.reduce(TableJoin.java:1)
> >at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:430)
> >at org.apache.hadoop.mapred.Child.main(Child.java:155)
> >
> > Any inputs?
> >
> > Amandeep
> >
> >
> > Amandeep Khurana
> > Computer Science Graduate Student
> > University of California, Santa Cruz
> >
>
>
>
> --
> M. Raşit ÖZDAŞ
>


Re: Re: Re: Re: Regarding "Hadoop multi cluster" set-up

2009-02-07 Thread shefali pawar
Hi,

No, not yet. We are still struggling! If you find the solution please let me 
know.

Shefali

On Sat, 07 Feb 2009 02:56:15 +0530  wrote
>I had to change the master on my running cluster and ended up with the same
>problem. Were you able to fix it at your end?
>
>Amandeep
>
>
>Amandeep Khurana
>Computer Science Graduate Student
>University of California, Santa Cruz
>
>
>On Thu, Feb 5, 2009 at 8:46 AM, shefali pawar wrote:
>
>> Hi,
>>
>> I do not think that the firewall is blocking the port because it has been
>> turned off on both the computers! And also since it is a random port number
>> I do not think it should create a problem.
>>
>> I do not understand what is going wrong!
>>
>> Shefali
>>
>> On Wed, 04 Feb 2009 23:23:04 +0530  wrote
>> >I'm not certain that the firewall is your problem but if that port is
>> >blocked on your master you should open it to let communication through.
>> Here
>> >is one website that might be relevant:
>> >
>> >
>> http://stackoverflow.com/questions/255077/open-ports-under-fedora-core-8-for-vmware-server
>> >
>> >but again, this may not be your problem.
>> >
>> >John
>> >
>> >On Wed, Feb 4, 2009 at 12:46 PM, shefali pawar wrote:
>> >
>> >> Hi,
>> >>
>> >> I will have to check. I can do that tomorrow in college. But if that is
>> the
>> >> case what should i do?
>> >>
>> >> Should i change the port number and try again?
>> >>
>> >> Shefali
>> >>
>> >> On Wed, 04 Feb 2009 S D wrote :
>> >>
>> >> >Shefali,
>> >> >
>> >> >Is your firewall blocking port 54310 on the master?
>> >> >
>> >> >John
>> >> >
>> >> >On Wed, Feb 4, 2009 at 12:34 PM, shefali pawar > >wrote:
>> >> >
>> >> > > Hi,
>> >> > >
>> >> > > I am trying to set-up a two node cluster using Hadoop0.19.0, with 1
>> >> > > master(which should also work as a slave) and 1 slave node.
>> >> > >
>> >> > > But while running bin/start-dfs.sh the datanode is not starting on
>> the
>> >> > > slave. I had read the previous mails on the list, but nothing seems
>> to
>> >> be
>> >> > > working in this case. I am getting the following error in the
>> >> > > hadoop-root-datanode-slave log file while running the command
>> >> > > bin/start-dfs.sh =>
>> >> > >
>> >> > > 2009-02-03 13:00:27,516 INFO
>> >> > > org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
>> >> > > /
>> >> > > STARTUP_MSG: Starting DataNode
>> >> > > STARTUP_MSG:  host = slave/172.16.0.32
>> >> > > STARTUP_MSG:  args = []
>> >> > > STARTUP_MSG:  version = 0.19.0
>> >> > > STARTUP_MSG:  build =
>> >> > > https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.19-r
>> >> > > 713890; compiled by 'ndaley' on Fri Nov 14 03:12:29 UTC 2008
>> >> > > /
>> >> > > 2009-02-03 13:00:28,725 INFO org.apache.hadoop.ipc.Client: Retrying
>> >> connect
>> >> > > to server: master/172.16.0.46:54310. Already tried 0 time(s).
>> >> > > 2009-02-03 13:00:29,726 INFO org.apache.hadoop.ipc.Client: Retrying
>> >> connect
>> >> > > to server: master/172.16.0.46:54310. Already tried 1 time(s).
>> >> > > 2009-02-03 13:00:30,727 INFO org.apache.hadoop.ipc.Client: Retrying
>> >> connect
>> >> > > to server: master/172.16.0.46:54310. Already tried 2 time(s).
>> >> > > 2009-02-03 13:00:31,728 INFO org.apache.hadoop.ipc.Client: Retrying
>> >> connect
>> >> > > to server: master/172.16.0.46:54310. Already tried 3 time(s).
>> >> > > 2009-02-03 13:00:32,729 INFO org.apache.hadoop.ipc.Client: Retrying
>> >> connect
>> >> > > to server: master/172.16.0.46:54310. Already tried 4 time(s).
>> >> > > 2009-02-03 13:00:33,730 INFO org.apache.hadoop.ipc.Client: Retrying
>> >> connect
>> >> > > to server: master/172.16.0.46:54310. Already tried 5 time(s).
>> >> > > 2009-02-03 13:00:34,731 INFO org.apache.hadoop.ipc.Client: Retrying
>> >> connect
>> >> > > to server: master/172.16.0.46:54310. Already tried 6 time(s).
>> >> > > 2009-02-03 13:00:35,732 INFO org.apache.hadoop.ipc.Client: Retrying
>> >> connect
>> >> > > to server: master/172.16.0.46:54310. Already tried 7 time(s).
>> >> > > 2009-02-03 13:00:36,733 INFO org.apache.hadoop.ipc.Client: Retrying
>> >> connect
>> >> > > to server: master/172.16.0.46:54310. Already tried 8 time(s).
>> >> > > 2009-02-03 13:00:37,734 INFO org.apache.hadoop.ipc.Client: Retrying
>> >> connect
>> >> > > to server: master/172.16.0.46:54310. Already tried 9 time(s).
>> >> > > 2009-02-03 13:00:37,738 ERROR
>> >> > > org.apache.hadoop.hdfs.server.datanode.DataNode:
>> java.io.IOException:
>> >> Call
>> >> > > to master/172.16.0.46:54310 failed on local exception: No route to
>> >> host
>> >> > >        at org.apache.hadoop.ipc.Client.call(Client.java:699)
>> >> > >        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
>> >> > >        at $Proxy4.getProtocolVersion(Unknown Source)
>> >> > >        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:319)
>> >> > >        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:306)
>> >> > >     

Re: Cannot copy from local file system to DFS

2009-02-07 Thread Rasit OZDAS
Hi, Mithila,

"File /user/mithila/test/20417.txt could only be replicated to 0
nodes, instead of 1"

I think your datanode isn't working properly.
please take a look at log file of your datanode (logs/*datanode*.log).

If there is no error in that log file, I've heard that hadoop can sometimes mark
a datanode as "BAD" and refuses to send the block to that node, this
can be the cause.
(List, please correct me if I'm wrong!)

Hope this helps,
Rasit

2009/2/6 Mithila Nagendra :
> Hey all
> I was trying to run the word count example on one of the hadoop systems I
> installed, but when i try to copy the text files from the local file system
> to the DFS, it throws up the following exception:
>
> [mith...@node02 hadoop]$ jps
> 8711 JobTracker
> 8805 TaskTracker
> 8901 Jps
> 8419 NameNode
> 8642 SecondaryNameNode
> [mith...@node02 hadoop]$ cd ..
> [mith...@node02 mithila]$ ls
> hadoop  hadoop-0.17.2.1.tar  hadoop-datastore  test
> [mith...@node02 mithila]$ hadoop/bin/hadoop dfs -copyFromLocal test test
> 09/02/06 11:26:26 INFO dfs.DFSClient: org.apache.hadoop.ipc.RemoteException:
> java.io.IOException: File /user/mithila/test/20417.txt could only be
> replicated to 0 nodes, instead of 1
>at
> org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1145)
>at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:300)
>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>at java.lang.reflect.Method.invoke(Method.java:597)
>at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
>at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)
>
>at org.apache.hadoop.ipc.Client.call(Client.java:557)
>at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
>at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)
>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>at java.lang.reflect.Method.invoke(Method.java:597)
>at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)
>at
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2335)
>at
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2220)
>at
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1700(DFSClient.java:1702)
>at
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1842)
>
> 09/02/06 11:26:26 WARN dfs.DFSClient: NotReplicatedYetException sleeping
> /user/mithila/test/20417.txt retries left 4
> 09/02/06 11:26:27 INFO dfs.DFSClient: org.apache.hadoop.ipc.RemoteException:
> java.io.IOException: File /user/mithila/test/20417.txt could only be
> replicated to 0 nodes, instead of 1
>at
> org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1145)
>at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:300)
>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>at java.lang.reflect.Method.invoke(Method.java:597)
>at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
>at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)
>
>at org.apache.hadoop.ipc.Client.call(Client.java:557)
>at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
>at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)
>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>at java.lang.reflect.Method.invoke(Method.java:597)
>at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)
>at
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2335)
>at
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2220)
>at
> org.apache.h

Re: Heap size error

2009-02-07 Thread Rasit OZDAS
Hi, Amandeep,
I've copied following lines from a site:
--
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

This can have two reasons:

* Your Java application has a memory leak. There are tools like
YourKit Java Profiler that help you to identify such leaks.
* Your Java application really needs a lot of memory (more than
128 MB by default!). In this case the Java heap size can be increased
using the following runtime parameters:

java -Xms -Xmx

Defaults are:

java -Xms32m -Xmx128m

You can set this either in the Java Control Panel or on the command
line, depending on the environment you run your application.
-

Hope this helps,
Rasit

2009/2/7 Amandeep Khurana :
> I'm getting the following error while running my hadoop job:
>
> 09/02/06 15:33:03 INFO mapred.JobClient: Task Id :
> attempt_200902061333_0004_r_00_1, Status : FAILED
> java.lang.OutOfMemoryError: Java heap space
>at java.util.Arrays.copyOf(Unknown Source)
>at java.lang.AbstractStringBuilder.expandCapacity(Unknown Source)
>at java.lang.AbstractStringBuilder.append(Unknown Source)
>at java.lang.StringBuffer.append(Unknown Source)
>at TableJoin$Reduce.reduce(TableJoin.java:61)
>at TableJoin$Reduce.reduce(TableJoin.java:1)
>at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:430)
>at org.apache.hadoop.mapred.Child.main(Child.java:155)
>
> Any inputs?
>
> Amandeep
>
>
> Amandeep Khurana
> Computer Science Graduate Student
> University of California, Santa Cruz
>



-- 
M. Raşit ÖZDAŞ


Re: Completed jobs not finishing, errors in jobtracker logs

2009-02-07 Thread Arun C Murthy


On Feb 6, 2009, at 12:39 PM, Bryan Duxbury wrote:

I'm seeing some strange behavior on my cluster. Jobs will be done  
(that is, all tasks completed), but the job will still be "running".  
This state seems to persist for minutes, and is really killing my  
throughput.


I'm seeing errors (warnings) in the jobtracker log that look like  
this:




Looks like a bug, can you please file a jira?

thanks,
Arun