Re: Call to namenode fails with java.io.EOFException

2011-05-13 Thread Sidney Simmons
All nodes are in sync configuration wise. We have a few cluster scripts that
ensure this is the case.


On 13 May 2011 06:55, Harsh J  wrote:

> One of the reasons I can think of could be a version mismatch. You may
> want to ensure that the job in question was not carrying a separate
> version of Hadoop along with it inside, perhaps?
>
> On Fri, May 13, 2011 at 12:42 AM, Sidney Simmons
>  wrote:
> > Hi there,
> >
> > I'm experiencing some unusual behaviour on our 0.20.2 hadoop cluster.
> > Randomly (periodically), we're getting "Call to namenode" failures on
> > tasktrackers causing tasks to fail:
> >
> > 2011-05-12 14:36:37,462 WARN org.apache.hadoop.mapred.TaskRunner:
> > attempt_201105090819_059_m_0038_0Child Error
> > java.io.IOException: Call to namenode/10.10.10.10:9000 failed on local
> > exception: java.io.EOFException
> >   at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
> >   at org.apache.hadoop.ipc.Client.call(Client.java:743)
> >   at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
> >   at $Proxy5.getFileInfo(Unknown Source)
> >   at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
> >   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
> >   at java.lang.reflect.Method.invoke(Unknown Source)
> >   at
> >
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> >   at
> >
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> >   at $Proxy5.getFileInfo(Unknown Source)
> >   at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:615)
> >   at
> >
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:453)
> >   at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:210)
> > Caused by: java.io.EOFException
> >   at java.io.DataInputStream.readInt(Unknown Source)
> >   at
> > org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
> >   at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
> >
> > The namenode log (logging level = INFO) shows the following a few seconds
> > either side of the above timestamps. Could be relevant or it could be a
> > coincidence :
> >
> > 2011-05-12 14:36:40,005 INFO org.apache.hadoop.ipc.Server: IPC Server
> > handler 57 on 9000 caught: java.nio.channels.ClosedChannelException
> >   at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(Unknown Source)
> >   at sun.nio.ch.SocketChannelImpl.write(Unknown Source)
> >   at org.apache.hadoop.ipc.Server.channelWrite(Server.java:1213)
> >   at org.apache.hadoop.ipc.Server.access$1900(Server.java:77)
> >   at
> > org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:622)
> >   at
> org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:686)
> >   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:997)
> >
> > The jobtracker does however have an entry that correlates with the
> > tasktracker :
> >
> > 2011-05-12 14:36:39,781 INFO org.apache.hadoop.mapred.TaskInProgress:
> Error
> > from attempt_201105090819_059_m_0038_0: java.io.IOException: Call to
> > namenode/10.10.10.10:9000 failed on local exception:
> java.io.EOFException
> >   at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
> >   at org.apache.hadoop.ipc.Client.call(Client.java:743)
> >   at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
> >   at $Proxy1.getProtocolVersion(Unknown Source)
> >   at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
> >   at
> > org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:105)
> >   at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:208)
> >   at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:169)
> >   at
> >
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
> >   at
> > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
> >   at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
> >   at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
> >   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
> >   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
> >   at org.apache.hadoop.mapred.Child.main(Child.java:157)
> > Caused by: java.io.EOFException
> >   at java.io.DataInputStream.readInt(Unknown Source)
> >   at
> > org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
> >   at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
> >
> > Can anyone give me any pointers on how to start troubleshooting this
> issue?
> > It's very sporadic and we haven't been able to reproduce the issue yet in
> > our lab. After looking through the mailing list archives, some of the
> > suggestions revolve around the following settings:
> >
> > dfs.namenode.handler.count 128 (existing 64)
> > dfs.datanode.

Re: Call to namenode fails (java.io.EOFException)

2011-05-13 Thread Sidney Simmons
It's not a single node. It occurs on multiple nodes at (seemingly) random
points throughout the day. Should we be performing period restarts of the
processes / datanode servers ?



On 13 May 2011 07:02, highpointe  wrote:

> Bounce mapred and TT on the node
>
>
>
> Sent from my iPhone
>
> On May 12, 2011, at 3:56 PM, Sidney Simmons 
> wrote:
>
> > Hi there,
> >
> > Apologies if this comes through twice but i sent the mail a few hours
> > ago and haven't seen it on the mailing list.
> >
> > I'm experiencing some unusual behaviour on our 0.20.2 hadoop cluster.
> > Randomly (periodically), we're getting "Call to namenode" failures on
> > tasktrackers causing tasks to fail:
> >
> > 2011-05-12 14:36:37,462 WARN org.apache.hadoop.mapred.TaskRunner:
> > attempt_201105090819_059_m_0038_0Child Error
> > java.io.IOException: Call to namenode/10.10.10.10:9000 failed on local
> > exception: java.io.EOFException
> >   at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
> >   at org.apache.hadoop.ipc.Client.call(Client.java:743)
> >   at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
> >   at $Proxy5.getFileInfo(Unknown Source)
> >   at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
> >   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
> >   at java.lang.reflect.Method.invoke(Unknown Source)
> >   at
> >
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> >   at
> >
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> >   at $Proxy5.getFileInfo(Unknown Source)
> >   at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:615)
> >   at
> >
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:453)
> >   at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:210)
> > Caused by: java.io.EOFException
> >   at java.io.DataInputStream.readInt(Unknown Source)
> >   at
> > org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
> >   at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
> >
> > The namenode log (logging level = INFO) shows the following a few seconds
> > either side of the above timestamps. Could be relevant or it could be a
> > coincidence :
> >
> > 2011-05-12 14:36:40,005 INFO org.apache.hadoop.ipc.Server: IPC Server
> > handler 57 on 9000 caught: java.nio.channels.ClosedChannelException
> >   at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(Unknown Source)
> >   at sun.nio.ch.SocketChannelImpl.write(Unknown Source)
> >   at org.apache.hadoop.ipc.Server.channelWrite(Server.java:1213)
> >   at org.apache.hadoop.ipc.Server.access$1900(Server.java:77)
> >   at
> > org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:622)
> >   at
> org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:686)
> >   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:997)
> >
> > The jobtracker does however have an entry that correlates with the
> > tasktracker :
> >
> > 2011-05-12 14:36:39,781 INFO org.apache.hadoop.mapred.TaskInProgress:
> Error
> > from attempt_201105090819_059_m_0038_0: java.io.IOException: Call to
> > namenode/10.10.10.10:9000 failed on local exception:
> java.io.EOFException
> >   at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
> >   at org.apache.hadoop.ipc.Client.call(Client.java:743)
> >   at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
> >   at $Proxy1.getProtocolVersion(Unknown Source)
> >   at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
> >   at
> > org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:105)
> >   at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:208)
> >   at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:169)
> >   at
> >
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
> >   at
> > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
> >   at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
> >   at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
> >   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
> >   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
> >   at org.apache.hadoop.mapred.Child.main(Child.java:157)
> > Caused by: java.io.EOFException
> >   at java.io.DataInputStream.readInt(Unknown Source)
> >   at
> > org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
> >   at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
> >
> > Can anyone give me any pointers on how to start troubleshooting this
> issue?
> > It's very sporadic and we haven't been able to reproduce the issue yet in
> > our lab. After looking through the mailing list archives, some of the
> > suggestions revolve around the following settin

Re: What exactly are the output_dir/part-00000 semantics (of a streaming job) ?

2011-05-13 Thread Dieter Plaetinck
On Thu, 12 May 2011 09:49:23 -0700 (PDT)
Aman  wrote:

> The creation of files part-n is atomic. When you run a MR job,
> these files are created in directory /_temporary and
> moved to  after the files is closed for writing. This
> move is atomic hence as long as you don't try to read these files
> from temporary directory (which I see you are not) you will be fine. 

Perfect!
thanks.

Dieter


RE: Datanode doesn't start but there is no exception in the log

2011-05-13 Thread Panayotis Antonopoulos

There is no shutdown message until I shutdown the DataNode.

I used hostname of the machine that will run the DataNode and I now used the IP 
but there is no difference.
Again the DataNode seems to freeze and the output at the log is the one I 
mentioned before.



> Subject: Re: Datanode doesn't start but there is no exception in the log
> From: highpoint...@gmail.com
> Date: Thu, 12 May 2011 23:59:02 -0600
> To: common-user@hadoop.apache.org
> 
> Have you defined the IP
> of the DN in the slaves file?
> 
> 
> 
> Sent from my iPhone
> 
> On May 12, 2011, at 7:27 PM, Bharath Mundlapudi  wrote:
> 
> > Is that all the messages in the datanode log? Do you see any SHUTDOWN 
> > message also?
> > 
> > -Bharath
> > 
> > 
> > 
> > 
> > From: Panayotis Antonopoulos 
> > To: common-user@hadoop.apache.org
> > Sent: Thursday, May 12, 2011 6:07 PM
> > Subject: Datanode doesn't start but there is no exception in the log
> > 
> > 
> > Hello,
> > I am trying to set up Hadoop HDFS in a cluster for the first time. So far I 
> > was using pseudo-distributed mode on my PC at home and everything was 
> > working perfectly.
> > Tha NameNode starts but the DataNode doesn't start and the log contains the 
> > following:
> > 
> > 2011-05-13 04:01:13,663 INFO 
> > org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG: 
> > /
> > STARTUP_MSG: Starting DataNode
> > STARTUP_MSG:   host = clone1/147.102.4.129
> > STARTUP_MSG:   args = []
> > STARTUP_MSG:   version = 0.20.2-cdh3u0
> > STARTUP_MSG:   build =  -r 81256ad0f2e4ab2bd34b04f53d25a6c23686dd14; 
> > compiled by 'hudson' on Fri Mar 25 19:56:23 PDT 2011
> > /
> > 2011-05-13 04:01:14,019 INFO 
> > org.apache.hadoop.security.UserGroupInformation: JAAS Configuration already 
> > set up for Hadoop, not re-installing.
> > 2011-05-13 04:01:14,143 INFO 
> > org.apache.hadoop.hdfs.server.datanode.DataNode: Registered 
> > FSDatasetStatusMBean
> > 2011-05-13 04:01:14,152 INFO 
> > org.apache.hadoop.hdfs.server.datanode.DataNode: Opened info server at 50010
> > 2011-05-13 04:01:14,154 INFO 
> > org.apache.hadoop.hdfs.server.datanode.DataNode: Balancing bandwith is 
> > 1048576 bytes/s
> > 2011-05-13 04:01:14,206 INFO org.mortbay.log: Logging to 
> > org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via 
> > org.mortbay.log.Slf4jLog
> > 2011-05-13 04:01:14,272 INFO org.apache.hadoop.http.HttpServer: Added 
> > global filtersafety 
> > (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
> > 2011-05-13 04:01:14,278 INFO org.apache.hadoop.http.HttpServer: Port 
> > returned by webServer.getConnectors()[0].getLocalPort() before open() is 
> > -1. Opening the listener on 50075
> > 2011-05-13 04:01:14,278 INFO org.apache.hadoop.http.HttpServer: 
> > listener.getLocalPort() returned 50075 
> > webServer.getConnectors()[0].getLocalPort() returned 50075
> > 2011-05-13 04:01:14,278 INFO org.apache.hadoop.http.HttpServer: Jetty bound 
> > to port 50075
> > 2011-05-13 04:01:14,278 INFO org.mortbay.log: jetty-6.1.26
> > 2011-05-13 04:01:14,567 INFO org.mortbay.log: Started 
> > SelectChannelConnector@0.0.0.0:50075
> > 2011-05-13 04:01:14,570 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
> > Initializing JVM Metrics with processName=DataNode, sessionId=null
> > 2011-05-13 04:01:14,976 INFO org.apache.hadoop.ipc.Server: Starting Socket 
> > Reader #1 for port 50020
> > 2011-05-13 04:01:14,978 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: 
> > Initializing RPC Metrics with hostName=DataNode, port=50020
> > 2011-05-13 04:01:14,981 INFO 
> > org.apache.hadoop.ipc.metrics.RpcDetailedMetrics: Initializing RPC Metrics 
> > with hostName=DataNode, port=50020
> > 2011-05-13 04:01:14,984 INFO 
> > org.apache.hadoop.hdfs.server.datanode.DataNode: dnRegistration = 
> > DatanodeRegistration(clone1:50010, storageID=, infoPort=50075, 
> > ipcPort=50020)
> > 
> > Does anyone know what might be wrong??
> > 
> > Thank you in advance!
> > Panagiotis
  

Re: mapper java process not exiting

2011-05-13 Thread Adi
>>Is there a reason for using OpenJDK and not Sun's JDK?

The cluster we are seeing the problem in uses Sun's JDK  java version
"1.6.0_21",Java(TM) SE Runtime Environment (build 1.6.0_21-b06),Java
HotSpot(TM) 64-Bit Server VM (build 17.0-b16, mixed mode)

The standalone node where I tried to reproduce the issue uses OpenJDK and
this one does not see this issue as it is able to reuse JVMs.

-Adi

Also...  I believe there were noted issues with the .17 JDK. I will look for
> a link and post if I can find.
>
>

> Otherwise, the behaviour I have seen before. Hadoop is detaching from the
> JVM and stops seeing it.
>
> I think your problem lies in the JDK and not Hadoop.
>
>
> On May 12, 2011 at 8:12 PM, Adi  wrote:
>
> >>> 2011-05-12 13:52:04,147 WARN
> >> org.apache.hadoop.mapreduce.util.ProcessTree:
> >>> Error executing shell command
> >>> org.apache.hadoop.util.Shell$ExitCodeException: kill -12545: No such
> >> process
> >>
> >> Your logs showed that Hadoop tried to kill processes but the kill
> >> command claimed they didn't exist. The next time you see this problem,
> >> can you check the logs and see if any of the PIDs that appear in the
> >> logs are in fact still running?
> >>
> >> A more likely scenario is that Hadoop's tracking of child VMs is
> >> getting out of sync, but I'm not sure what would cause that.
> >>
> >>
> > Yes those java processes are in fact running. And those error messages do
> > not always show up. Just sometimes. But the processes never get cleaned
> up.
> >
> > -Adi
>


Re: mapper java process not exiting

2011-05-13 Thread highpointe
You posted system specifics earlier; would you mind posting again? can't find 
them in the thread. 

Sent from my iPhone

On May 13, 2011, at 8:05 AM, Adi  wrote:

>>> Is there a reason for using OpenJDK and not Sun's JDK?
> 
> The cluster we are seeing the problem in uses Sun's JDK  java version
> "1.6.0_21",Java(TM) SE Runtime Environment (build 1.6.0_21-b06),Java
> HotSpot(TM) 64-Bit Server VM (build 17.0-b16, mixed mode)
> 
> The standalone node where I tried to reproduce the issue uses OpenJDK and
> this one does not see this issue as it is able to reuse JVMs.
> 
> -Adi
> 
> Also...  I believe there were noted issues with the .17 JDK. I will look for
>> a link and post if I can find.
>> 
>> 
> 
>> Otherwise, the behaviour I have seen before. Hadoop is detaching from the
>> JVM and stops seeing it.
>> 
>> I think your problem lies in the JDK and not Hadoop.
>> 
>> 
>> On May 12, 2011 at 8:12 PM, Adi  wrote:
>> 
> 2011-05-12 13:52:04,147 WARN
 org.apache.hadoop.mapreduce.util.ProcessTree:
> Error executing shell command
> org.apache.hadoop.util.Shell$ExitCodeException: kill -12545: No such
 process
 
 Your logs showed that Hadoop tried to kill processes but the kill
 command claimed they didn't exist. The next time you see this problem,
 can you check the logs and see if any of the PIDs that appear in the
 logs are in fact still running?
 
 A more likely scenario is that Hadoop's tracking of child VMs is
 getting out of sync, but I'm not sure what would cause that.
 
 
>>> Yes those java processes are in fact running. And those error messages do
>>> not always show up. Just sometimes. But the processes never get cleaned
>> up.
>>> 
>>> -Adi
>> 


Re: Datanode doesn't start but there is no exception in the log

2011-05-13 Thread highpointe
When you say "freeze" you mean there is nothing rolling in the log?

Sent from my iPhone

On May 13, 2011, at 2:28 AM, Panayotis Antonopoulos 
 wrote:

> 
> There is no shutdown message until I shutdown the DataNode.
> 
> I used hostname of the machine that will run the DataNode and I now used the 
> IP but there is no difference.
> Again the DataNode seems to freeze and the output at the log is the one I 
> mentioned before.
> 
> 
> 
>> Subject: Re: Datanode doesn't start but there is no exception in the log
>> From: highpoint...@gmail.com
>> Date: Thu, 12 May 2011 23:59:02 -0600
>> To: common-user@hadoop.apache.org
>> 
>> Have you defined the IP
>> of the DN in the slaves file?
>> 
>> 
>> 
>> Sent from my iPhone
>> 
>> On May 12, 2011, at 7:27 PM, Bharath Mundlapudi  
>> wrote:
>> 
>>> Is that all the messages in the datanode log? Do you see any SHUTDOWN 
>>> message also?
>>> 
>>> -Bharath
>>> 
>>> 
>>> 
>>> 
>>> From: Panayotis Antonopoulos 
>>> To: common-user@hadoop.apache.org
>>> Sent: Thursday, May 12, 2011 6:07 PM
>>> Subject: Datanode doesn't start but there is no exception in the log
>>> 
>>> 
>>> Hello,
>>> I am trying to set up Hadoop HDFS in a cluster for the first time. So far I 
>>> was using pseudo-distributed mode on my PC at home and everything was 
>>> working perfectly.
>>> Tha NameNode starts but the DataNode doesn't start and the log contains the 
>>> following:
>>> 
>>> 2011-05-13 04:01:13,663 INFO 
>>> org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG: 
>>> /
>>> STARTUP_MSG: Starting DataNode
>>> STARTUP_MSG:   host = clone1/147.102.4.129
>>> STARTUP_MSG:   args = []
>>> STARTUP_MSG:   version = 0.20.2-cdh3u0
>>> STARTUP_MSG:   build =  -r 81256ad0f2e4ab2bd34b04f53d25a6c23686dd14; 
>>> compiled by 'hudson' on Fri Mar 25 19:56:23 PDT 2011
>>> /
>>> 2011-05-13 04:01:14,019 INFO 
>>> org.apache.hadoop.security.UserGroupInformation: JAAS Configuration already 
>>> set up for Hadoop, not re-installing.
>>> 2011-05-13 04:01:14,143 INFO 
>>> org.apache.hadoop.hdfs.server.datanode.DataNode: Registered 
>>> FSDatasetStatusMBean
>>> 2011-05-13 04:01:14,152 INFO 
>>> org.apache.hadoop.hdfs.server.datanode.DataNode: Opened info server at 50010
>>> 2011-05-13 04:01:14,154 INFO 
>>> org.apache.hadoop.hdfs.server.datanode.DataNode: Balancing bandwith is 
>>> 1048576 bytes/s
>>> 2011-05-13 04:01:14,206 INFO org.mortbay.log: Logging to 
>>> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via 
>>> org.mortbay.log.Slf4jLog
>>> 2011-05-13 04:01:14,272 INFO org.apache.hadoop.http.HttpServer: Added 
>>> global filtersafety 
>>> (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
>>> 2011-05-13 04:01:14,278 INFO org.apache.hadoop.http.HttpServer: Port 
>>> returned by webServer.getConnectors()[0].getLocalPort() before open() is 
>>> -1. Opening the listener on 50075
>>> 2011-05-13 04:01:14,278 INFO org.apache.hadoop.http.HttpServer: 
>>> listener.getLocalPort() returned 50075 
>>> webServer.getConnectors()[0].getLocalPort() returned 50075
>>> 2011-05-13 04:01:14,278 INFO org.apache.hadoop.http.HttpServer: Jetty bound 
>>> to port 50075
>>> 2011-05-13 04:01:14,278 INFO org.mortbay.log: jetty-6.1.26
>>> 2011-05-13 04:01:14,567 INFO org.mortbay.log: Started 
>>> SelectChannelConnector@0.0.0.0:50075
>>> 2011-05-13 04:01:14,570 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
>>> Initializing JVM Metrics with processName=DataNode, sessionId=null
>>> 2011-05-13 04:01:14,976 INFO org.apache.hadoop.ipc.Server: Starting Socket 
>>> Reader #1 for port 50020
>>> 2011-05-13 04:01:14,978 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: 
>>> Initializing RPC Metrics with hostName=DataNode, port=50020
>>> 2011-05-13 04:01:14,981 INFO 
>>> org.apache.hadoop.ipc.metrics.RpcDetailedMetrics: Initializing RPC Metrics 
>>> with hostName=DataNode, port=50020
>>> 2011-05-13 04:01:14,984 INFO 
>>> org.apache.hadoop.hdfs.server.datanode.DataNode: dnRegistration = 
>>> DatanodeRegistration(clone1:50010, storageID=, infoPort=50075, 
>>> ipcPort=50020)
>>> 
>>> Does anyone know what might be wrong??
>>> 
>>> Thank you in advance!
>>> Panagiotis
> 


RE: Datanode doesn't start but there is no exception in the log

2011-05-13 Thread Panayotis Antonopoulos

There is no other information in the log (although when I run it on my pc and 
it works, there is more information in the log) and also the web page of the 
namenode doesn't contain any live datanodes as it should. 

That's why I said it freezes... I have no idea what is going on...

Please if anyone can help because it is really important to make it work as 
soon as possible (for my diploma thesis) and I really have no idea what might 
go wrong.


> Subject: Re: Datanode doesn't start but there is no exception in the log
> From: highpoint...@gmail.com
> Date: Fri, 13 May 2011 08:22:53 -0600
> To: common-user@hadoop.apache.org
> 
> When you say "freeze" you mean there is nothing rolling in the log?
> 
> Sent from my iPhone
> 
> On May 13, 2011, at 2:28 AM, Panayotis Antonopoulos 
>  wrote:
> 
> > 
> > There is no shutdown message until I shutdown the DataNode.
> > 
> > I used hostname of the machine that will run the DataNode and I now used 
> > the IP but there is no difference.
> > Again the DataNode seems to freeze and the output at the log is the one I 
> > mentioned before.
> > 
> > 
> > 
> >> Subject: Re: Datanode doesn't start but there is no exception in the log
> >> From: highpoint...@gmail.com
> >> Date: Thu, 12 May 2011 23:59:02 -0600
> >> To: common-user@hadoop.apache.org
> >> 
> >> Have you defined the IP
> >> of the DN in the slaves file?
> >> 
> >> 
> >> 
> >> Sent from my iPhone
> >> 
> >> On May 12, 2011, at 7:27 PM, Bharath Mundlapudi  
> >> wrote:
> >> 
> >>> Is that all the messages in the datanode log? Do you see any SHUTDOWN 
> >>> message also?
> >>> 
> >>> -Bharath
> >>> 
> >>> 
> >>> 
> >>> 
> >>> From: Panayotis Antonopoulos 
> >>> To: common-user@hadoop.apache.org
> >>> Sent: Thursday, May 12, 2011 6:07 PM
> >>> Subject: Datanode doesn't start but there is no exception in the log
> >>> 
> >>> 
> >>> Hello,
> >>> I am trying to set up Hadoop HDFS in a cluster for the first time. So far 
> >>> I was using pseudo-distributed mode on my PC at home and everything was 
> >>> working perfectly.
> >>> Tha NameNode starts but the DataNode doesn't start and the log contains 
> >>> the following:
> >>> 
> >>> 2011-05-13 04:01:13,663 INFO 
> >>> org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG: 
> >>> /
> >>> STARTUP_MSG: Starting DataNode
> >>> STARTUP_MSG:   host = clone1/147.102.4.129
> >>> STARTUP_MSG:   args = []
> >>> STARTUP_MSG:   version = 0.20.2-cdh3u0
> >>> STARTUP_MSG:   build =  -r 81256ad0f2e4ab2bd34b04f53d25a6c23686dd14; 
> >>> compiled by 'hudson' on Fri Mar 25 19:56:23 PDT 2011
> >>> /
> >>> 2011-05-13 04:01:14,019 INFO 
> >>> org.apache.hadoop.security.UserGroupInformation: JAAS Configuration 
> >>> already set up for Hadoop, not re-installing.
> >>> 2011-05-13 04:01:14,143 INFO 
> >>> org.apache.hadoop.hdfs.server.datanode.DataNode: Registered 
> >>> FSDatasetStatusMBean
> >>> 2011-05-13 04:01:14,152 INFO 
> >>> org.apache.hadoop.hdfs.server.datanode.DataNode: Opened info server at 
> >>> 50010
> >>> 2011-05-13 04:01:14,154 INFO 
> >>> org.apache.hadoop.hdfs.server.datanode.DataNode: Balancing bandwith is 
> >>> 1048576 bytes/s
> >>> 2011-05-13 04:01:14,206 INFO org.mortbay.log: Logging to 
> >>> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via 
> >>> org.mortbay.log.Slf4jLog
> >>> 2011-05-13 04:01:14,272 INFO org.apache.hadoop.http.HttpServer: Added 
> >>> global filtersafety 
> >>> (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
> >>> 2011-05-13 04:01:14,278 INFO org.apache.hadoop.http.HttpServer: Port 
> >>> returned by webServer.getConnectors()[0].getLocalPort() before open() is 
> >>> -1. Opening the listener on 50075
> >>> 2011-05-13 04:01:14,278 INFO org.apache.hadoop.http.HttpServer: 
> >>> listener.getLocalPort() returned 50075 
> >>> webServer.getConnectors()[0].getLocalPort() returned 50075
> >>> 2011-05-13 04:01:14,278 INFO org.apache.hadoop.http.HttpServer: Jetty 
> >>> bound to port 50075
> >>> 2011-05-13 04:01:14,278 INFO org.mortbay.log: jetty-6.1.26
> >>> 2011-05-13 04:01:14,567 INFO org.mortbay.log: Started 
> >>> SelectChannelConnector@0.0.0.0:50075
> >>> 2011-05-13 04:01:14,570 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
> >>> Initializing JVM Metrics with processName=DataNode, sessionId=null
> >>> 2011-05-13 04:01:14,976 INFO org.apache.hadoop.ipc.Server: Starting 
> >>> Socket Reader #1 for port 50020
> >>> 2011-05-13 04:01:14,978 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: 
> >>> Initializing RPC Metrics with hostName=DataNode, port=50020
> >>> 2011-05-13 04:01:14,981 INFO 
> >>> org.apache.hadoop.ipc.metrics.RpcDetailedMetrics: Initializing RPC 
> >>> Metrics with hostName=DataNode, port=50020
> >>> 2011-05-13 04:01:14,984 INFO 
> >>> org.apache.hadoop.hdfs.server.datanode.DataNode: dnRegistration = 
> >>> DatanodeRegistration(clone1:50010, storageID=, infoPort=

Re: Can Mapper get paths of inputSplits ?

2011-05-13 Thread Owen O'Malley
On Thu, May 12, 2011 at 10:16 PM, Mark question  wrote:

>   Who's filling the map.input.file and map.input.offset (ie. which class)
> so I can extend it to have a function to return these strings.


MapTask.updateJobWithSplit is the method doing the work.

-- Owen


Re: Datanode doesn't start but there is no exception in the log

2011-05-13 Thread Harsh J
Hello Panayotis,

Could you please post a jstack output of your hung process to look into?

$ jstack  # will do.

2011/5/13 Panayotis Antonopoulos :
>
> There is no other information in the log (although when I run it on my pc and 
> it works, there is more information in the log) and also the web page of the 
> namenode doesn't contain any live datanodes as it should.
>
> That's why I said it freezes... I have no idea what is going on...
>
> Please if anyone can help because it is really important to make it work as 
> soon as possible (for my diploma thesis) and I really have no idea what might 
> go wrong.
>
>
>> Subject: Re: Datanode doesn't start but there is no exception in the log
>> From: highpoint...@gmail.com
>> Date: Fri, 13 May 2011 08:22:53 -0600
>> To: common-user@hadoop.apache.org
>>
>> When you say "freeze" you mean there is nothing rolling in the log?
>>
>> Sent from my iPhone
>>
>> On May 13, 2011, at 2:28 AM, Panayotis Antonopoulos 
>>  wrote:
>>
>> >
>> > There is no shutdown message until I shutdown the DataNode.
>> >
>> > I used hostname of the machine that will run the DataNode and I now used 
>> > the IP but there is no difference.
>> > Again the DataNode seems to freeze and the output at the log is the one I 
>> > mentioned before.
>> >
>> >
>> >
>> >> Subject: Re: Datanode doesn't start but there is no exception in the log
>> >> From: highpoint...@gmail.com
>> >> Date: Thu, 12 May 2011 23:59:02 -0600
>> >> To: common-user@hadoop.apache.org
>> >>
>> >> Have you defined the IP
>> >> of the DN in the slaves file?
>> >>
>> >>
>> >>
>> >> Sent from my iPhone
>> >>
>> >> On May 12, 2011, at 7:27 PM, Bharath Mundlapudi  
>> >> wrote:
>> >>
>> >>> Is that all the messages in the datanode log? Do you see any SHUTDOWN 
>> >>> message also?
>> >>>
>> >>> -Bharath
>> >>>
>> >>>
>> >>>
>> >>> 
>> >>> From: Panayotis Antonopoulos 
>> >>> To: common-user@hadoop.apache.org
>> >>> Sent: Thursday, May 12, 2011 6:07 PM
>> >>> Subject: Datanode doesn't start but there is no exception in the log
>> >>>
>> >>>
>> >>> Hello,
>> >>> I am trying to set up Hadoop HDFS in a cluster for the first time. So 
>> >>> far I was using pseudo-distributed mode on my PC at home and everything 
>> >>> was working perfectly.
>> >>> Tha NameNode starts but the DataNode doesn't start and the log contains 
>> >>> the following:
>> >>>
>> >>> 2011-05-13 04:01:13,663 INFO 
>> >>> org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
>> >>> /
>> >>> STARTUP_MSG: Starting DataNode
>> >>> STARTUP_MSG:   host = clone1/147.102.4.129
>> >>> STARTUP_MSG:   args = []
>> >>> STARTUP_MSG:   version = 0.20.2-cdh3u0
>> >>> STARTUP_MSG:   build =  -r 81256ad0f2e4ab2bd34b04f53d25a6c23686dd14; 
>> >>> compiled by 'hudson' on Fri Mar 25 19:56:23 PDT 2011
>> >>> /
>> >>> 2011-05-13 04:01:14,019 INFO 
>> >>> org.apache.hadoop.security.UserGroupInformation: JAAS Configuration 
>> >>> already set up for Hadoop, not re-installing.
>> >>> 2011-05-13 04:01:14,143 INFO 
>> >>> org.apache.hadoop.hdfs.server.datanode.DataNode: Registered 
>> >>> FSDatasetStatusMBean
>> >>> 2011-05-13 04:01:14,152 INFO 
>> >>> org.apache.hadoop.hdfs.server.datanode.DataNode: Opened info server at 
>> >>> 50010
>> >>> 2011-05-13 04:01:14,154 INFO 
>> >>> org.apache.hadoop.hdfs.server.datanode.DataNode: Balancing bandwith is 
>> >>> 1048576 bytes/s
>> >>> 2011-05-13 04:01:14,206 INFO org.mortbay.log: Logging to 
>> >>> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via 
>> >>> org.mortbay.log.Slf4jLog
>> >>> 2011-05-13 04:01:14,272 INFO org.apache.hadoop.http.HttpServer: Added 
>> >>> global filtersafety 
>> >>> (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
>> >>> 2011-05-13 04:01:14,278 INFO org.apache.hadoop.http.HttpServer: Port 
>> >>> returned by webServer.getConnectors()[0].getLocalPort() before open() is 
>> >>> -1. Opening the listener on 50075
>> >>> 2011-05-13 04:01:14,278 INFO org.apache.hadoop.http.HttpServer: 
>> >>> listener.getLocalPort() returned 50075 
>> >>> webServer.getConnectors()[0].getLocalPort() returned 50075
>> >>> 2011-05-13 04:01:14,278 INFO org.apache.hadoop.http.HttpServer: Jetty 
>> >>> bound to port 50075
>> >>> 2011-05-13 04:01:14,278 INFO org.mortbay.log: jetty-6.1.26
>> >>> 2011-05-13 04:01:14,567 INFO org.mortbay.log: Started 
>> >>> SelectChannelConnector@0.0.0.0:50075
>> >>> 2011-05-13 04:01:14,570 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
>> >>> Initializing JVM Metrics with processName=DataNode, sessionId=null
>> >>> 2011-05-13 04:01:14,976 INFO org.apache.hadoop.ipc.Server: Starting 
>> >>> Socket Reader #1 for port 50020
>> >>> 2011-05-13 04:01:14,978 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: 
>> >>> Initializing RPC Metrics with hostName=DataNode, port=50020
>> >>> 2011-05-13 04:01:14,981 INFO 
>> >>> org.apache.hadoop.ipc.metrics.RpcDetail

RE: Datanode doesn't start but there is no exception in the log

2011-05-13 Thread Panayotis Antonopoulos

Thank you for your help!
Here is the result of the command that you said:

panton@clone1:~/hadoop-0.20.203.0$ jstack 6320
2011-05-13 20:31:59
Full thread dump Java HotSpot(TM) 64-Bit Server VM (20.0-b11 mixed mode):

"Attach Listener" daemon prio=10 tid=0x409c9800 nid=0x1999 waiting on 
condition [0x]
   java.lang.Thread.State: RUNNABLE

"pool-1-thread-1" prio=10 tid=0x7f50f035f800 nid=0x1973 runnable 
[0x7f50f6caf000]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
- locked <0xeb403378> (a sun.nio.ch.Util$2)
- locked <0xeb403368> (a java.util.Collections$UnmodifiableSet)
- locked <0xeb403160> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:84)
at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:332)
- locked <0xeb403548> (a 
org.apache.hadoop.ipc.Server$Listener$Reader)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

"Timer-0" daemon prio=10 tid=0x7f50f84e6800 nid=0x195b in Object.wait() 
[0x7f50f6db]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0xec47a0a8> (a java.util.TaskQueue)
at java.util.TimerThread.mainLoop(Timer.java:509)
- locked <0xec47a0a8> (a java.util.TaskQueue)
at java.util.TimerThread.run(Timer.java:462)

"899599744@qtp-1416044437-1 - Acceptor0 SelectChannelConnector@0.0.0.0:50075" 
prio=10 tid=0x7f50f8414800 nid=0x1926 runnable [0x7f50f6eb1000]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
- locked <0xec473870> (a sun.nio.ch.Util$2)
- locked <0xec473860> (a java.util.Collections$UnmodifiableSet)
- locked <0xec4733c8> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
at 
org.mortbay.io.nio.SelectorManager$SelectSet.doSelect(SelectorManager.java:498)
at org.mortbay.io.nio.SelectorManager.doSelect(SelectorManager.java:192)
at 
org.mortbay.jetty.nio.SelectChannelConnector.accept(SelectChannelConnector.java:124)
at 
org.mortbay.jetty.AbstractConnector$Acceptor.run(AbstractConnector.java:708)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

"1620640756@qtp-1416044437-0" prio=10 tid=0x7f50f83f0800 nid=0x1925 in 
Object.wait() [0x7f50f75eb000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0xec473a88> (a 
org.mortbay.thread.QueuedThreadPool$PoolThread)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:626)
- locked <0xec473a88> (a 
org.mortbay.thread.QueuedThreadPool$PoolThread)

"refreshUsed-/home/users/panton/hadoop-0.20.203.0/dfs/data" daemon prio=10 
tid=0x7f50f83f8000 nid=0x191f waiting on condition [0x7f50f77fb000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at org.apache.hadoop.fs.DU$DURefreshThread.run(DU.java:80)
at java.lang.Thread.run(Thread.java:662)

"Timer for 'DataNode' metrics system" daemon prio=10 tid=0x7f50f83d3000 
nid=0x18f5 in Object.wait() [0x7f50f7b02000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0xec4407f8> (a java.util.TaskQueue)
at java.util.TimerThread.mainLoop(Timer.java:509)
- locked <0xec4407f8> (a java.util.TaskQueue)
at java.util.TimerThread.run(Timer.java:462)

"RMI TCP Accept-0" daemon prio=10 tid=0x7f50f835f800 nid=0x18e5 runnable 
[0x7f50f7d04000]
   java.lang.Thread.State: RUNNABLE
at java.net.PlainSocketImpl.socketAccept(Native Method)
at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:408)
- locked <0xec3e3930> (a java.net.SocksSocketImpl)
at java.net.ServerSocket.implAccept(ServerSocket.java:462)
at java.net.ServerSocket.accept(ServerSocket.java:430)
at 
sun.management.jmxremote.LocalRMIServerSocketFactory$1.accept(LocalRMIServerSocketFactory.java:34)
at 
sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:369)
at sun.rmi.transp

Re: Datanode doesn't start but there is no exception in the log

2011-05-13 Thread Harsh J
Hey,

2011/5/13 Panayotis Antonopoulos :
> "899599744@qtp-1416044437-1 - Acceptor0 SelectChannelConnector@0.0.0.0:50075" 
> prio=10 tid=0x7f50f8414800 nid=0x1926 runnable [0x7f50f6eb1000]
>   java.lang.Thread.State: RUNNABLE
>    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
>    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
>    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
>    - locked <0xec473870> (a sun.nio.ch.Util$2)
>    - locked <0xec473860> (a java.util.Collections$UnmodifiableSet)
>    - locked <0xec4733c8> (a sun.nio.ch.EPollSelectorImpl)
>    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
>    at 
> org.mortbay.io.nio.SelectorManager$SelectSet.doSelect(SelectorManager.java:498)
>    at org.mortbay.io.nio.SelectorManager.doSelect(SelectorManager.java:192)
>    at 
> org.mortbay.jetty.nio.SelectChannelConnector.accept(SelectChannelConnector.java:124)
>    at 
> org.mortbay.jetty.AbstractConnector$Acceptor.run(AbstractConnector.java:708)
>    at 
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

and,

> "1620640756@qtp-1416044437-0" prio=10 tid=0x7f50f83f0800 nid=0x1925 in 
> Object.wait() [0x7f50f75eb000]
>   java.lang.Thread.State: TIMED_WAITING (on object monitor)
>    at java.lang.Object.wait(Native Method)
>    - waiting on <0xec473a88> (a 
> org.mortbay.thread.QueuedThreadPool$PoolThread)
>    at 
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:626)
>    - locked <0xec473a88> (a 
> org.mortbay.thread.QueuedThreadPool$PoolThread)

and,

> "main" prio=10 tid=0x40115000 nid=0x18d2 runnable [0x7f5101d2e000]
>   java.lang.Thread.State: RUNNABLE
>    at java.io.FileInputStream.readBytes(Native Method)
>    at java.io.FileInputStream.read(FileInputStream.java:220)
>    at 
> sun.security.provider.SeedGenerator$URLSeedGenerator.getSeedBytes(SeedGenerator.java:493)
>    at sun.security.provider.SeedGenerator.generateSeed(SeedGenerator.java:117)
>    at 
> sun.security.provider.SecureRandom.engineGenerateSeed(SecureRandom.java:114)
>    at 
> sun.security.provider.SecureRandom.engineNextBytes(SecureRandom.java:171)
>    - locked <0xeb430cc0> (a sun.security.provider.SecureRandom)
>    at java.security.SecureRandom.nextBytes(SecureRandom.java:433)
>    - locked <0xeb430f60> (a java.security.SecureRandom)
>    at java.security.SecureRandom.next(SecureRandom.java:455)
>    at java.util.Random.nextInt(Random.java:257)
>    at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.setNewStorageID(DataNode.java:608)
>    at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.register(DataNode.java:629)
>    at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.runDatanodeDaemon(DataNode.java:1378)
>    at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1438)
>    at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1563)
>    at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1573)

lead me to believe that you're running into this:
http://search-hadoop.com/m/7Giae6vLWR1/securerandom&subj=Re+Entropy+Pool+and+HDFS+FS+Commands+Hanging+System

Just give it some time and it should start up soon (you may generate
some other activity on the DN to help it get some fresh entropy).
Sometimes it may take up to a minute at start up.

-- 
Harsh J


Re: Datanode doesn't start but there is no exception in the log

2011-05-13 Thread Harsh J
Actually, only the last mentioned stack matters. Also see:
https://issues.apache.org/jira/browse/HDFS-1835

On Fri, May 13, 2011 at 11:15 PM, Harsh J  wrote:
> Hey,
>
> 2011/5/13 Panayotis Antonopoulos :
>> "899599744@qtp-1416044437-1 - Acceptor0 
>> SelectChannelConnector@0.0.0.0:50075" prio=10 tid=0x7f50f8414800 
>> nid=0x1926 runnable [0x7f50f6eb1000]
>>   java.lang.Thread.State: RUNNABLE
>>    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>>    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
>>    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
>>    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
>>    - locked <0xec473870> (a sun.nio.ch.Util$2)
>>    - locked <0xec473860> (a java.util.Collections$UnmodifiableSet)
>>    - locked <0xec4733c8> (a sun.nio.ch.EPollSelectorImpl)
>>    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
>>    at 
>> org.mortbay.io.nio.SelectorManager$SelectSet.doSelect(SelectorManager.java:498)
>>    at org.mortbay.io.nio.SelectorManager.doSelect(SelectorManager.java:192)
>>    at 
>> org.mortbay.jetty.nio.SelectChannelConnector.accept(SelectChannelConnector.java:124)
>>    at 
>> org.mortbay.jetty.AbstractConnector$Acceptor.run(AbstractConnector.java:708)
>>    at 
>> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
>
> and,
>
>> "1620640756@qtp-1416044437-0" prio=10 tid=0x7f50f83f0800 nid=0x1925 in 
>> Object.wait() [0x7f50f75eb000]
>>   java.lang.Thread.State: TIMED_WAITING (on object monitor)
>>    at java.lang.Object.wait(Native Method)
>>    - waiting on <0xec473a88> (a 
>> org.mortbay.thread.QueuedThreadPool$PoolThread)
>>    at 
>> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:626)
>>    - locked <0xec473a88> (a 
>> org.mortbay.thread.QueuedThreadPool$PoolThread)
>
> and,
>
>> "main" prio=10 tid=0x40115000 nid=0x18d2 runnable 
>> [0x7f5101d2e000]
>>   java.lang.Thread.State: RUNNABLE
>>    at java.io.FileInputStream.readBytes(Native Method)
>>    at java.io.FileInputStream.read(FileInputStream.java:220)
>>    at 
>> sun.security.provider.SeedGenerator$URLSeedGenerator.getSeedBytes(SeedGenerator.java:493)
>>    at 
>> sun.security.provider.SeedGenerator.generateSeed(SeedGenerator.java:117)
>>    at 
>> sun.security.provider.SecureRandom.engineGenerateSeed(SecureRandom.java:114)
>>    at 
>> sun.security.provider.SecureRandom.engineNextBytes(SecureRandom.java:171)
>>    - locked <0xeb430cc0> (a sun.security.provider.SecureRandom)
>>    at java.security.SecureRandom.nextBytes(SecureRandom.java:433)
>>    - locked <0xeb430f60> (a java.security.SecureRandom)
>>    at java.security.SecureRandom.next(SecureRandom.java:455)
>>    at java.util.Random.nextInt(Random.java:257)
>>    at 
>> org.apache.hadoop.hdfs.server.datanode.DataNode.setNewStorageID(DataNode.java:608)
>>    at 
>> org.apache.hadoop.hdfs.server.datanode.DataNode.register(DataNode.java:629)
>>    at 
>> org.apache.hadoop.hdfs.server.datanode.DataNode.runDatanodeDaemon(DataNode.java:1378)
>>    at 
>> org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1438)
>>    at 
>> org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1563)
>>    at 
>> org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1573)
>
> lead me to believe that you're running into this:
> http://search-hadoop.com/m/7Giae6vLWR1/securerandom&subj=Re+Entropy+Pool+and+HDFS+FS+Commands+Hanging+System
>
> Just give it some time and it should start up soon (you may generate
> some other activity on the DN to help it get some fresh entropy).
> Sometimes it may take up to a minute at start up.
>
> --
> Harsh J
>



-- 
Harsh J


RE: Datanode doesn't start but there is no exception in the log

2011-05-13 Thread Panayotis Antonopoulos

I have been waiting for hours to see if it will ever start but it doesn't.
I will check the links you sent me.

Thanks again for your help!!!

> From: ha...@cloudera.com
> Date: Fri, 13 May 2011 23:18:40 +0530
> Subject: Re: Datanode doesn't start but there is no exception in the log
> To: common-user@hadoop.apache.org
> 
> Actually, only the last mentioned stack matters. Also see:
> https://issues.apache.org/jira/browse/HDFS-1835
> 
> On Fri, May 13, 2011 at 11:15 PM, Harsh J  wrote:
> > Hey,
> >
> > 2011/5/13 Panayotis Antonopoulos :
> >> "899599744@qtp-1416044437-1 - Acceptor0 
> >> SelectChannelConnector@0.0.0.0:50075" prio=10 tid=0x7f50f8414800 
> >> nid=0x1926 runnable [0x7f50f6eb1000]
> >>   java.lang.Thread.State: RUNNABLE
> >>at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
> >>at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
> >>at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
> >>at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
> >>- locked <0xec473870> (a sun.nio.ch.Util$2)
> >>- locked <0xec473860> (a java.util.Collections$UnmodifiableSet)
> >>- locked <0xec4733c8> (a sun.nio.ch.EPollSelectorImpl)
> >>at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
> >>at 
> >> org.mortbay.io.nio.SelectorManager$SelectSet.doSelect(SelectorManager.java:498)
> >>at org.mortbay.io.nio.SelectorManager.doSelect(SelectorManager.java:192)
> >>at 
> >> org.mortbay.jetty.nio.SelectChannelConnector.accept(SelectChannelConnector.java:124)
> >>at 
> >> org.mortbay.jetty.AbstractConnector$Acceptor.run(AbstractConnector.java:708)
> >>at 
> >> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> >
> > and,
> >
> >> "1620640756@qtp-1416044437-0" prio=10 tid=0x7f50f83f0800 nid=0x1925 in 
> >> Object.wait() [0x7f50f75eb000]
> >>   java.lang.Thread.State: TIMED_WAITING (on object monitor)
> >>at java.lang.Object.wait(Native Method)
> >>- waiting on <0xec473a88> (a 
> >> org.mortbay.thread.QueuedThreadPool$PoolThread)
> >>at 
> >> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:626)
> >>- locked <0xec473a88> (a 
> >> org.mortbay.thread.QueuedThreadPool$PoolThread)
> >
> > and,
> >
> >> "main" prio=10 tid=0x40115000 nid=0x18d2 runnable 
> >> [0x7f5101d2e000]
> >>   java.lang.Thread.State: RUNNABLE
> >>at java.io.FileInputStream.readBytes(Native Method)
> >>at java.io.FileInputStream.read(FileInputStream.java:220)
> >>at 
> >> sun.security.provider.SeedGenerator$URLSeedGenerator.getSeedBytes(SeedGenerator.java:493)
> >>at 
> >> sun.security.provider.SeedGenerator.generateSeed(SeedGenerator.java:117)
> >>at 
> >> sun.security.provider.SecureRandom.engineGenerateSeed(SecureRandom.java:114)
> >>at 
> >> sun.security.provider.SecureRandom.engineNextBytes(SecureRandom.java:171)
> >>- locked <0xeb430cc0> (a sun.security.provider.SecureRandom)
> >>at java.security.SecureRandom.nextBytes(SecureRandom.java:433)
> >>- locked <0xeb430f60> (a java.security.SecureRandom)
> >>at java.security.SecureRandom.next(SecureRandom.java:455)
> >>at java.util.Random.nextInt(Random.java:257)
> >>at 
> >> org.apache.hadoop.hdfs.server.datanode.DataNode.setNewStorageID(DataNode.java:608)
> >>at 
> >> org.apache.hadoop.hdfs.server.datanode.DataNode.register(DataNode.java:629)
> >>at 
> >> org.apache.hadoop.hdfs.server.datanode.DataNode.runDatanodeDaemon(DataNode.java:1378)
> >>at 
> >> org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1438)
> >>at 
> >> org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1563)
> >>at 
> >> org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1573)
> >
> > lead me to believe that you're running into this:
> > http://search-hadoop.com/m/7Giae6vLWR1/securerandom&subj=Re+Entropy+Pool+and+HDFS+FS+Commands+Hanging+System
> >
> > Just give it some time and it should start up soon (you may generate
> > some other activity on the DN to help it get some fresh entropy).
> > Sometimes it may take up to a minute at start up.
> >
> > --
> > Harsh J
> >
> 
> 
> 
> -- 
> Harsh J
  

Re: Datanode doesn't start but there is no exception in the log

2011-05-13 Thread sridhar basam
Sounds like your entropy pool is exhausted blocking the process. What sort
of hardware/os combo are you running this on?

 Sridhar


Is there a way to tell if I'm in a combiner or a reducer?

2011-05-13 Thread W.P. McNeill
I have a MapReduce process that uses the same class for its combiner and
reducer.  I just realized that I want the behavior in the combiner and
reducer to be slightly different in this one place.  I could write separate
combiner and reducer classes derived from a common source, but in my
situation this is overkill.  Is there some way I can tell at runtime whether
I'm running as a combiner or a reducer?  Like some configuration setting?


FileSystem API - Moving files in HDFS

2011-05-13 Thread Jim Twensky
Hi,

I'd like to move and copy files from one directory in HDFS to another
one. I know there are methods in the Filesystem API that enable
copying files between the local disk and HDFS, but I couldn't figure
out how to do this between two paths both in HDFS. I think rename(Path
src, Path dest) can be used to move files, but copying still remains a
challenge to me. Any ideas?

Thanks,
Jim


Re: FileSystem API - Moving files in HDFS

2011-05-13 Thread lohit
There is no FileSystem API to copy. 
You could try 
hadoop dfs -cp  

which basically reads the file and writes to new file. 
The code for this is in FsShell.java


- Original Message 
From: Jim Twensky 
To: core-u...@hadoop.apache.org
Sent: Fri, May 13, 2011 1:21:09 PM
Subject: FileSystem API - Moving files in HDFS

Hi,

I'd like to move and copy files from one directory in HDFS to another
one. I know there are methods in the Filesystem API that enable
copying files between the local disk and HDFS, but I couldn't figure
out how to do this between two paths both in HDFS. I think rename(Path
src, Path dest) can be used to move files, but copying still remains a
challenge to me. Any ideas?

Thanks,
Jim



Re: Is there a way to tell if I'm in a combiner or a reducer?

2011-05-13 Thread Harsh Chouraria
Hello,

Short answer: No. Use separate classes (or derive your combiner from the 
reducer, with modified behavior).

I answered a similar question not too long ago from now: 
http://search-hadoop.com/m/Wh7vuKJEtL1/reducer+combiner&subj=Differentiate+Reducer+or+Combiner

HTH.

On 14-May-2011, at 1:17 AM, W.P. McNeill wrote:

> I have a MapReduce process that uses the same class for its combiner and
> reducer.  I just realized that I want the behavior in the combiner and
> reducer to be slightly different in this one place.  I could write separate
> combiner and reducer classes derived from a common source, but in my
> situation this is overkill.  Is there some way I can tell at runtime whether
> I'm running as a combiner or a reducer?  Like some configuration setting?

--
Harsh J

RE: Datanode doesn't start but there is no exception in the log

2011-05-13 Thread Panayotis Antonopoulos

I am using the cluster of the Lab of my university, so I have limited access to 
its settings.
It consists of 10 nodes with Intel Xeon CPUs and GNU/Linux 2.6.38

Is there a way to solve the problem without changing the settings of the 
cluster?

I am trying to use the patch that Harsh J sent me but I haven't managed to 
install it on hadoop properly yet.
I have never installed a patch before...

> Date: Fri, 13 May 2011 15:09:38 -0400
> Subject: Re: Datanode doesn't start but there is no exception in the log
> From: s...@basam.org
> To: common-user@hadoop.apache.org
> 
> Sounds like your entropy pool is exhausted blocking the process. What sort
> of hardware/os combo are you running this on?
> 
>  Sridhar
  

RE: Datanode doesn't start but there is no exception in the log

2011-05-13 Thread Panayotis Antonopoulos

I installed the patch:
https://issues.apache.org/jira/browse/HDFS-1835
that Harsh J told me and now everything works great!!!

I hope that this change won't create other problems.

Thanks to everyone and especially to Harsh J!!
I would never find the problem without your help!!

> From: antonopoulos...@hotmail.com
> To: common-user@hadoop.apache.org
> Subject: RE: Datanode doesn't start but there is no exception in the log
> Date: Sat, 14 May 2011 03:49:21 +0300
> 
> 
> I am using the cluster of the Lab of my university, so I have limited access 
> to its settings.
> It consists of 10 nodes with Intel Xeon CPUs and GNU/Linux 2.6.38
> 
> Is there a way to solve the problem without changing the settings of the 
> cluster?
> 
> I am trying to use the patch that Harsh J sent me but I haven't managed to 
> install it on hadoop properly yet.
> I have never installed a patch before...
> 
> > Date: Fri, 13 May 2011 15:09:38 -0400
> > Subject: Re: Datanode doesn't start but there is no exception in the log
> > From: s...@basam.org
> > To: common-user@hadoop.apache.org
> > 
> > Sounds like your entropy pool is exhausted blocking the process. What sort
> > of hardware/os combo are you running this on?
> > 
> >  Sridhar
> 
  

Re: FileSystem API - Moving files in HDFS

2011-05-13 Thread Mahadev Konar
Jim,
 you can use FileUtil.copy() methods to copy files.

Hope that helps.


-- 
thanks
mahadev
@mahadevkonar



On Fri, May 13, 2011 at 2:00 PM, lohit  wrote:
> There is no FileSystem API to copy.
> You could try
> hadoop dfs -cp  
>
> which basically reads the file and writes to new file.
> The code for this is in FsShell.java
>
>
> - Original Message 
> From: Jim Twensky 
> To: core-u...@hadoop.apache.org
> Sent: Fri, May 13, 2011 1:21:09 PM
> Subject: FileSystem API - Moving files in HDFS
>
> Hi,
>
> I'd like to move and copy files from one directory in HDFS to another
> one. I know there are methods in the Filesystem API that enable
> copying files between the local disk and HDFS, but I couldn't figure
> out how to do this between two paths both in HDFS. I think rename(Path
> src, Path dest) can be used to move files, but copying still remains a
> challenge to me. Any ideas?
>
> Thanks,
> Jim
>
>


Reduce stuck at 0%

2011-05-13 Thread Travis Bolinger

Hello all.

First off, let me apologize for being semi-new to hadoop.  I appreciate 
your patience.


I am trying to setup a simple cluster to use with hadoop 0.20.2 (1 
namenode/jobtracker, 3 datanodes/tasktrackers).  I believe it is almost 
set up correctly.  The webpage for the hdfs and jobtracker show all 3 
clients active and it appears to be working.  However, when I start a 
map-reduce task, the map finishes 100% but the reduce gets stuck at 0% 
and eventually gives me io errors stating: "Too many fetch-failures for 
output".  The jobtracker-log file shows this error multiple times.  I 
have seen others suggest a few things for similar errors;  here are the 
things I've tried:


1) Disable ipv6
2) Edit /etc/hosts (I'm not 100% sure I've set this file up correctly, 
you can see it in the link that follows)
3) Double check core-site.xml for misconfiguration (I think its set up 
correctly?)

4) Double check firewall for hdfs and mapred ports (think I got 'em all).

All of my setup files and a portion of all log files can be found at: 
http://pastebin.com/u/AimFirst.  If anyone sees any obvious 
configuration errors or has any other suggestions, I would greatly 
appreciate it.


Thanks for any help you can provide.
Travis