Hi Jason,
Hi Jason,

Thanks for the info - it's good to hear from somebody else who's run into this :)

I tried again with a bigger box for the master, and wound up with the same results.

I guess the framework could be killing it - but no idea why. This is during a very simple "write out the results" phase, so very high I/O but not much computation, and nothing should be hung.

Any particular configuration values you had to tweak? I'm running this in Elastic MapReduce (EMR) so most settings are whatever they provide by default. I override a few things in my JobConf, but (for example) anything related to HDFS/MR framework will be locked & loaded by the time my job is executing.

Thanks!

-- Ken

On Dec 8, 2009, at 9:34am, Jason Venner wrote:

Is it possible that this is occurring in a task that is being killed by the
framework.
Sometimes there is a little lag, between the time the tracker 'kills a task' and the task fully dies, you could be getting into a situation like that where the task is in the process of dying but the last write is still in
progress.
I see this situation happen when the task tracker machine is heavily loaded. In once case there was a 15 minute lag between the timestamp in the tracker
for killing task XYZ, and the task actually going away.

It took me a while to work this out as I had to merge the tracker and task
logs by time to actually see the pattern.
The host machines where under very heavy io pressure, and may have been paging also. The code and configuration issues that triggered this have been
resolved, so I don't see it anymore.

On Tue, Dec 8, 2009 at 8:32 AM, Ken Krugler <kkrugler_li...@transpac.com >wrote:

Hi all,

In searching the mail/web archives, I see occasionally questions from
people (like me) who run into the LeaseExpiredException (in my case, on
0.18.3 while running a 50 server cluster in EMR).

Unfortunately I don't see any responses, other than Dennis Kubes saying that he thought some work had been done in this area of Hadoop "a while back". And this was in 2007, so it hopefully doesn't apply to my situation.

I see these LeaseExpiredException errors showing up in the logs around the
same time as IOException errors, eg:

java.io.IOException: Stream closed.
      at
org.apache.hadoop.dfs.DFSClient $DFSOutputStream.isClosed(DFSClient.java:2245)
      at
org.apache.hadoop.dfs.DFSClient $DFSOutputStream.writeChunk(DFSClient.java:2481)
      at
org .apache .hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:155)
      at
org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java: 132)
      at
org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java: 121)
      at
org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:112)
at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:86)
      at
org.apache.hadoop.fs.FSDataOutputStream $PositionCache.write(FSDataOutputStream.java:49)
      at java.io.DataOutputStream.write(DataOutputStream.java:90)
      at
org.apache.hadoop.io.SequenceFile $BlockCompressWriter.writeBuffer(SequenceFile.java:1260)
      at
org.apache.hadoop.io.SequenceFile $BlockCompressWriter.sync(SequenceFile.java:1277)
      at
org.apache.hadoop.io.SequenceFile $BlockCompressWriter.close(SequenceFile.java:1295)
      at
org.apache.hadoop.mapred.SequenceFileOutputFormat $1.close(SequenceFileOutputFormat.java:73)
      at
org.apache.hadoop.mapred.MapTask $DirectMapOutputCollector.close(MapTask.java:276)
      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:238)
      at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java: 2216)

This issue seemed related, but would have been fixed in the 0.18.3 release.

http://issues.apache.org/jira/browse/HADOOP-3760

I saw a similar HBase issue -
https://issues.apache.org/jira/browse/HBASE-529 - but they "fixed" it by
retrying a failure case.

These exceptions occur during "write storms", where lots of files are being
written out. Though "lots" is relative, e.g. 10-20M.

It's repeatable, in that it fails on the same step of a series of chained
MR jobs.

Is it possible I need to be running a bigger box for my namenode server?
Any other ideas?

Thanks,

-- Ken


On May 25, 2009, at 7:37am, Stas Oskin wrote:

Hi.

I have a process that writes to file on DFS from time to time, using
OutputStream.
After some time of writing, I'm starting getting the exception below, and the write fails. The DFSClient retries several times, and then fails.

Copying the file from local disk to DFS via CopyLocalFile() works fine.

Can anyone advice on the matter?

I'm using Hadoop 0.18.3.

Thanks in advance.


09/05/25 15:35:35 INFO dfs.DFSClient:
org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.dfs.LeaseExpiredException: No lease on /test/ test.bin
File
does not exist. Holder DFSClient_-951664265 does not have any open files.

         at
org.apache.hadoop.dfs.FSNamesystem.checkLease(FSNamesystem.java: 1172)

         at

org .apache .hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1103
)

at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java: 330)

at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)

         at

sun .reflect .DelegatingMethodAccessorImpl .invoke(DelegatingMethodAccessorImpl.java:25
)

         at java.lang.reflect.Method.invoke(Method.java:597)

         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)

at org.apache.hadoop.ipc.Server$Handler.run(Server.java: 890)



         at org.apache.hadoop.ipc.Client.call(Client.java:716)

         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)

         at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

         at

sun .reflect .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
)

         at

sun .reflect .DelegatingMethodAccessorImpl .invoke(DelegatingMethodAccessorImpl.java:25
)

         at java.lang.reflect.Method.invoke(Method.java:597)

         at

org .apache .hadoop .io .retry .RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82
)

         at

org .apache .hadoop .io .retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59
)

         at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)

         at

org.apache.hadoop.dfs.DFSClient $DFSOutputStream.locateFollowingBlock(DFSClient.java:2450
)

         at

org.apache.hadoop.dfs.DFSClient $DFSOutputStream.nextBlockOutputStream(DFSClient.java:2333
)

         at

org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access $1800(DFSClient.java:1745
)

         at

org.apache.hadoop.dfs.DFSClient$DFSOutputStream $DataStreamer.run(DFSClient.java:1922
)


--------------------------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g






--------------------------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g







--
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals

--------------------------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g




Reply via email to