Re: Intermittent BindException during long MR jobs
Thanks for the responses. In our case the port is 0, and so from the link http://wiki.apache.org/hadoop/BindException Ted mentioned it says that a collision is highly unlikely: If the port is 0, then the OS is looking for any free port -so the port-in-use and port-below-1024 problems are highly unlikely to be the cause of the problem. I think load may be the culprit since the nodes will be heavily used during the times that the exception occurs. Is there anyway to set/increase the timeout for the call/connection attempt? In all cases so far it seems to be on a call to delete a file in HDFS. I had a search through the HDFS code base but couldn't see an obvious way to set a timeout, and couldn't see it being set. Krishna On 28 February 2015 at 15:20, Ted Yu yuzhih...@gmail.com wrote: Krishna: Please take a look at: http://wiki.apache.org/hadoop/BindException Cheers On Thu, Feb 26, 2015 at 10:30 PM, hadoop.supp...@visolve.com wrote: Hello Krishna, Exception seems to be IP specific. It might be occurred due to unavailability of IP address in the system to assign. Double check the IP address availability and run the job. *Thanks,* *S.RagavendraGanesh* ViSolve Hadoop Support Team ViSolve Inc. | San Jose, California Website: www.visolve.com email: servi...@visolve.com | Phone: 408-850-2243 *From:* Krishna Rao [mailto:krishnanj...@gmail.com] *Sent:* Thursday, February 26, 2015 9:48 PM *To:* user@hive.apache.org; u...@hadoop.apache.org *Subject:* Intermittent BindException during long MR jobs Hi, we occasionally run into a BindException causing long running jobs to occasionally fail. The stacktrace is below. Any ideas what this could be caused by? Cheers, Krishna Stacktrace: 379969 [Thread-980] ERROR org.apache.hadoop.hive.ql.exec.Task - Job Submission failed with exception 'java.net.BindException(Problem binding to [back10/10.4.2.10:0] java.net.BindException: Cann ot assign requested address; For more details see: http://wiki.apache.org/hadoop/BindException)' java.net.BindException: Problem binding to [back10/10.4.2.10:0] java.net.BindException: Cannot assign requested address; For more details see: http://wiki.apache.org/hadoop/BindException at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:718) at org.apache.hadoop.ipc.Client.call(Client.java:1242) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) at com.sun.proxy.$Proxy10.create(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:193) at sun.reflect.GeneratedMethodAccessor43.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) at com.sun.proxy.$Proxy11.create(Unknown Source) at org.apache.hadoop.hdfs.DFSOutputStream.init(DFSOutputStream.java:1376) at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1395) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1255) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1212) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:276) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:265) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:82) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:888) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:869) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:768) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:757) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:558) at org.apache.hadoop.mapreduce.split.JobSplitWriter.createFile(JobSplitWriter.java:96) at org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:85) at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:517) at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:487) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:369) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1286) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1283) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396
Re: Intermittent BindException during long MR jobs
Krishna: Please take a look at: http://wiki.apache.org/hadoop/BindException Cheers On Thu, Feb 26, 2015 at 10:30 PM, hadoop.supp...@visolve.com wrote: Hello Krishna, Exception seems to be IP specific. It might be occurred due to unavailability of IP address in the system to assign. Double check the IP address availability and run the job. *Thanks,* *S.RagavendraGanesh* ViSolve Hadoop Support Team ViSolve Inc. | San Jose, California Website: www.visolve.com email: servi...@visolve.com | Phone: 408-850-2243 *From:* Krishna Rao [mailto:krishnanj...@gmail.com] *Sent:* Thursday, February 26, 2015 9:48 PM *To:* user@hive.apache.org; u...@hadoop.apache.org *Subject:* Intermittent BindException during long MR jobs Hi, we occasionally run into a BindException causing long running jobs to occasionally fail. The stacktrace is below. Any ideas what this could be caused by? Cheers, Krishna Stacktrace: 379969 [Thread-980] ERROR org.apache.hadoop.hive.ql.exec.Task - Job Submission failed with exception 'java.net.BindException(Problem binding to [back10/10.4.2.10:0] java.net.BindException: Cann ot assign requested address; For more details see: http://wiki.apache.org/hadoop/BindException)' java.net.BindException: Problem binding to [back10/10.4.2.10:0] java.net.BindException: Cannot assign requested address; For more details see: http://wiki.apache.org/hadoop/BindException at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:718) at org.apache.hadoop.ipc.Client.call(Client.java:1242) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) at com.sun.proxy.$Proxy10.create(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:193) at sun.reflect.GeneratedMethodAccessor43.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) at com.sun.proxy.$Proxy11.create(Unknown Source) at org.apache.hadoop.hdfs.DFSOutputStream.init(DFSOutputStream.java:1376) at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1395) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1255) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1212) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:276) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:265) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:82) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:888) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:869) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:768) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:757) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:558) at org.apache.hadoop.mapreduce.split.JobSplitWriter.createFile(JobSplitWriter.java:96) at org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:85) at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:517) at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:487) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:369) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1286) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1283) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1283) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:606) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:601) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:601) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:586) at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:448
Intermittent BindException during long MR jobs
Hi, we occasionally run into a BindException causing long running jobs to occasionally fail. The stacktrace is below. Any ideas what this could be caused by? Cheers, Krishna Stacktrace: 379969 [Thread-980] ERROR org.apache.hadoop.hive.ql.exec.Task - Job Submission failed with exception 'java.net.BindException(Problem binding to [back10/10.4.2.10:0] java.net.BindException: Cann ot assign requested address; For more details see: http://wiki.apache.org/hadoop/BindException)' java.net.BindException: Problem binding to [back10/10.4.2.10:0] java.net.BindException: Cannot assign requested address; For more details see: http://wiki.apache.org/hadoop/BindException at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:718) at org.apache.hadoop.ipc.Client.call(Client.java:1242) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) at com.sun.proxy.$Proxy10.create(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:193) at sun.reflect.GeneratedMethodAccessor43.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) at com.sun.proxy.$Proxy11.create(Unknown Source) at org.apache.hadoop.hdfs.DFSOutputStream.init(DFSOutputStream.java:1376) at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1395) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1255) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1212) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:276) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:265) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:82) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:888) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:869) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:768) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:757) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:558) at org.apache.hadoop.mapreduce.split.JobSplitWriter.createFile(JobSplitWriter.java:96) at org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:85) at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:517) at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:487) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:369) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1286) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1283) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1283) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:606) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:601) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:601) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:586) at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:448) at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:138) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:66) at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:56)