Re: Hbase - hadoop handling DNS blips

2014-08-27 Thread Bharath Vissapragada
Hi Arun,

My 2cents.

I've seen this sometime in the past and after doing some research, the
issue seems to be related to
https://issues.apache.org/jira/browse/HADOOP-6356 . HLog (SequenceFile)
internally uses FileContext( unlike other HBase components which use
FileSystem), which doesn't cache connections. So every time you create a
log file writer instance, it eventually hits DNS and crashes.

- Bharath


On Wed, Aug 27, 2014 at 12:40 PM, Arun Mishra 
wrote:

> Hello,
>
> This is the first time I am sending a query on hbase mailing list.
> Hopefully this is the correct group to ask hbase/hadoop related questions.
>
> I am running hbase 0.92, hadoop 2.0 (cdh 4.1.3). Recently, there were some
> instability in my dns service and host lookups request failed occasionally.
> During such failures, some random region server will shut itself down when
> it encounters a fatal exception during log roll operation. DNS issue was
> eventually resolved and region server fatals stopped.
>
> While I was trying to understand the hbase/hadoop behavior during network
> events/blips, I found there is a default retry policy used -
> TRY_ONCE_THEN_FAIL. Please correct me if thats not the case.
>
> But then I was thinking that there could be more of these blips during
> network or some other infrastructure maintenance operations. These
> maintenance operations should not result in region server going down. If
> the client simply attempts one more time, host lookup request should
> succeed.
>
> If someone has any similar experience, can they please share? Are there
> options one can try out against such failures?
>
> May be I am not thinking in the right direction, but this behavior makes
> me feel that hbase (using hdfs) is sensitive to DNS service availability.
> DNS unavailability for even few seconds can bring down the entire cluster
> (rare chance if all attempt to roll hlogs at the same time).
>
> Here is the stack trace:
> 11:14:48.706 AM
> 2014-08-17 11:14:48,706 FATAL
> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server
> hadoop0104111601,60020,1408273008941: IOE in log roller
> java.io.IOException: cannot get log writer
> at
> org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:716)
> at
> org.apache.hadoop.hbase.regionserver.wal.HLog.createWriterInstance(HLog.java:663)
> at
> org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:595)
> at
> org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.io.IOException: java.lang.RuntimeException:
> java.lang.reflect.InvocationTargetException
> at
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.init(SequenceFileLogWriter.java:106)
> at
> org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:713)
> ... 4 more
> Caused by: java.lang.RuntimeException:
> java.lang.reflect.InvocationTargetException
> at
> org.apache.hadoop.fs.AbstractFileSystem.newInstance(AbstractFileSystem.java:122)
> at
> org.apache.hadoop.fs.AbstractFileSystem.createFileSystem(AbstractFileSystem.java:148)
> at
> org.apache.hadoop.fs.AbstractFileSystem.get(AbstractFileSystem.java:233)
> at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:321)
> at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:319)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
> at
> org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:319)
> at
> org.apache.hadoop.fs.FileContext.getFileContext(FileContext.java:432)
> at
> org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:469)
> at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.init(SequenceFileLogWriter.java:87)
> ... 5 more
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.GeneratedConstructorAccessor20.newInstance(Unknown
> Source)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> at
> org.apache.hadoop.fs.AbstractFileSystem.newInstance(AbstractFileSystem.java:120)
> ... 19 more
> Caused by: java.lang.IllegalArgumentException:
> java.net.UnknownHostException: hadoop0104111601
> at
> org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:414)
> at
> org.apache.hadoop.hdfs.NameNodePro

Hbase - hadoop handling DNS blips

2014-08-27 Thread Arun Mishra
Hello,

This is the first time I am sending a query on hbase mailing list. Hopefully 
this is the correct group to ask hbase/hadoop related questions.

I am running hbase 0.92, hadoop 2.0 (cdh 4.1.3). Recently, there were some 
instability in my dns service and host lookups request failed occasionally. 
During such failures, some random region server will shut itself down when it 
encounters a fatal exception during log roll operation. DNS issue was 
eventually resolved and region server fatals stopped.

While I was trying to understand the hbase/hadoop behavior during network 
events/blips, I found there is a default retry policy used - 
TRY_ONCE_THEN_FAIL. Please correct me if thats not the case.

But then I was thinking that there could be more of these blips during network 
or some other infrastructure maintenance operations. These maintenance 
operations should not result in region server going down. If the client simply 
attempts one more time, host lookup request should succeed. 

If someone has any similar experience, can they please share? Are there options 
one can try out against such failures? 

May be I am not thinking in the right direction, but this behavior makes me 
feel that hbase (using hdfs) is sensitive to DNS service availability. DNS 
unavailability for even few seconds can bring down the entire cluster (rare 
chance if all attempt to roll hlogs at the same time).

Here is the stack trace:
11:14:48.706 AM 
2014-08-17 11:14:48,706 FATAL 
org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
hadoop0104111601,60020,1408273008941: IOE in log roller
java.io.IOException: cannot get log writer
at 
org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:716)
at 
org.apache.hadoop.hbase.regionserver.wal.HLog.createWriterInstance(HLog.java:663)
at 
org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:595)
at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: java.lang.RuntimeException: 
java.lang.reflect.InvocationTargetException
at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.init(SequenceFileLogWriter.java:106)
at 
org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:713)
... 4 more
Caused by: java.lang.RuntimeException: 
java.lang.reflect.InvocationTargetException
at 
org.apache.hadoop.fs.AbstractFileSystem.newInstance(AbstractFileSystem.java:122)
at 
org.apache.hadoop.fs.AbstractFileSystem.createFileSystem(AbstractFileSystem.java:148)
at 
org.apache.hadoop.fs.AbstractFileSystem.get(AbstractFileSystem.java:233)
at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:321)
at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:319)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at 
org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:319)
at org.apache.hadoop.fs.FileContext.getFileContext(FileContext.java:432)
at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:469)
at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.init(SequenceFileLogWriter.java:87)
... 5 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedConstructorAccessor20.newInstance(Unknown 
Source)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at 
org.apache.hadoop.fs.AbstractFileSystem.newInstance(AbstractFileSystem.java:120)
... 19 more
Caused by: java.lang.IllegalArgumentException: java.net.UnknownHostException: 
hadoop0104111601
at 
org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:414)
at 
org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:164)
at 
org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider.getProxy(ConfiguredFailoverProxyProvider.java:125)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.(RetryInvocationHandler.java:60)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.(RetryInvocationHandler.java:51)
at org.apache.hadoop.io.retry.RetryProxy.create(RetryProxy.java:58)
at 
org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:137)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:389)