[ https://issues.apache.org/jira/browse/HIVE-5172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
agate updated HIVE-5172: ------------------------ Attachment: HIVE-5172.1.patch.txt Uploading a patch which creates a new TUGIContainingTransport and adds it to the map, and returns the newly created TUGIContainingTransport > TUGIContainingTransport returning null transport, causing intermittent > SocketTimeoutException on hive client and NullPointerException in > TUGIBasedProcessor on the server > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: HIVE-5172 > URL: https://issues.apache.org/jira/browse/HIVE-5172 > Project: Hive > Issue Type: Bug > Components: Metastore > Affects Versions: 0.9.0 > Reporter: agate > Attachments: HIVE-5172.1.patch.txt > > > We are running into frequent problem using HCatalog 0.4.1 (Hive Metastore > Server 0.9) where we get connection reset or connection timeout errors on the > client and NullPointerException in TUGIBasedProcessor on the server. > {code} > hive client logs: > ================= > org.apache.thrift.transport.TTransportException: > java.net.SocketTimeoutException: Read timed out > at > org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129) > at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) > at > org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378) > at > org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297) > at > org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204) > at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_set_ugi(ThriftHiveMetastore.java:2136) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.set_ugi(ThriftHiveMetastore.java:2122) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.openStore(HiveMetaStoreClient.java:286) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:197) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:157) > at > org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2092) > at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2102) > at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:888) > at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:830) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:954) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7524) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:243) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:909) > at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:341) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:642) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:557) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > Caused by: java.net.SocketTimeoutException: Read timed out > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.read(SocketInputStream.java:129) > at > org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127) > ... 31 more > {code} > {code} > hive metastore server logs: > =============================== > 2013-07-26 06:34:52,853 ERROR server.TThreadPoolServer > (TThreadPoolServer.java:run(182)) - Error occurred during processing of > message. > java.lang.NullPointerException > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor.setIpAddress(TUGIBasedProcessor.java:183) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:79) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:176) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > {code} > Adding some extra debug log messages in TUGIBasedProcessor, noticed that the > TUGIContainingTransport is null which results in NullPointerException on the > server. > Further drilling into TUGIContainingTransport noticed that getTransport() > returns a null which causes the above > error. > Further corelating with GC logs observed that that error always hits when the > CMS GC has just kicked in > (but does not happen after every GC) > Put some debugging code in TUGIContainingTransport.getTransport() and I > tracked it down to > {code} > @Override > public TUGIContainingTransport getTransport(TTransport trans) { > // UGI information is not available at connection setup time, it will > be set later > // via set_ugi() rpc. > transMap.putIfAbsent(trans, new TUGIContainingTransport(trans)); > //return transMap.get(trans); //<-change > TUGIContainingTransport retTrans = transMap.get(trans); > if ( retTrans == null ) { > LOGGER.error ("cannot find transport that was in map !!") > return retTrans; > } else { > LOGGER.debug ("found transport in map") > return retTrans; > } > } > {code} > When we run this in our test environment, see that we run into the problem > just after GC runs, > and "cannot find transport that was in the map!!" message gets logged. > Could the GC be collecting entries from transMap, just before the we get it > Tried a minor change which seems to work > {code} > public TUGIContainingTransport getTransport(TTransport trans) { > TUGIContainingTransport tugiTrans = transMap.get(trans); > if ( tugiTrans == null ) { > // UGI information is not available at connection setup time, it will > be set later > // via set_ugi() rpc. > tugiTrans = new TUGIContainingTransport(trans); //get a new > TUGIContainingTransport > transMap.putIfAbsent(trans, tugiTrans); > } > return tugiTrans; > } > {code} > My questions for hive and thrift experts > 1.) Do we need to use a ConcurrentMap > {code} > ConcurrentMap<TTransport, TUGIContainingTransport> transMap = new > MapMaker().weakKeys().weakValues().makeMap(); > {code} > It does use == to compare keys (which might be the problem), also in this > case we cant rely on the trans to be always there in the transMap, even after > a put, so in that case change above probably makes sense > 2.) Is it better idea to use WeakHashMap with WeakReference value instead ? > (was looking at org.apache.thrift.transport.TSaslServerTransport, especially > change made by THRIFT-1468 as an example) > e.g. > {code} > private static Map<TTransport, WeakReference<TUGIContainingTransport>> > transMap = Collections.synchronizedMap(new WeakHashMap<TTransport, > WeakReference<TUGIContainingTransport>>()); > {code} > getTransport() would be something like > {code} > public TUGIContainingTransport getTransport(TTransport trans) { > WeakReference<TUGIContainingTransport> ret = transMap.get(trans); > if (ret == null || ret.get() == null) { > ret = new WeakReference<TUGIContainingTransport>(new > TUGIContainingTransport(trans)); > transMap.put(trans, ret); // No need for putIfAbsent(). > // Concurrent calls to getTransport() will pass in different > TTransports. > } > return ret.get(); > } > {code} > 3.) Is it absolutely essential that successive calls to get getTransport() > need to retrieve the same TUGIContainingTransport object or its ok to create > a new one, if its not available in the map ? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira