[ 
https://issues.apache.org/jira/browse/HIVE-24849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17371535#comment-17371535
 ] 

Steve Loughran commented on HIVE-24849:
---------------------------------------

How does tbl.isEmpty() work? Does it do a listStatus call (blocks for all 
direct children) or use an iterator (listStatusIterator(), listFiles()) and 
return false if the returned iterator .hasNext() is true?  Using an iterator is 
better as the wait time will that for a single S3 or ABFS LIST call, rather 
than block for all the results

> Create external table socket timeout when location has large number of files
> ----------------------------------------------------------------------------
>
>                 Key: HIVE-24849
>                 URL: https://issues.apache.org/jira/browse/HIVE-24849
>             Project: Hive
>          Issue Type: Bug
>          Components: Metastore
>    Affects Versions: 2.3.4
>         Environment: AWS EMR 5.23 with default Hive metastore and external 
> location S3
>  
>            Reporter: Mithun Antony
>            Priority: Major
>
> # The create table API call timeout when during an external table creation on 
> a location where the number files in the S3 location is large ( ie: ~10K 
> objects ).
> The default timeout `hive.metastore.client.socket.timeout` is `600s` current 
> workaround is it to increase the timeout to a higher value
> {code:java}
> 2021-03-04T01:37:42,761 ERROR [66b8024b-e52f-42b8-8629-a45383bcac0c 
> main([])]: exec.DDLTask (DDLTask.java:failed(639)) - 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.thrift.transport.TTransportException: 
> java.net.SocketTimeoutException: Read timed out
>  at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:873)
>  at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:878)
>  at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:4356)
>  at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:354)
>  at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199)
>  at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
>  at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2183)
>  at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1839)
>  at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1526)
>  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
>  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
>  at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
>  at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
>  at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
>  at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336)
>  at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:474)
>  at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:490)
>  at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:793)
>  at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
>  at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at org.apache.hadoop.util.RunJar.run(RunJar.java:239)
>  at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
> Caused by: org.apache.thrift.transport.TTransportException: 
> java.net.SocketTimeoutException: Read timed out
>  at 
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
>  at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
>  at 
> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
>  at 
> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
>  at 
> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
>  at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
>  at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_create_table_with_environment_context(ThriftHiveMetastore.java:1199)
>  at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.create_table_with_environment_context(ThriftHiveMetastore.java:1185)
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.create_table_with_environment_context(HiveMetaStoreClient.java:2399)
>  at 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.create_table_with_environment_context(SessionHiveMetaStoreClient.java:93)
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:752)
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:740)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:173)
>  at com.sun.proxy.$Proxy37.createTable(Unknown Source)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2330)
>  at com.sun.proxy.$Proxy37.createTable(Unknown Source)
>  at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:863)
>  ... 25 more
> Caused by: java.net.SocketTimeoutException: Read timed out
>  at java.net.SocketInputStream.socketRead0(Native Method)
>  at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
>  at java.net.SocketInputStream.read(SocketInputStream.java:171)
>  at java.net.SocketInputStream.read(SocketInputStream.java:141)
>  at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
>  at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
>  at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
>  at 
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
>  ... 49 more{code}
>  
> Understanding is the create table should not check the exisiting 
> files/partitions and the validation is done when the table is read.
> The issue could be reproduced executing the below statment in Hive CLI where 
> the location should contain ~10k objects 
>  
> {code:java}
> SET hive.stats.autogather=false;
> DROP table if exists tTable;
> CREATE EXTERNAL TABLE tTable(
>    id string,
>    mId string,
>    wd bigint,
>    lang string,
>    tcountry string,
>    vId bigint
> ) 
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'
> LOCATION "s3://<your bucket>/"
> TBLPROPERTIES ('serialization.null.format' = ''); {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to