[jira] [Commented] (HIVE-19848) Implement HiveServer2WebUI authentication (Spark has HTTP Basic Auth for its UIs)

2019-12-25 Thread t oo (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-19848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17003403#comment-17003403
 ] 

t oo commented on HIVE-19848:
-

[~Rajkumar Singh] is this on roadmap?

> Implement HiveServer2WebUI authentication (Spark has HTTP Basic Auth for its 
> UIs)
> -
>
> Key: HIVE-19848
> URL: https://issues.apache.org/jira/browse/HIVE-19848
> Project: Hive
>  Issue Type: New Feature
>  Components: Web UI
>Affects Versions: 2.3.2
>Reporter: t oo
>Priority: Major
>
> Implement HiveServer2WebUI authentication (Spark has HTTP Basic Auth for its 
> UIs)
> We are using Hive on EC2s without EMR/HDFS/Kerberos



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-15141) metrics reporter using HADOOP2 is not able to re-initialize - and prevents hiveserver2 recovery

2019-11-02 Thread t oo (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-15141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16965504#comment-16965504
 ] 

t oo commented on HIVE-15141:
-

Are metrics enabled causing process to crash?

> metrics reporter using HADOOP2 is not able to re-initialize - and prevents 
> hiveserver2 recovery
> ---
>
> Key: HIVE-15141
> URL: https://issues.apache.org/jira/browse/HIVE-15141
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> * hiveserver2 initializes {{MetricsFactory}} => CodahaleMetrics created => 
> registers HADOOP2 source
> * exception from somewhere...possibly recoverable
> * MetricsFactory deinitializes the backend with close()
> * retries failing because the metrics system cant continue 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-17350) metrics errors when retrying HS2 startup

2019-11-02 Thread t oo (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-17350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16965502#comment-16965502
 ] 

t oo edited comment on HIVE-17350 at 11/2/19 9:20 PM:
--

[~sershe] [~khwunchai] Can u give more details? Are u saying with metrics 
enabled that hive process won't start at all?


was (Author: toopt4):
[~sershe] [~khwunchai] Can u give more details? Are u saying with metrics 
enabled that give process won't start at all?

> metrics errors when retrying HS2 startup
> 
>
> Key: HIVE-17350
> URL: https://issues.apache.org/jira/browse/HIVE-17350
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Priority: Major
>
> Looks like there are some sort of retries that happen when HS2 init fails. 
> When HS2 startup fails for an unrelated reason and is retried, the metrics 
> source initialization fails on subsequent attempts. 
> {noformat}
> 2017-08-15T23:31:47,650 WARN  [main]: impl.MetricsSystemImpl 
> (MetricsSystemImpl.java:init(152)) - hiveserver2 metrics system already 
> initialized!
> 2017-08-15T23:31:47,650 ERROR [main]: metastore.HiveMetaStore 
> (HiveMetaStore.java:init(438)) - error in Metrics init: 
> java.lang.reflect.InvocationTargetException null
> java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.hive.common.metrics.common.MetricsFactory.init(MetricsFactory.java:42)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:435)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:79)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:92)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:6892)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:140)
>   at 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:74)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1653)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:83)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:133)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3612)
>   at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3664)
>   at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3644)
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:582)
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:545)
>   at 
> org.apache.hive.service.cli.CLIService.applyAuthorizationConfigPolicy(CLIService.java:128)
>   at org.apache.hive.service.cli.CLIService.init(CLIService.java:113)
>   at 
> org.apache.hive.service.CompositeService.init(CompositeService.java:59)
>   at org.apache.hive.service.server.HiveServer2.init(HiveServer2.java:139)
>   at 
> org.apache.hive.service.server.HiveServer2.startHiveServer2(HiveServer2.java:595)
>   at 
> org.apache.hive.service.server.HiveServer2.access$700(HiveServer2.java:97)
>   at 
> org.apache.hive.service.server.HiveServer2$StartOptionExecutor.execute(HiveServer2.java:843)
>   at 

[jira] [Comment Edited] (HIVE-17350) metrics errors when retrying HS2 startup

2019-11-02 Thread t oo (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-17350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16965502#comment-16965502
 ] 

t oo edited comment on HIVE-17350 at 11/2/19 9:19 PM:
--

[~sershe] [~khwunchai] Can u give more details? Are u saying with metrics 
enabled that give process won't start at all?


was (Author: toopt4):
[~sershe] [~khwunchai] Can I give more details? Are u saying with metrics 
enabled that give process won't start at all?

> metrics errors when retrying HS2 startup
> 
>
> Key: HIVE-17350
> URL: https://issues.apache.org/jira/browse/HIVE-17350
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Priority: Major
>
> Looks like there are some sort of retries that happen when HS2 init fails. 
> When HS2 startup fails for an unrelated reason and is retried, the metrics 
> source initialization fails on subsequent attempts. 
> {noformat}
> 2017-08-15T23:31:47,650 WARN  [main]: impl.MetricsSystemImpl 
> (MetricsSystemImpl.java:init(152)) - hiveserver2 metrics system already 
> initialized!
> 2017-08-15T23:31:47,650 ERROR [main]: metastore.HiveMetaStore 
> (HiveMetaStore.java:init(438)) - error in Metrics init: 
> java.lang.reflect.InvocationTargetException null
> java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.hive.common.metrics.common.MetricsFactory.init(MetricsFactory.java:42)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:435)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:79)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:92)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:6892)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:140)
>   at 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:74)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1653)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:83)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:133)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3612)
>   at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3664)
>   at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3644)
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:582)
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:545)
>   at 
> org.apache.hive.service.cli.CLIService.applyAuthorizationConfigPolicy(CLIService.java:128)
>   at org.apache.hive.service.cli.CLIService.init(CLIService.java:113)
>   at 
> org.apache.hive.service.CompositeService.init(CompositeService.java:59)
>   at org.apache.hive.service.server.HiveServer2.init(HiveServer2.java:139)
>   at 
> org.apache.hive.service.server.HiveServer2.startHiveServer2(HiveServer2.java:595)
>   at 
> org.apache.hive.service.server.HiveServer2.access$700(HiveServer2.java:97)
>   at 
> org.apache.hive.service.server.HiveServer2$StartOptionExecutor.execute(HiveServer2.java:843)
>   at 

[jira] [Commented] (HIVE-17350) metrics errors when retrying HS2 startup

2019-11-02 Thread t oo (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-17350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16965502#comment-16965502
 ] 

t oo commented on HIVE-17350:
-

[~sershe] [~khwunchai] Can I give more details? Are u saying with metrics 
enabled that give process won't start at all?

> metrics errors when retrying HS2 startup
> 
>
> Key: HIVE-17350
> URL: https://issues.apache.org/jira/browse/HIVE-17350
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Priority: Major
>
> Looks like there are some sort of retries that happen when HS2 init fails. 
> When HS2 startup fails for an unrelated reason and is retried, the metrics 
> source initialization fails on subsequent attempts. 
> {noformat}
> 2017-08-15T23:31:47,650 WARN  [main]: impl.MetricsSystemImpl 
> (MetricsSystemImpl.java:init(152)) - hiveserver2 metrics system already 
> initialized!
> 2017-08-15T23:31:47,650 ERROR [main]: metastore.HiveMetaStore 
> (HiveMetaStore.java:init(438)) - error in Metrics init: 
> java.lang.reflect.InvocationTargetException null
> java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.hive.common.metrics.common.MetricsFactory.init(MetricsFactory.java:42)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:435)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:79)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:92)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:6892)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:140)
>   at 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:74)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1653)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:83)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:133)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3612)
>   at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3664)
>   at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3644)
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:582)
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:545)
>   at 
> org.apache.hive.service.cli.CLIService.applyAuthorizationConfigPolicy(CLIService.java:128)
>   at org.apache.hive.service.cli.CLIService.init(CLIService.java:113)
>   at 
> org.apache.hive.service.CompositeService.init(CompositeService.java:59)
>   at org.apache.hive.service.server.HiveServer2.init(HiveServer2.java:139)
>   at 
> org.apache.hive.service.server.HiveServer2.startHiveServer2(HiveServer2.java:595)
>   at 
> org.apache.hive.service.server.HiveServer2.access$700(HiveServer2.java:97)
>   at 
> org.apache.hive.service.server.HiveServer2$StartOptionExecutor.execute(HiveServer2.java:843)
>   at org.apache.hive.service.server.HiveServer2.main(HiveServer2.java:712)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> 

[jira] [Comment Edited] (HIVE-15546) Optimize Utilities.getInputPaths() so each listStatus of a partition is done in parallel

2019-04-01 Thread t oo (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804873#comment-16804873
 ] 

t oo edited comment on HIVE-15546 at 4/1/19 8:01 AM:
-

[~stakiar] Issue still faced with single threading - HIVE-21546


was (Author: toopt4):
Did this ever make release 2.3? I can't see it in 
[https://github.com/apache/hive/blob/rel/release-2.3.0/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java]
 and issue still faced with single threading 
(https://stackoverflow.com/questions/55416703/hiveserver2-on-spark-mapred-fileinputformat-total-input-files-to-process)

> Optimize Utilities.getInputPaths() so each listStatus of a partition is done 
> in parallel
> 
>
> Key: HIVE-15546
> URL: https://issues.apache.org/jira/browse/HIVE-15546
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: 2.3.0
>
> Attachments: HIVE-15546.1.patch, HIVE-15546.2.patch, 
> HIVE-15546.3.patch, HIVE-15546.4.patch, HIVE-15546.5.patch, HIVE-15546.6.patch
>
>
> When running on blobstores (like S3) where metadata operations (like 
> listStatus) are costly, Utilities.getInputPaths() can add significant 
> overhead when setting up the input paths for an MR / Spark / Tez job.
> The method performs a listStatus on all input paths in order to check if the 
> path is empty. If the path is empty, a dummy file is created for the given 
> partition. This is all done sequentially. This can be really slow when there 
> are a lot of empty partitions. Even when all partitions have input data, this 
> can take a long time.
> We should either:
> (1) Just remove the logic to check if each input path is empty, and handle 
> any edge cases accordingly.
> (2) Multi-thread the listStatus calls



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21546) hiveserver2 - “mapred.FileInputFormat: Total input files to process” - why single threaded?

2019-03-29 Thread t oo (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

t oo updated HIVE-21546:

Affects Version/s: 3.1.1
   2.3.4
  Component/s: StorageHandler
   storage-api
   File Formats

> hiveserver2 - “mapred.FileInputFormat: Total input files to process” - why 
> single threaded?
> ---
>
> Key: HIVE-21546
> URL: https://issues.apache.org/jira/browse/HIVE-21546
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats, storage-api, StorageHandler
>Affects Versions: 3.1.1, 2.3.4
>Reporter: t oo
>Priority: Major
>
> I have setup Hive (v2.3.4) on Spark (exec engine, but MR gets same issue), 
> hadoop 2.7.6 (or hadoop 2.8.5). My external hive table is Parquet format on 
> s3 across 100s of partitions. Below settings are set to 20:
> {\{hive.exec.input.listing.max.threads mapred.dfsclient.parallelism.max 
> mapreduce.input.fileinputformat.list-status.num-threads }}
> Run a simple query:
> {\{select * from s.there h_code = 'KGD78' and h_no = '265' }}
> I can see the below in HiveServer2 logs (the logs continue for more than 1000 
> lines listing all the different partitions). Why is the listing of files not 
> being done in parallel? It takes more than 5mins just in the listing.
> {{2019-03-29T11:29:26,866 INFO [3fa82455-7853-4c4b-8964-847c00bec708 
> HiveServer2-Handler-Pool: Thread-53] compress.CodecPool: Got brand-new 
> decompressor [.snappy] 2019-03-29T11:29:27,283 INFO 
> [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] 
> mapred.FileInputFormat: Total input files to process : 1 
> 2019-03-29T11:29:27,797 INFO [3fa82455-7853-4c4b-8964-847c00bec708 
> HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input 
> files to process : 1 2019-03-29T11:29:28,374 INFO 
> [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] 
> mapred.FileInputFormat: Total input files to process : 1 
> 2019-03-29T11:29:28,919 INFO [3fa82455-7853-4c4b-8964-847c00bec708 
> HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input 
> files to process : 1 2019-03-29T11:29:29,483 INFO 
> [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] 
> mapred.FileInputFormat: Total input files to process : 1 
> 2019-03-29T11:29:30,003 INFO [3fa82455-7853-4c4b-8964-847c00bec708 
> HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input 
> files to process : 1 2019-03-29T11:29:30,518 INFO 
> [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] 
> mapred.FileInputFormat: Total input files to process : 1 
> 2019-03-29T11:29:31,001 INFO [3fa82455-7853-4c4b-8964-847c00bec708 
> HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input 
> files to process : 1 2019-03-29T11:29:31,549 INFO 
> [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] 
> mapred.FileInputFormat: Total input files to process : 1 
> 2019-03-29T11:29:32,048 INFO [3fa82455-7853-4c4b-8964-847c00bec708 
> HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input 
> files to process : 1 2019-03-29T11:29:32,574 INFO 
> [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] 
> mapred.FileInputFormat: Total input files to process : 1 
> 2019-03-29T11:29:33,130 INFO [3fa82455-7853-4c4b-8964-847c00bec708 
> HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input 
> files to process : 1 2019-03-29T11:29:33,639 INFO 
> [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] 
> mapred.FileInputFormat: Total input files to process : 1 
> 2019-03-29T11:29:34,189 INFO [3fa82455-7853-4c4b-8964-847c00bec708 
> HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input 
> files to process : 1 2019-03-29T11:29:34,743 INFO 
> [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] 
> mapred.FileInputFormat: Total input files to process : 1 
> 2019-03-29T11:29:35,208 INFO [3fa82455-7853-4c4b-8964-847c00bec708 
> HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input 
> files to process : 1 2019-03-29T11:29:35,701 INFO 
> [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] 
> mapred.FileInputFormat: Total input files to process : 1 
> 2019-03-29T11:29:36,183 INFO [3fa82455-7853-4c4b-8964-847c00bec708 
> HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input 
> files to process : 1 2019-03-29T11:29:36,662 INFO 
> [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] 
> mapred.FileInputFormat: Total input files to process : 1 
> 2019-03-29T11:29:37,154 INFO [3fa82455-7853-4c4b-8964-847c00bec708 
> 

[jira] [Updated] (HIVE-21546) hiveserver2 - “mapred.FileInputFormat: Total input files to process” - why single threaded?

2019-03-29 Thread t oo (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

t oo updated HIVE-21546:

Description: 
I have setup Hive (v2.3.4) on Spark (exec engine, but MR gets same issue), 
hadoop 2.7.6 (or hadoop 2.8.5). My external hive table is Parquet format on s3 
across 100s of partitions. Below settings are set to 20:

{\{hive.exec.input.listing.max.threads mapred.dfsclient.parallelism.max 
mapreduce.input.fileinputformat.list-status.num-threads }}

Run a simple query:

{\{select * from s.there h_code = 'KGD78' and h_no = '265' }}

I can see the below in HiveServer2 logs (the logs continue for more than 1000 
lines listing all the different partitions). Why is the listing of files not 
being done in parallel? It takes more than 5mins just in the listing.

{{2019-03-29T11:29:26,866 INFO [3fa82455-7853-4c4b-8964-847c00bec708 
HiveServer2-Handler-Pool: Thread-53] compress.CodecPool: Got brand-new 
decompressor [.snappy] 2019-03-29T11:29:27,283 INFO 
[3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] 
mapred.FileInputFormat: Total input files to process : 1 
2019-03-29T11:29:27,797 INFO [3fa82455-7853-4c4b-8964-847c00bec708 
HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files 
to process : 1 2019-03-29T11:29:28,374 INFO 
[3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] 
mapred.FileInputFormat: Total input files to process : 1 
2019-03-29T11:29:28,919 INFO [3fa82455-7853-4c4b-8964-847c00bec708 
HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files 
to process : 1 2019-03-29T11:29:29,483 INFO 
[3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] 
mapred.FileInputFormat: Total input files to process : 1 
2019-03-29T11:29:30,003 INFO [3fa82455-7853-4c4b-8964-847c00bec708 
HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files 
to process : 1 2019-03-29T11:29:30,518 INFO 
[3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] 
mapred.FileInputFormat: Total input files to process : 1 
2019-03-29T11:29:31,001 INFO [3fa82455-7853-4c4b-8964-847c00bec708 
HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files 
to process : 1 2019-03-29T11:29:31,549 INFO 
[3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] 
mapred.FileInputFormat: Total input files to process : 1 
2019-03-29T11:29:32,048 INFO [3fa82455-7853-4c4b-8964-847c00bec708 
HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files 
to process : 1 2019-03-29T11:29:32,574 INFO 
[3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] 
mapred.FileInputFormat: Total input files to process : 1 
2019-03-29T11:29:33,130 INFO [3fa82455-7853-4c4b-8964-847c00bec708 
HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files 
to process : 1 2019-03-29T11:29:33,639 INFO 
[3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] 
mapred.FileInputFormat: Total input files to process : 1 
2019-03-29T11:29:34,189 INFO [3fa82455-7853-4c4b-8964-847c00bec708 
HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files 
to process : 1 2019-03-29T11:29:34,743 INFO 
[3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] 
mapred.FileInputFormat: Total input files to process : 1 
2019-03-29T11:29:35,208 INFO [3fa82455-7853-4c4b-8964-847c00bec708 
HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files 
to process : 1 2019-03-29T11:29:35,701 INFO 
[3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] 
mapred.FileInputFormat: Total input files to process : 1 
2019-03-29T11:29:36,183 INFO [3fa82455-7853-4c4b-8964-847c00bec708 
HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files 
to process : 1 2019-03-29T11:29:36,662 INFO 
[3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] 
mapred.FileInputFormat: Total input files to process : 1 
2019-03-29T11:29:37,154 INFO [3fa82455-7853-4c4b-8964-847c00bec708 
HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files 
to process : 1 2019-03-29T11:29:37,645 INFO 
[3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] 
mapred.FileInputFormat: Total input files to process : 1 }}

I've tried

{\{hive.exec.input.listing.max.threads mapred.dfsclient.parallelism.max 
mapreduce.input.fileinputformat.list-status.num-threads }}

with defaults, 1, 50...still same result

 

 

 

Hive 3.1.1/hadoop3.1.2 also has the issue:

 

2019-03-29T18:10:15,451 INFO [16b32706-3490-432d-b49e-67279ea88e15 
HiveServer2-Handler-Pool: Thread-30] hadoop.InternalParquetRecordReader: at row 
0. reading next block
2019-03-29T18:10:15,461 INFO [16b32706-3490-432d-b49e-67279ea88e15 
HiveServer2-Handler-Pool: Thread-30] 

[jira] [Commented] (HIVE-15546) Optimize Utilities.getInputPaths() so each listStatus of a partition is done in parallel

2019-03-29 Thread t oo (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804873#comment-16804873
 ] 

t oo commented on HIVE-15546:
-

Did this ever make release 2.3? I can't see it in 
[https://github.com/apache/hive/blob/rel/release-2.3.0/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java]
 and issue still faced with single threading 
(https://stackoverflow.com/questions/55416703/hiveserver2-on-spark-mapred-fileinputformat-total-input-files-to-process)

> Optimize Utilities.getInputPaths() so each listStatus of a partition is done 
> in parallel
> 
>
> Key: HIVE-15546
> URL: https://issues.apache.org/jira/browse/HIVE-15546
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: 2.3.0
>
> Attachments: HIVE-15546.1.patch, HIVE-15546.2.patch, 
> HIVE-15546.3.patch, HIVE-15546.4.patch, HIVE-15546.5.patch, HIVE-15546.6.patch
>
>
> When running on blobstores (like S3) where metadata operations (like 
> listStatus) are costly, Utilities.getInputPaths() can add significant 
> overhead when setting up the input paths for an MR / Spark / Tez job.
> The method performs a listStatus on all input paths in order to check if the 
> path is empty. If the path is empty, a dummy file is created for the given 
> partition. This is all done sequentially. This can be really slow when there 
> are a lot of empty partitions. Even when all partitions have input data, this 
> can take a long time.
> We should either:
> (1) Just remove the logic to check if each input path is empty, and handle 
> any edge cases accordingly.
> (2) Multi-thread the listStatus calls



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-16003) Blobstores should use fs.listFiles(path, recursive=true) rather than FileUtils.listStatusRecursively

2019-03-29 Thread t oo (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804832#comment-16804832
 ] 

t oo commented on HIVE-16003:
-

gentle ping

> Blobstores should use fs.listFiles(path, recursive=true) rather than 
> FileUtils.listStatusRecursively
> 
>
> Key: HIVE-16003
> URL: https://issues.apache.org/jira/browse/HIVE-16003
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Assignee: Janaki Lahorani
>Priority: Major
>
> {{FileUtils.listStatusRecursively}} can be slow on blobstores because 
> {{listStatus}} calls are applied recursively to a given directory. This can 
> be especially bad on tables with multiple levels of partitioning.
> The {{FileSystem}} API provides an optimized API called {{listFiles(path, 
> recursive)}} that can be used to invoke an optimized recursive directory 
> listing.
> The problem is that the {{listFiles(path, recursive)}} API doesn't provide a 
> option to pass in a {{PathFilter}}, while {{FileUtils.listStatusRecursively}} 
> uses a custom HIDDEN_FILES_PATH_FILTER.
> To fix this we could either:
> 1: Modify the FileSystem API to provide a {{listFiles(path, recursive, 
> PathFilter)}} method (probably the cleanest solution)
> 2: Add conditional logic so that blobstores invoke {{listFiles(path, 
> recursive)}} and the rest of the code uses the current implementation of 
> {{FileUtils.listStatusRecursively}}
> 3: Replace the implementation of {{FileUtils.listStatusRecursively}} with 
> {{listFiles(path, recursive)}} and apply the {{PathFilter}} on the results 
> (not sure what optimizations can be made if {{PathFilter}} objects are passed 
> into {{FileSystem}} methods - maybe {{PathFilter}} objects are pushed to the 
> NameNode?)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-14165) Remove Hive file listing during split computation

2019-03-29 Thread t oo (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804833#comment-16804833
 ] 

t oo commented on HIVE-14165:
-

gentle ping

> Remove Hive file listing during split computation
> -
>
> Key: HIVE-14165
> URL: https://issues.apache.org/jira/browse/HIVE-14165
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.1.0
>Reporter: Abdullah Yousufi
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-14165.02.patch, HIVE-14165.03.patch, 
> HIVE-14165.04.patch, HIVE-14165.05.patch, HIVE-14165.06.patch, 
> HIVE-14165.07.patch, HIVE-14165.patch
>
>
> The Hive side listing in FetchOperator.java is unnecessary, since Hadoop's 
> FileInputFormat.java will list the files during split computation anyway to 
> determine their size. One way to remove this is to catch the 
> InvalidInputFormat exception thrown by FileInputFormat#getSplits() on the 
> Hive side instead of doing the file listing beforehand.
> For S3 select queries on partitioned tables, this results in a 2x speedup.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-17357) Plugin jars are not properly added for LocalHiveSparkClient

2019-03-15 Thread t oo (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16794046#comment-16794046
 ] 

t oo commented on HIVE-17357:
-

can you clarify this issue? does it mean before the fix there was no way for 
HiveonSpark (with spark.master in non local mode) to use custom 
serdes/udfs?[https://cwiki.apache.org//confluence/display/Hive/Hive+on+Spark:+Getting+Started]
 does not mention how to register custom serde/udf jars/classes. For example if 
i want to query a {{'com.uber.hoodie.hadoop.HoodieInputFormat' (this class 
relies on parquet) table the docs dont say where to place the jar.}}

> Plugin jars are not properly added for LocalHiveSparkClient
> ---
>
> Key: HIVE-17357
> URL: https://issues.apache.org/jira/browse/HIVE-17357
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 3.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HIVE-17357.1.patch
>
>
> I forgot to include the same change for LocalHiveSparkClient.java in 
> HIVE-17336. We need to make the same change as HIVE-17336 in 
> LocalHiveSparkClient class to include plugin jars. Maybe we should have a 
> common base class for both LocalHiveSparkClient and RemoteHiveSparkClient to 
> have some common functions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-17336) Missing class 'org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat' from Hive on Spark when inserting into hbase based table

2019-03-15 Thread t oo (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16794045#comment-16794045
 ] 

t oo commented on HIVE-17336:
-

can you clarify this issue? does it mean before the fix there was no way for 
HiveonSpark (with spark.master in non local mode) to use custom 
serdes/udfs?[https://cwiki.apache.org//confluence/display/Hive/Hive+on+Spark:+Getting+Started]
 does not mention how to register custom serde/udf jars/classes. For example if 
i want to query a {{'com.uber.hoodie.hadoop.HoodieInputFormat' (this class 
relies on parquet) table the docs dont say where to place the jar.}}

> Missing class 'org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat' from 
> Hive on Spark when inserting into hbase based table
> ---
>
> Key: HIVE-17336
> URL: https://issues.apache.org/jira/browse/HIVE-17336
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 3.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HIVE-17336.1.patch
>
>
> When inserting into a hbase based table from hive on spark, the following 
> exception is thrown 
> {noformat}
> Error while processing statement: FAILED: Execution Error, return code 3 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask. 
> org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find 
> class: org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat
> Serialization trace:
> inputFileFormatClass (org.apache.hadoop.hive.ql.plan.TableDesc)
> tableInfo (org.apache.hadoop.hive.ql.plan.FileSinkDesc)
> conf (org.apache.hadoop.hive.ql.exec.FileSinkOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
> invertedWorkGraph (org.apache.hadoop.hive.ql.plan.SparkWork)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:156)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:133)
>  at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670)
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClass(SerializationUtilities.java:183)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.DefaultSerializers$ClassSerializer.read(DefaultSerializers.java:326)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.DefaultSerializers$ClassSerializer.read(DefaultSerializers.java:314)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:759)
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObjectOrNull(SerializationUtilities.java:201)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:132)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
>  at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:216)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
>  at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:216)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790)
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClassAndObject(SerializationUtilities.java:178)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:134)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:40)
>  at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:216)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
>  

[jira] [Commented] (HIVE-20828) Upgrade to Spark 2.4.0

2019-02-23 Thread t oo (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16775867#comment-16775867
 ] 

t oo commented on HIVE-20828:
-

upg to spark 2.4.1?

> Upgrade to Spark 2.4.0
> --
>
> Key: HIVE-20828
> URL: https://issues.apache.org/jira/browse/HIVE-20828
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-20828.1.patch, HIVE-20828.2.patch
>
>
> The Spark community is in the process of releasing Spark 2.4.0. We should do 
> some testing with the RC candidates and then upgrade once the release is 
> finalized.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20602) hive3 crashes after 1min

2019-02-22 Thread t oo (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16775656#comment-16775656
 ] 

t oo commented on HIVE-20602:
-

workaround is set _hive_.metastore.event.db._notification_._api_.auth to false

> hive3 crashes after 1min
> 
>
> Key: HIVE-20602
> URL: https://issues.apache.org/jira/browse/HIVE-20602
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Metastore, Standalone Metastore
>Affects Versions: 3.0.0
>Reporter: t oo
>Priority: Blocker
>
> Running hiveserver2 process (v3.0.0 of hive) on ec2 (not emr), the process 
> starts up and for the first 1min everything is ok (I can make beeline 
> connection, create/repair/select external hive tables) but then the 
> hiveserver2 process crashes. If I restart the process and even do nothing the 
> hiveserver2 process crashes after 1min. When checking the logs I see messages 
> like 'number of connections to metastore: 1','number of connections to 
> metastore: 2','number of connections to metastore: 3' then 'could not bind to 
> port 1 port already in use' then end of the logs.
> I made some experiments on few different ec2s (if i use hive v2.3.2 the 
> hiveserver2 process never crashes), but if i use hive v3.0.0 it consistently 
> crashes after a min.
> Metastore db is mysql rds, hive metastore process never crashed. I can see 
> the external hive table ddls are persisted in the mysql (ie DBS, TBLS tables).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19117) hiveserver2 org.apache.thrift.transport.TTransportException error when running 2nd query after minute of inactivity

2019-02-15 Thread t oo (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16769865#comment-16769865
 ] 

t oo commented on HIVE-19117:
-

Any idea Mr V?

> hiveserver2 org.apache.thrift.transport.TTransportException error when 
> running 2nd query after minute of inactivity
> ---
>
> Key: HIVE-19117
> URL: https://issues.apache.org/jira/browse/HIVE-19117
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2, Metastore, Thrift API
>Affects Versions: 2.1.1
> Environment: * Hive 2.1.1 with hive.server2.transport.mode set to 
> binary (sample JDBC string is jdbc:hive2://remotehost:1/default)
>  * Hadoop 2.8.3
>  * Metastore using MySQL
>  * Java 8
>Reporter: t oo
>Priority: Blocker
>
> I make a JDBC connection from my SQL tool (ie Squirrel SQL, Oracle SQL 
> Developer) to HiveServer2 (running on remote server) with port 1.
> I am able to run some queries successfully. I then do something else (not in 
> the SQL tool) for 1-2minutes and then return to my SQL tool and attempt to 
> run a query but I get this error: 
> {code:java}
> org.apache.thrift.transport.TTransportException: java.net.SocketException: 
> Software caused connection abort: socket write error{code}
> If I now disconnect and reconnect in my SQL tool I can run queries again. But 
> does anyone know what HiveServer2 settings I should change to prevent the 
> error? I assume something in hive-site.xml
> From the hiveserver2 logs below, can see an exact 1 minute gap from 30th min 
> to 31stmin where the disconnect happens.
> {code:java}
> 2018-04-05T03:30:41,706 INFO [HiveServer2-Handler-Pool: Thread-36] 
> session.SessionState: Resetting thread name to HiveServer2-Handler-Pool: 
> Thread-36
>  2018-04-05T03:30:41,712 INFO [HiveServer2-Handler-Pool: Thread-36] 
> session.SessionState: Updating thread name to 
> c81ec0f9-7a9d-46b6-9708-e7d78520a48a HiveServer2-Handler-Pool: Thread-36
>  2018-04-05T03:30:41,712 INFO [HiveServer2-Handler-Pool: Thread-36] 
> session.SessionState: Resetting thread name to HiveServer2-Handler-Pool: 
> Thread-36
>  2018-04-05T03:30:41,718 INFO [HiveServer2-Handler-Pool: Thread-36] 
> session.SessionState: Updating thread name to 
> c81ec0f9-7a9d-46b6-9708-e7d78520a48a HiveServer2-Handler-Pool: Thread-36
>  2018-04-05T03:30:41,719 INFO [HiveServer2-Handler-Pool: Thread-36] 
> session.SessionState: Resetting thread name to HiveServer2-Handler-Pool: 
> Thread-36
>  2018-04-05T03:31:41,232 INFO [HiveServer2-Handler-Pool: Thread-36] 
> thrift.ThriftCLIService: Session disconnected without closing properly.
>  2018-04-05T03:31:41,233 INFO [HiveServer2-Handler-Pool: Thread-36] 
> thrift.ThriftCLIService: Closing the session: SessionHandle 
> [c81ec0f9-7a9d-46b6-9708-e7d78520a48a]
>  2018-04-05T03:31:41,233 INFO [HiveServer2-Handler-Pool: Thread-36] 
> service.CompositeService: Session closed, SessionHandle 
> [c81ec0f9-7a9d-46b6-9708-e7d78520a48a], current sessions:0
>  2018-04-05T03:31:41,233 INFO [HiveServer2-Handler-Pool: Thread-36] 
> session.SessionState: Updating thread name to 
> c81ec0f9-7a9d-46b6-9708-e7d78520a48a HiveServer2-Handler-Pool: Thread-36
>  2018-04-05T03:31:41,233 INFO [HiveServer2-Handler-Pool: Thread-36] 
> session.SessionState: Resetting thread name to HiveServer2-Handler-Pool: 
> Thread-36
>  2018-04-05T03:31:41,233 INFO [HiveServer2-Handler-Pool: Thread-36] 
> session.SessionState: Updating thread name to 
> c81ec0f9-7a9d-46b6-9708-e7d78520a48a HiveServer2-Handler-Pool: Thread-36
>  2018-04-05T03:31:41,233 INFO [HiveServer2-Handler-Pool: Thread-36] 
> session.HiveSessionImpl: Operation log session directory is deleted: 
> /var/hive/hs2log/tmp/c81ec0f9-7a9d-46b6-9708-e7d78520a48a
>  2018-04-05T03:31:41,233 INFO [HiveServer2-Handler-Pool: Thread-36] 
> session.SessionState: Resetting thread name to HiveServer2-Handler-Pool: 
> Thread-36
>  2018-04-05T03:31:41,236 INFO [HiveServer2-Handler-Pool: Thread-36] 
> session.SessionState: Deleted directory: 
> /var/hive/scratch/tmp/anonymous/c81ec0f9-7a9d-46b6-9708-e7d78520a48a on fs 
> with scheme file
>  2018-04-05T03:31:41,236 INFO [HiveServer2-Handler-Pool: Thread-36] 
> session.SessionState: Deleted directory: 
> /var/hive/ec2-user/c81ec0f9-7a9d-46b6-9708-e7d78520a48a on fs with scheme file
>  2018-04-05T03:31:41,236 INFO [HiveServer2-Handler-Pool: Thread-36] 
> hive.metastore: Closed a connection to metastore, current connections: 1{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19848) Implement HiveServer2WebUI authentication (Spark has HTTP Basic Auth for its UIs)

2019-02-15 Thread t oo (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16769860#comment-16769860
 ] 

t oo commented on HIVE-19848:
-

:(

> Implement HiveServer2WebUI authentication (Spark has HTTP Basic Auth for its 
> UIs)
> -
>
> Key: HIVE-19848
> URL: https://issues.apache.org/jira/browse/HIVE-19848
> Project: Hive
>  Issue Type: New Feature
>  Components: Web UI
>Affects Versions: 2.3.2
>Reporter: t oo
>Priority: Major
>
> Implement HiveServer2WebUI authentication (Spark has HTTP Basic Auth for its 
> UIs)
> We are using Hive on EC2s without EMR/HDFS/Kerberos



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19919) HiveServer2 - expose queryable data dictionary (ie Oracles' ALL_TAB_COLUMNS)

2019-02-15 Thread t oo (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16769861#comment-16769861
 ] 

t oo commented on HIVE-19919:
-

gentle ping

> HiveServer2 - expose queryable data dictionary (ie Oracles' ALL_TAB_COLUMNS)
> 
>
> Key: HIVE-19919
> URL: https://issues.apache.org/jira/browse/HIVE-19919
> Project: Hive
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.0.0, 2.3.2
>Reporter: t oo
>Priority: Major
>
> All major db vendors have a table like information_schema.columns, 
> all_tab_columns or syscolumns containing table_name,column_name, data_type, 
> col_order. Adding this feature to HiveServer2 would be very convenient for 
> users.
> This information is currently only available in the mysql metastore ie TBLS, 
> COLS but should be exposed up into the HiveServer2 1 port connection. 
> Thus saving users from having 2 connections (1 to see data, 1 to see 
> metadata). For security reason too, mysql can be firewalled from end-users.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19959) 'Hive on Spark' error - org.apache.hive.com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: 109

2019-02-15 Thread t oo (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16769838#comment-16769838
 ] 

t oo commented on HIVE-19959:
-

[~xuefuz] did u encounter this?

> 'Hive on Spark' error - 
> org.apache.hive.com.esotericsoftware.kryo.KryoException: Encountered 
> unregistered class ID: 109
> ---
>
> Key: HIVE-19959
> URL: https://issues.apache.org/jira/browse/HIVE-19959
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 2.3.2, 2.3.3
> Environment: env: hive 2.3.3 spark 2.0.0 in standalone mode scratch 
> dir on S3 hive table on s3 hadoop 2.8.3 installed no hdfs setup
>Reporter: t oo
>Priority: Blocker
>
> connecting to beeline and running SELECT * works but when running select 
> count(*) get below error:
> 18/05/01 07:41:37 INFO Utilities: Open file to read in plan: 
> s3a://redacted/tmp/31f5ffb5-f318-45f1-b07d-1fac0b406c89/hive_2018-05-01_07-41-09_102_7250900080631620338-
> 2/-mr-10004/bbb93046-5d8f-4b6e-888e-c86bfeb57e3f/map.xml
> 18/05/01 07:41:37 INFO PerfLogger:  from=org.apache.hadoop.hive.ql.exec.Utilities>
> 18/05/01 07:41:37 INFO Utilities: Deserializing MapWork via kryo
> 18/05/01 07:41:37 ERROR Utilities: Failed to load plan: 
> s3a://redacted/tmp/31f5ffb5-f318-45f1-b07d-1fac0b406c89/hive_2018-05-01_07-41-09_102_7250900080631620338-
> 2/-mr-10004/bbb93046-5d8f-4b6e-888e-c86bfeb57e3f/map.xml: 
> org.apache.hive.com.esotericsoftware.kryo.KryoException: Encountered 
> unregistered class ID: 109
> Serialization trace:
> properties (org.apache.hadoop.hive.ql.plan.PartitionDesc)
> aliasToPartnInfo (org.apache.hadoop.hive.ql.plan.MapWork)
> org.apache.hive.com.esotericsoftware.kryo.KryoException: Encountered 
> unregistered class ID: 109
> Serialization trace:
> properties (org.apache.hadoop.hive.ql.plan.PartitionDesc)
> aliasToPartnInfo (org.apache.hadoop.hive.ql.plan.MapWork)
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:119)
> at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:610)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:599)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:134)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:648)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:626)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.deserializeObjectByKryo(Utilities.java:1082)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.deserializePlan(Utilities.java:973)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.deserializePlan(Utilities.java:987)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:423)
> at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:302)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:268)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:484)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:477)
> at 
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:715)
> at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:246)
> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:209)
> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:102)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
> at org.apache.spark.scheduler.Task.run(Task.scala:85)
> at 

[jira] [Commented] (HIVE-20606) hive3.1 beeline to dns complaining about ssl on ip

2019-02-15 Thread t oo (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16769837#comment-16769837
 ] 

t oo commented on HIVE-20606:
-

[~krisden] - did u fix?

> hive3.1 beeline to dns complaining about ssl on ip
> --
>
> Key: HIVE-20606
> URL: https://issues.apache.org/jira/browse/HIVE-20606
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline, HiveServer2
>Affects Versions: 3.1.0
>Reporter: t oo
>Priority: Blocker
>
> Why is beeline complaining about ip when i use dns in the connection? I have 
> a valid cert/jks on the dns. Exact same beeline worked when running on 
> hive2.3.2 but this is hive3.1.0
> [ec2-user@ip-10-1-2-3 logs]$ $HIVE_HOME/bin/beeline
>  SLF4J: Class path contains multiple SLF4J bindings.
>  SLF4J: Found binding in 
> [jar:file:/usr/lib/apache-hive-3.1.0-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>  SLF4J: Found binding in 
> [jar:file:/usr/lib/hadoop-2.7.5/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>  SLF4J: See [http://www.slf4j.org/codes.html#multiple_bindings] for an 
> explanation.
>  SLF4J: Actual binding is of type 
> [org.apache.logging.slf4j.Log4jLoggerFactory]
>  Beeline version 3.1.0 by Apache Hive
>  beeline> !connect 
> jdbc:hive2://mydns:1/default;ssl=true;sslTrustStore=/home/ec2-user/spark_home/conf/app-trust-nonprd.jks;trustStorePassword=changeit
>  userhere passhere
>  Connecting to 
> jdbc:hive2://mydns:1/default;ssl=true;sslTrustStore=/home/ec2-user/spark_home/conf/app-trust-nonprd.jks;trustStorePassword=changeit
>  18/09/20 04:49:06 [main]: WARN jdbc.HiveConnection: Failed to connect to 
> mydns:1
>  Unknown HS2 problem when communicating with Thrift server.
>  Error: Could not open client transport with JDBC Uri: 
> jdbc:hive2://mydns:1/default;ssl=true;sslTrustStore=/home/ec2-user/spark_home/conf/app-trust-nonprd.jks;trustStorePassword=changeit:
>  javax.net.ssl.SSLHandshakeException: 
> java.security.cert.CertificateException: No subject alternative names 
> matching IP address 10.1.2.3 found (state=08S01,code=0)
>  beeline>
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
> hiveserver2 logs:
> 2018-09-20T04:50:16,245 ERROR [HiveServer2-Handler-Pool: Thread-79] 
> server.TThreadPoolServer: Error occurred during processing of message.
> java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: 
> javax.net.ssl.SSLHandshakeException: Remote host closed connection during 
> handshake
>  at 
> org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219)
>  ~[hive-exec-3.1.0.jar:3.1.0]
>  at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:269)
>  ~[hive-exec-3.1.0.jar:3.1.0]
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  ~[?:1.8.0_181]
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  ~[?:1.8.0_181]
>  at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
> Caused by: org.apache.thrift.transport.TTransportException: 
> javax.net.ssl.SSLHandshakeException: Remote host closed connection during 
> handshake
>  at 
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
>  ~[hive-exec-3.1.0.jar:3.1.0]
>  at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) 
> ~[hive-exec-3.1.0.jar:3.1.0]
>  at 
> org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:178)
>  ~[hive-exec-3.1.0.jar:3.1.0]
>  at 
> org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125)
>  ~[hive-exec-3.1.0.jar:3.1.0]
>  at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) 
> ~[hive-exec-3.1.0.jar:3.1.0]
>  at 
> org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
>  ~[hive-exec-3.1.0.jar:3.1.0]
>  at 
> org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
>  ~[hive-exec-3.1.0.jar:3.1.0]
>  ... 4 more
> Caused by: javax.net.ssl.SSLHandshakeException: Remote host closed connection 
> during handshake
>  at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1002) 
> ~[?:1.8.0_181]
>  at 
> sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1385)
>  ~[?:1.8.0_181]
>  at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:938) 
> ~[?:1.8.0_181]
>  at sun.security.ssl.AppInputStream.read(AppInputStream.java:105) 
> ~[?:1.8.0_181]
>  at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) 
> ~[?:1.8.0_181]
>  at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) 
> ~[?:1.8.0_181]
>  at 

[jira] [Commented] (HIVE-14269) Performance optimizations for data on S3

2019-02-15 Thread t oo (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16769769#comment-16769769
 ] 

t oo commented on HIVE-14269:
-

gentle ping

> Performance optimizations for data on S3
> 
>
> Key: HIVE-14269
> URL: https://issues.apache.org/jira/browse/HIVE-14269
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.1.0
>Reporter: Sergio Peña
>Assignee: Sergio Peña
>Priority: Major
>
> Working with tables that resides on Amazon S3 (or any other object store) 
> have several performance impact when reading or writing data, and also 
> consistency issues.
> This JIRA is an umbrella task to monitor all the performance improvements 
> that can be done in Hive to work better with S3 data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20506) HOS times out when cluster is full while Hive-on-MR waits

2019-02-13 Thread t oo (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16767670#comment-16767670
 ] 

t oo commented on HIVE-20506:
-

Does Hive-on-Spark still have the issue if Spark is setup with standalone 
scheduler rather than YARN?

> HOS times out when cluster is full while Hive-on-MR waits
> -
>
> Key: HIVE-20506
> URL: https://issues.apache.org/jira/browse/HIVE-20506
> Project: Hive
>  Issue Type: Improvement
>Reporter: Brock Noland
>Assignee: Brock Noland
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-20506-CDH5.14.2.patch, HIVE-20506.1.patch, 
> HIVE-20506.2.patch, HIVE-20506.3.patch, Screen Shot 2018-09-07 at 8.10.37 
> AM.png
>
>
> My understanding is as follows:
> Hive-on-MR when the cluster is full will wait for resources to be available 
> before submitting a job. This is because the hadoop jar command is the 
> primary mechanism Hive uses to know if a job is complete or failed.
>  
> Hive-on-Spark will timeout after {{SPARK_RPC_CLIENT_CONNECT_TIMEOUT}} because 
> the RPC client in the AppMaster doesn't connect back to the RPC Server in 
> HS2. 
> This is a behavior difference it'd be great to close.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19740) Hiveserver2 can't connect to metastore when using Hive 3.0

2019-02-11 Thread t oo (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16765534#comment-16765534
 ] 

t oo commented on HIVE-19740:
-

bump

> Hiveserver2 can't connect to metastore when using Hive 3.0
> --
>
> Key: HIVE-19740
> URL: https://issues.apache.org/jira/browse/HIVE-19740
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: heyang wang
>Priority: Major
> Attachments: hive-site.xml
>
>
> I am using docker to deploy Hadoop 2.7, Hive 3.0 and Spark 2.3.
> After starting all the docker image. Hive server2 can't start while 
> outputting the following error log:
> 2018-05-30T14:13:53,832 WARN [main]: server.HiveServer2 
> (HiveServer2.java:startHiveServer2(1041)) - Error starting HiveServer2 on 
> attempt 1, will retry in 6ms
>  java.lang.RuntimeException: Error initializing notification event poll
>  at org.apache.hive.service.server.HiveServer2.init(HiveServer2.java:269) 
> ~[hive-service-3.0.0.jar:3.0.0]
>  at 
> org.apache.hive.service.server.HiveServer2.startHiveServer2(HiveServer2.java:1013)
>  [hive-service-3.0.0.jar:3.0.0]
>  at 
> org.apache.hive.service.server.HiveServer2.access$1800(HiveServer2.java:134) 
> [hive-service-3.0.0.jar:3.0.0]
>  at 
> org.apache.hive.service.server.HiveServer2$StartOptionExecutor.execute(HiveServer2.java:1282)
>  [hive-service-3.0.0.jar:3.0.0]
>  at org.apache.hive.service.server.HiveServer2.main(HiveServer2.java:1126) 
> [hive-service-3.0.0.jar:3.0.0]
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_131]
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_131]
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_131]
>  at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_131]
>  at org.apache.hadoop.util.RunJar.run(RunJar.java:221) 
> [hadoop-common-2.7.4.jar:?]
>  at org.apache.hadoop.util.RunJar.main(RunJar.java:136) 
> [hadoop-common-2.7.4.jar:?]
>  Caused by: java.io.IOException: org.apache.thrift.TApplicationException: 
> Internal error processing get_current_notificationEventId
>  at 
> org.apache.hadoop.hive.metastore.messaging.EventUtils$MSClientNotificationFetcher.getCurrentNotificationEventId(EventUtils.java:75)
>  ~[hive-exec-3.0.0.jar:3.0.0]
>  at 
> org.apache.hadoop.hive.ql.metadata.events.NotificationEventPoll.(NotificationEventPoll.java:103)
>  ~[hive-exec-3.0.0.jar:3.0.0]
>  at 
> org.apache.hadoop.hive.ql.metadata.events.NotificationEventPoll.initialize(NotificationEventPoll.java:59)
>  ~[hive-exec-3.0.0.jar:3.0.0]
>  at org.apache.hive.service.server.HiveServer2.init(HiveServer2.java:267) 
> ~[hive-service-3.0.0.jar:3.0.0]
>  ... 10 more
>  Caused by: org.apache.thrift.TApplicationException: Internal error 
> processing get_current_notificationEventId
>  at 
> org.apache.thrift.TApplicationException.read(TApplicationException.java:111) 
> ~[hive-exec-3.0.0.jar:3.0.0]
>  at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79) 
> ~[hive-exec-3.0.0.jar:3.0.0]
>  at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_current_notificationEventId(ThriftHiveMetastore.java:5541)
>  ~[hive-exec-3.0.0.jar:3.0.0]
>  at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_current_notificationEventId(ThriftHiveMetastore.java:5529)
>  ~[hive-exec-3.0.0.jar:3.0.0]
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getCurrentNotificationEventId(HiveMetaStoreClient.java:2713)
>  ~[hive-exec-3.0.0.jar:3.0.0]
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_131]
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_131]
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_131]
>  at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_131]
>  at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:212)
>  ~[hive-exec-3.0.0.jar:3.0.0]
>  at com.sun.proxy.$Proxy34.getCurrentNotificationEventId(Unknown Source) 
> ~[?:?]
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_131]
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_131]
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_131]
>  at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_131]
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2763)
>  ~[hive-exec-3.0.0.jar:3.0.0]
>  at com.sun.proxy.$Proxy34.getCurrentNotificationEventId(Unknown Source) 
> ~[?:?]
>  at 
> 

[jira] [Commented] (HIVE-19821) Distributed HiveServer2

2019-02-11 Thread t oo (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16765529#comment-16765529
 ] 

t oo commented on HIVE-19821:
-

bump

> Distributed HiveServer2
> ---
>
> Key: HIVE-19821
> URL: https://issues.apache.org/jira/browse/HIVE-19821
> Project: Hive
>  Issue Type: New Feature
>  Components: HiveServer2
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-19821.1.WIP.patch, HIVE-19821.2.WIP.patch, 
> HIVE-19821_ Distributed HiveServer2.pdf
>
>
> HS2 deployments often hit OOM issues due to a number of factors: (1) too many 
> concurrent connections, (2) query that scan a large number of partitions have 
> to pull a lot of metadata into memory (e.g. a query reading thousands of 
> partitions requires loading thousands of partitions into memory), (3) very 
> large queries can take up a lot of heap space, especially during query 
> parsing. There are a number of other factors that cause HiveServer2 to run 
> out of memory, these are just some of the more commons ones.
> Distributed HS2 proposes to do all query parsing, compilation, planning, and 
> execution coordination inside a dedicated container. This should 
> significantly decrease memory pressure on HS2 and allow HS2 to scale to a 
> larger number of concurrent users.
> For HoS (and I think Hive-on-Tez) this just requires moving all query 
> compilation, planning, etc. inside the application master for the 
> corresponding Hive session.
> The main benefit here is isolation. A poorly written Hive query cannot bring 
> down an entire HiveServer2 instance and force all other queries to fail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-12408) SQLStdAuthorizer should not require external table creator to be owner of directory, in addition to rw permissions

2019-02-11 Thread t oo (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16765479#comment-16765479
 ] 

t oo commented on HIVE-12408:
-

can this be ported to branch 2? It causes issues in AWS env

> SQLStdAuthorizer should not require external table creator to be owner of 
> directory, in addition to rw permissions
> --
>
> Key: HIVE-12408
> URL: https://issues.apache.org/jira/browse/HIVE-12408
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization, Security, SQLStandardAuthorization
>Affects Versions: 0.14.0
> Environment: HDP 2.2 + Kerberos
>Reporter: Hari Sekhon
>Assignee: Akira Ajisaka
>Priority: Critical
> Fix For: 3.0.0
>
> Attachments: HIVE-12408.001.patch, HIVE-12408.002.patch
>
>
> When trying to create an external table via beeline in Hive using the 
> SQLStdAuthorizer it expects the table creator to be the owner of the 
> directory path and ignores the group rwx permission that is granted to the 
> user.
> {code}Error: Error while compiling statement: FAILED: 
> HiveAccessControlException Permission denied: Principal [name=hari, 
> type=USER] does not have following privileges for operation CREATETABLE 
> [[INSERT, DELETE, OBJECT OWNERSHIP] on Object [type=DFS_URI, 
> name=/etl/path/to/hdfs/dir]] (state=42000,code=4){code}
> All it should be checking is read access to that directory.
> The directory owner requirement breaks the ability of more than one user to 
> create external table definitions to a given location. For example this is a 
> flume landing directory with json data, and the /etl tree is owned by the 
> flume user. Even chowning the tree to another user would still break access 
> to other users who are able to read the directory in hdfs but would still 
> unable to create external tables on top of it.
> This looks like a remnant of the owner only access model in SQLStdAuth and is 
> a separate issue to HIVE-11864 / HIVE-12324.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20803) Hive external table can't read S3 file containing timestamp partition

2018-10-27 Thread t oo (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665999#comment-16665999
 ] 

t oo commented on HIVE-20803:
-

workaround is to write a different path to s3 that url encodes the : colon 
character. The space character can stay

> Hive external table can't read S3 file containing timestamp partition
> -
>
> Key: HIVE-20803
> URL: https://issues.apache.org/jira/browse/HIVE-20803
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.2
>Reporter: t oo
>Priority: Major
>
> SparkThriftServer can select * from the table fine and get data. But 
> HiveServer2 throws below error on select *:
>  
> hive.msck.path.validation = ignore in hive-site.xml
> then ran msck repair my_sch.h_l
> aws s3 ls s3://priv1/priv2/H_L/ --recursive
> 2018-10-18 03:00:56 2474983 
> priv1/priv2/H_L/part_dt=20180309/part_src=xyz/part_src_file=MY_LOC/part_ldts=2018-10-18
>  02:59:46/part-0-2536ca01-243c-4220-8e55-6869a045fba2.snappy.parquet
> show create table my_sch.h_l;
> ++
> | createtab_stmt |
> ++
> | CREATE EXTERNAL TABLE `my_sch.h_l`( |
> | `xy_hkey_h_l` binary, |
> | `xy_rtts` timestamp, |
> | `xy_rsrc` string, |
> | `xy_bkcc` string, |
> | `xy_mltid` string, |
> | `location_id` bigint) |
> | PARTITIONED BY ( |
> | `part_dt` string, |
> | `part_src` string, |
> | `part_src_file` string, |
> | `part_ldts` timestamp) |
> | ROW FORMAT SERDE |
> | 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' |
> | STORED AS INPUTFORMAT |
> | 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' |
> | OUTPUTFORMAT |
> | 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' |
> | LOCATION |
> | 's3a://priv1/priv2/H_L' |
> | TBLPROPERTIES ( |
> | 'spark.sql.partitionProvider'='catalog', |
> | 'spark.sql.sources.schema.numPartCols'='4', |
> | 'spark.sql.sources.schema.numParts'='1', |
> | 
> 'spark.sql.sources.schema.part.0'='\{"type":"struct","fields":[{"name":"xy_hkey_h_l","type":"binary","nullable":true,"metadata":{}},\{"name":"xy_rtts","type":"timestamp","nullable":true,"metadata":{}},\{"name":"xy_rsrc","type":"string","nullable":true,"metadata":{}},\{"name":"xy_bkcc","type":"string","nullable":true,"metadata":{}},\{"name":"xy_mltid","type":"string","nullable":true,"metadata":{}},\{"name":"location_id","type":"long","nullable":true,"metadata":{}},\{"name":"part_dt","type":"string","nullable":true,"metadata":{}},\{"name":"part_src","type":"string","nullable":true,"metadata":{}},\{"name":"part_src_file","type":"string","nullable":true,"metadata":{}},\{"name":"part_ldts","type":"timestamp","nullable":true,"metadata":{}}]}',
>  |
> | 'spark.sql.sources.schema.partCol.0'='part_dt', |
> | 'spark.sql.sources.schema.partCol.1'='part_src', |
> | 'spark.sql.sources.schema.partCol.2'='part_src_file', |
> | 'spark.sql.sources.schema.partCol.3'='part_ldts', |
> | 'transient_lastDdlTime'='1540421484') |
> ++
>  select * from my_sch.h_l limit 5;
> Error: java.io.IOException: java.lang.IllegalArgumentException: 
> java.net.URISyntaxException: Relative path in absolute URI: 
> part_ldts=2018-10-18 02:59:46 (state=,code=0)
> org.apache.hive.service.cli.HiveSQLException: java.io.IOException: 
> java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative 
> path in absolute URI: part_ldts=2018-10-18 02:59:46
>  at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:267)
>  at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:253)
>  at org.apache.hive.jdbc.HiveQueryResultSet.next(HiveQueryResultSet.java:374)
>  at org.apache.hive.beeline.BufferedRows.(BufferedRows.java:53)
>  at 
> org.apache.hive.beeline.IncrementalRowsWithNormalization.(IncrementalRowsWithNormalization.java:50)
>  at org.apache.hive.beeline.BeeLine.print(BeeLine.java:2192)
>  at org.apache.hive.beeline.Commands.executeInternal(Commands.java:1009)
>  at org.apache.hive.beeline.Commands.execute(Commands.java:1205)
>  at org.apache.hive.beeline.Commands.sql(Commands.java:1134)
>  at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1314)
>  at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:1178)
>  at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:1033)
>  at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:519)
>  at org.apache.hive.beeline.BeeLine.main(BeeLine.java:501)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  

[jira] [Updated] (HIVE-20803) Hive external table can't read S3 file containing timestamp partition

2018-10-24 Thread t oo (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

t oo updated HIVE-20803:

Summary: Hive external table can't read S3 file containing timestamp 
partition  (was: Hive can't read S3 parquet file with timestamp partition)

> Hive external table can't read S3 file containing timestamp partition
> -
>
> Key: HIVE-20803
> URL: https://issues.apache.org/jira/browse/HIVE-20803
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.2
>Reporter: t oo
>Priority: Major
>
> SparkThriftServer can select * from the table fine and get data. But 
> HiveServer2 throws below error on select *:
>  
> hive.msck.path.validation = ignore in hive-site.xml
> then ran msck repair my_sch.h_l
> aws s3 ls s3://priv1/priv2/H_L/ --recursive
> 2018-10-18 03:00:56 2474983 
> priv1/priv2/H_L/part_dt=20180309/part_src=xyz/part_src_file=MY_LOC/part_ldts=2018-10-18
>  02:59:46/part-0-2536ca01-243c-4220-8e55-6869a045fba2.snappy.parquet
> show create table my_sch.h_l;
> ++
> | createtab_stmt |
> ++
> | CREATE EXTERNAL TABLE `my_sch.h_l`( |
> | `xy_hkey_h_l` binary, |
> | `xy_rtts` timestamp, |
> | `xy_rsrc` string, |
> | `xy_bkcc` string, |
> | `xy_mltid` string, |
> | `location_id` bigint) |
> | PARTITIONED BY ( |
> | `part_dt` string, |
> | `part_src` string, |
> | `part_src_file` string, |
> | `part_ldts` timestamp) |
> | ROW FORMAT SERDE |
> | 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' |
> | STORED AS INPUTFORMAT |
> | 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' |
> | OUTPUTFORMAT |
> | 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' |
> | LOCATION |
> | 's3a://priv1/priv2/H_L' |
> | TBLPROPERTIES ( |
> | 'spark.sql.partitionProvider'='catalog', |
> | 'spark.sql.sources.schema.numPartCols'='4', |
> | 'spark.sql.sources.schema.numParts'='1', |
> | 
> 'spark.sql.sources.schema.part.0'='\{"type":"struct","fields":[{"name":"xy_hkey_h_l","type":"binary","nullable":true,"metadata":{}},\{"name":"xy_rtts","type":"timestamp","nullable":true,"metadata":{}},\{"name":"xy_rsrc","type":"string","nullable":true,"metadata":{}},\{"name":"xy_bkcc","type":"string","nullable":true,"metadata":{}},\{"name":"xy_mltid","type":"string","nullable":true,"metadata":{}},\{"name":"location_id","type":"long","nullable":true,"metadata":{}},\{"name":"part_dt","type":"string","nullable":true,"metadata":{}},\{"name":"part_src","type":"string","nullable":true,"metadata":{}},\{"name":"part_src_file","type":"string","nullable":true,"metadata":{}},\{"name":"part_ldts","type":"timestamp","nullable":true,"metadata":{}}]}',
>  |
> | 'spark.sql.sources.schema.partCol.0'='part_dt', |
> | 'spark.sql.sources.schema.partCol.1'='part_src', |
> | 'spark.sql.sources.schema.partCol.2'='part_src_file', |
> | 'spark.sql.sources.schema.partCol.3'='part_ldts', |
> | 'transient_lastDdlTime'='1540421484') |
> ++
>  select * from my_sch.h_l limit 5;
> Error: java.io.IOException: java.lang.IllegalArgumentException: 
> java.net.URISyntaxException: Relative path in absolute URI: 
> part_ldts=2018-10-18 02:59:46 (state=,code=0)
> org.apache.hive.service.cli.HiveSQLException: java.io.IOException: 
> java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative 
> path in absolute URI: part_ldts=2018-10-18 02:59:46
>  at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:267)
>  at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:253)
>  at org.apache.hive.jdbc.HiveQueryResultSet.next(HiveQueryResultSet.java:374)
>  at org.apache.hive.beeline.BufferedRows.(BufferedRows.java:53)
>  at 
> org.apache.hive.beeline.IncrementalRowsWithNormalization.(IncrementalRowsWithNormalization.java:50)
>  at org.apache.hive.beeline.BeeLine.print(BeeLine.java:2192)
>  at org.apache.hive.beeline.Commands.executeInternal(Commands.java:1009)
>  at org.apache.hive.beeline.Commands.execute(Commands.java:1205)
>  at org.apache.hive.beeline.Commands.sql(Commands.java:1134)
>  at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1314)
>  at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:1178)
>  at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:1033)
>  at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:519)
>  at org.apache.hive.beeline.BeeLine.main(BeeLine.java:501)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 

[jira] [Commented] (HIVE-16295) Add support for using Hadoop's S3A OutputCommitter

2018-09-28 Thread t oo (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631753#comment-16631753
 ] 

t oo commented on HIVE-16295:
-

can this be merged to master?

> Add support for using Hadoop's S3A OutputCommitter
> --
>
> Key: HIVE-16295
> URL: https://issues.apache.org/jira/browse/HIVE-16295
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-16295.1.WIP.patch, HIVE-16295.2.WIP.patch, 
> HIVE-16295.3.WIP.patch, HIVE-16295.4.patch, HIVE-16295.5.patch, 
> HIVE-16295.6.patch, HIVE-16295.7.patch, HIVE-16295.8.patch, HIVE-16295.9.patch
>
>
> Hive doesn't have integration with Hadoop's {{OutputCommitter}}, it uses a 
> {{NullOutputCommitter}} and uses its own commit logic spread across 
> {{FileSinkOperator}}, {{MoveTask}}, and {{Hive}}.
> The Hadoop community is building an {{OutputCommitter}} that integrates with 
> S3Guard and does a safe, coordinate commit of data on S3 inside individual 
> tasks (HADOOP-13786). If Hive can integrate with this new {{OutputCommitter}} 
> there would be a lot of benefits to Hive-on-S3:
> * Data is only written once; directly committing data at a task level means 
> no renames are necessary
> * The commit is done safely, in a coordinated manner; duplicate tasks (from 
> task retries or speculative execution) should not step on each other



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-16277) Exchange Partition between filesystems throws "IllegalArgumentException Wrong FS"

2018-09-28 Thread t oo (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631741#comment-16631741
 ] 

t oo commented on HIVE-16277:
-

is this still an issue?

> Exchange Partition between filesystems throws "IllegalArgumentException Wrong 
> FS"
> -
>
> Key: HIVE-16277
> URL: https://issues.apache.org/jira/browse/HIVE-16277
> Project: Hive
>  Issue Type: Bug
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-16277.1.patch, HIVE-16277.2.patch, 
> HIVE-16277.3.patch, HIVE-16277.4.patch
>
>
> The following query: {{alter table s3_tbl exchange partition (country='USA') 
> with table hdfs_tbl}} fails with the following exception:
> {code}
> Error: org.apache.hive.service.cli.HiveSQLException: Error while processing 
> statement: FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: 
> java.lang.IllegalArgumentException Wrong FS: 
> s3a://[bucket]/table/country=USA, expected: file:///)
>   at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:379)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:256)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.access$800(SQLOperation.java:91)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:347)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:361)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> MetaException(message:Got exception: java.lang.IllegalArgumentException Wrong 
> FS: s3a://[bucket]/table/country=USA, expected: file:///)
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.exchangeTablePartitions(Hive.java:3553)
>   at 
> org.apache.hadoop.hive.ql.exec.DDLTask.exchangeTablePartition(DDLTask.java:4691)
>   at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:570)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2182)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1838)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1525)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1236)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1231)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:254)
>   ... 11 more
> Caused by: MetaException(message:Got exception: 
> java.lang.IllegalArgumentException Wrong FS: 
> s3a://[bucket]/table/country=USA, expected: file:///)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:1387)
>   at 
> org.apache.hadoop.hive.metastore.Warehouse.renameDir(Warehouse.java:208)
>   at 
> org.apache.hadoop.hive.metastore.Warehouse.renameDir(Warehouse.java:200)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.exchange_partitions(HiveMetaStore.java:2967)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
>   at com.sun.proxy.$Proxy28.exchange_partitions(Unknown Source)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.exchange_partitions(HiveMetaStoreClient.java:690)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> 

[jira] [Commented] (HIVE-14271) FileSinkOperator should not rename files to final paths when S3 is the default destination

2018-09-28 Thread t oo (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631691#comment-16631691
 ] 

t oo commented on HIVE-14271:
-

is this still relevant?

> FileSinkOperator should not rename files to final paths when S3 is the 
> default destination
> --
>
> Key: HIVE-14271
> URL: https://issues.apache.org/jira/browse/HIVE-14271
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergio Peña
>Assignee: Sergio Peña
>Priority: Major
>
> FileSinkOperator does a rename of {{outPaths -> finalPaths}} when it finished 
> writing all rows to a temporary path. The problem is that S3 does not support 
> renaming.
> Two options can be considered:
> a. Use a copy operation instead. After FileSinkOperator writes all rows to 
> outPaths, then the commit method will do a copy() call instead of move().
> b. Write row by row directly to the S3 path (see HIVE-1620). This may add 
> better performance calls, but we should take care of the cleanup part in case 
> of writing errors.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-14129) Execute move tasks in parallel

2018-09-28 Thread t oo (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631689#comment-16631689
 ] 

t oo commented on HIVE-14129:
-

[~ashutoshc] can this be merged to master?

> Execute move tasks in parallel
> --
>
> Key: HIVE-14129
> URL: https://issues.apache.org/jira/browse/HIVE-14129
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
>Priority: Major
> Attachments: HIVE-14129.2.patch, HIVE-14129.patch, HIVE-14129.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-14128) Parallelize jobClose phases

2018-09-28 Thread t oo (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631686#comment-16631686
 ] 

t oo commented on HIVE-14128:
-

[~rajesh.balamohan] can this be merged to master?

> Parallelize jobClose phases
> ---
>
> Key: HIVE-14128
> URL: https://issues.apache.org/jira/browse/HIVE-14128
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 1.2.0, 2.0.0, 2.1.0
>Reporter: Ashutosh Chauhan
>Assignee: Rajesh Balamohan
>Priority: Major
> Attachments: HIVE-14128.1.patch, HIVE-14128.master.2.patch, 
> HIVE-14128.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20606) hive3.1 beeline to dns complaining about ssl on ip

2018-09-19 Thread t oo (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

t oo updated HIVE-20606:

Priority: Blocker  (was: Critical)

> hive3.1 beeline to dns complaining about ssl on ip
> --
>
> Key: HIVE-20606
> URL: https://issues.apache.org/jira/browse/HIVE-20606
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline, HiveServer2
>Affects Versions: 3.1.0
>Reporter: t oo
>Priority: Blocker
>
> Why is beeline complaining about ip when i use dns in the connection? I have 
> a valid cert/jks on the dns. Exact same beeline worked when running on 
> hive2.3.2 but this is hive3.1.0
> [ec2-user@ip-10-1-2-3 logs]$ $HIVE_HOME/bin/beeline
>  SLF4J: Class path contains multiple SLF4J bindings.
>  SLF4J: Found binding in 
> [jar:file:/usr/lib/apache-hive-3.1.0-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>  SLF4J: Found binding in 
> [jar:file:/usr/lib/hadoop-2.7.5/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>  SLF4J: See [http://www.slf4j.org/codes.html#multiple_bindings] for an 
> explanation.
>  SLF4J: Actual binding is of type 
> [org.apache.logging.slf4j.Log4jLoggerFactory]
>  Beeline version 3.1.0 by Apache Hive
>  beeline> !connect 
> jdbc:hive2://mydns:1/default;ssl=true;sslTrustStore=/home/ec2-user/spark_home/conf/app-trust-nonprd.jks;trustStorePassword=changeit
>  userhere passhere
>  Connecting to 
> jdbc:hive2://mydns:1/default;ssl=true;sslTrustStore=/home/ec2-user/spark_home/conf/app-trust-nonprd.jks;trustStorePassword=changeit
>  18/09/20 04:49:06 [main]: WARN jdbc.HiveConnection: Failed to connect to 
> mydns:1
>  Unknown HS2 problem when communicating with Thrift server.
>  Error: Could not open client transport with JDBC Uri: 
> jdbc:hive2://mydns:1/default;ssl=true;sslTrustStore=/home/ec2-user/spark_home/conf/app-trust-nonprd.jks;trustStorePassword=changeit:
>  javax.net.ssl.SSLHandshakeException: 
> java.security.cert.CertificateException: No subject alternative names 
> matching IP address 10.1.2.3 found (state=08S01,code=0)
>  beeline>
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
> hiveserver2 logs:
> 2018-09-20T04:50:16,245 ERROR [HiveServer2-Handler-Pool: Thread-79] 
> server.TThreadPoolServer: Error occurred during processing of message.
> java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: 
> javax.net.ssl.SSLHandshakeException: Remote host closed connection during 
> handshake
>  at 
> org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219)
>  ~[hive-exec-3.1.0.jar:3.1.0]
>  at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:269)
>  ~[hive-exec-3.1.0.jar:3.1.0]
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  ~[?:1.8.0_181]
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  ~[?:1.8.0_181]
>  at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
> Caused by: org.apache.thrift.transport.TTransportException: 
> javax.net.ssl.SSLHandshakeException: Remote host closed connection during 
> handshake
>  at 
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
>  ~[hive-exec-3.1.0.jar:3.1.0]
>  at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) 
> ~[hive-exec-3.1.0.jar:3.1.0]
>  at 
> org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:178)
>  ~[hive-exec-3.1.0.jar:3.1.0]
>  at 
> org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125)
>  ~[hive-exec-3.1.0.jar:3.1.0]
>  at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) 
> ~[hive-exec-3.1.0.jar:3.1.0]
>  at 
> org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
>  ~[hive-exec-3.1.0.jar:3.1.0]
>  at 
> org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
>  ~[hive-exec-3.1.0.jar:3.1.0]
>  ... 4 more
> Caused by: javax.net.ssl.SSLHandshakeException: Remote host closed connection 
> during handshake
>  at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1002) 
> ~[?:1.8.0_181]
>  at 
> sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1385)
>  ~[?:1.8.0_181]
>  at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:938) 
> ~[?:1.8.0_181]
>  at sun.security.ssl.AppInputStream.read(AppInputStream.java:105) 
> ~[?:1.8.0_181]
>  at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) 
> ~[?:1.8.0_181]
>  at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) 
> ~[?:1.8.0_181]
>  at java.io.BufferedInputStream.read(BufferedInputStream.java:345) 
> 

[jira] [Updated] (HIVE-20606) hive3.1 beeline to dns complaining about ssl on ip

2018-09-19 Thread t oo (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

t oo updated HIVE-20606:

Description: 
Why is beeline complaining about ip when i use dns in the connection? I have a 
valid cert/jks on the dns. Exact same beeline worked when running on hive2.3.2 
but this is hive3.1.0

[ec2-user@ip-10-1-2-3 logs]$ $HIVE_HOME/bin/beeline
 SLF4J: Class path contains multiple SLF4J bindings.
 SLF4J: Found binding in 
[jar:file:/usr/lib/apache-hive-3.1.0-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: Found binding in 
[jar:file:/usr/lib/hadoop-2.7.5/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: See [http://www.slf4j.org/codes.html#multiple_bindings] for an 
explanation.
 SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
 Beeline version 3.1.0 by Apache Hive
 beeline> !connect 
jdbc:hive2://mydns:1/default;ssl=true;sslTrustStore=/home/ec2-user/spark_home/conf/app-trust-nonprd.jks;trustStorePassword=changeit
 userhere passhere
 Connecting to 
jdbc:hive2://mydns:1/default;ssl=true;sslTrustStore=/home/ec2-user/spark_home/conf/app-trust-nonprd.jks;trustStorePassword=changeit
 18/09/20 04:49:06 [main]: WARN jdbc.HiveConnection: Failed to connect to 
mydns:1
 Unknown HS2 problem when communicating with Thrift server.
 Error: Could not open client transport with JDBC Uri: 
jdbc:hive2://mydns:1/default;ssl=true;sslTrustStore=/home/ec2-user/spark_home/conf/app-trust-nonprd.jks;trustStorePassword=changeit:
 javax.net.ssl.SSLHandshakeException: java.security.cert.CertificateException: 
No subject alternative names matching IP address 10.1.2.3 found 
(state=08S01,code=0)
 beeline>

 

 

 

 

 

 

 

 

 

 

hiveserver2 logs:

2018-09-20T04:50:16,245 ERROR [HiveServer2-Handler-Pool: Thread-79] 
server.TThreadPoolServer: Error occurred during processing of message.
java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: 
javax.net.ssl.SSLHandshakeException: Remote host closed connection during 
handshake
 at 
org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219)
 ~[hive-exec-3.1.0.jar:3.1.0]
 at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:269)
 ~[hive-exec-3.1.0.jar:3.1.0]
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[?:1.8.0_181]
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
~[?:1.8.0_181]
 at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
Caused by: org.apache.thrift.transport.TTransportException: 
javax.net.ssl.SSLHandshakeException: Remote host closed connection during 
handshake
 at 
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
 ~[hive-exec-3.1.0.jar:3.1.0]
 at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) 
~[hive-exec-3.1.0.jar:3.1.0]
 at 
org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:178)
 ~[hive-exec-3.1.0.jar:3.1.0]
 at 
org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125)
 ~[hive-exec-3.1.0.jar:3.1.0]
 at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) 
~[hive-exec-3.1.0.jar:3.1.0]
 at 
org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
 ~[hive-exec-3.1.0.jar:3.1.0]
 at 
org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
 ~[hive-exec-3.1.0.jar:3.1.0]
 ... 4 more
Caused by: javax.net.ssl.SSLHandshakeException: Remote host closed connection 
during handshake
 at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1002) 
~[?:1.8.0_181]
 at 
sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1385) 
~[?:1.8.0_181]
 at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:938) 
~[?:1.8.0_181]
 at sun.security.ssl.AppInputStream.read(AppInputStream.java:105) ~[?:1.8.0_181]
 at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) 
~[?:1.8.0_181]
 at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) 
~[?:1.8.0_181]
 at java.io.BufferedInputStream.read(BufferedInputStream.java:345) 
~[?:1.8.0_181]
 at 
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
 ~[hive-exec-3.1.0.jar:3.1.0]
 at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) 
~[hive-exec-3.1.0.jar:3.1.0]
 at 
org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:178)
 ~[hive-exec-3.1.0.jar:3.1.0]
 at 
org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125)
 ~[hive-exec-3.1.0.jar:3.1.0]
 at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) 
~[hive-exec-3.1.0.jar:3.1.0]
 at 

[jira] [Updated] (HIVE-20602) hive3 crashes after 1min

2018-09-19 Thread t oo (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

t oo updated HIVE-20602:

Issue Type: Bug  (was: New Feature)

> hive3 crashes after 1min
> 
>
> Key: HIVE-20602
> URL: https://issues.apache.org/jira/browse/HIVE-20602
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Metastore, Standalone Metastore
>Affects Versions: 3.0.0
>Reporter: t oo
>Priority: Blocker
>
> Running hiveserver2 process (v3.0.0 of hive) on ec2 (not emr), the process 
> starts up and for the first 1min everything is ok (I can make beeline 
> connection, create/repair/select external hive tables) but then the 
> hiveserver2 process crashes. If I restart the process and even do nothing the 
> hiveserver2 process crashes after 1min. When checking the logs I see messages 
> like 'number of connections to metastore: 1','number of connections to 
> metastore: 2','number of connections to metastore: 3' then 'could not bind to 
> port 1 port already in use' then end of the logs.
> I made some experiments on few different ec2s (if i use hive v2.3.2 the 
> hiveserver2 process never crashes), but if i use hive v3.0.0 it consistently 
> crashes after a min.
> Metastore db is mysql rds, hive metastore process never crashed. I can see 
> the external hive table ddls are persisted in the mysql (ie DBS, TBLS tables).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19958) hive on spark - count(*) error when running spark standalone against s3 external tables

2018-09-17 Thread t oo (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

t oo updated HIVE-19958:

Priority: Blocker  (was: Major)

> hive on spark - count(*) error when running spark standalone against s3 
> external tables
> ---
>
> Key: HIVE-19958
> URL: https://issues.apache.org/jira/browse/HIVE-19958
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 2.3.2, 2.3.3
>Reporter: t oo
>Priority: Blocker
>
> I am running 'Hive on Spark' with hive v2.3.3 and Spark v2.0.0 running in 
> spark standalone mode with no yarn/hdfs. My hive tables are external pointing 
> to S3. My hive-site has spark.submit.deployMode set to client. spark.master 
> set to spark://actualmaster:7077 and in the spark ui I see the spark master 
> has available worker with resources.
> In beeline I run select * from table; This works. Then in beeline I run 
> select count(*) from table; and i get error below:
> /usr/lib/apache-hive-2.3.3-bin/lib/hive-exec-2.3.3.jar contains the so called 
> missing class and hive2 is started with nohup $HIVE_HOME/bin/hive --service 
> hiveserver2 --hiveconf hive.server2.thrift.port=1 --hiveconf 
> hive.root.logger=INFO,console &>> $HIVE_HOME/logs/hiveserver2.log &
> Below error is from viewing the 'job' in the sparkUI:
> {code:java}
>  
> Failed stageid0: mapPartitionsToPair at MapTran.java:40
>  
>   java.lang.NoClassDefFoundError: 
> Lorg/apache/hive/spark/counter/SparkCounters;
>  at java.lang.Class.getDeclaredFields0(Native Method)
>  at java.lang.Class.privateGetDeclaredFields(Class.java:2583)
>  at java.lang.Class.getDeclaredField(Class.java:2068)
>  at java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1803)
>  at java.io.ObjectStreamClass.access$700(ObjectStreamClass.java:79)
>  at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:494)
>  at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:482)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at java.io.ObjectStreamClass.(ObjectStreamClass.java:482)
>  at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:379)
>  at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:669)
>  at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1875)
>  at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1744)
>  at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2032)
>  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1566)
>  at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2277)
>  at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2201)
>  at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2059)
>  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1566)
>  at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2277)
>  at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2201)
>  at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2059)
>  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1566)
>  at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2277)
>  at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2201)
>  at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2059)
>  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1566)
>  at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2277)
>  at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2201)
>  at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2059)
>  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1566)
>  at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2277)
>  at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2201)
>  at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2059)
>  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1566)
>  at java.io.ObjectInputStream.readObject(ObjectInputStream.java:426)
>  at 
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
>  at 
> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
>  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:71)
>  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
>  at org.apache.spark.scheduler.Task.run(Task.scala:85)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

[jira] [Comment Edited] (HIVE-19959) 'Hive on Spark' error - org.apache.hive.com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: 109

2018-09-17 Thread t oo (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16617403#comment-16617403
 ] 

t oo edited comment on HIVE-19959 at 9/17/18 11:44 AM:
---

on local mode it is ok but when having master/worker on different machines get 
the same error even with  Hive 2.3.3 and Spark 2.0.0


was (Author: toopt4):
on local mode it is ok but when having master/worker. on different machines get 
the same error even with  Hive 2.3.3 and Spark 2.0.0

> 'Hive on Spark' error - 
> org.apache.hive.com.esotericsoftware.kryo.KryoException: Encountered 
> unregistered class ID: 109
> ---
>
> Key: HIVE-19959
> URL: https://issues.apache.org/jira/browse/HIVE-19959
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 2.3.2, 2.3.3
> Environment: env: hive 2.3.3 spark 2.0.0 in standalone mode scratch 
> dir on S3 hive table on s3 hadoop 2.8.3 installed no hdfs setup
>Reporter: t oo
>Priority: Blocker
>
> connecting to beeline and running SELECT * works but when running select 
> count(*) get below error:
> 18/05/01 07:41:37 INFO Utilities: Open file to read in plan: 
> s3a://redacted/tmp/31f5ffb5-f318-45f1-b07d-1fac0b406c89/hive_2018-05-01_07-41-09_102_7250900080631620338-
> 2/-mr-10004/bbb93046-5d8f-4b6e-888e-c86bfeb57e3f/map.xml
> 18/05/01 07:41:37 INFO PerfLogger:  from=org.apache.hadoop.hive.ql.exec.Utilities>
> 18/05/01 07:41:37 INFO Utilities: Deserializing MapWork via kryo
> 18/05/01 07:41:37 ERROR Utilities: Failed to load plan: 
> s3a://redacted/tmp/31f5ffb5-f318-45f1-b07d-1fac0b406c89/hive_2018-05-01_07-41-09_102_7250900080631620338-
> 2/-mr-10004/bbb93046-5d8f-4b6e-888e-c86bfeb57e3f/map.xml: 
> org.apache.hive.com.esotericsoftware.kryo.KryoException: Encountered 
> unregistered class ID: 109
> Serialization trace:
> properties (org.apache.hadoop.hive.ql.plan.PartitionDesc)
> aliasToPartnInfo (org.apache.hadoop.hive.ql.plan.MapWork)
> org.apache.hive.com.esotericsoftware.kryo.KryoException: Encountered 
> unregistered class ID: 109
> Serialization trace:
> properties (org.apache.hadoop.hive.ql.plan.PartitionDesc)
> aliasToPartnInfo (org.apache.hadoop.hive.ql.plan.MapWork)
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:119)
> at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:610)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:599)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:134)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:648)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:626)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.deserializeObjectByKryo(Utilities.java:1082)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.deserializePlan(Utilities.java:973)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.deserializePlan(Utilities.java:987)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:423)
> at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:302)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:268)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:484)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:477)
> at 
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:715)
> at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:246)
> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:209)
> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:102)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
> at 

[jira] [Commented] (HIVE-19959) 'Hive on Spark' error - org.apache.hive.com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: 109

2018-09-17 Thread t oo (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16617403#comment-16617403
 ] 

t oo commented on HIVE-19959:
-

on local mode it is ok but when having master/worker. on different machines get 
the same error even with  Hive 2.3.3 and Spark 2.0.0

> 'Hive on Spark' error - 
> org.apache.hive.com.esotericsoftware.kryo.KryoException: Encountered 
> unregistered class ID: 109
> ---
>
> Key: HIVE-19959
> URL: https://issues.apache.org/jira/browse/HIVE-19959
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 2.3.2, 2.3.3
> Environment: env: hive 2.3.3 spark 2.0.0 in standalone mode scratch 
> dir on S3 hive table on s3 hadoop 2.8.3 installed no hdfs setup
>Reporter: t oo
>Priority: Blocker
>
> connecting to beeline and running SELECT * works but when running select 
> count(*) get below error:
> 18/05/01 07:41:37 INFO Utilities: Open file to read in plan: 
> s3a://redacted/tmp/31f5ffb5-f318-45f1-b07d-1fac0b406c89/hive_2018-05-01_07-41-09_102_7250900080631620338-
> 2/-mr-10004/bbb93046-5d8f-4b6e-888e-c86bfeb57e3f/map.xml
> 18/05/01 07:41:37 INFO PerfLogger:  from=org.apache.hadoop.hive.ql.exec.Utilities>
> 18/05/01 07:41:37 INFO Utilities: Deserializing MapWork via kryo
> 18/05/01 07:41:37 ERROR Utilities: Failed to load plan: 
> s3a://redacted/tmp/31f5ffb5-f318-45f1-b07d-1fac0b406c89/hive_2018-05-01_07-41-09_102_7250900080631620338-
> 2/-mr-10004/bbb93046-5d8f-4b6e-888e-c86bfeb57e3f/map.xml: 
> org.apache.hive.com.esotericsoftware.kryo.KryoException: Encountered 
> unregistered class ID: 109
> Serialization trace:
> properties (org.apache.hadoop.hive.ql.plan.PartitionDesc)
> aliasToPartnInfo (org.apache.hadoop.hive.ql.plan.MapWork)
> org.apache.hive.com.esotericsoftware.kryo.KryoException: Encountered 
> unregistered class ID: 109
> Serialization trace:
> properties (org.apache.hadoop.hive.ql.plan.PartitionDesc)
> aliasToPartnInfo (org.apache.hadoop.hive.ql.plan.MapWork)
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:119)
> at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:610)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:599)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:134)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:648)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:626)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.deserializeObjectByKryo(Utilities.java:1082)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.deserializePlan(Utilities.java:973)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.deserializePlan(Utilities.java:987)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:423)
> at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:302)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:268)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:484)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:477)
> at 
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:715)
> at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:246)
> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:209)
> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:102)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
> at 

[jira] [Updated] (HIVE-19959) 'Hive on Spark' error - org.apache.hive.com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: 109

2018-09-17 Thread t oo (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

t oo updated HIVE-19959:

Priority: Blocker  (was: Major)

> 'Hive on Spark' error - 
> org.apache.hive.com.esotericsoftware.kryo.KryoException: Encountered 
> unregistered class ID: 109
> ---
>
> Key: HIVE-19959
> URL: https://issues.apache.org/jira/browse/HIVE-19959
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 2.3.2, 2.3.3
> Environment: env: hive 2.3.3 spark 2.0.0 in standalone mode scratch 
> dir on S3 hive table on s3 hadoop 2.8.3 installed no hdfs setup
>Reporter: t oo
>Priority: Blocker
>
> connecting to beeline and running SELECT * works but when running select 
> count(*) get below error:
> 18/05/01 07:41:37 INFO Utilities: Open file to read in plan: 
> s3a://redacted/tmp/31f5ffb5-f318-45f1-b07d-1fac0b406c89/hive_2018-05-01_07-41-09_102_7250900080631620338-
> 2/-mr-10004/bbb93046-5d8f-4b6e-888e-c86bfeb57e3f/map.xml
> 18/05/01 07:41:37 INFO PerfLogger:  from=org.apache.hadoop.hive.ql.exec.Utilities>
> 18/05/01 07:41:37 INFO Utilities: Deserializing MapWork via kryo
> 18/05/01 07:41:37 ERROR Utilities: Failed to load plan: 
> s3a://redacted/tmp/31f5ffb5-f318-45f1-b07d-1fac0b406c89/hive_2018-05-01_07-41-09_102_7250900080631620338-
> 2/-mr-10004/bbb93046-5d8f-4b6e-888e-c86bfeb57e3f/map.xml: 
> org.apache.hive.com.esotericsoftware.kryo.KryoException: Encountered 
> unregistered class ID: 109
> Serialization trace:
> properties (org.apache.hadoop.hive.ql.plan.PartitionDesc)
> aliasToPartnInfo (org.apache.hadoop.hive.ql.plan.MapWork)
> org.apache.hive.com.esotericsoftware.kryo.KryoException: Encountered 
> unregistered class ID: 109
> Serialization trace:
> properties (org.apache.hadoop.hive.ql.plan.PartitionDesc)
> aliasToPartnInfo (org.apache.hadoop.hive.ql.plan.MapWork)
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:119)
> at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:610)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:599)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:134)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:648)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:626)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.deserializeObjectByKryo(Utilities.java:1082)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.deserializePlan(Utilities.java:973)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.deserializePlan(Utilities.java:987)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:423)
> at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:302)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:268)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:484)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:477)
> at 
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:715)
> at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:246)
> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:209)
> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:102)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
> at org.apache.spark.scheduler.Task.run(Task.scala:85)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
> at 

[jira] [Updated] (HIVE-19959) 'Hive on Spark' error - org.apache.hive.com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: 109

2018-06-21 Thread t oo (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

t oo updated HIVE-19959:

Component/s: Spark

> 'Hive on Spark' error - 
> org.apache.hive.com.esotericsoftware.kryo.KryoException: Encountered 
> unregistered class ID: 109
> ---
>
> Key: HIVE-19959
> URL: https://issues.apache.org/jira/browse/HIVE-19959
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 2.3.2, 2.3.3
> Environment: env: hive 2.3.3 spark 2.0.0 in standalone mode scratch 
> dir on S3 hive table on s3 hadoop 2.8.3 installed no hdfs setup
>Reporter: t oo
>Priority: Major
>
> connecting to beeline and running SELECT * works but when running select 
> count(*) get below error:
> 18/05/01 07:41:37 INFO Utilities: Open file to read in plan: 
> s3a://redacted/tmp/31f5ffb5-f318-45f1-b07d-1fac0b406c89/hive_2018-05-01_07-41-09_102_7250900080631620338-
> 2/-mr-10004/bbb93046-5d8f-4b6e-888e-c86bfeb57e3f/map.xml
> 18/05/01 07:41:37 INFO PerfLogger:  from=org.apache.hadoop.hive.ql.exec.Utilities>
> 18/05/01 07:41:37 INFO Utilities: Deserializing MapWork via kryo
> 18/05/01 07:41:37 ERROR Utilities: Failed to load plan: 
> s3a://redacted/tmp/31f5ffb5-f318-45f1-b07d-1fac0b406c89/hive_2018-05-01_07-41-09_102_7250900080631620338-
> 2/-mr-10004/bbb93046-5d8f-4b6e-888e-c86bfeb57e3f/map.xml: 
> org.apache.hive.com.esotericsoftware.kryo.KryoException: Encountered 
> unregistered class ID: 109
> Serialization trace:
> properties (org.apache.hadoop.hive.ql.plan.PartitionDesc)
> aliasToPartnInfo (org.apache.hadoop.hive.ql.plan.MapWork)
> org.apache.hive.com.esotericsoftware.kryo.KryoException: Encountered 
> unregistered class ID: 109
> Serialization trace:
> properties (org.apache.hadoop.hive.ql.plan.PartitionDesc)
> aliasToPartnInfo (org.apache.hadoop.hive.ql.plan.MapWork)
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:119)
> at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:610)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:599)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:134)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:648)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:626)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.deserializeObjectByKryo(Utilities.java:1082)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.deserializePlan(Utilities.java:973)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.deserializePlan(Utilities.java:987)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:423)
> at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:302)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:268)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:484)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:477)
> at 
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:715)
> at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:246)
> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:209)
> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:102)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
> at org.apache.spark.scheduler.Task.run(Task.scala:85)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
> at 
> 

[jira] [Commented] (HIVE-19919) HiveServer2 - expose queryable data dictionary (ie Oracles' ALL_TAB_COLUMNS)

2018-06-18 Thread t oo (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16516424#comment-16516424
 ] 

t oo commented on HIVE-19919:
-

can this be ported to hive2.3.4? why is the schematool needed and can you give 
an example of how to run that schematool for mysql metastore?

> HiveServer2 - expose queryable data dictionary (ie Oracles' ALL_TAB_COLUMNS)
> 
>
> Key: HIVE-19919
> URL: https://issues.apache.org/jira/browse/HIVE-19919
> Project: Hive
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.0.0, 2.3.2
>Reporter: t oo
>Priority: Major
>
> All major db vendors have a table like information_schema.columns, 
> all_tab_columns or syscolumns containing table_name,column_name, data_type, 
> col_order. Adding this feature to HiveServer2 would be very convenient for 
> users.
> This information is currently only available in the mysql metastore ie TBLS, 
> COLS but should be exposed up into the HiveServer2 1 port connection. 
> Thus saving users from having 2 connections (1 to see data, 1 to see 
> metadata). For security reason too, mysql can be firewalled from end-users.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-1010) Implement INFORMATION_SCHEMA in Hive

2018-06-18 Thread t oo (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16516408#comment-16516408
 ] 

t oo commented on HIVE-1010:


any chance of porting to Hive 2.3.4? Spark still can't run on hadoop3 and i 
assume hive3 needs hadoop3

> Implement INFORMATION_SCHEMA in Hive
> 
>
> Key: HIVE-1010
> URL: https://issues.apache.org/jira/browse/HIVE-1010
> Project: Hive
>  Issue Type: New Feature
>  Components: Metastore, Query Processor, Server Infrastructure
>Reporter: Jeff Hammerbacher
>Assignee: Gunther Hagleitner
>Priority: Major
>  Labels: TODOC3.0
> Fix For: 3.0.0
>
> Attachments: HIVE-1010.10.patch, HIVE-1010.11.patch, 
> HIVE-1010.12.patch, HIVE-1010.13.patch, HIVE-1010.14.patch, 
> HIVE-1010.15.patch, HIVE-1010.16.patch, HIVE-1010.7.patch, HIVE-1010.8.patch, 
> HIVE-1010.9.patch
>
>
> INFORMATION_SCHEMA is part of the SQL92 standard and would be useful to 
> implement using our metastore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19437) HiveServer2 Drops connection to Metastore when hiverserver2 webui is enabled

2018-06-10 Thread t oo (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

t oo updated HIVE-19437:

Priority: Critical  (was: Major)

> HiveServer2 Drops connection to Metastore when hiverserver2 webui is enabled
> 
>
> Key: HIVE-19437
> URL: https://issues.apache.org/jira/browse/HIVE-19437
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, SQL, Web UI
>Affects Versions: 2.1.1, 2.3.2
>Reporter: rr
>Priority: Critical
>
>  
> when ssl is enabled for hiveserver2 webui on port 10002, hiveserver2 is 
> unable to start up. Keeps connecting to  metastore and then drops the 
> connection and then retry again. Hiveserver2 pid will be available but its 
> not actually UP as it drops the metastore connection.
> Logs shows as follows :
> 2018-05-07T04:45:52,980 INFO [main] sqlstd.SQLStdHiveAccessController: 
> Created SQLStdHiveAccessController for session context : 
> HiveAuthzSessionContext [sessionString=9f65e1ba-8810-47ee-a370-238606f02479, 
> clientType=HIVESERVER2]
>  2018-05-07T04:45:52,980 WARN [main] session.SessionState: 
> METASTORE_FILTER_HOOK will be ignored, since 
> hive.security.authorization.manager is set to instance of 
> HiveAuthorizerFactory.
> 2018-05-07T04:45:52,981 INFO [main] hive.metastore: Mestastore configuration 
> hive.metastore.filter.hook changed from 
> org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl to 
> org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook
> 2018-05-07T04:45:52,981 INFO [main] hive.metastore: Closed a connection to 
> metastore, current connections: 0
> 2018-05-07T04:45:52,982 INFO [main] hive.metastore: Trying to connect to 
> metastore with URI thrift://localhost:9083
> 2018-05-07T04:45:52,982 INFO [main] hive.metastore: Opened a connection to 
> metastore, current connections: 1
> 2018-05-07T04:45:52,985 INFO [main] hive.metastore: Connected to metastore.
> 2018-05-07T04:45:52,986 INFO [main] service.CompositeService: Operation log 
> root directory is created: /var/hive/hs2log/tmp
> 2018-05-07T04:45:52,986 INFO [main] service.CompositeService: HiveServer2: 
> Background operation thread pool size: 100
> 2018-05-07T04:45:52,986 INFO [main] service.CompositeService: HiveServer2: 
> Background operation thread wait queue size: 100
> 2018-05-07T04:45:52,986 INFO [main] service.CompositeService: HiveServer2: 
> Background operation thread keepalive time: 10 seconds
> 2018-05-07T04:45:52,988 INFO [main] hive.metastore: Closed a connection to 
> metastore, current connections: 0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19437) HiveServer2 Drops connection to Metastore when hiverserver2 webui is enabled

2018-06-10 Thread t oo (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16507535#comment-16507535
 ] 

t oo commented on HIVE-19437:
-

Getting same issue on Hive 2.3.2

> HiveServer2 Drops connection to Metastore when hiverserver2 webui is enabled
> 
>
> Key: HIVE-19437
> URL: https://issues.apache.org/jira/browse/HIVE-19437
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, SQL, Web UI
>Affects Versions: 2.1.1, 2.3.2
>Reporter: rr
>Priority: Major
>
>  
> when ssl is enabled for hiveserver2 webui on port 10002, hiveserver2 is 
> unable to start up. Keeps connecting to  metastore and then drops the 
> connection and then retry again. Hiveserver2 pid will be available but its 
> not actually UP as it drops the metastore connection.
> Logs shows as follows :
> 2018-05-07T04:45:52,980 INFO [main] sqlstd.SQLStdHiveAccessController: 
> Created SQLStdHiveAccessController for session context : 
> HiveAuthzSessionContext [sessionString=9f65e1ba-8810-47ee-a370-238606f02479, 
> clientType=HIVESERVER2]
>  2018-05-07T04:45:52,980 WARN [main] session.SessionState: 
> METASTORE_FILTER_HOOK will be ignored, since 
> hive.security.authorization.manager is set to instance of 
> HiveAuthorizerFactory.
> 2018-05-07T04:45:52,981 INFO [main] hive.metastore: Mestastore configuration 
> hive.metastore.filter.hook changed from 
> org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl to 
> org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook
> 2018-05-07T04:45:52,981 INFO [main] hive.metastore: Closed a connection to 
> metastore, current connections: 0
> 2018-05-07T04:45:52,982 INFO [main] hive.metastore: Trying to connect to 
> metastore with URI thrift://localhost:9083
> 2018-05-07T04:45:52,982 INFO [main] hive.metastore: Opened a connection to 
> metastore, current connections: 1
> 2018-05-07T04:45:52,985 INFO [main] hive.metastore: Connected to metastore.
> 2018-05-07T04:45:52,986 INFO [main] service.CompositeService: Operation log 
> root directory is created: /var/hive/hs2log/tmp
> 2018-05-07T04:45:52,986 INFO [main] service.CompositeService: HiveServer2: 
> Background operation thread pool size: 100
> 2018-05-07T04:45:52,986 INFO [main] service.CompositeService: HiveServer2: 
> Background operation thread wait queue size: 100
> 2018-05-07T04:45:52,986 INFO [main] service.CompositeService: HiveServer2: 
> Background operation thread keepalive time: 10 seconds
> 2018-05-07T04:45:52,988 INFO [main] hive.metastore: Closed a connection to 
> metastore, current connections: 0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19437) HiveServer2 Drops connection to Metastore when hiverserver2 webui is enabled

2018-06-10 Thread t oo (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

t oo updated HIVE-19437:

Affects Version/s: 2.3.2

> HiveServer2 Drops connection to Metastore when hiverserver2 webui is enabled
> 
>
> Key: HIVE-19437
> URL: https://issues.apache.org/jira/browse/HIVE-19437
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, SQL, Web UI
>Affects Versions: 2.1.1, 2.3.2
>Reporter: rr
>Priority: Major
>
>  
> when ssl is enabled for hiveserver2 webui on port 10002, hiveserver2 is 
> unable to start up. Keeps connecting to  metastore and then drops the 
> connection and then retry again. Hiveserver2 pid will be available but its 
> not actually UP as it drops the metastore connection.
> Logs shows as follows :
> 2018-05-07T04:45:52,980 INFO [main] sqlstd.SQLStdHiveAccessController: 
> Created SQLStdHiveAccessController for session context : 
> HiveAuthzSessionContext [sessionString=9f65e1ba-8810-47ee-a370-238606f02479, 
> clientType=HIVESERVER2]
>  2018-05-07T04:45:52,980 WARN [main] session.SessionState: 
> METASTORE_FILTER_HOOK will be ignored, since 
> hive.security.authorization.manager is set to instance of 
> HiveAuthorizerFactory.
> 2018-05-07T04:45:52,981 INFO [main] hive.metastore: Mestastore configuration 
> hive.metastore.filter.hook changed from 
> org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl to 
> org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook
> 2018-05-07T04:45:52,981 INFO [main] hive.metastore: Closed a connection to 
> metastore, current connections: 0
> 2018-05-07T04:45:52,982 INFO [main] hive.metastore: Trying to connect to 
> metastore with URI thrift://localhost:9083
> 2018-05-07T04:45:52,982 INFO [main] hive.metastore: Opened a connection to 
> metastore, current connections: 1
> 2018-05-07T04:45:52,985 INFO [main] hive.metastore: Connected to metastore.
> 2018-05-07T04:45:52,986 INFO [main] service.CompositeService: Operation log 
> root directory is created: /var/hive/hs2log/tmp
> 2018-05-07T04:45:52,986 INFO [main] service.CompositeService: HiveServer2: 
> Background operation thread pool size: 100
> 2018-05-07T04:45:52,986 INFO [main] service.CompositeService: HiveServer2: 
> Background operation thread wait queue size: 100
> 2018-05-07T04:45:52,986 INFO [main] service.CompositeService: HiveServer2: 
> Background operation thread keepalive time: 10 seconds
> 2018-05-07T04:45:52,988 INFO [main] hive.metastore: Closed a connection to 
> metastore, current connections: 0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19848) Implement HiveServer2WebUI authentication (Spark has HTTP Basic Auth for its UIs)

2018-06-10 Thread t oo (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

t oo updated HIVE-19848:

Description: 
Implement HiveServer2WebUI authentication (Spark has HTTP Basic Auth for its 
UIs)

We are using Hive on EC2s without EMR/HDFS/Kerberos

  was:Implement HiveServer2WebUI authentication (Spark has HTTP Basic Auth for 
its UIs)


> Implement HiveServer2WebUI authentication (Spark has HTTP Basic Auth for its 
> UIs)
> -
>
> Key: HIVE-19848
> URL: https://issues.apache.org/jira/browse/HIVE-19848
> Project: Hive
>  Issue Type: New Feature
>  Components: Web UI
>Affects Versions: 2.3.2
>Reporter: t oo
>Priority: Major
>
> Implement HiveServer2WebUI authentication (Spark has HTTP Basic Auth for its 
> UIs)
> We are using Hive on EC2s without EMR/HDFS/Kerberos



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19117) hiveserver2 org.apache.thrift.transport.TTransportException error when running 2nd query after minute of inactivity

2018-04-05 Thread t oo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16427810#comment-16427810
 ] 

t oo commented on HIVE-19117:
-

[~gopalv] Even with ;http.header.Connection=close it is still an issue

> hiveserver2 org.apache.thrift.transport.TTransportException error when 
> running 2nd query after minute of inactivity
> ---
>
> Key: HIVE-19117
> URL: https://issues.apache.org/jira/browse/HIVE-19117
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2, Metastore, Thrift API
>Affects Versions: 2.1.1
> Environment: * Hive 2.1.1 with hive.server2.transport.mode set to 
> binary (sample JDBC string is jdbc:hive2://remotehost:1/default)
>  * Hadoop 2.8.3
>  * Metastore using MySQL
>  * Java 8
>Reporter: t oo
>Priority: Blocker
>
> I make a JDBC connection from my SQL tool (ie Squirrel SQL, Oracle SQL 
> Developer) to HiveServer2 (running on remote server) with port 1.
> I am able to run some queries successfully. I then do something else (not in 
> the SQL tool) for 1-2minutes and then return to my SQL tool and attempt to 
> run a query but I get this error: 
> {code:java}
> org.apache.thrift.transport.TTransportException: java.net.SocketException: 
> Software caused connection abort: socket write error{code}
> If I now disconnect and reconnect in my SQL tool I can run queries again. But 
> does anyone know what HiveServer2 settings I should change to prevent the 
> error? I assume something in hive-site.xml
> From the hiveserver2 logs below, can see an exact 1 minute gap from 30th min 
> to 31stmin where the disconnect happens.
> {code:java}
> 2018-04-05T03:30:41,706 INFO [HiveServer2-Handler-Pool: Thread-36] 
> session.SessionState: Resetting thread name to HiveServer2-Handler-Pool: 
> Thread-36
>  2018-04-05T03:30:41,712 INFO [HiveServer2-Handler-Pool: Thread-36] 
> session.SessionState: Updating thread name to 
> c81ec0f9-7a9d-46b6-9708-e7d78520a48a HiveServer2-Handler-Pool: Thread-36
>  2018-04-05T03:30:41,712 INFO [HiveServer2-Handler-Pool: Thread-36] 
> session.SessionState: Resetting thread name to HiveServer2-Handler-Pool: 
> Thread-36
>  2018-04-05T03:30:41,718 INFO [HiveServer2-Handler-Pool: Thread-36] 
> session.SessionState: Updating thread name to 
> c81ec0f9-7a9d-46b6-9708-e7d78520a48a HiveServer2-Handler-Pool: Thread-36
>  2018-04-05T03:30:41,719 INFO [HiveServer2-Handler-Pool: Thread-36] 
> session.SessionState: Resetting thread name to HiveServer2-Handler-Pool: 
> Thread-36
>  2018-04-05T03:31:41,232 INFO [HiveServer2-Handler-Pool: Thread-36] 
> thrift.ThriftCLIService: Session disconnected without closing properly.
>  2018-04-05T03:31:41,233 INFO [HiveServer2-Handler-Pool: Thread-36] 
> thrift.ThriftCLIService: Closing the session: SessionHandle 
> [c81ec0f9-7a9d-46b6-9708-e7d78520a48a]
>  2018-04-05T03:31:41,233 INFO [HiveServer2-Handler-Pool: Thread-36] 
> service.CompositeService: Session closed, SessionHandle 
> [c81ec0f9-7a9d-46b6-9708-e7d78520a48a], current sessions:0
>  2018-04-05T03:31:41,233 INFO [HiveServer2-Handler-Pool: Thread-36] 
> session.SessionState: Updating thread name to 
> c81ec0f9-7a9d-46b6-9708-e7d78520a48a HiveServer2-Handler-Pool: Thread-36
>  2018-04-05T03:31:41,233 INFO [HiveServer2-Handler-Pool: Thread-36] 
> session.SessionState: Resetting thread name to HiveServer2-Handler-Pool: 
> Thread-36
>  2018-04-05T03:31:41,233 INFO [HiveServer2-Handler-Pool: Thread-36] 
> session.SessionState: Updating thread name to 
> c81ec0f9-7a9d-46b6-9708-e7d78520a48a HiveServer2-Handler-Pool: Thread-36
>  2018-04-05T03:31:41,233 INFO [HiveServer2-Handler-Pool: Thread-36] 
> session.HiveSessionImpl: Operation log session directory is deleted: 
> /var/hive/hs2log/tmp/c81ec0f9-7a9d-46b6-9708-e7d78520a48a
>  2018-04-05T03:31:41,233 INFO [HiveServer2-Handler-Pool: Thread-36] 
> session.SessionState: Resetting thread name to HiveServer2-Handler-Pool: 
> Thread-36
>  2018-04-05T03:31:41,236 INFO [HiveServer2-Handler-Pool: Thread-36] 
> session.SessionState: Deleted directory: 
> /var/hive/scratch/tmp/anonymous/c81ec0f9-7a9d-46b6-9708-e7d78520a48a on fs 
> with scheme file
>  2018-04-05T03:31:41,236 INFO [HiveServer2-Handler-Pool: Thread-36] 
> session.SessionState: Deleted directory: 
> /var/hive/ec2-user/c81ec0f9-7a9d-46b6-9708-e7d78520a48a on fs with scheme file
>  2018-04-05T03:31:41,236 INFO [HiveServer2-Handler-Pool: Thread-36] 
> hive.metastore: Closed a connection to metastore, current connections: 1{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19117) hiveserver2 org.apache.thrift.transport.TTransportException error when running 2nd query after minute of inactivity

2018-04-05 Thread t oo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

t oo updated HIVE-19117:

Description: 
I make a JDBC connection from my SQL tool (ie Squirrel SQL, Oracle SQL 
Developer) to HiveServer2 (running on remote server) with port 1.

I am able to run some queries successfully. I then do something else (not in 
the SQL tool) for 1-2minutes and then return to my SQL tool and attempt to run 
a query but I get this error: 
{code:java}
org.apache.thrift.transport.TTransportException: java.net.SocketException: 
Software caused connection abort: socket write error{code}
If I now disconnect and reconnect in my SQL tool I can run queries again. But 
does anyone know what HiveServer2 settings I should change to prevent the 
error? I assume something in hive-site.xml

>From the hiveserver2 logs below, can see an exact 1 minute gap from 30th min 
>to 31stmin where the disconnect happens.
{code:java}
2018-04-05T03:30:41,706 INFO [HiveServer2-Handler-Pool: Thread-36] 
session.SessionState: Resetting thread name to HiveServer2-Handler-Pool: 
Thread-36
 2018-04-05T03:30:41,712 INFO [HiveServer2-Handler-Pool: Thread-36] 
session.SessionState: Updating thread name to 
c81ec0f9-7a9d-46b6-9708-e7d78520a48a HiveServer2-Handler-Pool: Thread-36
 2018-04-05T03:30:41,712 INFO [HiveServer2-Handler-Pool: Thread-36] 
session.SessionState: Resetting thread name to HiveServer2-Handler-Pool: 
Thread-36
 2018-04-05T03:30:41,718 INFO [HiveServer2-Handler-Pool: Thread-36] 
session.SessionState: Updating thread name to 
c81ec0f9-7a9d-46b6-9708-e7d78520a48a HiveServer2-Handler-Pool: Thread-36
 2018-04-05T03:30:41,719 INFO [HiveServer2-Handler-Pool: Thread-36] 
session.SessionState: Resetting thread name to HiveServer2-Handler-Pool: 
Thread-36
 2018-04-05T03:31:41,232 INFO [HiveServer2-Handler-Pool: Thread-36] 
thrift.ThriftCLIService: Session disconnected without closing properly.
 2018-04-05T03:31:41,233 INFO [HiveServer2-Handler-Pool: Thread-36] 
thrift.ThriftCLIService: Closing the session: SessionHandle 
[c81ec0f9-7a9d-46b6-9708-e7d78520a48a]
 2018-04-05T03:31:41,233 INFO [HiveServer2-Handler-Pool: Thread-36] 
service.CompositeService: Session closed, SessionHandle 
[c81ec0f9-7a9d-46b6-9708-e7d78520a48a], current sessions:0
 2018-04-05T03:31:41,233 INFO [HiveServer2-Handler-Pool: Thread-36] 
session.SessionState: Updating thread name to 
c81ec0f9-7a9d-46b6-9708-e7d78520a48a HiveServer2-Handler-Pool: Thread-36
 2018-04-05T03:31:41,233 INFO [HiveServer2-Handler-Pool: Thread-36] 
session.SessionState: Resetting thread name to HiveServer2-Handler-Pool: 
Thread-36
 2018-04-05T03:31:41,233 INFO [HiveServer2-Handler-Pool: Thread-36] 
session.SessionState: Updating thread name to 
c81ec0f9-7a9d-46b6-9708-e7d78520a48a HiveServer2-Handler-Pool: Thread-36
 2018-04-05T03:31:41,233 INFO [HiveServer2-Handler-Pool: Thread-36] 
session.HiveSessionImpl: Operation log session directory is deleted: 
/var/hive/hs2log/tmp/c81ec0f9-7a9d-46b6-9708-e7d78520a48a
 2018-04-05T03:31:41,233 INFO [HiveServer2-Handler-Pool: Thread-36] 
session.SessionState: Resetting thread name to HiveServer2-Handler-Pool: 
Thread-36
 2018-04-05T03:31:41,236 INFO [HiveServer2-Handler-Pool: Thread-36] 
session.SessionState: Deleted directory: 
/var/hive/scratch/tmp/anonymous/c81ec0f9-7a9d-46b6-9708-e7d78520a48a on fs with 
scheme file
 2018-04-05T03:31:41,236 INFO [HiveServer2-Handler-Pool: Thread-36] 
session.SessionState: Deleted directory: 
/var/hive/ec2-user/c81ec0f9-7a9d-46b6-9708-e7d78520a48a on fs with scheme file
 2018-04-05T03:31:41,236 INFO [HiveServer2-Handler-Pool: Thread-36] 
hive.metastore: Closed a connection to metastore, current connections: 1{code}

  was:
I make a JDBC connection from my SQL tool (ie Squirrel SQL, Oracle SQL 
Developer) to HiveServer2 (running on remote server) with port 1.

I am able to run some queries successfully. I then do something else (not in 
the SQL tool) for 1-2minutes and then return to my SQL tool and attempt to run 
a query but I get this error: {{}}
{code:java}
org.apache.thrift.transport.TTransportException: java.net.SocketException: 
Software caused connection abort: socket write error{code}
If I now disconnect and reconnect in my SQL tool I can run queries again. But 
does anyone know what HiveServer2 settings I should change to prevent the 
error? I assume something in hive-site.xml

>From the hiveserver2 logs below, can see an exact 1 minute gap from 30th min 
>to 31stmin where the disconnect happens.
{code:java}
2018-04-05T03:30:41,706 INFO [HiveServer2-Handler-Pool: Thread-36] 
session.SessionState: Resetting thread name to HiveServer2-Handler-Pool: 
Thread-36
 2018-04-05T03:30:41,712 INFO [HiveServer2-Handler-Pool: Thread-36] 
session.SessionState: Updating thread name to 
c81ec0f9-7a9d-46b6-9708-e7d78520a48a HiveServer2-Handler-Pool: Thread-36
 2018-04-05T03:30:41,712 INFO [HiveServer2-Handler-Pool: Thread-36]