[jira] [Created] (HIVE-25428) Add support for redshift database launch in new test driver
Dantong Dong created HIVE-25428: --- Summary: Add support for redshift database launch in new test driver Key: HIVE-25428 URL: https://issues.apache.org/jira/browse/HIVE-25428 Project: Hive Issue Type: Sub-task Reporter: Dantong Dong -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25427) Add support for mssql database launch in new test driver
Dantong Dong created HIVE-25427: --- Summary: Add support for mssql database launch in new test driver Key: HIVE-25427 URL: https://issues.apache.org/jira/browse/HIVE-25427 Project: Hive Issue Type: Sub-task Reporter: Dantong Dong -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25426) Add support for Oracle database launch in new test driver
Dantong Dong created HIVE-25426: --- Summary: Add support for Oracle database launch in new test driver Key: HIVE-25426 URL: https://issues.apache.org/jira/browse/HIVE-25426 Project: Hive Issue Type: Sub-task Reporter: Dantong Dong Assignee: Dantong Dong -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25425) Add support for Derby database launch in new test driver
Dantong Dong created HIVE-25425: --- Summary: Add support for Derby database launch in new test driver Key: HIVE-25425 URL: https://issues.apache.org/jira/browse/HIVE-25425 Project: Hive Issue Type: Sub-task Reporter: Dantong Dong -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25424) Add support for Postgres database launch in new test driver
Dantong Dong created HIVE-25424: --- Summary: Add support for Postgres database launch in new test driver Key: HIVE-25424 URL: https://issues.apache.org/jira/browse/HIVE-25424 Project: Hive Issue Type: Sub-task Reporter: Dantong Dong Assignee: Dantong Dong -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25423) Add new test driver to automatically launch and load external database
Dantong Dong created HIVE-25423: --- Summary: Add new test driver to automatically launch and load external database Key: HIVE-25423 URL: https://issues.apache.org/jira/browse/HIVE-25423 Project: Hive Issue Type: Test Components: Testing Infrastructure, Tests Affects Versions: 3.1.2 Reporter: Dantong Dong Assignee: Dantong Dong Add new test driver(TestMiniLlapExtDBCliDriver) to automatically launch and load external database with specified custom script during test. This Issue was originated from [HIVE-24396|https://issues.apache.org/jira/browse/HIVE-24396]. Will add docs later !Screen Shot 2021-08-04 at 2.32.35 PM.png|width=724,height=379! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25282) Drop/Alter table in REMOTE db should fail
Dantong Dong created HIVE-25282: --- Summary: Drop/Alter table in REMOTE db should fail Key: HIVE-25282 URL: https://issues.apache.org/jira/browse/HIVE-25282 Project: Hive Issue Type: Sub-task Reporter: Dantong Dong Assignee: Dantong Dong Fix For: 4.0.0 Drop/Alter table statement should be explicitly rejected in REMOTE database. In consistency with HIVE-24425: Create table in REMOTE db should fail. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-4233) The TGT gotten from class 'CLIService' should be renewed on time
[ https://issues.apache.org/jira/browse/HIVE-4233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dong updated HIVE-4233: --- Status: Patch Available (was: Open) The TGT gotten from class 'CLIService' should be renewed on time - Key: HIVE-4233 URL: https://issues.apache.org/jira/browse/HIVE-4233 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.10.0 Environment: CentOS release 6.3 (Final) jdk1.6.0_31 HiveServer2 0.10.0-cdh4.2.0 Kerberos Security Reporter: dong Priority: Critical When the HIveServer2 have started more than 7 days, I use beeline shell to connect the HiveServer2,all operation failed. The log of HiveServer2 shows it was caused by the Kerberos auth failure,the exception stack trace is: 2013-03-26 11:55:20,932 ERROR hive.ql.metadata.Hive: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1084) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.init(RetryingMetaStoreClient.java:51) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:61) at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2140) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2151) at org.apache.hadoop.hive.ql.metadata.Hive.getDelegationToken(Hive.java:2275) at org.apache.hive.service.cli.CLIService.getDelegationTokenFromMetaStore(CLIService.java:358) at org.apache.hive.service.cli.thrift.ThriftCLIService.OpenSession(ThriftCLIService.java:127) at org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1073) at org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1058) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge20S.java:565) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedConstructorAccessor52.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1082) ... 16 more Caused by: java.lang.IllegalStateException: This ticket is no longer valid at javax.security.auth.kerberos.KerberosTicket.toString(KerberosTicket.java:601) at java.lang.String.valueOf(String.java:2826) at java.lang.StringBuilder.append(StringBuilder.java:115) at sun.security.jgss.krb5.SubjectComber.findAux(SubjectComber.java:120) at sun.security.jgss.krb5.SubjectComber.find(SubjectComber.java:41) at sun.security.jgss.krb5.Krb5Util.getTicket(Krb5Util.java:130) at sun.security.jgss.krb5.Krb5InitCredential$1.run(Krb5InitCredential.java:328) at java.security.AccessController.doPrivileged(Native Method) at sun.security.jgss.krb5.Krb5InitCredential.getTgt(Krb5InitCredential.java:325) at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:128) at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:106) at sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:172) at sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:209) at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:195) at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:162) at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:175) at org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94) at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253) at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
[jira] [Updated] (HIVE-4233) The TGT gotten from class 'CLIService' should be renewed on time
[ https://issues.apache.org/jira/browse/HIVE-4233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dong updated HIVE-4233: --- Status: Open (was: Patch Available) The TGT gotten from class 'CLIService' should be renewed on time - Key: HIVE-4233 URL: https://issues.apache.org/jira/browse/HIVE-4233 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.10.0 Environment: CentOS release 6.3 (Final) jdk1.6.0_31 HiveServer2 0.10.0-cdh4.2.0 Kerberos Security Reporter: dong Priority: Critical When the HIveServer2 have started more than 7 days, I use beeline shell to connect the HiveServer2,all operation failed. The log of HiveServer2 shows it was caused by the Kerberos auth failure,the exception stack trace is: 2013-03-26 11:55:20,932 ERROR hive.ql.metadata.Hive: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1084) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.init(RetryingMetaStoreClient.java:51) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:61) at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2140) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2151) at org.apache.hadoop.hive.ql.metadata.Hive.getDelegationToken(Hive.java:2275) at org.apache.hive.service.cli.CLIService.getDelegationTokenFromMetaStore(CLIService.java:358) at org.apache.hive.service.cli.thrift.ThriftCLIService.OpenSession(ThriftCLIService.java:127) at org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1073) at org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1058) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge20S.java:565) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedConstructorAccessor52.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1082) ... 16 more Caused by: java.lang.IllegalStateException: This ticket is no longer valid at javax.security.auth.kerberos.KerberosTicket.toString(KerberosTicket.java:601) at java.lang.String.valueOf(String.java:2826) at java.lang.StringBuilder.append(StringBuilder.java:115) at sun.security.jgss.krb5.SubjectComber.findAux(SubjectComber.java:120) at sun.security.jgss.krb5.SubjectComber.find(SubjectComber.java:41) at sun.security.jgss.krb5.Krb5Util.getTicket(Krb5Util.java:130) at sun.security.jgss.krb5.Krb5InitCredential$1.run(Krb5InitCredential.java:328) at java.security.AccessController.doPrivileged(Native Method) at sun.security.jgss.krb5.Krb5InitCredential.getTgt(Krb5InitCredential.java:325) at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:128) at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:106) at sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:172) at sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:209) at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:195) at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:162) at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:175) at org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94) at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253) at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
[jira] [Updated] (HIVE-4233) The TGT gotten from class 'CLIService' should be renewed on time
[ https://issues.apache.org/jira/browse/HIVE-4233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dong updated HIVE-4233: --- Attachment: 0001-FIX-HIVE-4233.patch Add HiveKerberosReloginHelper to schdule the command to renew the tgt. I tested this patch with a kerberos principal whose maxlife is 15 minutes,it does not fail after 15 mintues. When doesn't apply this path,the Keberos auth failure always thrown after 15 mintues,and the beeline can't reconnect the HiveServer2. Please review this patch ,maybe it can solve this problem. Thanks. The TGT gotten from class 'CLIService' should be renewed on time - Key: HIVE-4233 URL: https://issues.apache.org/jira/browse/HIVE-4233 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.10.0 Environment: CentOS release 6.3 (Final) jdk1.6.0_31 HiveServer2 0.10.0-cdh4.2.0 Kerberos Security Reporter: dong Priority: Critical Attachments: 0001-FIX-HIVE-4233.patch When the HIveServer2 have started more than 7 days, I use beeline shell to connect the HiveServer2,all operation failed. The log of HiveServer2 shows it was caused by the Kerberos auth failure,the exception stack trace is: 2013-03-26 11:55:20,932 ERROR hive.ql.metadata.Hive: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1084) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.init(RetryingMetaStoreClient.java:51) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:61) at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2140) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2151) at org.apache.hadoop.hive.ql.metadata.Hive.getDelegationToken(Hive.java:2275) at org.apache.hive.service.cli.CLIService.getDelegationTokenFromMetaStore(CLIService.java:358) at org.apache.hive.service.cli.thrift.ThriftCLIService.OpenSession(ThriftCLIService.java:127) at org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1073) at org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1058) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge20S.java:565) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedConstructorAccessor52.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1082) ... 16 more Caused by: java.lang.IllegalStateException: This ticket is no longer valid at javax.security.auth.kerberos.KerberosTicket.toString(KerberosTicket.java:601) at java.lang.String.valueOf(String.java:2826) at java.lang.StringBuilder.append(StringBuilder.java:115) at sun.security.jgss.krb5.SubjectComber.findAux(SubjectComber.java:120) at sun.security.jgss.krb5.SubjectComber.find(SubjectComber.java:41) at sun.security.jgss.krb5.Krb5Util.getTicket(Krb5Util.java:130) at sun.security.jgss.krb5.Krb5InitCredential$1.run(Krb5InitCredential.java:328) at java.security.AccessController.doPrivileged(Native Method) at sun.security.jgss.krb5.Krb5InitCredential.getTgt(Krb5InitCredential.java:325) at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:128) at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:106) at sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:172) at sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:209) at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:195) at
[jira] [Updated] (HIVE-4233) The TGT gotten from class 'CLIService' should be renewed on time
[ https://issues.apache.org/jira/browse/HIVE-4233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dong updated HIVE-4233: --- Status: Patch Available (was: Open) The TGT gotten from class 'CLIService' should be renewed on time - Key: HIVE-4233 URL: https://issues.apache.org/jira/browse/HIVE-4233 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.10.0 Environment: CentOS release 6.3 (Final) jdk1.6.0_31 HiveServer2 0.10.0-cdh4.2.0 Kerberos Security Reporter: dong Priority: Critical Attachments: 0001-FIX-HIVE-4233.patch When the HIveServer2 have started more than 7 days, I use beeline shell to connect the HiveServer2,all operation failed. The log of HiveServer2 shows it was caused by the Kerberos auth failure,the exception stack trace is: 2013-03-26 11:55:20,932 ERROR hive.ql.metadata.Hive: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1084) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.init(RetryingMetaStoreClient.java:51) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:61) at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2140) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2151) at org.apache.hadoop.hive.ql.metadata.Hive.getDelegationToken(Hive.java:2275) at org.apache.hive.service.cli.CLIService.getDelegationTokenFromMetaStore(CLIService.java:358) at org.apache.hive.service.cli.thrift.ThriftCLIService.OpenSession(ThriftCLIService.java:127) at org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1073) at org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1058) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge20S.java:565) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedConstructorAccessor52.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1082) ... 16 more Caused by: java.lang.IllegalStateException: This ticket is no longer valid at javax.security.auth.kerberos.KerberosTicket.toString(KerberosTicket.java:601) at java.lang.String.valueOf(String.java:2826) at java.lang.StringBuilder.append(StringBuilder.java:115) at sun.security.jgss.krb5.SubjectComber.findAux(SubjectComber.java:120) at sun.security.jgss.krb5.SubjectComber.find(SubjectComber.java:41) at sun.security.jgss.krb5.Krb5Util.getTicket(Krb5Util.java:130) at sun.security.jgss.krb5.Krb5InitCredential$1.run(Krb5InitCredential.java:328) at java.security.AccessController.doPrivileged(Native Method) at sun.security.jgss.krb5.Krb5InitCredential.getTgt(Krb5InitCredential.java:325) at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:128) at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:106) at sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:172) at sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:209) at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:195) at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:162) at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:175) at org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94) at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253) at
[jira] [Created] (HIVE-4233) The TGT gotten from class 'CLIService' should be renewed on time?
dong created HIVE-4233: -- Summary: The TGT gotten from class 'CLIService' should be renewed on time? Key: HIVE-4233 URL: https://issues.apache.org/jira/browse/HIVE-4233 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.10.0 Environment: CentOS release 6.3 (Final) jdk1.6.0_31 HiveServer2 0.10.0-cdh4.2.0 Kerberos Security Reporter: dong Priority: Critical When the HIveServer2 have started more than 7 days, I use beeline shell to connect the HiveServer2,all operation failed. The log of HiveServer2 shows it was caused by the Kerberos auth failure,the exception stack trace is: 2013-03-26 11:55:20,932 ERROR hive.ql.metadata.Hive: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1084) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.init(RetryingMetaStoreClient.java:51) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:61) at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2140) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2151) at org.apache.hadoop.hive.ql.metadata.Hive.getDelegationToken(Hive.java:2275) at org.apache.hive.service.cli.CLIService.getDelegationTokenFromMetaStore(CLIService.java:358) at org.apache.hive.service.cli.thrift.ThriftCLIService.OpenSession(ThriftCLIService.java:127) at org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1073) at org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1058) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge20S.java:565) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedConstructorAccessor52.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1082) ... 16 more Caused by: java.lang.IllegalStateException: This ticket is no longer valid at javax.security.auth.kerberos.KerberosTicket.toString(KerberosTicket.java:601) at java.lang.String.valueOf(String.java:2826) at java.lang.StringBuilder.append(StringBuilder.java:115) at sun.security.jgss.krb5.SubjectComber.findAux(SubjectComber.java:120) at sun.security.jgss.krb5.SubjectComber.find(SubjectComber.java:41) at sun.security.jgss.krb5.Krb5Util.getTicket(Krb5Util.java:130) at sun.security.jgss.krb5.Krb5InitCredential$1.run(Krb5InitCredential.java:328) at java.security.AccessController.doPrivileged(Native Method) at sun.security.jgss.krb5.Krb5InitCredential.getTgt(Krb5InitCredential.java:325) at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:128) at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:106) at sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:172) at sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:209) at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:195) at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:162) at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:175) at org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94) at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253) at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37) at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52) at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49) at java.security.AccessController.doPrivileged(Native Method)
[jira] [Updated] (HIVE-4233) The TGT gotten from class 'CLIService' should be renewed on time
[ https://issues.apache.org/jira/browse/HIVE-4233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dong updated HIVE-4233: --- Summary: The TGT gotten from class 'CLIService' should be renewed on time (was: The TGT gotten from class 'CLIService' should be renewed on time? ) The TGT gotten from class 'CLIService' should be renewed on time - Key: HIVE-4233 URL: https://issues.apache.org/jira/browse/HIVE-4233 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.10.0 Environment: CentOS release 6.3 (Final) jdk1.6.0_31 HiveServer2 0.10.0-cdh4.2.0 Kerberos Security Reporter: dong Priority: Critical When the HIveServer2 have started more than 7 days, I use beeline shell to connect the HiveServer2,all operation failed. The log of HiveServer2 shows it was caused by the Kerberos auth failure,the exception stack trace is: 2013-03-26 11:55:20,932 ERROR hive.ql.metadata.Hive: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1084) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.init(RetryingMetaStoreClient.java:51) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:61) at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2140) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2151) at org.apache.hadoop.hive.ql.metadata.Hive.getDelegationToken(Hive.java:2275) at org.apache.hive.service.cli.CLIService.getDelegationTokenFromMetaStore(CLIService.java:358) at org.apache.hive.service.cli.thrift.ThriftCLIService.OpenSession(ThriftCLIService.java:127) at org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1073) at org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1058) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge20S.java:565) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedConstructorAccessor52.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1082) ... 16 more Caused by: java.lang.IllegalStateException: This ticket is no longer valid at javax.security.auth.kerberos.KerberosTicket.toString(KerberosTicket.java:601) at java.lang.String.valueOf(String.java:2826) at java.lang.StringBuilder.append(StringBuilder.java:115) at sun.security.jgss.krb5.SubjectComber.findAux(SubjectComber.java:120) at sun.security.jgss.krb5.SubjectComber.find(SubjectComber.java:41) at sun.security.jgss.krb5.Krb5Util.getTicket(Krb5Util.java:130) at sun.security.jgss.krb5.Krb5InitCredential$1.run(Krb5InitCredential.java:328) at java.security.AccessController.doPrivileged(Native Method) at sun.security.jgss.krb5.Krb5InitCredential.getTgt(Krb5InitCredential.java:325) at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:128) at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:106) at sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:172) at sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:209) at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:195) at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:162) at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:175) at org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94) at
[jira] [Updated] (HIVE-3388) Improve Performance of UDF PERCENTILE_APPROX()
[ https://issues.apache.org/jira/browse/HIVE-3388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-3388: -- Resolution: Fixed Status: Resolved (was: Patch Available) Committed. Thanks Rongrong! Improve Performance of UDF PERCENTILE_APPROX() -- Key: HIVE-3388 URL: https://issues.apache.org/jira/browse/HIVE-3388 Project: Hive Issue Type: Task Reporter: Rongrong Zhong Assignee: Rongrong Zhong Priority: Minor Attachments: HIVE-3388.1.patch.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3388) Improve Performance of UDF PERCENTILE_APPROX()
[ https://issues.apache.org/jira/browse/HIVE-3388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13450192#comment-13450192 ] Siying Dong commented on HIVE-3388: --- +1 Improve Performance of UDF PERCENTILE_APPROX() -- Key: HIVE-3388 URL: https://issues.apache.org/jira/browse/HIVE-3388 Project: Hive Issue Type: Task Reporter: Rongrong Zhong Assignee: Rongrong Zhong Priority: Minor Attachments: HIVE-3388.1.patch.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-2247) ALTER TABLE RENAME PARTITION
[ https://issues.apache.org/jira/browse/HIVE-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong resolved HIVE-2247. --- Resolution: Fixed I committed the patch 7 months ago. Forgot to resolve it. Thanks Weiyan! ALTER TABLE RENAME PARTITION Key: HIVE-2247 URL: https://issues.apache.org/jira/browse/HIVE-2247 Project: Hive Issue Type: New Feature Reporter: Siying Dong Assignee: Weiyan Wang Attachments: HIVE-2247.10.patch.txt, HIVE-2247.11.patch.txt, HIVE-2247.3.patch.txt, HIVE-2247.4.patch.txt, HIVE-2247.5.patch.txt, HIVE-2247.6.patch.txt, HIVE-2247.7.patch.txt, HIVE-2247.8.patch.txt, HIVE-2247.9.patch.txt, HIVE-2247.9.patch.txt We need a ALTER TABLE TABLE RENAME PARTITIONfunction that is similar t ALTER TABLE RENAME. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-3030) escape more chars for script operator
[ https://issues.apache.org/jira/browse/HIVE-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong resolved HIVE-3030. --- Resolution: Fixed escape more chars for script operator - Key: HIVE-3030 URL: https://issues.apache.org/jira/browse/HIVE-3030 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Only new line was being escaped. The same behavior needs to be done for carriage returns, and tabs -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3030) escape more chars for script operator
[ https://issues.apache.org/jira/browse/HIVE-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13281137#comment-13281137 ] Siying Dong commented on HIVE-3030: --- Committed. Thanks Namit! escape more chars for script operator - Key: HIVE-3030 URL: https://issues.apache.org/jira/browse/HIVE-3030 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Only new line was being escaped. The same behavior needs to be done for carriage returns, and tabs -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3030) escape more chars for script operator
[ https://issues.apache.org/jira/browse/HIVE-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280294#comment-13280294 ] Siying Dong commented on HIVE-3030: --- Logic looks good to me. I'll run unit tests now. In the mean time, can you add tests to cover those new cases? Cases like escaping '\', and unescaping cases like '\\', ,'\\\t' or '\\\t'? escape more chars for script operator - Key: HIVE-3030 URL: https://issues.apache.org/jira/browse/HIVE-3030 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Only new line was being escaped. The same behavior needs to be done for carriage returns, and tabs -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3030) escape more chars for script operator
[ https://issues.apache.org/jira/browse/HIVE-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280399#comment-13280399 ] Siying Dong commented on HIVE-3030: --- Discussed with Namit offline. He is going to add one more test case now. escape more chars for script operator - Key: HIVE-3030 URL: https://issues.apache.org/jira/browse/HIVE-3030 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Only new line was being escaped. The same behavior needs to be done for carriage returns, and tabs -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3030) escape more chars for script operator
[ https://issues.apache.org/jira/browse/HIVE-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280470#comment-13280470 ] Siying Dong commented on HIVE-3030: --- Tests look good to me. Will run the test suites. Let's open a follow-up JIRA to escape a more complete list of characters. escape more chars for script operator - Key: HIVE-3030 URL: https://issues.apache.org/jira/browse/HIVE-3030 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Only new line was being escaped. The same behavior needs to be done for carriage returns, and tabs -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3030) escape more chars for script operator
[ https://issues.apache.org/jira/browse/HIVE-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13278018#comment-13278018 ] Siying Dong commented on HIVE-3030: --- Here is a general problem (maybe not related to new change to this patch): there is no way to output \n back to Hive. It will be translated to a new_line. In a similar way, if the column contains \n, it will not be escaped so the transform script will have no way to distinguish this from a new line. With this patch, more cases like this will be added. Maybe not for this patch but as a follow-up, we might want to escape \\ too to keep the escaping mapping a complete one. Other than that, the patch looks good to me. escape more chars for script operator - Key: HIVE-3030 URL: https://issues.apache.org/jira/browse/HIVE-3030 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Only new line was being escaped. The same behavior needs to be done for carriage returns, and tabs -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3030) escape more chars for script operator
[ https://issues.apache.org/jira/browse/HIVE-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13278020#comment-13278020 ] Siying Dong commented on HIVE-3030: --- I meaned Maybe not for this patch but as a follow-up, we might want to escape slashslash too to keep the escaping mapping a complete one. escape more chars for script operator - Key: HIVE-3030 URL: https://issues.apache.org/jira/browse/HIVE-3030 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Only new line was being escaped. The same behavior needs to be done for carriage returns, and tabs -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2451) TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression of HIVE-1538
[ https://issues.apache.org/jira/browse/HIVE-2451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2451: -- Status: Open (was: Patch Available) There's a bug. TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression of HIVE-1538 -- Key: HIVE-2451 URL: https://issues.apache.org/jira/browse/HIVE-2451 Project: Hive Issue Type: Bug Reporter: Siying Dong Assignee: Siying Dong Attachments: HIVE-2451.1.patch Example: select count(1) from bucket_table TABLESAMPLE(BUCKET xxx out of yyy) where partition_column = 'xxx' will not trigger input pruning. The reason is that we assume sample filtering operator only happens as the second filter after table scan, which is broken by HIVE-1538, even if the feature doesn't turn on. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2451) TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression of HIVE-1538
[ https://issues.apache.org/jira/browse/HIVE-2451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2451: -- Attachment: HIVE-2451.2.patch Changed an assert issue and recover the some test result files which were changed incorrectly by HIVE-1538. TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression of HIVE-1538 -- Key: HIVE-2451 URL: https://issues.apache.org/jira/browse/HIVE-2451 Project: Hive Issue Type: Bug Reporter: Siying Dong Assignee: Siying Dong Attachments: HIVE-2451.1.patch, HIVE-2451.2.patch Example: select count(1) from bucket_table TABLESAMPLE(BUCKET xxx out of yyy) where partition_column = 'xxx' will not trigger input pruning. The reason is that we assume sample filtering operator only happens as the second filter after table scan, which is broken by HIVE-1538, even if the feature doesn't turn on. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2451) TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression of HIVE-1538
[ https://issues.apache.org/jira/browse/HIVE-2451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2451: -- Attachment: HIVE-2451.3.patch Reran all test suites and fixed another several wrong test results. TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression of HIVE-1538 -- Key: HIVE-2451 URL: https://issues.apache.org/jira/browse/HIVE-2451 Project: Hive Issue Type: Bug Reporter: Siying Dong Assignee: Siying Dong Attachments: HIVE-2451.1.patch, HIVE-2451.2.patch, HIVE-2451.3.patch Example: select count(1) from bucket_table TABLESAMPLE(BUCKET xxx out of yyy) where partition_column = 'xxx' will not trigger input pruning. The reason is that we assume sample filtering operator only happens as the second filter after table scan, which is broken by HIVE-1538, even if the feature doesn't turn on. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2451) TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression of HIVE-1538
[ https://issues.apache.org/jira/browse/HIVE-2451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2451: -- Status: Patch Available (was: Open) TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression of HIVE-1538 -- Key: HIVE-2451 URL: https://issues.apache.org/jira/browse/HIVE-2451 Project: Hive Issue Type: Bug Reporter: Siying Dong Assignee: Siying Dong Attachments: HIVE-2451.1.patch, HIVE-2451.2.patch, HIVE-2451.3.patch Example: select count(1) from bucket_table TABLESAMPLE(BUCKET xxx out of yyy) where partition_column = 'xxx' will not trigger input pruning. The reason is that we assume sample filtering operator only happens as the second filter after table scan, which is broken by HIVE-1538, even if the feature doesn't turn on. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2451) TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression of HIVE-1538
TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression of HIVE-1538 -- Key: HIVE-2451 URL: https://issues.apache.org/jira/browse/HIVE-2451 Project: Hive Issue Type: Bug Reporter: Siying Dong Assignee: Siying Dong Example: select count(1) from bucket_table TABLESAMPLE(BUCKET xxx out of yyy) where partition_column = 'xxx' will not trigger input pruning. The reason is that we assume sample filtering operator only happens as the second filter after table scan, which is broken by HIVE-1538, even if the feature doesn't turn on. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2451) TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression of HIVE-1538
[ https://issues.apache.org/jira/browse/HIVE-2451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2451: -- Status: Patch Available (was: Open) TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression of HIVE-1538 -- Key: HIVE-2451 URL: https://issues.apache.org/jira/browse/HIVE-2451 Project: Hive Issue Type: Bug Reporter: Siying Dong Assignee: Siying Dong Attachments: HIVE-2451.1.patch Example: select count(1) from bucket_table TABLESAMPLE(BUCKET xxx out of yyy) where partition_column = 'xxx' will not trigger input pruning. The reason is that we assume sample filtering operator only happens as the second filter after table scan, which is broken by HIVE-1538, even if the feature doesn't turn on. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2451) TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression of HIVE-1538
[ https://issues.apache.org/jira/browse/HIVE-2451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2451: -- Attachment: HIVE-2451.1.patch Fix the problem by considering sample filter operator can be the first filter operator after table scan. TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression of HIVE-1538 -- Key: HIVE-2451 URL: https://issues.apache.org/jira/browse/HIVE-2451 Project: Hive Issue Type: Bug Reporter: Siying Dong Assignee: Siying Dong Attachments: HIVE-2451.1.patch Example: select count(1) from bucket_table TABLESAMPLE(BUCKET xxx out of yyy) where partition_column = 'xxx' will not trigger input pruning. The reason is that we assume sample filtering operator only happens as the second filter after table scan, which is broken by HIVE-1538, even if the feature doesn't turn on. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2360) create dynamic partition if and only if intermediate source has files
[ https://issues.apache.org/jira/browse/HIVE-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103145#comment-13103145 ] Siying Dong commented on HIVE-2360: --- Franklin finished his internship and left. We should find another one to finish the task. create dynamic partition if and only if intermediate source has files - Key: HIVE-2360 URL: https://issues.apache.org/jira/browse/HIVE-2360 Project: Hive Issue Type: Bug Reporter: Franklin Hu Assignee: Franklin Hu Priority: Minor Fix For: 0.8.0 Attachments: hive-2360.1.patch, hive-2360.2.patch There are some conditions under which a partition description is created due to insert overwriting a table using dynamic partitioning for partitions that that are empty (have no files). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-2360) create dynamic partition if and only if intermediate source has files
[ https://issues.apache.org/jira/browse/HIVE-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong reassigned HIVE-2360: - Assignee: (was: Franklin Hu) create dynamic partition if and only if intermediate source has files - Key: HIVE-2360 URL: https://issues.apache.org/jira/browse/HIVE-2360 Project: Hive Issue Type: Bug Reporter: Franklin Hu Priority: Minor Fix For: 0.8.0 Attachments: hive-2360.1.patch, hive-2360.2.patch There are some conditions under which a partition description is created due to insert overwriting a table using dynamic partitioning for partitions that that are empty (have no files). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2378) Warn user that precision is lost when bigint is implicitly cast to double.
[ https://issues.apache.org/jira/browse/HIVE-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093991#comment-13093991 ] Siying Dong commented on HIVE-2378: --- +1, will commit if unit tests pass. Warn user that precision is lost when bigint is implicitly cast to double. -- Key: HIVE-2378 URL: https://issues.apache.org/jira/browse/HIVE-2378 Project: Hive Issue Type: Improvement Reporter: Kevin Wilfong Assignee: Kevin Wilfong Attachments: HIVE-2378.1.patch.txt, HIVE-2378.2.patch.txt, HIVE-2378.3.patch.txt When a bigint is implicitly cast to a double (when a bigint is involved in an equality expression with a string or double) precision may be lost, resulting in unexpected behavior. Until we fix the underlying issue we should throw an error in strict mode, and a warning in nonstrict mode alerting the user about this. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-2378) Warn user that precision is lost when bigint is implicitly cast to double.
[ https://issues.apache.org/jira/browse/HIVE-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong resolved HIVE-2378. --- Resolution: Fixed Committed. Thanks Kevin! Warn user that precision is lost when bigint is implicitly cast to double. -- Key: HIVE-2378 URL: https://issues.apache.org/jira/browse/HIVE-2378 Project: Hive Issue Type: Improvement Reporter: Kevin Wilfong Assignee: Kevin Wilfong Attachments: HIVE-2378.1.patch.txt, HIVE-2378.2.patch.txt, HIVE-2378.3.patch.txt When a bigint is implicitly cast to a double (when a bigint is involved in an equality expression with a string or double) precision may be lost, resulting in unexpected behavior. Until we fix the underlying issue we should throw an error in strict mode, and a warning in nonstrict mode alerting the user about this. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2385) Local Mode can be more aggressive if LIMIT optimization is on
[ https://issues.apache.org/jira/browse/HIVE-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13091305#comment-13091305 ] Siying Dong commented on HIVE-2385: --- @Carl, are you still seeing tests failing? Local Mode can be more aggressive if LIMIT optimization is on - Key: HIVE-2385 URL: https://issues.apache.org/jira/browse/HIVE-2385 Project: Hive Issue Type: Improvement Reporter: Siying Dong Assignee: Siying Dong Priority: Minor Attachments: HIVE-2385.1.patch, HIVE-2385.2.patch Local mode now depends on total input data, but for LIMIT queries with no filtering, the data actually scanned can be much less and it's relatively predictable. We can place local mode more aggressively. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2352) create empty files if and only if table is bucketed and hive.enforce.bucketing=true
[ https://issues.apache.org/jira/browse/HIVE-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2352: -- Priority: Major (was: Minor) Issue Type: Improvement (was: Bug) create empty files if and only if table is bucketed and hive.enforce.bucketing=true --- Key: HIVE-2352 URL: https://issues.apache.org/jira/browse/HIVE-2352 Project: Hive Issue Type: Improvement Reporter: Franklin Hu Assignee: Franklin Hu Fix For: 0.8.0 Attachments: hive-2352.1.patch, hive-2352.2.patch, hive-2352.3.patch create table t1 (key int, value string) stored as rcfile; insert overwrite table t1 select * from src where false; Creates an empty RCFile with no rows and size 151B. The file not should be created since there are no rows. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2352) create empty files if and only if table is bucketed and hive.enforce.bucketing=true
[ https://issues.apache.org/jira/browse/HIVE-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13090399#comment-13090399 ] Siying Dong commented on HIVE-2352: --- I ran tests twice. Both crashed. I think it is an important patch and will improve latency of some queries (like scanning a large dataset for one or two rows) dramatically (Currently I sometimes do a ORDER BY LIMIT BY to speed it up if I know the data set is small). We should raise the priority. create empty files if and only if table is bucketed and hive.enforce.bucketing=true --- Key: HIVE-2352 URL: https://issues.apache.org/jira/browse/HIVE-2352 Project: Hive Issue Type: Bug Reporter: Franklin Hu Assignee: Franklin Hu Priority: Minor Fix For: 0.8.0 Attachments: hive-2352.1.patch, hive-2352.2.patch, hive-2352.3.patch create table t1 (key int, value string) stored as rcfile; insert overwrite table t1 select * from src where false; Creates an empty RCFile with no rows and size 151B. The file not should be created since there are no rows. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2352) create empty files if and only if table is bucketed and hive.enforce.bucketing=true
[ https://issues.apache.org/jira/browse/HIVE-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2352: -- Assignee: (was: Franklin Hu) create empty files if and only if table is bucketed and hive.enforce.bucketing=true --- Key: HIVE-2352 URL: https://issues.apache.org/jira/browse/HIVE-2352 Project: Hive Issue Type: Improvement Reporter: Franklin Hu Fix For: 0.8.0 Attachments: hive-2352.1.patch, hive-2352.2.patch, hive-2352.3.patch create table t1 (key int, value string) stored as rcfile; insert overwrite table t1 select * from src where false; Creates an empty RCFile with no rows and size 151B. The file not should be created since there are no rows. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2352) create empty files if and only if table is bucketed and hive.enforce.bucketing=true
[ https://issues.apache.org/jira/browse/HIVE-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089575#comment-13089575 ] Siying Dong commented on HIVE-2352: --- Franklin's internship ended. Let me apply his patch and see whether there is any failed tests. create empty files if and only if table is bucketed and hive.enforce.bucketing=true --- Key: HIVE-2352 URL: https://issues.apache.org/jira/browse/HIVE-2352 Project: Hive Issue Type: Bug Reporter: Franklin Hu Assignee: Franklin Hu Priority: Minor Fix For: 0.8.0 Attachments: hive-2352.1.patch, hive-2352.2.patch, hive-2352.3.patch create table t1 (key int, value string) stored as rcfile; insert overwrite table t1 select * from src where false; Creates an empty RCFile with no rows and size 151B. The file not should be created since there are no rows. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2385) Local Mode can be more aggressive if LIMIT optimization is on
[ https://issues.apache.org/jira/browse/HIVE-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2385: -- Attachment: HIVE-2385.2.patch Fix the bug and it passes autolocal1.q. I'm running the whole test suites now. Local Mode can be more aggressive if LIMIT optimization is on - Key: HIVE-2385 URL: https://issues.apache.org/jira/browse/HIVE-2385 Project: Hive Issue Type: Improvement Reporter: Siying Dong Assignee: Siying Dong Priority: Minor Attachments: HIVE-2385.1.patch, HIVE-2385.2.patch Local mode now depends on total input data, but for LIMIT queries with no filtering, the data actually scanned can be much less and it's relatively predictable. We can place local mode more aggressively. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2385) Local Mode can be more aggressive if LIMIT optimization is on
[ https://issues.apache.org/jira/browse/HIVE-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2385: -- Status: Patch Available (was: Open) Local Mode can be more aggressive if LIMIT optimization is on - Key: HIVE-2385 URL: https://issues.apache.org/jira/browse/HIVE-2385 Project: Hive Issue Type: Improvement Reporter: Siying Dong Assignee: Siying Dong Priority: Minor Attachments: HIVE-2385.1.patch, HIVE-2385.2.patch Local mode now depends on total input data, but for LIMIT queries with no filtering, the data actually scanned can be much less and it's relatively predictable. We can place local mode more aggressively. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2385) Local Mode can be more aggressive if LIMIT optimization is on
Local Mode can be more aggressive if LIMIT optimization is on - Key: HIVE-2385 URL: https://issues.apache.org/jira/browse/HIVE-2385 Project: Hive Issue Type: Improvement Reporter: Siying Dong Priority: Minor Local mode now depends on total input data, but for LIMIT queries with no filtering, the data actually scanned can be much less and it's relatively predictable. We can place local mode more aggressively. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-2385) Local Mode can be more aggressive if LIMIT optimization is on
[ https://issues.apache.org/jira/browse/HIVE-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong reassigned HIVE-2385: - Assignee: Siying Dong Local Mode can be more aggressive if LIMIT optimization is on - Key: HIVE-2385 URL: https://issues.apache.org/jira/browse/HIVE-2385 Project: Hive Issue Type: Improvement Reporter: Siying Dong Assignee: Siying Dong Priority: Minor Attachments: HIVE-2385.1.patch Local mode now depends on total input data, but for LIMIT queries with no filtering, the data actually scanned can be much less and it's relatively predictable. We can place local mode more aggressively. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2272) add TIMESTAMP data type
[ https://issues.apache.org/jira/browse/HIVE-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2272: -- Resolution: Fixed Status: Resolved (was: Patch Available) Thanks Franklin! add TIMESTAMP data type --- Key: HIVE-2272 URL: https://issues.apache.org/jira/browse/HIVE-2272 Project: Hive Issue Type: New Feature Reporter: Franklin Hu Assignee: Franklin Hu Fix For: 0.8.0 Attachments: hive-2272.1.patch, hive-2272.10.patch, hive-2272.11.patch, hive-2272.2.patch, hive-2272.3.patch, hive-2272.4.patch, hive-2272.5.patch, hive-2272.6.patch, hive-2272.7.patch, hive-2272.8.patch, hive-2272.9.patch Add TIMESTAMP type to serde2 that supports unix timestamp (1970-01-01 00:00:01 UTC to 2038-01-19 03:14:07 UTC) with optional nanosecond precision using both LazyBinary and LazySimple SerDes. For LazySimpleSerDe, the data is stored in jdbc compliant java.sql.Timestamp parsable strings. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-2282) Local mode needs to work well with block sampling
[ https://issues.apache.org/jira/browse/HIVE-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong resolved HIVE-2282. --- Resolution: Fixed Committed. Thanks Kevin! Local mode needs to work well with block sampling - Key: HIVE-2282 URL: https://issues.apache.org/jira/browse/HIVE-2282 Project: Hive Issue Type: Improvement Reporter: Siying Dong Assignee: Kevin Wilfong Attachments: HIVE-2282.1.patch.txt, HIVE-2282.2.patch.txt, HIVE-2282.3.patch.txt, HIVE-2282.4.patch.txt Currently, if block sampling is enabled and large set of data are sampled to a small set, local mode needs to be kicked in. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2272) add TIMESTAMP data type
[ https://issues.apache.org/jira/browse/HIVE-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13082007#comment-13082007 ] Siying Dong commented on HIVE-2272: --- +1, please open a follow up JIRA for setting timezones. add TIMESTAMP data type --- Key: HIVE-2272 URL: https://issues.apache.org/jira/browse/HIVE-2272 Project: Hive Issue Type: New Feature Reporter: Franklin Hu Assignee: Franklin Hu Fix For: 0.8.0 Attachments: hive-2272.1.patch, hive-2272.10.patch, hive-2272.2.patch, hive-2272.3.patch, hive-2272.4.patch, hive-2272.5.patch, hive-2272.6.patch, hive-2272.7.patch, hive-2272.8.patch, hive-2272.9.patch Add TIMESTAMP type to serde2 that supports unix timestamp (1970-01-01 00:00:01 UTC to 2038-01-19 03:14:07 UTC) with optional nanosecond precision using both LazyBinary and LazySimple SerDes. For LazySimpleSerDe, the data is stored in jdbc compliant java.sql.Timestamp parsable strings. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2309) Incorrect regular expression for extracting task id from filename
[ https://issues.apache.org/jira/browse/HIVE-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071409#comment-13071409 ] Siying Dong commented on HIVE-2309: --- can we limit number of digits for the attempt ID? Incorrect regular expression for extracting task id from filename - Key: HIVE-2309 URL: https://issues.apache.org/jira/browse/HIVE-2309 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.7.1 Reporter: Paul Yang Assignee: Paul Yang Priority: Minor Attachments: HIVE-2309.1.patch For producing the correct filenames for bucketed tables, there is a method in Utilities.java that extracts out the task id from the filename and replaces it with the bucket number. There is a bug in the regex that is used to extract this value for attempt numbers = 10: {code} re.match(^.*?([0-9]+)(_[0-9])?(\\..*)?$, 'attempt_201107090429_64965_m_001210_10').group(1) '10' re.match(^.*?([0-9]+)(_[0-9])?(\\..*)?$, 'attempt_201107090429_64965_m_001210_9').group(1) '001210' {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2309) Incorrect regular expression for extracting task id from filename
[ https://issues.apache.org/jira/browse/HIVE-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071420#comment-13071420 ] Siying Dong commented on HIVE-2309: --- +1, will commit after tests pass Incorrect regular expression for extracting task id from filename - Key: HIVE-2309 URL: https://issues.apache.org/jira/browse/HIVE-2309 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.7.1 Reporter: Paul Yang Assignee: Paul Yang Priority: Minor Attachments: HIVE-2309.1.patch, HIVE-2309.2.patch For producing the correct filenames for bucketed tables, there is a method in Utilities.java that extracts out the task id from the filename and replaces it with the bucket number. There is a bug in the regex that is used to extract this value for attempt numbers = 10: {code} re.match(^.*?([0-9]+)(_[0-9])?(\\..*)?$, 'attempt_201107090429_64965_m_001210_10').group(1) '10' re.match(^.*?([0-9]+)(_[0-9])?(\\..*)?$, 'attempt_201107090429_64965_m_001210_9').group(1) '001210' {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2248) Comparison Operators convert number types to common type instead of double if possible
[ https://issues.apache.org/jira/browse/HIVE-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2248: -- Summary: Comparison Operators convert number types to common type instead of double if possible (was: Comparison Operators convert number types to common type instead of double if necessary) Comparison Operators convert number types to common type instead of double if possible -- Key: HIVE-2248 URL: https://issues.apache.org/jira/browse/HIVE-2248 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Siying Dong Assignee: Siying Dong Fix For: 0.8.0 Attachments: HIVE-2248.1.patch Now if the two sides of comparison is of different type, we always convert both to double and compare. It was a slight regression from the change in https://issues.apache.org/jira/browse/HIVE-1638. The old UDFOPComparison, using GenericUDFBridge, always tried to find common type first. The worse case is this: If you did WHERE BIGINT_COLUMN = 0 , we always convert the column and 0 to double and compare, which is wasteful, though it is usually a minor costs in the system. But it is easy to fix. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2236) Cli: Print Hadoop's CPU milliseconds
[ https://issues.apache.org/jira/browse/HIVE-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2236: -- Attachment: HIVE-2236.4.patch Cli: Print Hadoop's CPU milliseconds Key: HIVE-2236 URL: https://issues.apache.org/jira/browse/HIVE-2236 Project: Hive Issue Type: New Feature Components: CLI Reporter: Siying Dong Assignee: Siying Dong Priority: Minor Attachments: HIVE-2236.1.patch, HIVE-2236.2.patch, HIVE-2236.3.patch, HIVE-2236.4.patch CPU Milliseonds information is available from Hadoop's framework. Printing it out to Hive CLI when executing a job will help users to know more about their jobs. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2249) When creating constant expression for numbers, try to infer type from another comparison operand, instead of trying to use integer first, and then long and double
[ https://issues.apache.org/jira/browse/HIVE-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13070811#comment-13070811 ] Siying Dong commented on HIVE-2249: --- Joseph, can you handle the string case too? When creating constant expression for numbers, try to infer type from another comparison operand, instead of trying to use integer first, and then long and double -- Key: HIVE-2249 URL: https://issues.apache.org/jira/browse/HIVE-2249 Project: Hive Issue Type: Improvement Reporter: Siying Dong Assignee: Joseph Barillari Attachments: HIVE-2249.1.patch.txt The current code to build constant expression for numbers, here is the code: try { v = Double.valueOf(expr.getText()); v = Long.valueOf(expr.getText()); v = Integer.valueOf(expr.getText()); } catch (NumberFormatException e) { // do nothing here, we will throw an exception in the following block } if (v == null) { throw new SemanticException(ErrorMsg.INVALID_NUMERICAL_CONSTANT .getMsg(expr)); } return new ExprNodeConstantDesc(v); The for the case that WHERE BIG_INT_COLUMN = 0, or WHERE DOUBLE_COLUMN = 0, we always have to do a type conversion when comparing, which is unnecessary if it is slightly smarter to choose type when creating the constant expression. We can simply walk one level up the tree, find another comparison party and use the same type with that one if it is possible. For user's wrong query like 'INT_COLUMN=1.1', we can even do more. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2282) Local mode needs to work well with block sampling
[ https://issues.apache.org/jira/browse/HIVE-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13070865#comment-13070865 ] Siying Dong commented on HIVE-2282: --- I don't know why but I ran the test suites twice and both failed. Can you rebase your codes and try to run the whole test suites and see whether all the tests pass? I'll try again too. Local mode needs to work well with block sampling - Key: HIVE-2282 URL: https://issues.apache.org/jira/browse/HIVE-2282 Project: Hive Issue Type: Improvement Reporter: Siying Dong Assignee: Kevin Wilfong Attachments: HIVE-2282.1.patch.txt, HIVE-2282.2.patch.txt, HIVE-2282.3.patch.txt, HIVE-2282.4.patch.txt Currently, if block sampling is enabled and large set of data are sampled to a small set, local mode needs to be kicked in. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2282) Local mode needs to work well with block sampling
[ https://issues.apache.org/jira/browse/HIVE-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069610#comment-13069610 ] Siying Dong commented on HIVE-2282: --- Kevin, you forgot to add file ql/src/test/results/clientpositive/sample_islocalmode_hook.q.out to the patch. Local mode needs to work well with block sampling - Key: HIVE-2282 URL: https://issues.apache.org/jira/browse/HIVE-2282 Project: Hive Issue Type: Improvement Reporter: Siying Dong Assignee: Kevin Wilfong Attachments: HIVE-2282.1.patch.txt, HIVE-2282.2.patch.txt, HIVE-2282.3.patch.txt Currently, if block sampling is enabled and large set of data are sampled to a small set, local mode needs to be kicked in. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2282) Local mode needs to work well with block sampling
[ https://issues.apache.org/jira/browse/HIVE-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069611#comment-13069611 ] Siying Dong commented on HIVE-2282: --- Also, query like select key, value from sih_src tablesample(1 percent) actually doesn't generate stable result. You can use select count(1) instead. That will generate correct results. Local mode needs to work well with block sampling - Key: HIVE-2282 URL: https://issues.apache.org/jira/browse/HIVE-2282 Project: Hive Issue Type: Improvement Reporter: Siying Dong Assignee: Kevin Wilfong Attachments: HIVE-2282.1.patch.txt, HIVE-2282.2.patch.txt, HIVE-2282.3.patch.txt Currently, if block sampling is enabled and large set of data are sampled to a small set, local mode needs to be kicked in. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2296) bad compressed file names from insert into
[ https://issues.apache.org/jira/browse/HIVE-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2296: -- Resolution: Fixed Status: Resolved (was: Patch Available) committed. Thanks Franklin! bad compressed file names from insert into -- Key: HIVE-2296 URL: https://issues.apache.org/jira/browse/HIVE-2296 Project: Hive Issue Type: Bug Affects Versions: 0.8.0 Reporter: Franklin Hu Assignee: Franklin Hu Fix For: 0.8.0 Attachments: hive-2296.1.patch, hive-2296.2.patch When INSERT INTO is run on a table with compressed output (hive.exec.compress.output=true) and existing files in the table, it may copy the new files in bad file names: Before INSERT INTO: 00_0.gz After INSERT INTO: 00_0.gz 00_0.gz_copy_1 This causes corrupted output when doing a SELECT * on the table. Correct behavior should be to pick a valid filename such as: 00_0_copy_1.gz -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-2249) When creating constant expression for numbers, try to infer type from another comparison operand, instead of trying to use integer first, and then long and double
[ https://issues.apache.org/jira/browse/HIVE-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong reassigned HIVE-2249: - Assignee: Joseph Barillari When creating constant expression for numbers, try to infer type from another comparison operand, instead of trying to use integer first, and then long and double -- Key: HIVE-2249 URL: https://issues.apache.org/jira/browse/HIVE-2249 Project: Hive Issue Type: Improvement Reporter: Siying Dong Assignee: Joseph Barillari Attachments: HIVE-2249.1.patch.txt The current code to build constant expression for numbers, here is the code: try { v = Double.valueOf(expr.getText()); v = Long.valueOf(expr.getText()); v = Integer.valueOf(expr.getText()); } catch (NumberFormatException e) { // do nothing here, we will throw an exception in the following block } if (v == null) { throw new SemanticException(ErrorMsg.INVALID_NUMERICAL_CONSTANT .getMsg(expr)); } return new ExprNodeConstantDesc(v); The for the case that WHERE BIG_INT_COLUMN = 0, or WHERE DOUBLE_COLUMN = 0, we always have to do a type conversion when comparing, which is unnecessary if it is slightly smarter to choose type when creating the constant expression. We can simply walk one level up the tree, find another comparison party and use the same type with that one if it is possible. For user's wrong query like 'INT_COLUMN=1.1', we can even do more. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2236) Cli: Print Hadoop's CPU milliseconds
[ https://issues.apache.org/jira/browse/HIVE-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2236: -- Attachment: HIVE-2236.3.patch fix a bug Cli: Print Hadoop's CPU milliseconds Key: HIVE-2236 URL: https://issues.apache.org/jira/browse/HIVE-2236 Project: Hive Issue Type: New Feature Components: CLI Reporter: Siying Dong Assignee: Siying Dong Priority: Minor Attachments: HIVE-2236.1.patch, HIVE-2236.2.patch, HIVE-2236.3.patch CPU Milliseonds information is available from Hadoop's framework. Printing it out to Hive CLI when executing a job will help users to know more about their jobs. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2236) Cli: Print Hadoop's CPU milliseconds
[ https://issues.apache.org/jira/browse/HIVE-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2236: -- Status: Open (was: Patch Available) Cli: Print Hadoop's CPU milliseconds Key: HIVE-2236 URL: https://issues.apache.org/jira/browse/HIVE-2236 Project: Hive Issue Type: New Feature Components: CLI Reporter: Siying Dong Assignee: Siying Dong Priority: Minor Attachments: HIVE-2236.1.patch, HIVE-2236.2.patch, HIVE-2236.3.patch CPU Milliseonds information is available from Hadoop's framework. Printing it out to Hive CLI when executing a job will help users to know more about their jobs. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2236) Cli: Print Hadoop's CPU milliseconds
[ https://issues.apache.org/jira/browse/HIVE-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2236: -- Status: Patch Available (was: Open) Cli: Print Hadoop's CPU milliseconds Key: HIVE-2236 URL: https://issues.apache.org/jira/browse/HIVE-2236 Project: Hive Issue Type: New Feature Components: CLI Reporter: Siying Dong Assignee: Siying Dong Priority: Minor Attachments: HIVE-2236.1.patch, HIVE-2236.2.patch, HIVE-2236.3.patch CPU Milliseonds information is available from Hadoop's framework. Printing it out to Hive CLI when executing a job will help users to know more about their jobs. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2247) ALTER TABLE RENAME PARTITION
[ https://issues.apache.org/jira/browse/HIVE-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069126#comment-13069126 ] Siying Dong commented on HIVE-2247: --- I'm looking at the patch. Please test the backward compatible between the old server, new client and new server, old client. Please come by if you don't know how to test it. ALTER TABLE RENAME PARTITION Key: HIVE-2247 URL: https://issues.apache.org/jira/browse/HIVE-2247 Project: Hive Issue Type: New Feature Reporter: Siying Dong Assignee: Weiyan Wang Attachments: HIVE-2247.3.patch.txt, HIVE-2247.4.patch.txt, HIVE-2247.5.patch.txt We need a ALTER TABLE TABLE RENAME PARTITIONfunction that is similar t ALTER TABLE RENAME. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2296) bad compressed file names from insert into
[ https://issues.apache.org/jira/browse/HIVE-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069314#comment-13069314 ] Siying Dong commented on HIVE-2296: --- +1 bad compressed file names from insert into -- Key: HIVE-2296 URL: https://issues.apache.org/jira/browse/HIVE-2296 Project: Hive Issue Type: Bug Affects Versions: 0.8.0 Reporter: Franklin Hu Assignee: Franklin Hu Fix For: 0.8.0 Attachments: hive-2296.1.patch, hive-2296.2.patch When INSERT INTO is run on a table with compressed output (hive.exec.compress.output=true) and existing files in the table, it may copy the new files in bad file names: Before INSERT INTO: 00_0.gz After INSERT INTO: 00_0.gz 00_0.gz_copy_1 This causes corrupted output when doing a SELECT * on the table. Correct behavior should be to pick a valid filename such as: 00_0_copy_1.gz -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2236) Cli: Print Hadoop's CPU milliseconds
[ https://issues.apache.org/jira/browse/HIVE-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2236: -- Attachment: HIVE-2236.2.patch remove the MapRedStat list from DriverContext and add more counters. Cli: Print Hadoop's CPU milliseconds Key: HIVE-2236 URL: https://issues.apache.org/jira/browse/HIVE-2236 Project: Hive Issue Type: New Feature Components: CLI Reporter: Siying Dong Assignee: Siying Dong Priority: Minor Attachments: HIVE-2236.1.patch, HIVE-2236.2.patch CPU Milliseonds information is available from Hadoop's framework. Printing it out to Hive CLI when executing a job will help users to know more about their jobs. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2201) reduce name node calls in hive by creating temporary directories
[ https://issues.apache.org/jira/browse/HIVE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2201: -- Attachment: HIVE-2201.4.patch 1. change block merge task too 2. change the capital file name reduce name node calls in hive by creating temporary directories Key: HIVE-2201 URL: https://issues.apache.org/jira/browse/HIVE-2201 Project: Hive Issue Type: Improvement Reporter: Namit Jain Assignee: Siying Dong Attachments: HIVE-2201.1.patch, HIVE-2201.2.patch, HIVE-2201.3.patch, HIVE-2201.4.patch Currently, in Hive, when a file gets written by a FileSinkOperator, the sequence of operations is as follows: 1. In tmp directory tmp1, create a tmp file _tmp_1 2. At the end of the operator, move /tmp1/_tmp_1 to /tmp1/1 3. Move directory /tmp1 to /tmp2 4. For all files in /tmp2, remove all files starting with _tmp and duplicate files. Due to speculative execution, a lot of temporary files are created in /tmp1 (or /tmp2). This leads to a lot of name node calls, specially for large queries. The protocol above can be modified slightly: 1. In tmp directory tmp1, create a tmp file _tmp_1 2. At the end of the operator, move /tmp1/_tmp_1 to /tmp2/1 3. Move directory /tmp2 to /tmp3 4. For all files in /tmp3, remove all duplicate files. This should reduce the number of tmp files. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2282) Local mode needs to work well with block sampling
[ https://issues.apache.org/jira/browse/HIVE-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13066299#comment-13066299 ] Siying Dong commented on HIVE-2282: --- +1, will commit after testing. Local mode needs to work well with block sampling - Key: HIVE-2282 URL: https://issues.apache.org/jira/browse/HIVE-2282 Project: Hive Issue Type: Improvement Reporter: Siying Dong Assignee: Kevin Wilfong Attachments: HIVE-2282.1.patch.txt, HIVE-2282.2.patch.txt, HIVE-2282.3.patch.txt Currently, if block sampling is enabled and large set of data are sampled to a small set, local mode needs to be kicked in. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-2247) ALTER TABLE RENAME PARTITION
[ https://issues.apache.org/jira/browse/HIVE-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong reassigned HIVE-2247: - Assignee: Weiyan Wang ALTER TABLE RENAME PARTITION Key: HIVE-2247 URL: https://issues.apache.org/jira/browse/HIVE-2247 Project: Hive Issue Type: New Feature Reporter: Siying Dong Assignee: Weiyan Wang We need a ALTER TABLE TABLE RENAME PARTITIONfunction that is similar t ALTER TABLE RENAME. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2282) Local mode needs to work well with block sampling
Local mode needs to work well with block sampling - Key: HIVE-2282 URL: https://issues.apache.org/jira/browse/HIVE-2282 Project: Hive Issue Type: Improvement Reporter: Siying Dong Currently, if block sampling is enabled and large set of data are sampled to a small set, local mode needs to be kicked in. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2272) add TIMESTAMP data type
[ https://issues.apache.org/jira/browse/HIVE-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13064935#comment-13064935 ] Siying Dong commented on HIVE-2272: --- Can you add it to review board? add TIMESTAMP data type --- Key: HIVE-2272 URL: https://issues.apache.org/jira/browse/HIVE-2272 Project: Hive Issue Type: New Feature Reporter: Franklin Hu Assignee: Franklin Hu Attachments: hive-2272.1.patch, hive-2272.2.patch, hive-2272.3.patch Add TIMESTAMP type to serde2 that supports unix timestamp (1970-01-01 00:00:01 UTC to 2038-01-19 03:14:07 UTC) with optional nanosecond precision using both LazyBinary and LazySimple SerDes. For LazySimpleSerDe, the data is stored in jdbc compliant java.sql.Timestamp parsable strings. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2247) ALTER TABLE RENAME PARTITION
[ https://issues.apache.org/jira/browse/HIVE-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13064994#comment-13064994 ] Siying Dong commented on HIVE-2247: --- Sorry for the confusion. I just meaned to change the directory name where the data is, and change the location parameter in the partition metadata. If we decide not to change physical path, we just change partition name. If we need to change the physical path, then we need to change partition name and location. ALTER TABLE RENAME PARTITION Key: HIVE-2247 URL: https://issues.apache.org/jira/browse/HIVE-2247 Project: Hive Issue Type: New Feature Reporter: Siying Dong Assignee: Weiyan Wang Attachments: HIVE-2247.3.patch.txt We need a ALTER TABLE TABLE RENAME PARTITIONfunction that is similar t ALTER TABLE RENAME. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2247) ALTER TABLE RENAME PARTITION
[ https://issues.apache.org/jira/browse/HIVE-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13063457#comment-13063457 ] Siying Dong commented on HIVE-2247: --- The use case of use is that we want to have sanity check for the quality of the data in a temp partition name before we move the data to the partition that people consider that the partition is ready. We want to avoid data scanning for this operation. ALTER TABLE RENAME PARTITION Key: HIVE-2247 URL: https://issues.apache.org/jira/browse/HIVE-2247 Project: Hive Issue Type: New Feature Reporter: Siying Dong We need a ALTER TABLE TABLE RENAME PARTITIONfunction that is similar t ALTER TABLE RENAME. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1721) use bloom filters to improve the performance of joins
[ https://issues.apache.org/jira/browse/HIVE-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13063501#comment-13063501 ] Siying Dong commented on HIVE-1721: --- Andrew, what do you mean by the filter could be built in parallel with an MR job? Our initial plan was to only build filter based on smaller tables and apply the filter against the big table to reduce data to be shuffled. For the syntax, the plan is to use syntax like MAPJOIN. We can do something like SELECT /*+ BLOOMFILTER(t1) +*/ ... FROM t1 JOIN t2 ... use bloom filters to improve the performance of joins - Key: HIVE-1721 URL: https://issues.apache.org/jira/browse/HIVE-1721 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Assignee: J. Andrew Key Labels: optimization In case of map-joins, it is likely that the big table will not find many matching rows from the small table. Currently, we perform a hash-map lookup for every row in the big table, which can be pretty expensive. It might be useful to try out a bloom-filter containing all the elements in the small table. Each element from the big table is first searched in the bloom filter, and only in case of a positive match, the small table hash table is explored. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-306) Support INSERT [INTO] destination
[ https://issues.apache.org/jira/browse/HIVE-306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-306: - Status: Patch Available (was: Open) Support INSERT [INTO] destination --- Key: HIVE-306 URL: https://issues.apache.org/jira/browse/HIVE-306 Project: Hive Issue Type: New Feature Reporter: Zheng Shao Assignee: Franklin Hu Attachments: hive-306.1.patch, hive-306.2.patch, hive-306.3.patch, hive-306.4.patch Currently hive only supports INSERT OVERWRITE destination. We should support INSERT [INTO] destination. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2236) Cli: Print Hadoop's CPU milliseconds
[ https://issues.apache.org/jira/browse/HIVE-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2236: -- Status: Patch Available (was: Open) Cli: Print Hadoop's CPU milliseconds Key: HIVE-2236 URL: https://issues.apache.org/jira/browse/HIVE-2236 Project: Hive Issue Type: New Feature Components: CLI Reporter: Siying Dong Assignee: Siying Dong Priority: Minor Attachments: HIVE-2236.1.patch CPU Milliseonds information is available from Hadoop's framework. Printing it out to Hive CLI when executing a job will help users to know more about their jobs. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2247) CREATE TABLE RENAME PARTITION
CREATE TABLE RENAME PARTITION - Key: HIVE-2247 URL: https://issues.apache.org/jira/browse/HIVE-2247 Project: Hive Issue Type: New Feature Reporter: Siying Dong We need a ALTER TABLE TABLE RENAME PARTITIONfunction that is similar t ALTER TABLE RENAME. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2248) Comparison Operators convert number types to common type instead of double if necessary
Comparison Operators convert number types to common type instead of double if necessary --- Key: HIVE-2248 URL: https://issues.apache.org/jira/browse/HIVE-2248 Project: Hive Issue Type: Bug Reporter: Siying Dong Assignee: Siying Dong Now if the two sides of comparison is of different type, we always convert both to double and compare. It was a slight regression from the change in https://issues.apache.org/jira/browse/HIVE-1638. The old UDFOPComparison, using GenericUDFBridge, always tried to find common type first. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2248) Comparison Operators convert number types to common type instead of double if necessary
[ https://issues.apache.org/jira/browse/HIVE-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2248: -- Description: Now if the two sides of comparison is of different type, we always convert both to double and compare. It was a slight regression from the change in https://issues.apache.org/jira/browse/HIVE-1638. The old UDFOPComparison, using GenericUDFBridge, always tried to find common type first. The worse case is this: If you did WHERE BIGINT_COLUMN = 0 , we always convert the column and 0 to double and compare, which is wasteful, though it is usually a minor costs in the system. But it is easy to fix. was:Now if the two sides of comparison is of different type, we always convert both to double and compare. It was a slight regression from the change in https://issues.apache.org/jira/browse/HIVE-1638. The old UDFOPComparison, using GenericUDFBridge, always tried to find common type first. Comparison Operators convert number types to common type instead of double if necessary --- Key: HIVE-2248 URL: https://issues.apache.org/jira/browse/HIVE-2248 Project: Hive Issue Type: Bug Reporter: Siying Dong Assignee: Siying Dong Now if the two sides of comparison is of different type, we always convert both to double and compare. It was a slight regression from the change in https://issues.apache.org/jira/browse/HIVE-1638. The old UDFOPComparison, using GenericUDFBridge, always tried to find common type first. The worse case is this: If you did WHERE BIGINT_COLUMN = 0 , we always convert the column and 0 to double and compare, which is wasteful, though it is usually a minor costs in the system. But it is easy to fix. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2248) Comparison Operators convert number types to common type instead of double if necessary
[ https://issues.apache.org/jira/browse/HIVE-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2248: -- Status: Patch Available (was: Open) Comparison Operators convert number types to common type instead of double if necessary --- Key: HIVE-2248 URL: https://issues.apache.org/jira/browse/HIVE-2248 Project: Hive Issue Type: Bug Reporter: Siying Dong Assignee: Siying Dong Attachments: HIVE-2248.1.patch Now if the two sides of comparison is of different type, we always convert both to double and compare. It was a slight regression from the change in https://issues.apache.org/jira/browse/HIVE-1638. The old UDFOPComparison, using GenericUDFBridge, always tried to find common type first. The worse case is this: If you did WHERE BIGINT_COLUMN = 0 , we always convert the column and 0 to double and compare, which is wasteful, though it is usually a minor costs in the system. But it is easy to fix. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2248) Comparison Operators convert number types to common type instead of double if necessary
[ https://issues.apache.org/jira/browse/HIVE-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2248: -- Attachment: HIVE-2248.1.patch Comparison Operators convert number types to common type instead of double if necessary --- Key: HIVE-2248 URL: https://issues.apache.org/jira/browse/HIVE-2248 Project: Hive Issue Type: Bug Reporter: Siying Dong Assignee: Siying Dong Attachments: HIVE-2248.1.patch Now if the two sides of comparison is of different type, we always convert both to double and compare. It was a slight regression from the change in https://issues.apache.org/jira/browse/HIVE-1638. The old UDFOPComparison, using GenericUDFBridge, always tried to find common type first. The worse case is this: If you did WHERE BIGINT_COLUMN = 0 , we always convert the column and 0 to double and compare, which is wasteful, though it is usually a minor costs in the system. But it is easy to fix. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2249) When creating constant expression for numbers, try to infer type from another comparison operand, instead of trying to use integer first, and then long and double
When creating constant expression for numbers, try to infer type from another comparison operand, instead of trying to use integer first, and then long and double -- Key: HIVE-2249 URL: https://issues.apache.org/jira/browse/HIVE-2249 Project: Hive Issue Type: Improvement Reporter: Siying Dong The current code to build constant expression for numbers, here is the code: try { v = Double.valueOf(expr.getText()); v = Long.valueOf(expr.getText()); v = Integer.valueOf(expr.getText()); } catch (NumberFormatException e) { // do nothing here, we will throw an exception in the following block } if (v == null) { throw new SemanticException(ErrorMsg.INVALID_NUMERICAL_CONSTANT .getMsg(expr)); } return new ExprNodeConstantDesc(v); The for the case that WHERE BIG_INT_COLUMN = 0, or WHERE DOUBLE_COLUMN = 0, we always have to do a type conversion when comparing, which is unnecessary if it is slightly smarter to choose type when creating the constant expression. We can simply walk one level up the tree, find another comparison party and use the same type with that one if it is possible. For user's wrong query like 'INT_COLUMN=1.1', we can even do more. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-306) Support INSERT [INTO] destination
[ https://issues.apache.org/jira/browse/HIVE-306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058140#comment-13058140 ] Siying Dong commented on HIVE-306: -- +1. Looks good to me for now. I'm running tests. If it is committed, please open a follow-up JIRA for making moving files more efficient and compacting smaller files smarter for it. Support INSERT [INTO] destination --- Key: HIVE-306 URL: https://issues.apache.org/jira/browse/HIVE-306 Project: Hive Issue Type: New Feature Reporter: Zheng Shao Assignee: Franklin Hu Attachments: hive-306.1.patch, hive-306.2.patch Currently hive only supports INSERT OVERWRITE destination. We should support INSERT [INTO] destination. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2035) Use block-level merge for RCFile if merging intermediate results are needed
[ https://issues.apache.org/jira/browse/HIVE-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055355#comment-13055355 ] Siying Dong commented on HIVE-2035: --- +1, will run regression tests Use block-level merge for RCFile if merging intermediate results are needed --- Key: HIVE-2035 URL: https://issues.apache.org/jira/browse/HIVE-2035 Project: Hive Issue Type: Improvement Reporter: Ning Zhang Assignee: Franklin Hu Attachments: hive-2035.1.patch, hive-2035.3.patch Currently if hive.merge.mapredfiles and/or hive.merge.mapfile is set to true the intermediate data could be merged using an additional MapReduce job. This could be quite expensive if the data size is large. With HIVE-1950, merging can be done in the RCFile block level so that it bypasses the (de-)compression, (de-)serialization phases. This could improve the merge process significantly. This JIRA should handle the case where the input table is not stored in RCFile, but the destination table is (which requires the intermediate data should be stored in the same format as the destination table). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2035) Use block-level merge for RCFile if merging intermediate results are needed
[ https://issues.apache.org/jira/browse/HIVE-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2035: -- Status: Patch Available (was: Open) Use block-level merge for RCFile if merging intermediate results are needed --- Key: HIVE-2035 URL: https://issues.apache.org/jira/browse/HIVE-2035 Project: Hive Issue Type: Improvement Reporter: Ning Zhang Assignee: Franklin Hu Attachments: hive-2035.1.patch, hive-2035.3.patch Currently if hive.merge.mapredfiles and/or hive.merge.mapfile is set to true the intermediate data could be merged using an additional MapReduce job. This could be quite expensive if the data size is large. With HIVE-1950, merging can be done in the RCFile block level so that it bypasses the (de-)compression, (de-)serialization phases. This could improve the merge process significantly. This JIRA should handle the case where the input table is not stored in RCFile, but the destination table is (which requires the intermediate data should be stored in the same format as the destination table). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2035) Use block-level merge for RCFile if merging intermediate results are needed
[ https://issues.apache.org/jira/browse/HIVE-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13056205#comment-13056205 ] Siying Dong commented on HIVE-2035: --- committed Use block-level merge for RCFile if merging intermediate results are needed --- Key: HIVE-2035 URL: https://issues.apache.org/jira/browse/HIVE-2035 Project: Hive Issue Type: Improvement Reporter: Ning Zhang Assignee: Franklin Hu Attachments: hive-2035.1.patch, hive-2035.3.patch Currently if hive.merge.mapredfiles and/or hive.merge.mapfile is set to true the intermediate data could be merged using an additional MapReduce job. This could be quite expensive if the data size is large. With HIVE-1950, merging can be done in the RCFile block level so that it bypasses the (de-)compression, (de-)serialization phases. This could improve the merge process significantly. This JIRA should handle the case where the input table is not stored in RCFile, but the destination table is (which requires the intermediate data should be stored in the same format as the destination table). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2236) Cli: Print Hadoop's CPU milliseconds
Cli: Print Hadoop's CPU milliseconds Key: HIVE-2236 URL: https://issues.apache.org/jira/browse/HIVE-2236 Project: Hive Issue Type: New Feature Components: CLI Reporter: Siying Dong Priority: Minor CPU Milliseonds information is available from Hadoop's framework. Printing it out to Hive CLI when executing a job will help users to know more about their jobs. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2236) Cli: Print Hadoop's CPU milliseconds
[ https://issues.apache.org/jira/browse/HIVE-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2236: -- Status: Patch Available (was: Open) Cli: Print Hadoop's CPU milliseconds Key: HIVE-2236 URL: https://issues.apache.org/jira/browse/HIVE-2236 Project: Hive Issue Type: New Feature Components: CLI Reporter: Siying Dong Assignee: Siying Dong Priority: Minor Attachments: HIVE-2236.1.patch CPU Milliseonds information is available from Hadoop's framework. Printing it out to Hive CLI when executing a job will help users to know more about their jobs. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2201) reduce name node calls in hive by creating temporary directories
[ https://issues.apache.org/jira/browse/HIVE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054188#comment-13054188 ] Siying Dong commented on HIVE-2201: --- ping reduce name node calls in hive by creating temporary directories Key: HIVE-2201 URL: https://issues.apache.org/jira/browse/HIVE-2201 Project: Hive Issue Type: Improvement Reporter: Namit Jain Assignee: Siying Dong Attachments: HIVE-2201.1.patch, HIVE-2201.2.patch, HIVE-2201.3.patch Currently, in Hive, when a file gets written by a FileSinkOperator, the sequence of operations is as follows: 1. In tmp directory tmp1, create a tmp file _tmp_1 2. At the end of the operator, move /tmp1/_tmp_1 to /tmp1/1 3. Move directory /tmp1 to /tmp2 4. For all files in /tmp2, remove all files starting with _tmp and duplicate files. Due to speculative execution, a lot of temporary files are created in /tmp1 (or /tmp2). This leads to a lot of name node calls, specially for large queries. The protocol above can be modified slightly: 1. In tmp directory tmp1, create a tmp file _tmp_1 2. At the end of the operator, move /tmp1/_tmp_1 to /tmp2/1 3. Move directory /tmp2 to /tmp3 4. For all files in /tmp3, remove all duplicate files. This should reduce the number of tmp files. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2035) Use block-level merge for RCFile if merging intermediate results are needed
[ https://issues.apache.org/jira/browse/HIVE-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051415#comment-13051415 ] Siying Dong commented on HIVE-2035: --- will take a look. Use block-level merge for RCFile if merging intermediate results are needed --- Key: HIVE-2035 URL: https://issues.apache.org/jira/browse/HIVE-2035 Project: Hive Issue Type: Improvement Reporter: Ning Zhang Assignee: Franklin Hu Attachments: hive-2035.1.patch Currently if hive.merge.mapredfiles and/or hive.merge.mapfile is set to true the intermediate data could be merged using an additional MapReduce job. This could be quite expensive if the data size is large. With HIVE-1950, merging can be done in the RCFile block level so that it bypasses the (de-)compression, (de-)serialization phases. This could improve the merge process significantly. This JIRA should handle the case where the input table is not stored in RCFile, but the destination table is (which requires the intermediate data should be stored in the same format as the destination table). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2201) reduce name node calls in hive by creating temporary directories
[ https://issues.apache.org/jira/browse/HIVE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2201: -- Status: Patch Available (was: In Progress) reduce name node calls in hive by creating temporary directories Key: HIVE-2201 URL: https://issues.apache.org/jira/browse/HIVE-2201 Project: Hive Issue Type: Improvement Reporter: Namit Jain Assignee: Siying Dong Attachments: HIVE-2201.1.patch, HIVE-2201.2.patch, HIVE-2201.3.patch Currently, in Hive, when a file gets written by a FileSinkOperator, the sequence of operations is as follows: 1. In tmp directory tmp1, create a tmp file _tmp_1 2. At the end of the operator, move /tmp1/_tmp_1 to /tmp1/1 3. Move directory /tmp1 to /tmp2 4. For all files in /tmp2, remove all files starting with _tmp and duplicate files. Due to speculative execution, a lot of temporary files are created in /tmp1 (or /tmp2). This leads to a lot of name node calls, specially for large queries. The protocol above can be modified slightly: 1. In tmp directory tmp1, create a tmp file _tmp_1 2. At the end of the operator, move /tmp1/_tmp_1 to /tmp2/1 3. Move directory /tmp2 to /tmp3 4. For all files in /tmp3, remove all duplicate files. This should reduce the number of tmp files. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2201) reduce name node calls in hive by creating temporary directories
[ https://issues.apache.org/jira/browse/HIVE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2201: -- Assignee: Siying Dong Summary: reduce name node calls in hive by creating temporary directories (was: remove name node calls in hive by creating temporary directories) reduce name node calls in hive by creating temporary directories Key: HIVE-2201 URL: https://issues.apache.org/jira/browse/HIVE-2201 Project: Hive Issue Type: Improvement Reporter: Namit Jain Assignee: Siying Dong Currently, in Hive, when a file gets written by a FileSinkOperator, the sequence of operations is as follows: 1. In tmp directory tmp1, create a tmp file _tmp_1 2. At the end of the operator, move /tmp1/_tmp_1 to /tmp1/1 3. Move directory /tmp1 to /tmp2 4. For all files in /tmp2, remove all files starting with _tmp and duplicate files. Due to speculative execution, a lot of temporary files are created in /tmp1 (or /tmp2). This leads to a lot of name node calls, specially for large queries. The protocol above can be modified slightly: 1. In tmp directory tmp1, create a tmp file _tmp_1 2. At the end of the operator, move /tmp1/_tmp_1 to /tmp2/1 3. Move directory /tmp2 to /tmp3 4. For all files in /tmp3, remove all duplicate files. This should reduce the number of tmp files. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2201) reduce name node calls in hive by creating temporary directories
[ https://issues.apache.org/jira/browse/HIVE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2201: -- Attachment: HIVE-2201.1.patch Implemented the logic. Discovered one problem: when moving from /tmp1/_tmp_1 to /tmp2/1, we might need to check whether /tmp2 exists before moving it. This patch avoids this call by pre-create the temp directory before submitting the job. However, we cannot do that for dynamic partitioning as we don't know the directory names. So for dynamic partitioning, we have some extra costs added for DFS namenode read. So far I think this tradeoff is worthwhile. Potentially this cost can be reduced it by caching directories created. We can try that approach as a followup. reduce name node calls in hive by creating temporary directories Key: HIVE-2201 URL: https://issues.apache.org/jira/browse/HIVE-2201 Project: Hive Issue Type: Improvement Reporter: Namit Jain Assignee: Siying Dong Attachments: HIVE-2201.1.patch Currently, in Hive, when a file gets written by a FileSinkOperator, the sequence of operations is as follows: 1. In tmp directory tmp1, create a tmp file _tmp_1 2. At the end of the operator, move /tmp1/_tmp_1 to /tmp1/1 3. Move directory /tmp1 to /tmp2 4. For all files in /tmp2, remove all files starting with _tmp and duplicate files. Due to speculative execution, a lot of temporary files are created in /tmp1 (or /tmp2). This leads to a lot of name node calls, specially for large queries. The protocol above can be modified slightly: 1. In tmp directory tmp1, create a tmp file _tmp_1 2. At the end of the operator, move /tmp1/_tmp_1 to /tmp2/1 3. Move directory /tmp2 to /tmp3 4. For all files in /tmp3, remove all duplicate files. This should reduce the number of tmp files. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2201) reduce name node calls in hive by creating temporary directories
[ https://issues.apache.org/jira/browse/HIVE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2201: -- Status: Patch Available (was: In Progress) reduce name node calls in hive by creating temporary directories Key: HIVE-2201 URL: https://issues.apache.org/jira/browse/HIVE-2201 Project: Hive Issue Type: Improvement Reporter: Namit Jain Assignee: Siying Dong Attachments: HIVE-2201.1.patch Currently, in Hive, when a file gets written by a FileSinkOperator, the sequence of operations is as follows: 1. In tmp directory tmp1, create a tmp file _tmp_1 2. At the end of the operator, move /tmp1/_tmp_1 to /tmp1/1 3. Move directory /tmp1 to /tmp2 4. For all files in /tmp2, remove all files starting with _tmp and duplicate files. Due to speculative execution, a lot of temporary files are created in /tmp1 (or /tmp2). This leads to a lot of name node calls, specially for large queries. The protocol above can be modified slightly: 1. In tmp directory tmp1, create a tmp file _tmp_1 2. At the end of the operator, move /tmp1/_tmp_1 to /tmp2/1 3. Move directory /tmp2 to /tmp3 4. For all files in /tmp3, remove all duplicate files. This should reduce the number of tmp files. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2201) reduce name node calls in hive by creating temporary directories
[ https://issues.apache.org/jira/browse/HIVE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2201: -- Attachment: (was: HIVE-2201.1.patch) reduce name node calls in hive by creating temporary directories Key: HIVE-2201 URL: https://issues.apache.org/jira/browse/HIVE-2201 Project: Hive Issue Type: Improvement Reporter: Namit Jain Assignee: Siying Dong Attachments: HIVE-2201.1.patch Currently, in Hive, when a file gets written by a FileSinkOperator, the sequence of operations is as follows: 1. In tmp directory tmp1, create a tmp file _tmp_1 2. At the end of the operator, move /tmp1/_tmp_1 to /tmp1/1 3. Move directory /tmp1 to /tmp2 4. For all files in /tmp2, remove all files starting with _tmp and duplicate files. Due to speculative execution, a lot of temporary files are created in /tmp1 (or /tmp2). This leads to a lot of name node calls, specially for large queries. The protocol above can be modified slightly: 1. In tmp directory tmp1, create a tmp file _tmp_1 2. At the end of the operator, move /tmp1/_tmp_1 to /tmp2/1 3. Move directory /tmp2 to /tmp3 4. For all files in /tmp3, remove all duplicate files. This should reduce the number of tmp files. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Work started] (HIVE-2201) reduce name node calls in hive by creating temporary directories
[ https://issues.apache.org/jira/browse/HIVE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-2201 started by Siying Dong. reduce name node calls in hive by creating temporary directories Key: HIVE-2201 URL: https://issues.apache.org/jira/browse/HIVE-2201 Project: Hive Issue Type: Improvement Reporter: Namit Jain Assignee: Siying Dong Attachments: HIVE-2201.1.patch Currently, in Hive, when a file gets written by a FileSinkOperator, the sequence of operations is as follows: 1. In tmp directory tmp1, create a tmp file _tmp_1 2. At the end of the operator, move /tmp1/_tmp_1 to /tmp1/1 3. Move directory /tmp1 to /tmp2 4. For all files in /tmp2, remove all files starting with _tmp and duplicate files. Due to speculative execution, a lot of temporary files are created in /tmp1 (or /tmp2). This leads to a lot of name node calls, specially for large queries. The protocol above can be modified slightly: 1. In tmp directory tmp1, create a tmp file _tmp_1 2. At the end of the operator, move /tmp1/_tmp_1 to /tmp2/1 3. Move directory /tmp2 to /tmp3 4. For all files in /tmp3, remove all duplicate files. This should reduce the number of tmp files. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2201) reduce name node calls in hive by creating temporary directories
[ https://issues.apache.org/jira/browse/HIVE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2201: -- Attachment: HIVE-2201.1.patch reduce name node calls in hive by creating temporary directories Key: HIVE-2201 URL: https://issues.apache.org/jira/browse/HIVE-2201 Project: Hive Issue Type: Improvement Reporter: Namit Jain Assignee: Siying Dong Attachments: HIVE-2201.1.patch Currently, in Hive, when a file gets written by a FileSinkOperator, the sequence of operations is as follows: 1. In tmp directory tmp1, create a tmp file _tmp_1 2. At the end of the operator, move /tmp1/_tmp_1 to /tmp1/1 3. Move directory /tmp1 to /tmp2 4. For all files in /tmp2, remove all files starting with _tmp and duplicate files. Due to speculative execution, a lot of temporary files are created in /tmp1 (or /tmp2). This leads to a lot of name node calls, specially for large queries. The protocol above can be modified slightly: 1. In tmp directory tmp1, create a tmp file _tmp_1 2. At the end of the operator, move /tmp1/_tmp_1 to /tmp2/1 3. Move directory /tmp2 to /tmp3 4. For all files in /tmp3, remove all duplicate files. This should reduce the number of tmp files. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2201) reduce name node calls in hive by creating temporary directories
[ https://issues.apache.org/jira/browse/HIVE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2201: -- Attachment: HIVE-2201.2.patch fix a bug. reduce name node calls in hive by creating temporary directories Key: HIVE-2201 URL: https://issues.apache.org/jira/browse/HIVE-2201 Project: Hive Issue Type: Improvement Reporter: Namit Jain Assignee: Siying Dong Attachments: HIVE-2201.1.patch, HIVE-2201.2.patch Currently, in Hive, when a file gets written by a FileSinkOperator, the sequence of operations is as follows: 1. In tmp directory tmp1, create a tmp file _tmp_1 2. At the end of the operator, move /tmp1/_tmp_1 to /tmp1/1 3. Move directory /tmp1 to /tmp2 4. For all files in /tmp2, remove all files starting with _tmp and duplicate files. Due to speculative execution, a lot of temporary files are created in /tmp1 (or /tmp2). This leads to a lot of name node calls, specially for large queries. The protocol above can be modified slightly: 1. In tmp directory tmp1, create a tmp file _tmp_1 2. At the end of the operator, move /tmp1/_tmp_1 to /tmp2/1 3. Move directory /tmp2 to /tmp3 4. For all files in /tmp3, remove all duplicate files. This should reduce the number of tmp files. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2211) Revert
Revert -- Key: HIVE-2211 URL: https://issues.apache.org/jira/browse/HIVE-2211 Project: Hive Issue Type: Bug Reporter: Siying Dong Quick fix a bug caused by HIVE-243 HIVE-234 removed the codes to wait for the threads to finish and use ThreadPoolExector.shutdown() to wait for the results. The usage of ThreadPoolExecutor.shutdown(), however, is wrong. The codes assume that the function blocks until all threads finish running but it actually only marks status and won't block. It caused wrong result of Utilities.getInputSummary() and caused many jobs are executed as local mode while they have huge data. Revert those changes quickly. We can have a follow-up to see how to deal with this more efficiently if you want. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2211) Revert
[ https://issues.apache.org/jira/browse/HIVE-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2211: -- Status: Patch Available (was: Open) Revert -- Key: HIVE-2211 URL: https://issues.apache.org/jira/browse/HIVE-2211 Project: Hive Issue Type: Bug Reporter: Siying Dong Attachments: HIVE-2211.1.patch Quick fix a bug caused by HIVE-243 HIVE-234 removed the codes to wait for the threads to finish and use ThreadPoolExector.shutdown() to wait for the results. The usage of ThreadPoolExecutor.shutdown(), however, is wrong. The codes assume that the function blocks until all threads finish running but it actually only marks status and won't block. It caused wrong result of Utilities.getInputSummary() and caused many jobs are executed as local mode while they have huge data. Revert those changes quickly. We can have a follow-up to see how to deal with this more efficiently if you want. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2211) Revert
[ https://issues.apache.org/jira/browse/HIVE-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2211: -- Attachment: HIVE-2211.1.patch Just a simple revert. I did a small modification: when catching InterruptedException, stop waiting pending threads and exit. Revert -- Key: HIVE-2211 URL: https://issues.apache.org/jira/browse/HIVE-2211 Project: Hive Issue Type: Bug Reporter: Siying Dong Attachments: HIVE-2211.1.patch Quick fix a bug caused by HIVE-243 HIVE-234 removed the codes to wait for the threads to finish and use ThreadPoolExector.shutdown() to wait for the results. The usage of ThreadPoolExecutor.shutdown(), however, is wrong. The codes assume that the function blocks until all threads finish running but it actually only marks status and won't block. It caused wrong result of Utilities.getInputSummary() and caused many jobs are executed as local mode while they have huge data. Revert those changes quickly. We can have a follow-up to see how to deal with this more efficiently if you want. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2211) Fix a bug caused by HIVE-243
[ https://issues.apache.org/jira/browse/HIVE-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2211: -- Summary: Fix a bug caused by HIVE-243 (was: Revert) Fix a bug caused by HIVE-243 Key: HIVE-2211 URL: https://issues.apache.org/jira/browse/HIVE-2211 Project: Hive Issue Type: Bug Reporter: Siying Dong Attachments: HIVE-2211.1.patch Quick fix a bug caused by HIVE-243 HIVE-234 removed the codes to wait for the threads to finish and use ThreadPoolExector.shutdown() to wait for the results. The usage of ThreadPoolExecutor.shutdown(), however, is wrong. The codes assume that the function blocks until all threads finish running but it actually only marks status and won't block. It caused wrong result of Utilities.getInputSummary() and caused many jobs are executed as local mode while they have huge data. Revert those changes quickly. We can have a follow-up to see how to deal with this more efficiently if you want. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2186) Dynamic Partitioning Failing because of characters not supported globStatus
[ https://issues.apache.org/jira/browse/HIVE-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2186: -- Resolution: Fixed Release Note: Committed. Thanks Franklin. Status: Resolved (was: Patch Available) Dynamic Partitioning Failing because of characters not supported globStatus --- Key: HIVE-2186 URL: https://issues.apache.org/jira/browse/HIVE-2186 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Siying Dong Assignee: Franklin Hu Attachments: hive-2186.1.patch, hive-2186.2.patch, hive-2186.3.patch, hive-2186.4.patch, hive-2186.5.patch Some dynamic queries failed on the stage of loading partitions if dynamic partition columns contain special characters. We need to escape all of them. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2199) incorrect success flag passed to jobClose
[ https://issues.apache.org/jira/browse/HIVE-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2199: -- Resolution: Fixed Release Note: Committed. Thanks Franklin. Status: Resolved (was: Patch Available) incorrect success flag passed to jobClose - Key: HIVE-2199 URL: https://issues.apache.org/jira/browse/HIVE-2199 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Franklin Hu Assignee: Franklin Hu Priority: Minor Attachments: hive-2199.1.patch For block level merging of RCFiles, jobClose is passed the incorrect variable as the success flag -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira