from:"dong \(JIRA\)"

Dantong Dong created HIVE-25428:
---

 Summary: Add support for redshift database launch in new test 
driver
 Key: HIVE-25428
 URL: https://issues.apache.org/jira/browse/HIVE-25428
 Project: Hive
  Issue Type: Sub-task
Reporter: Dantong Dong






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25427) Add support for mssql database launch in new test driver

Dantong Dong created HIVE-25427:
---

 Summary: Add support for mssql database launch in new test driver
 Key: HIVE-25427
 URL: https://issues.apache.org/jira/browse/HIVE-25427
 Project: Hive
  Issue Type: Sub-task
Reporter: Dantong Dong






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25426) Add support for Oracle database launch in new test driver

Dantong Dong created HIVE-25426:
---

 Summary: Add support for Oracle database launch in new test driver
 Key: HIVE-25426
 URL: https://issues.apache.org/jira/browse/HIVE-25426
 Project: Hive
  Issue Type: Sub-task
Reporter: Dantong Dong
Assignee: Dantong Dong






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25425) Add support for Derby database launch in new test driver

Dantong Dong created HIVE-25425:
---

 Summary: Add support for Derby database launch in new test driver
 Key: HIVE-25425
 URL: https://issues.apache.org/jira/browse/HIVE-25425
 Project: Hive
  Issue Type: Sub-task
Reporter: Dantong Dong






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25424) Add support for Postgres database launch in new test driver

Dantong Dong created HIVE-25424:
---

 Summary: Add support for Postgres database launch in new test 
driver
 Key: HIVE-25424
 URL: https://issues.apache.org/jira/browse/HIVE-25424
 Project: Hive
  Issue Type: Sub-task
Reporter: Dantong Dong
Assignee: Dantong Dong






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25423) Add new test driver to automatically launch and load external database

Dantong Dong created HIVE-25423:
---

 Summary: Add new test driver to automatically launch and load 
external database
 Key: HIVE-25423
 URL: https://issues.apache.org/jira/browse/HIVE-25423
 Project: Hive
  Issue Type: Test
  Components: Testing Infrastructure, Tests
Affects Versions: 3.1.2
Reporter: Dantong Dong
Assignee: Dantong Dong


Add new test driver(TestMiniLlapExtDBCliDriver) to automatically launch and 
load external database with specified custom script during test. This Issue was 
originated from [HIVE-24396|https://issues.apache.org/jira/browse/HIVE-24396]. 
Will add docs later


!Screen Shot 2021-08-04 at 2.32.35 PM.png|width=724,height=379!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25282) Drop/Alter table in REMOTE db should fail

2021-06-23 Thread Dantong Dong (Jira)

Dantong Dong created HIVE-25282:
---

 Summary: Drop/Alter table in REMOTE db should fail
 Key: HIVE-25282
 URL: https://issues.apache.org/jira/browse/HIVE-25282
 Project: Hive
  Issue Type: Sub-task
Reporter: Dantong Dong
Assignee: Dantong Dong
 Fix For: 4.0.0


Drop/Alter table statement should be explicitly rejected in REMOTE database. In 
consistency with HIVE-24425: Create table in REMOTE db should fail.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-4233) The TGT gotten from class 'CLIService' should be renewed on time

2013-04-15 Thread dong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dong updated HIVE-4233:
---

Status: Patch Available  (was: Open)

 The TGT gotten from class 'CLIService'  should be renewed on time
 -

 Key: HIVE-4233
 URL: https://issues.apache.org/jira/browse/HIVE-4233
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.10.0
 Environment: CentOS release 6.3 (Final)
 jdk1.6.0_31
 HiveServer2  0.10.0-cdh4.2.0
 Kerberos Security 
Reporter: dong
Priority: Critical

 When the HIveServer2 have started more than 7 days, I use beeline  shell  to  
 connect the HiveServer2,all operation failed.
 The log of HiveServer2 shows it was caused by the Kerberos auth failure,the 
 exception stack trace is:
 2013-03-26 11:55:20,932 ERROR hive.ql.metadata.Hive: 
 java.lang.RuntimeException: Unable to instantiate 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient
 at 
 org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1084)
 at 
 org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.init(RetryingMetaStoreClient.java:51)
 at 
 org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:61)
 at 
 org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2140)
 at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2151)
 at 
 org.apache.hadoop.hive.ql.metadata.Hive.getDelegationToken(Hive.java:2275)
 at 
 org.apache.hive.service.cli.CLIService.getDelegationTokenFromMetaStore(CLIService.java:358)
 at 
 org.apache.hive.service.cli.thrift.ThriftCLIService.OpenSession(ThriftCLIService.java:127)
 at 
 org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1073)
 at 
 org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1058)
 at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
 at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
 at 
 org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge20S.java:565)
 at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 Caused by: java.lang.reflect.InvocationTargetException
 at sun.reflect.GeneratedConstructorAccessor52.newInstance(Unknown 
 Source)
 at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
 at 
 org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1082)
 ... 16 more
 Caused by: java.lang.IllegalStateException: This ticket is no longer valid
 at 
 javax.security.auth.kerberos.KerberosTicket.toString(KerberosTicket.java:601)
 at java.lang.String.valueOf(String.java:2826)
 at java.lang.StringBuilder.append(StringBuilder.java:115)
 at 
 sun.security.jgss.krb5.SubjectComber.findAux(SubjectComber.java:120)
 at sun.security.jgss.krb5.SubjectComber.find(SubjectComber.java:41)
 at sun.security.jgss.krb5.Krb5Util.getTicket(Krb5Util.java:130)
 at 
 sun.security.jgss.krb5.Krb5InitCredential$1.run(Krb5InitCredential.java:328)
 at java.security.AccessController.doPrivileged(Native Method)
 at 
 sun.security.jgss.krb5.Krb5InitCredential.getTgt(Krb5InitCredential.java:325)
 at 
 sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:128)
 at 
 sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:106)
 at 
 sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:172)
 at 
 sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:209)
 at 
 sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:195)
 at 
 sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:162)
 at 
 com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:175)
 at 
 org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
 at 
 org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253)
 at 
 org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)

[jira] [Updated] (HIVE-4233) The TGT gotten from class 'CLIService' should be renewed on time

2013-04-15 Thread dong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dong updated HIVE-4233:
---

Status: Open  (was: Patch Available)

 The TGT gotten from class 'CLIService'  should be renewed on time
 -

 Key: HIVE-4233
 URL: https://issues.apache.org/jira/browse/HIVE-4233
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.10.0
 Environment: CentOS release 6.3 (Final)
 jdk1.6.0_31
 HiveServer2  0.10.0-cdh4.2.0
 Kerberos Security 
Reporter: dong
Priority: Critical

 When the HIveServer2 have started more than 7 days, I use beeline  shell  to  
 connect the HiveServer2,all operation failed.
 The log of HiveServer2 shows it was caused by the Kerberos auth failure,the 
 exception stack trace is:
 2013-03-26 11:55:20,932 ERROR hive.ql.metadata.Hive: 
 java.lang.RuntimeException: Unable to instantiate 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient
 at 
 org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1084)
 at 
 org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.init(RetryingMetaStoreClient.java:51)
 at 
 org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:61)
 at 
 org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2140)
 at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2151)
 at 
 org.apache.hadoop.hive.ql.metadata.Hive.getDelegationToken(Hive.java:2275)
 at 
 org.apache.hive.service.cli.CLIService.getDelegationTokenFromMetaStore(CLIService.java:358)
 at 
 org.apache.hive.service.cli.thrift.ThriftCLIService.OpenSession(ThriftCLIService.java:127)
 at 
 org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1073)
 at 
 org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1058)
 at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
 at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
 at 
 org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge20S.java:565)
 at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 Caused by: java.lang.reflect.InvocationTargetException
 at sun.reflect.GeneratedConstructorAccessor52.newInstance(Unknown 
 Source)
 at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
 at 
 org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1082)
 ... 16 more
 Caused by: java.lang.IllegalStateException: This ticket is no longer valid
 at 
 javax.security.auth.kerberos.KerberosTicket.toString(KerberosTicket.java:601)
 at java.lang.String.valueOf(String.java:2826)
 at java.lang.StringBuilder.append(StringBuilder.java:115)
 at 
 sun.security.jgss.krb5.SubjectComber.findAux(SubjectComber.java:120)
 at sun.security.jgss.krb5.SubjectComber.find(SubjectComber.java:41)
 at sun.security.jgss.krb5.Krb5Util.getTicket(Krb5Util.java:130)
 at 
 sun.security.jgss.krb5.Krb5InitCredential$1.run(Krb5InitCredential.java:328)
 at java.security.AccessController.doPrivileged(Native Method)
 at 
 sun.security.jgss.krb5.Krb5InitCredential.getTgt(Krb5InitCredential.java:325)
 at 
 sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:128)
 at 
 sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:106)
 at 
 sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:172)
 at 
 sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:209)
 at 
 sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:195)
 at 
 sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:162)
 at 
 com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:175)
 at 
 org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
 at 
 org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253)
 at 
 org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)

[jira] [Updated] (HIVE-4233) The TGT gotten from class 'CLIService' should be renewed on time

2013-04-15 Thread dong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dong updated HIVE-4233:
---

Attachment: 0001-FIX-HIVE-4233.patch

Add HiveKerberosReloginHelper to schdule the command to renew the tgt.

I tested this patch with a kerberos principal whose maxlife is 15 minutes,it 
does not fail after 15 mintues. 

When doesn't apply this path,the Keberos auth failure always thrown after 15 
mintues,and the beeline can't reconnect the HiveServer2.

Please review this patch ,maybe it can solve this problem.

Thanks.





 The TGT gotten from class 'CLIService'  should be renewed on time
 -

 Key: HIVE-4233
 URL: https://issues.apache.org/jira/browse/HIVE-4233
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.10.0
 Environment: CentOS release 6.3 (Final)
 jdk1.6.0_31
 HiveServer2  0.10.0-cdh4.2.0
 Kerberos Security 
Reporter: dong
Priority: Critical
 Attachments: 0001-FIX-HIVE-4233.patch


 When the HIveServer2 have started more than 7 days, I use beeline  shell  to  
 connect the HiveServer2,all operation failed.
 The log of HiveServer2 shows it was caused by the Kerberos auth failure,the 
 exception stack trace is:
 2013-03-26 11:55:20,932 ERROR hive.ql.metadata.Hive: 
 java.lang.RuntimeException: Unable to instantiate 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient
 at 
 org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1084)
 at 
 org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.init(RetryingMetaStoreClient.java:51)
 at 
 org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:61)
 at 
 org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2140)
 at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2151)
 at 
 org.apache.hadoop.hive.ql.metadata.Hive.getDelegationToken(Hive.java:2275)
 at 
 org.apache.hive.service.cli.CLIService.getDelegationTokenFromMetaStore(CLIService.java:358)
 at 
 org.apache.hive.service.cli.thrift.ThriftCLIService.OpenSession(ThriftCLIService.java:127)
 at 
 org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1073)
 at 
 org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1058)
 at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
 at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
 at 
 org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge20S.java:565)
 at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 Caused by: java.lang.reflect.InvocationTargetException
 at sun.reflect.GeneratedConstructorAccessor52.newInstance(Unknown 
 Source)
 at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
 at 
 org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1082)
 ... 16 more
 Caused by: java.lang.IllegalStateException: This ticket is no longer valid
 at 
 javax.security.auth.kerberos.KerberosTicket.toString(KerberosTicket.java:601)
 at java.lang.String.valueOf(String.java:2826)
 at java.lang.StringBuilder.append(StringBuilder.java:115)
 at 
 sun.security.jgss.krb5.SubjectComber.findAux(SubjectComber.java:120)
 at sun.security.jgss.krb5.SubjectComber.find(SubjectComber.java:41)
 at sun.security.jgss.krb5.Krb5Util.getTicket(Krb5Util.java:130)
 at 
 sun.security.jgss.krb5.Krb5InitCredential$1.run(Krb5InitCredential.java:328)
 at java.security.AccessController.doPrivileged(Native Method)
 at 
 sun.security.jgss.krb5.Krb5InitCredential.getTgt(Krb5InitCredential.java:325)
 at 
 sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:128)
 at 
 sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:106)
 at 
 sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:172)
 at 
 sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:209)
 at 
 sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:195)
 at

[jira] [Updated] (HIVE-4233) The TGT gotten from class 'CLIService' should be renewed on time

2013-04-15 Thread dong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dong updated HIVE-4233:
---

Status: Patch Available  (was: Open)

 The TGT gotten from class 'CLIService'  should be renewed on time
 -

 Key: HIVE-4233
 URL: https://issues.apache.org/jira/browse/HIVE-4233
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.10.0
 Environment: CentOS release 6.3 (Final)
 jdk1.6.0_31
 HiveServer2  0.10.0-cdh4.2.0
 Kerberos Security 
Reporter: dong
Priority: Critical
 Attachments: 0001-FIX-HIVE-4233.patch


 When the HIveServer2 have started more than 7 days, I use beeline  shell  to  
 connect the HiveServer2,all operation failed.
 The log of HiveServer2 shows it was caused by the Kerberos auth failure,the 
 exception stack trace is:
 2013-03-26 11:55:20,932 ERROR hive.ql.metadata.Hive: 
 java.lang.RuntimeException: Unable to instantiate 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient
 at 
 org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1084)
 at 
 org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.init(RetryingMetaStoreClient.java:51)
 at 
 org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:61)
 at 
 org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2140)
 at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2151)
 at 
 org.apache.hadoop.hive.ql.metadata.Hive.getDelegationToken(Hive.java:2275)
 at 
 org.apache.hive.service.cli.CLIService.getDelegationTokenFromMetaStore(CLIService.java:358)
 at 
 org.apache.hive.service.cli.thrift.ThriftCLIService.OpenSession(ThriftCLIService.java:127)
 at 
 org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1073)
 at 
 org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1058)
 at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
 at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
 at 
 org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge20S.java:565)
 at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 Caused by: java.lang.reflect.InvocationTargetException
 at sun.reflect.GeneratedConstructorAccessor52.newInstance(Unknown 
 Source)
 at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
 at 
 org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1082)
 ... 16 more
 Caused by: java.lang.IllegalStateException: This ticket is no longer valid
 at 
 javax.security.auth.kerberos.KerberosTicket.toString(KerberosTicket.java:601)
 at java.lang.String.valueOf(String.java:2826)
 at java.lang.StringBuilder.append(StringBuilder.java:115)
 at 
 sun.security.jgss.krb5.SubjectComber.findAux(SubjectComber.java:120)
 at sun.security.jgss.krb5.SubjectComber.find(SubjectComber.java:41)
 at sun.security.jgss.krb5.Krb5Util.getTicket(Krb5Util.java:130)
 at 
 sun.security.jgss.krb5.Krb5InitCredential$1.run(Krb5InitCredential.java:328)
 at java.security.AccessController.doPrivileged(Native Method)
 at 
 sun.security.jgss.krb5.Krb5InitCredential.getTgt(Krb5InitCredential.java:325)
 at 
 sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:128)
 at 
 sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:106)
 at 
 sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:172)
 at 
 sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:209)
 at 
 sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:195)
 at 
 sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:162)
 at 
 com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:175)
 at 
 org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
 at 
 org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253)
 at

[jira] [Created] (HIVE-4233) The TGT gotten from class 'CLIService' should be renewed on time?

2013-03-26 Thread dong (JIRA)

dong created HIVE-4233:
--

 Summary: The TGT gotten from class 'CLIService'  should be renewed 
on time? 
 Key: HIVE-4233
 URL: https://issues.apache.org/jira/browse/HIVE-4233
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.10.0
 Environment: CentOS release 6.3 (Final)

jdk1.6.0_31

HiveServer2  0.10.0-cdh4.2.0

Kerberos Security 
Reporter: dong
Priority: Critical


When the HIveServer2 have started more than 7 days, I use beeline  shell  to  
connect the HiveServer2,all operation failed.

The log of HiveServer2 shows it was caused by the Kerberos auth failure,the 
exception stack trace is:

2013-03-26 11:55:20,932 ERROR hive.ql.metadata.Hive: 
java.lang.RuntimeException: Unable to instantiate 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient
at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1084)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.init(RetryingMetaStoreClient.java:51)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:61)
at 
org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2140)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2151)
at 
org.apache.hadoop.hive.ql.metadata.Hive.getDelegationToken(Hive.java:2275)
at 
org.apache.hive.service.cli.CLIService.getDelegationTokenFromMetaStore(CLIService.java:358)
at 
org.apache.hive.service.cli.thrift.ThriftCLIService.OpenSession(ThriftCLIService.java:127)
at 
org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1073)
at 
org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1058)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge20S.java:565)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedConstructorAccessor52.newInstance(Unknown 
Source)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1082)
... 16 more
Caused by: java.lang.IllegalStateException: This ticket is no longer valid
at 
javax.security.auth.kerberos.KerberosTicket.toString(KerberosTicket.java:601)
at java.lang.String.valueOf(String.java:2826)
at java.lang.StringBuilder.append(StringBuilder.java:115)
at sun.security.jgss.krb5.SubjectComber.findAux(SubjectComber.java:120)
at sun.security.jgss.krb5.SubjectComber.find(SubjectComber.java:41)
at sun.security.jgss.krb5.Krb5Util.getTicket(Krb5Util.java:130)
at 
sun.security.jgss.krb5.Krb5InitCredential$1.run(Krb5InitCredential.java:328)
at java.security.AccessController.doPrivileged(Native Method)
at 
sun.security.jgss.krb5.Krb5InitCredential.getTgt(Krb5InitCredential.java:325)
at 
sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:128)
at 
sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:106)
at 
sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:172)
at 
sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:209)
at 
sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:195)
at 
sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:162)
at 
com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:175)
at 
org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
at 
org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253)
at 
org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
at 
org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
at 
org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
at java.security.AccessController.doPrivileged(Native Method)

[jira] [Updated] (HIVE-4233) The TGT gotten from class 'CLIService' should be renewed on time

2013-03-26 Thread dong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dong updated HIVE-4233:
---

Summary: The TGT gotten from class 'CLIService'  should be renewed on time  
(was: The TGT gotten from class 'CLIService'  should be renewed on time? )

 The TGT gotten from class 'CLIService'  should be renewed on time
 -

 Key: HIVE-4233
 URL: https://issues.apache.org/jira/browse/HIVE-4233
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.10.0
 Environment: CentOS release 6.3 (Final)
 jdk1.6.0_31
 HiveServer2  0.10.0-cdh4.2.0
 Kerberos Security 
Reporter: dong
Priority: Critical

 When the HIveServer2 have started more than 7 days, I use beeline  shell  to  
 connect the HiveServer2,all operation failed.
 The log of HiveServer2 shows it was caused by the Kerberos auth failure,the 
 exception stack trace is:
 2013-03-26 11:55:20,932 ERROR hive.ql.metadata.Hive: 
 java.lang.RuntimeException: Unable to instantiate 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient
 at 
 org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1084)
 at 
 org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.init(RetryingMetaStoreClient.java:51)
 at 
 org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:61)
 at 
 org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2140)
 at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2151)
 at 
 org.apache.hadoop.hive.ql.metadata.Hive.getDelegationToken(Hive.java:2275)
 at 
 org.apache.hive.service.cli.CLIService.getDelegationTokenFromMetaStore(CLIService.java:358)
 at 
 org.apache.hive.service.cli.thrift.ThriftCLIService.OpenSession(ThriftCLIService.java:127)
 at 
 org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1073)
 at 
 org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1058)
 at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
 at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
 at 
 org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge20S.java:565)
 at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 Caused by: java.lang.reflect.InvocationTargetException
 at sun.reflect.GeneratedConstructorAccessor52.newInstance(Unknown 
 Source)
 at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
 at 
 org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1082)
 ... 16 more
 Caused by: java.lang.IllegalStateException: This ticket is no longer valid
 at 
 javax.security.auth.kerberos.KerberosTicket.toString(KerberosTicket.java:601)
 at java.lang.String.valueOf(String.java:2826)
 at java.lang.StringBuilder.append(StringBuilder.java:115)
 at 
 sun.security.jgss.krb5.SubjectComber.findAux(SubjectComber.java:120)
 at sun.security.jgss.krb5.SubjectComber.find(SubjectComber.java:41)
 at sun.security.jgss.krb5.Krb5Util.getTicket(Krb5Util.java:130)
 at 
 sun.security.jgss.krb5.Krb5InitCredential$1.run(Krb5InitCredential.java:328)
 at java.security.AccessController.doPrivileged(Native Method)
 at 
 sun.security.jgss.krb5.Krb5InitCredential.getTgt(Krb5InitCredential.java:325)
 at 
 sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:128)
 at 
 sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:106)
 at 
 sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:172)
 at 
 sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:209)
 at 
 sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:195)
 at 
 sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:162)
 at 
 com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:175)
 at 
 org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
 at

[jira] [Updated] (HIVE-3388) Improve Performance of UDF PERCENTILE_APPROX()

2012-09-07 Thread Siying Dong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-3388:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed. Thanks Rongrong!

 Improve Performance of UDF PERCENTILE_APPROX()
 --

 Key: HIVE-3388
 URL: https://issues.apache.org/jira/browse/HIVE-3388
 Project: Hive
  Issue Type: Task
Reporter: Rongrong Zhong
Assignee: Rongrong Zhong
Priority: Minor
 Attachments: HIVE-3388.1.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3388) Improve Performance of UDF PERCENTILE_APPROX()

2012-09-06 Thread Siying Dong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13450192#comment-13450192
 ] 

Siying Dong commented on HIVE-3388:
---

+1

 Improve Performance of UDF PERCENTILE_APPROX()
 --

 Key: HIVE-3388
 URL: https://issues.apache.org/jira/browse/HIVE-3388
 Project: Hive
  Issue Type: Task
Reporter: Rongrong Zhong
Assignee: Rongrong Zhong
Priority: Minor
 Attachments: HIVE-3388.1.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HIVE-2247) ALTER TABLE RENAME PARTITION

2012-06-21 Thread Siying Dong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong resolved HIVE-2247.
---

Resolution: Fixed

I committed the patch 7 months ago. Forgot to resolve it. Thanks Weiyan!

 ALTER TABLE RENAME PARTITION
 

 Key: HIVE-2247
 URL: https://issues.apache.org/jira/browse/HIVE-2247
 Project: Hive
  Issue Type: New Feature
Reporter: Siying Dong
Assignee: Weiyan Wang
 Attachments: HIVE-2247.10.patch.txt, HIVE-2247.11.patch.txt, 
 HIVE-2247.3.patch.txt, HIVE-2247.4.patch.txt, HIVE-2247.5.patch.txt, 
 HIVE-2247.6.patch.txt, HIVE-2247.7.patch.txt, HIVE-2247.8.patch.txt, 
 HIVE-2247.9.patch.txt, HIVE-2247.9.patch.txt


 We need a ALTER TABLE TABLE RENAME PARTITIONfunction that is similar t ALTER 
 TABLE RENAME.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HIVE-3030) escape more chars for script operator

2012-05-22 Thread Siying Dong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong resolved HIVE-3030.
---

Resolution: Fixed

 escape more chars for script operator
 -

 Key: HIVE-3030
 URL: https://issues.apache.org/jira/browse/HIVE-3030
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain

 Only new line was being escaped.
 The same behavior needs to be done for carriage returns, and tabs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3030) escape more chars for script operator

2012-05-22 Thread Siying Dong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13281137#comment-13281137
 ] 

Siying Dong commented on HIVE-3030:
---

Committed. Thanks Namit!

 escape more chars for script operator
 -

 Key: HIVE-3030
 URL: https://issues.apache.org/jira/browse/HIVE-3030
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain

 Only new line was being escaped.
 The same behavior needs to be done for carriage returns, and tabs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3030) escape more chars for script operator

2012-05-21 Thread Siying Dong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280294#comment-13280294
 ] 

Siying Dong commented on HIVE-3030:
---

Logic looks good to me. I'll run unit tests now. In the mean time, can you add 
tests to cover those new cases? Cases like escaping '\', and unescaping cases 
like '\\', ,'\\\t' or '\\\t'?

 escape more chars for script operator
 -

 Key: HIVE-3030
 URL: https://issues.apache.org/jira/browse/HIVE-3030
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain

 Only new line was being escaped.
 The same behavior needs to be done for carriage returns, and tabs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3030) escape more chars for script operator

2012-05-21 Thread Siying Dong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280399#comment-13280399
 ] 

Siying Dong commented on HIVE-3030:
---

Discussed with Namit offline. He is going to add one more test case now.

 escape more chars for script operator
 -

 Key: HIVE-3030
 URL: https://issues.apache.org/jira/browse/HIVE-3030
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain

 Only new line was being escaped.
 The same behavior needs to be done for carriage returns, and tabs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3030) escape more chars for script operator

2012-05-21 Thread Siying Dong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280470#comment-13280470
 ] 

Siying Dong commented on HIVE-3030:
---

Tests look good to me. Will run the test suites. Let's open a follow-up JIRA to 
escape a more complete list of characters.

 escape more chars for script operator
 -

 Key: HIVE-3030
 URL: https://issues.apache.org/jira/browse/HIVE-3030
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain

 Only new line was being escaped.
 The same behavior needs to be done for carriage returns, and tabs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3030) escape more chars for script operator

2012-05-17 Thread Siying Dong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13278018#comment-13278018
 ] 

Siying Dong commented on HIVE-3030:
---

Here is a general problem (maybe not related to new change to this patch): 
there is no way to output \n back to Hive. It will be translated to a 
new_line. In a similar way, if the column contains \n, it will not be 
escaped so the transform script will have no way to distinguish this from a new 
line. With this patch, more cases like this will be added. Maybe not for this 
patch but as a follow-up, we might want to escape \\ too to keep the escaping 
mapping a complete one.

Other than that, the patch looks good to me.

 escape more chars for script operator
 -

 Key: HIVE-3030
 URL: https://issues.apache.org/jira/browse/HIVE-3030
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain

 Only new line was being escaped.
 The same behavior needs to be done for carriage returns, and tabs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3030) escape more chars for script operator

2012-05-17 Thread Siying Dong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13278020#comment-13278020
 ] 

Siying Dong commented on HIVE-3030:
---

I meaned Maybe not for this patch but as a follow-up, we might want to escape 
slashslash too to keep the escaping mapping a complete one.

 escape more chars for script operator
 -

 Key: HIVE-3030
 URL: https://issues.apache.org/jira/browse/HIVE-3030
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain

 Only new line was being escaped.
 The same behavior needs to be done for carriage returns, and tabs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2451) TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression of HIVE-1538


 [ 
https://issues.apache.org/jira/browse/HIVE-2451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2451:
--

Status: Open  (was: Patch Available)

There's a bug.

 TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression 
 of HIVE-1538
 --

 Key: HIVE-2451
 URL: https://issues.apache.org/jira/browse/HIVE-2451
 Project: Hive
  Issue Type: Bug
Reporter: Siying Dong
Assignee: Siying Dong
 Attachments: HIVE-2451.1.patch


 Example:
 select count(1) from bucket_table TABLESAMPLE(BUCKET xxx out of yyy) where 
 partition_column = 'xxx'
 will not trigger input pruning.
 The reason is that we assume sample filtering operator only happens as the 
 second filter after table scan, which is broken by HIVE-1538, even if the 
 feature doesn't turn on.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2451) TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression of HIVE-1538


 [ 
https://issues.apache.org/jira/browse/HIVE-2451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2451:
--

Attachment: HIVE-2451.2.patch

Changed an assert issue and recover the some test result files which were 
changed incorrectly by HIVE-1538.

 TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression 
 of HIVE-1538
 --

 Key: HIVE-2451
 URL: https://issues.apache.org/jira/browse/HIVE-2451
 Project: Hive
  Issue Type: Bug
Reporter: Siying Dong
Assignee: Siying Dong
 Attachments: HIVE-2451.1.patch, HIVE-2451.2.patch


 Example:
 select count(1) from bucket_table TABLESAMPLE(BUCKET xxx out of yyy) where 
 partition_column = 'xxx'
 will not trigger input pruning.
 The reason is that we assume sample filtering operator only happens as the 
 second filter after table scan, which is broken by HIVE-1538, even if the 
 feature doesn't turn on.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2451) TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression of HIVE-1538


 [ 
https://issues.apache.org/jira/browse/HIVE-2451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2451:
--

Attachment: HIVE-2451.3.patch

Reran all test suites and fixed another several wrong test results.

 TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression 
 of HIVE-1538
 --

 Key: HIVE-2451
 URL: https://issues.apache.org/jira/browse/HIVE-2451
 Project: Hive
  Issue Type: Bug
Reporter: Siying Dong
Assignee: Siying Dong
 Attachments: HIVE-2451.1.patch, HIVE-2451.2.patch, HIVE-2451.3.patch


 Example:
 select count(1) from bucket_table TABLESAMPLE(BUCKET xxx out of yyy) where 
 partition_column = 'xxx'
 will not trigger input pruning.
 The reason is that we assume sample filtering operator only happens as the 
 second filter after table scan, which is broken by HIVE-1538, even if the 
 feature doesn't turn on.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2451) TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression of HIVE-1538


 [ 
https://issues.apache.org/jira/browse/HIVE-2451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2451:
--

Status: Patch Available  (was: Open)

 TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression 
 of HIVE-1538
 --

 Key: HIVE-2451
 URL: https://issues.apache.org/jira/browse/HIVE-2451
 Project: Hive
  Issue Type: Bug
Reporter: Siying Dong
Assignee: Siying Dong
 Attachments: HIVE-2451.1.patch, HIVE-2451.2.patch, HIVE-2451.3.patch


 Example:
 select count(1) from bucket_table TABLESAMPLE(BUCKET xxx out of yyy) where 
 partition_column = 'xxx'
 will not trigger input pruning.
 The reason is that we assume sample filtering operator only happens as the 
 second filter after table scan, which is broken by HIVE-1538, even if the 
 feature doesn't turn on.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-2451) TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression of HIVE-1538

2011-09-15 Thread Siying Dong (JIRA)

TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression 
of HIVE-1538
--

 Key: HIVE-2451
 URL: https://issues.apache.org/jira/browse/HIVE-2451
 Project: Hive
  Issue Type: Bug
Reporter: Siying Dong
Assignee: Siying Dong


Example:

select count(1) from bucket_table TABLESAMPLE(BUCKET xxx out of yyy) where 
partition_column = 'xxx'

will not trigger input pruning.

The reason is that we assume sample filtering operator only happens as the 
second filter after table scan, which is broken by HIVE-1538, even if the 
feature doesn't turn on.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2451) TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression of HIVE-1538

2011-09-15 Thread Siying Dong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2451:
--

Status: Patch Available  (was: Open)

 TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression 
 of HIVE-1538
 --

 Key: HIVE-2451
 URL: https://issues.apache.org/jira/browse/HIVE-2451
 Project: Hive
  Issue Type: Bug
Reporter: Siying Dong
Assignee: Siying Dong
 Attachments: HIVE-2451.1.patch


 Example:
 select count(1) from bucket_table TABLESAMPLE(BUCKET xxx out of yyy) where 
 partition_column = 'xxx'
 will not trigger input pruning.
 The reason is that we assume sample filtering operator only happens as the 
 second filter after table scan, which is broken by HIVE-1538, even if the 
 feature doesn't turn on.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2451) TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression of HIVE-1538

2011-09-15 Thread Siying Dong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2451:
--

Attachment: HIVE-2451.1.patch

Fix the problem by considering sample filter operator can be the first filter 
operator after table scan.

 TABLESAMBLE(BUCKET xxx) sometimes doesn't trigger input pruning as regression 
 of HIVE-1538
 --

 Key: HIVE-2451
 URL: https://issues.apache.org/jira/browse/HIVE-2451
 Project: Hive
  Issue Type: Bug
Reporter: Siying Dong
Assignee: Siying Dong
 Attachments: HIVE-2451.1.patch


 Example:
 select count(1) from bucket_table TABLESAMPLE(BUCKET xxx out of yyy) where 
 partition_column = 'xxx'
 will not trigger input pruning.
 The reason is that we assume sample filtering operator only happens as the 
 second filter after table scan, which is broken by HIVE-1538, even if the 
 feature doesn't turn on.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2360) create dynamic partition if and only if intermediate source has files

2011-09-12 Thread Siying Dong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103145#comment-13103145
 ] 

Siying Dong commented on HIVE-2360:
---

Franklin finished his internship and left. We should find another one to finish 
the task.

 create dynamic partition if and only if intermediate source has files
 -

 Key: HIVE-2360
 URL: https://issues.apache.org/jira/browse/HIVE-2360
 Project: Hive
  Issue Type: Bug
Reporter: Franklin Hu
Assignee: Franklin Hu
Priority: Minor
 Fix For: 0.8.0

 Attachments: hive-2360.1.patch, hive-2360.2.patch


 There are some conditions under which a partition description is created due 
 to insert overwriting a table using dynamic partitioning for partitions that 
 that are empty (have no files).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HIVE-2360) create dynamic partition if and only if intermediate source has files

2011-09-12 Thread Siying Dong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong reassigned HIVE-2360:
-

Assignee: (was: Franklin Hu)

 create dynamic partition if and only if intermediate source has files
 -

 Key: HIVE-2360
 URL: https://issues.apache.org/jira/browse/HIVE-2360
 Project: Hive
  Issue Type: Bug
Reporter: Franklin Hu
Priority: Minor
 Fix For: 0.8.0

 Attachments: hive-2360.1.patch, hive-2360.2.patch


 There are some conditions under which a partition description is created due 
 to insert overwriting a table using dynamic partitioning for partitions that 
 that are empty (have no files).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2378) Warn user that precision is lost when bigint is implicitly cast to double.

2011-08-30 Thread Siying Dong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093991#comment-13093991
 ] 

Siying Dong commented on HIVE-2378:
---

+1, will commit if unit tests pass.

 Warn user that precision is lost when bigint is implicitly cast to double.
 --

 Key: HIVE-2378
 URL: https://issues.apache.org/jira/browse/HIVE-2378
 Project: Hive
  Issue Type: Improvement
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-2378.1.patch.txt, HIVE-2378.2.patch.txt, 
 HIVE-2378.3.patch.txt


 When a bigint is implicitly cast to a double (when a bigint is involved in an 
 equality expression with a string or double) precision may be lost, resulting 
 in unexpected behavior.  Until we fix the underlying issue we should throw an 
 error in strict mode, and a warning in nonstrict mode alerting the user about 
 this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HIVE-2378) Warn user that precision is lost when bigint is implicitly cast to double.

2011-08-30 Thread Siying Dong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong resolved HIVE-2378.
---

Resolution: Fixed

Committed. Thanks Kevin!

 Warn user that precision is lost when bigint is implicitly cast to double.
 --

 Key: HIVE-2378
 URL: https://issues.apache.org/jira/browse/HIVE-2378
 Project: Hive
  Issue Type: Improvement
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-2378.1.patch.txt, HIVE-2378.2.patch.txt, 
 HIVE-2378.3.patch.txt


 When a bigint is implicitly cast to a double (when a bigint is involved in an 
 equality expression with a string or double) precision may be lost, resulting 
 in unexpected behavior.  Until we fix the underlying issue we should throw an 
 error in strict mode, and a warning in nonstrict mode alerting the user about 
 this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2385) Local Mode can be more aggressive if LIMIT optimization is on

2011-08-25 Thread Siying Dong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13091305#comment-13091305
 ] 

Siying Dong commented on HIVE-2385:
---

@Carl, are you still seeing tests failing?

 Local Mode can be more aggressive if LIMIT optimization is on
 -

 Key: HIVE-2385
 URL: https://issues.apache.org/jira/browse/HIVE-2385
 Project: Hive
  Issue Type: Improvement
Reporter: Siying Dong
Assignee: Siying Dong
Priority: Minor
 Attachments: HIVE-2385.1.patch, HIVE-2385.2.patch


 Local mode now depends on total input data, but for LIMIT queries with no 
 filtering, the data actually scanned can be much less and it's relatively 
 predictable. We can place local mode more aggressively.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2352) create empty files if and only if table is bucketed and hive.enforce.bucketing=true

2011-08-24 Thread Siying Dong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2352:
--

  Priority: Major  (was: Minor)
Issue Type: Improvement  (was: Bug)

 create empty files if and only if table is bucketed and 
 hive.enforce.bucketing=true
 ---

 Key: HIVE-2352
 URL: https://issues.apache.org/jira/browse/HIVE-2352
 Project: Hive
  Issue Type: Improvement
Reporter: Franklin Hu
Assignee: Franklin Hu
 Fix For: 0.8.0

 Attachments: hive-2352.1.patch, hive-2352.2.patch, hive-2352.3.patch


 create table t1 (key int, value string) stored as rcfile;
 insert overwrite table t1 select * from src where false;
 Creates an empty RCFile with no rows and size 151B. The file not should be 
 created since there are no rows.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2352) create empty files if and only if table is bucketed and hive.enforce.bucketing=true

2011-08-24 Thread Siying Dong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13090399#comment-13090399
 ] 

Siying Dong commented on HIVE-2352:
---

I ran tests twice. Both crashed. I think it is an important patch and will 
improve latency of some queries (like scanning a large dataset for one or two 
rows) dramatically (Currently I sometimes do a ORDER BY LIMIT BY to speed it 
up if I know the data set is small). We should raise the priority.

 create empty files if and only if table is bucketed and 
 hive.enforce.bucketing=true
 ---

 Key: HIVE-2352
 URL: https://issues.apache.org/jira/browse/HIVE-2352
 Project: Hive
  Issue Type: Bug
Reporter: Franklin Hu
Assignee: Franklin Hu
Priority: Minor
 Fix For: 0.8.0

 Attachments: hive-2352.1.patch, hive-2352.2.patch, hive-2352.3.patch


 create table t1 (key int, value string) stored as rcfile;
 insert overwrite table t1 select * from src where false;
 Creates an empty RCFile with no rows and size 151B. The file not should be 
 created since there are no rows.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2352) create empty files if and only if table is bucketed and hive.enforce.bucketing=true

2011-08-24 Thread Siying Dong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2352:
--

Assignee: (was: Franklin Hu)

 create empty files if and only if table is bucketed and 
 hive.enforce.bucketing=true
 ---

 Key: HIVE-2352
 URL: https://issues.apache.org/jira/browse/HIVE-2352
 Project: Hive
  Issue Type: Improvement
Reporter: Franklin Hu
 Fix For: 0.8.0

 Attachments: hive-2352.1.patch, hive-2352.2.patch, hive-2352.3.patch


 create table t1 (key int, value string) stored as rcfile;
 insert overwrite table t1 select * from src where false;
 Creates an empty RCFile with no rows and size 151B. The file not should be 
 created since there are no rows.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2352) create empty files if and only if table is bucketed and hive.enforce.bucketing=true

2011-08-23 Thread Siying Dong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089575#comment-13089575
 ] 

Siying Dong commented on HIVE-2352:
---

Franklin's internship ended. Let me apply his patch and see whether there is 
any failed tests.

 create empty files if and only if table is bucketed and 
 hive.enforce.bucketing=true
 ---

 Key: HIVE-2352
 URL: https://issues.apache.org/jira/browse/HIVE-2352
 Project: Hive
  Issue Type: Bug
Reporter: Franklin Hu
Assignee: Franklin Hu
Priority: Minor
 Fix For: 0.8.0

 Attachments: hive-2352.1.patch, hive-2352.2.patch, hive-2352.3.patch


 create table t1 (key int, value string) stored as rcfile;
 insert overwrite table t1 select * from src where false;
 Creates an empty RCFile with no rows and size 151B. The file not should be 
 created since there are no rows.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2385) Local Mode can be more aggressive if LIMIT optimization is on

2011-08-23 Thread Siying Dong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2385:
--

Attachment: HIVE-2385.2.patch

Fix the bug and it passes autolocal1.q. I'm running the whole test suites now.

 Local Mode can be more aggressive if LIMIT optimization is on
 -

 Key: HIVE-2385
 URL: https://issues.apache.org/jira/browse/HIVE-2385
 Project: Hive
  Issue Type: Improvement
Reporter: Siying Dong
Assignee: Siying Dong
Priority: Minor
 Attachments: HIVE-2385.1.patch, HIVE-2385.2.patch


 Local mode now depends on total input data, but for LIMIT queries with no 
 filtering, the data actually scanned can be much less and it's relatively 
 predictable. We can place local mode more aggressively.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2385) Local Mode can be more aggressive if LIMIT optimization is on

2011-08-23 Thread Siying Dong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2385:
--

Status: Patch Available  (was: Open)

 Local Mode can be more aggressive if LIMIT optimization is on
 -

 Key: HIVE-2385
 URL: https://issues.apache.org/jira/browse/HIVE-2385
 Project: Hive
  Issue Type: Improvement
Reporter: Siying Dong
Assignee: Siying Dong
Priority: Minor
 Attachments: HIVE-2385.1.patch, HIVE-2385.2.patch


 Local mode now depends on total input data, but for LIMIT queries with no 
 filtering, the data actually scanned can be much less and it's relatively 
 predictable. We can place local mode more aggressively.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-2385) Local Mode can be more aggressive if LIMIT optimization is on

2011-08-17 Thread Siying Dong (JIRA)

Local Mode can be more aggressive if LIMIT optimization is on
-

 Key: HIVE-2385
 URL: https://issues.apache.org/jira/browse/HIVE-2385
 Project: Hive
  Issue Type: Improvement
Reporter: Siying Dong
Priority: Minor


Local mode now depends on total input data, but for LIMIT queries with no 
filtering, the data actually scanned can be much less and it's relatively 
predictable. We can place local mode more aggressively.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HIVE-2385) Local Mode can be more aggressive if LIMIT optimization is on

2011-08-17 Thread Siying Dong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong reassigned HIVE-2385:
-

Assignee: Siying Dong

 Local Mode can be more aggressive if LIMIT optimization is on
 -

 Key: HIVE-2385
 URL: https://issues.apache.org/jira/browse/HIVE-2385
 Project: Hive
  Issue Type: Improvement
Reporter: Siying Dong
Assignee: Siying Dong
Priority: Minor
 Attachments: HIVE-2385.1.patch


 Local mode now depends on total input data, but for LIMIT queries with no 
 filtering, the data actually scanned can be much less and it's relatively 
 predictable. We can place local mode more aggressively.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2272) add TIMESTAMP data type

2011-08-12 Thread Siying Dong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2272:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks Franklin!

 add TIMESTAMP data type
 ---

 Key: HIVE-2272
 URL: https://issues.apache.org/jira/browse/HIVE-2272
 Project: Hive
  Issue Type: New Feature
Reporter: Franklin Hu
Assignee: Franklin Hu
 Fix For: 0.8.0

 Attachments: hive-2272.1.patch, hive-2272.10.patch, 
 hive-2272.11.patch, hive-2272.2.patch, hive-2272.3.patch, hive-2272.4.patch, 
 hive-2272.5.patch, hive-2272.6.patch, hive-2272.7.patch, hive-2272.8.patch, 
 hive-2272.9.patch


 Add TIMESTAMP type to serde2 that supports unix timestamp (1970-01-01 
 00:00:01 UTC to 2038-01-19 03:14:07 UTC) with optional nanosecond precision 
 using both LazyBinary and LazySimple SerDes. 
 For LazySimpleSerDe, the data is stored in jdbc compliant java.sql.Timestamp 
 parsable strings.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HIVE-2282) Local mode needs to work well with block sampling

2011-08-12 Thread Siying Dong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong resolved HIVE-2282.
---

Resolution: Fixed

Committed. Thanks Kevin!

 Local mode needs to work well with block sampling
 -

 Key: HIVE-2282
 URL: https://issues.apache.org/jira/browse/HIVE-2282
 Project: Hive
  Issue Type: Improvement
Reporter: Siying Dong
Assignee: Kevin Wilfong
 Attachments: HIVE-2282.1.patch.txt, HIVE-2282.2.patch.txt, 
 HIVE-2282.3.patch.txt, HIVE-2282.4.patch.txt


 Currently, if block sampling is enabled and large set of data are sampled to 
 a small set, local mode needs to be kicked in. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2272) add TIMESTAMP data type

2011-08-09 Thread Siying Dong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13082007#comment-13082007
 ] 

Siying Dong commented on HIVE-2272:
---

+1, please open a follow up JIRA for setting timezones.

 add TIMESTAMP data type
 ---

 Key: HIVE-2272
 URL: https://issues.apache.org/jira/browse/HIVE-2272
 Project: Hive
  Issue Type: New Feature
Reporter: Franklin Hu
Assignee: Franklin Hu
 Fix For: 0.8.0

 Attachments: hive-2272.1.patch, hive-2272.10.patch, 
 hive-2272.2.patch, hive-2272.3.patch, hive-2272.4.patch, hive-2272.5.patch, 
 hive-2272.6.patch, hive-2272.7.patch, hive-2272.8.patch, hive-2272.9.patch


 Add TIMESTAMP type to serde2 that supports unix timestamp (1970-01-01 
 00:00:01 UTC to 2038-01-19 03:14:07 UTC) with optional nanosecond precision 
 using both LazyBinary and LazySimple SerDes. 
 For LazySimpleSerDe, the data is stored in jdbc compliant java.sql.Timestamp 
 parsable strings.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2309) Incorrect regular expression for extracting task id from filename

2011-07-26 Thread Siying Dong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071409#comment-13071409
 ] 

Siying Dong commented on HIVE-2309:
---

can we limit number of digits for the attempt ID?

 Incorrect regular expression for extracting task id from filename
 -

 Key: HIVE-2309
 URL: https://issues.apache.org/jira/browse/HIVE-2309
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.7.1
Reporter: Paul Yang
Assignee: Paul Yang
Priority: Minor
 Attachments: HIVE-2309.1.patch


 For producing the correct filenames for bucketed tables, there is a method in 
 Utilities.java that extracts out the task id from the filename and replaces 
 it with the bucket number. There is a bug in the regex that is used to 
 extract this value for attempt numbers = 10:
 {code}
  re.match(^.*?([0-9]+)(_[0-9])?(\\..*)?$, 
  'attempt_201107090429_64965_m_001210_10').group(1)
 '10'
  re.match(^.*?([0-9]+)(_[0-9])?(\\..*)?$, 
  'attempt_201107090429_64965_m_001210_9').group(1)
 '001210'
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2309) Incorrect regular expression for extracting task id from filename

2011-07-26 Thread Siying Dong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071420#comment-13071420
 ] 

Siying Dong commented on HIVE-2309:
---

+1, will commit after tests pass

 Incorrect regular expression for extracting task id from filename
 -

 Key: HIVE-2309
 URL: https://issues.apache.org/jira/browse/HIVE-2309
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.7.1
Reporter: Paul Yang
Assignee: Paul Yang
Priority: Minor
 Attachments: HIVE-2309.1.patch, HIVE-2309.2.patch


 For producing the correct filenames for bucketed tables, there is a method in 
 Utilities.java that extracts out the task id from the filename and replaces 
 it with the bucket number. There is a bug in the regex that is used to 
 extract this value for attempt numbers = 10:
 {code}
  re.match(^.*?([0-9]+)(_[0-9])?(\\..*)?$, 
  'attempt_201107090429_64965_m_001210_10').group(1)
 '10'
  re.match(^.*?([0-9]+)(_[0-9])?(\\..*)?$, 
  'attempt_201107090429_64965_m_001210_9').group(1)
 '001210'
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2248) Comparison Operators convert number types to common type instead of double if possible

2011-07-26 Thread Siying Dong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2248:
--

Summary: Comparison Operators convert number types to common type instead 
of double if possible  (was: Comparison Operators convert number types to 
common type instead of double if necessary)

 Comparison Operators convert number types to common type instead of double if 
 possible
 --

 Key: HIVE-2248
 URL: https://issues.apache.org/jira/browse/HIVE-2248
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Siying Dong
Assignee: Siying Dong
 Fix For: 0.8.0

 Attachments: HIVE-2248.1.patch


 Now if the two sides of comparison is of different type, we always convert 
 both to double and compare. It was a slight regression from the change in 
 https://issues.apache.org/jira/browse/HIVE-1638. The old UDFOPComparison, 
 using GenericUDFBridge, always tried to find common type first.
 The worse case is this: If you did WHERE BIGINT_COLUMN = 0 , we always 
 convert the column and 0 to double and compare, which is wasteful, though it 
 is usually a minor costs in the system. But it is easy to fix.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2236) Cli: Print Hadoop's CPU milliseconds

2011-07-25 Thread Siying Dong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2236:
--

Attachment: HIVE-2236.4.patch

 Cli: Print Hadoop's CPU milliseconds
 

 Key: HIVE-2236
 URL: https://issues.apache.org/jira/browse/HIVE-2236
 Project: Hive
  Issue Type: New Feature
  Components: CLI
Reporter: Siying Dong
Assignee: Siying Dong
Priority: Minor
 Attachments: HIVE-2236.1.patch, HIVE-2236.2.patch, HIVE-2236.3.patch, 
 HIVE-2236.4.patch


 CPU Milliseonds information is available from Hadoop's framework. Printing it 
 out to Hive CLI when executing a job will help users to know more about their 
 jobs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2249) When creating constant expression for numbers, try to infer type from another comparison operand, instead of trying to use integer first, and then long and double

2011-07-25 Thread Siying Dong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13070811#comment-13070811
 ] 

Siying Dong commented on HIVE-2249:
---

Joseph, can you handle the string case too?

 When creating constant expression for numbers, try to infer type from another 
 comparison operand, instead of trying to use integer first, and then long and 
 double
 --

 Key: HIVE-2249
 URL: https://issues.apache.org/jira/browse/HIVE-2249
 Project: Hive
  Issue Type: Improvement
Reporter: Siying Dong
Assignee: Joseph Barillari
 Attachments: HIVE-2249.1.patch.txt


 The current code to build constant expression for numbers, here is the code:
  try {
 v = Double.valueOf(expr.getText());
 v = Long.valueOf(expr.getText());
 v = Integer.valueOf(expr.getText());
   } catch (NumberFormatException e) {
 // do nothing here, we will throw an exception in the following block
   }
   if (v == null) {
 throw new SemanticException(ErrorMsg.INVALID_NUMERICAL_CONSTANT
 .getMsg(expr));
   }
   return new ExprNodeConstantDesc(v);
 The for the case that WHERE BIG_INT_COLUMN = 0, or WHERE DOUBLE_COLUMN 
 = 0, we always have to do a type conversion when comparing, which is 
 unnecessary if it is slightly smarter to choose type when creating the 
 constant expression. We can simply walk one level up the tree, find another 
 comparison party and use the same type with that one if it is possible. For 
 user's wrong query like 'INT_COLUMN=1.1', we can even do more.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2282) Local mode needs to work well with block sampling

2011-07-25 Thread Siying Dong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13070865#comment-13070865
 ] 

Siying Dong commented on HIVE-2282:
---

I don't know why but I ran the test suites twice and both failed. Can you 
rebase your codes and try to run the whole test suites and see whether all the 
tests pass? I'll try again too.

 Local mode needs to work well with block sampling
 -

 Key: HIVE-2282
 URL: https://issues.apache.org/jira/browse/HIVE-2282
 Project: Hive
  Issue Type: Improvement
Reporter: Siying Dong
Assignee: Kevin Wilfong
 Attachments: HIVE-2282.1.patch.txt, HIVE-2282.2.patch.txt, 
 HIVE-2282.3.patch.txt, HIVE-2282.4.patch.txt


 Currently, if block sampling is enabled and large set of data are sampled to 
 a small set, local mode needs to be kicked in. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2282) Local mode needs to work well with block sampling


[ 
https://issues.apache.org/jira/browse/HIVE-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069610#comment-13069610
 ] 

Siying Dong commented on HIVE-2282:
---

Kevin, you forgot to add file 
ql/src/test/results/clientpositive/sample_islocalmode_hook.q.out to the patch.

 Local mode needs to work well with block sampling
 -

 Key: HIVE-2282
 URL: https://issues.apache.org/jira/browse/HIVE-2282
 Project: Hive
  Issue Type: Improvement
Reporter: Siying Dong
Assignee: Kevin Wilfong
 Attachments: HIVE-2282.1.patch.txt, HIVE-2282.2.patch.txt, 
 HIVE-2282.3.patch.txt


 Currently, if block sampling is enabled and large set of data are sampled to 
 a small set, local mode needs to be kicked in. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2282) Local mode needs to work well with block sampling


[ 
https://issues.apache.org/jira/browse/HIVE-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069611#comment-13069611
 ] 

Siying Dong commented on HIVE-2282:
---

Also, query like select key, value from sih_src tablesample(1 percent) 
actually doesn't generate stable result. You can use select count(1) instead. 
That will generate correct results.

 Local mode needs to work well with block sampling
 -

 Key: HIVE-2282
 URL: https://issues.apache.org/jira/browse/HIVE-2282
 Project: Hive
  Issue Type: Improvement
Reporter: Siying Dong
Assignee: Kevin Wilfong
 Attachments: HIVE-2282.1.patch.txt, HIVE-2282.2.patch.txt, 
 HIVE-2282.3.patch.txt


 Currently, if block sampling is enabled and large set of data are sampled to 
 a small set, local mode needs to be kicked in. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2296) bad compressed file names from insert into


 [ 
https://issues.apache.org/jira/browse/HIVE-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2296:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

committed. Thanks Franklin!

 bad compressed file names from insert into
 --

 Key: HIVE-2296
 URL: https://issues.apache.org/jira/browse/HIVE-2296
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Franklin Hu
Assignee: Franklin Hu
 Fix For: 0.8.0

 Attachments: hive-2296.1.patch, hive-2296.2.patch


 When INSERT INTO is run on a table with compressed output 
 (hive.exec.compress.output=true) and existing files in the table, it may copy 
 the new files in bad file names:
 Before INSERT INTO:
 00_0.gz
 After INSERT INTO:
 00_0.gz
 00_0.gz_copy_1
 This causes corrupted output when doing a SELECT * on the table.
 Correct behavior should be to pick a valid filename such as:
 00_0_copy_1.gz

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HIVE-2249) When creating constant expression for numbers, try to infer type from another comparison operand, instead of trying to use integer first, and then long and double


 [ 
https://issues.apache.org/jira/browse/HIVE-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong reassigned HIVE-2249:
-

Assignee: Joseph Barillari

 When creating constant expression for numbers, try to infer type from another 
 comparison operand, instead of trying to use integer first, and then long and 
 double
 --

 Key: HIVE-2249
 URL: https://issues.apache.org/jira/browse/HIVE-2249
 Project: Hive
  Issue Type: Improvement
Reporter: Siying Dong
Assignee: Joseph Barillari
 Attachments: HIVE-2249.1.patch.txt


 The current code to build constant expression for numbers, here is the code:
  try {
 v = Double.valueOf(expr.getText());
 v = Long.valueOf(expr.getText());
 v = Integer.valueOf(expr.getText());
   } catch (NumberFormatException e) {
 // do nothing here, we will throw an exception in the following block
   }
   if (v == null) {
 throw new SemanticException(ErrorMsg.INVALID_NUMERICAL_CONSTANT
 .getMsg(expr));
   }
   return new ExprNodeConstantDesc(v);
 The for the case that WHERE BIG_INT_COLUMN = 0, or WHERE DOUBLE_COLUMN 
 = 0, we always have to do a type conversion when comparing, which is 
 unnecessary if it is slightly smarter to choose type when creating the 
 constant expression. We can simply walk one level up the tree, find another 
 comparison party and use the same type with that one if it is possible. For 
 user's wrong query like 'INT_COLUMN=1.1', we can even do more.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2236) Cli: Print Hadoop's CPU milliseconds


 [ 
https://issues.apache.org/jira/browse/HIVE-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2236:
--

Attachment: HIVE-2236.3.patch

fix a bug

 Cli: Print Hadoop's CPU milliseconds
 

 Key: HIVE-2236
 URL: https://issues.apache.org/jira/browse/HIVE-2236
 Project: Hive
  Issue Type: New Feature
  Components: CLI
Reporter: Siying Dong
Assignee: Siying Dong
Priority: Minor
 Attachments: HIVE-2236.1.patch, HIVE-2236.2.patch, HIVE-2236.3.patch


 CPU Milliseonds information is available from Hadoop's framework. Printing it 
 out to Hive CLI when executing a job will help users to know more about their 
 jobs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2236) Cli: Print Hadoop's CPU milliseconds


 [ 
https://issues.apache.org/jira/browse/HIVE-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2236:
--

Status: Open  (was: Patch Available)

 Cli: Print Hadoop's CPU milliseconds
 

 Key: HIVE-2236
 URL: https://issues.apache.org/jira/browse/HIVE-2236
 Project: Hive
  Issue Type: New Feature
  Components: CLI
Reporter: Siying Dong
Assignee: Siying Dong
Priority: Minor
 Attachments: HIVE-2236.1.patch, HIVE-2236.2.patch, HIVE-2236.3.patch


 CPU Milliseonds information is available from Hadoop's framework. Printing it 
 out to Hive CLI when executing a job will help users to know more about their 
 jobs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2236) Cli: Print Hadoop's CPU milliseconds


 [ 
https://issues.apache.org/jira/browse/HIVE-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2236:
--

Status: Patch Available  (was: Open)

 Cli: Print Hadoop's CPU milliseconds
 

 Key: HIVE-2236
 URL: https://issues.apache.org/jira/browse/HIVE-2236
 Project: Hive
  Issue Type: New Feature
  Components: CLI
Reporter: Siying Dong
Assignee: Siying Dong
Priority: Minor
 Attachments: HIVE-2236.1.patch, HIVE-2236.2.patch, HIVE-2236.3.patch


 CPU Milliseonds information is available from Hadoop's framework. Printing it 
 out to Hive CLI when executing a job will help users to know more about their 
 jobs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2247) ALTER TABLE RENAME PARTITION


[ 
https://issues.apache.org/jira/browse/HIVE-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069126#comment-13069126
 ] 

Siying Dong commented on HIVE-2247:
---

I'm looking at the patch. Please test the backward compatible between the old 
server, new client and new server, old client. Please come by if you don't know 
how to test it.

 ALTER TABLE RENAME PARTITION
 

 Key: HIVE-2247
 URL: https://issues.apache.org/jira/browse/HIVE-2247
 Project: Hive
  Issue Type: New Feature
Reporter: Siying Dong
Assignee: Weiyan Wang
 Attachments: HIVE-2247.3.patch.txt, HIVE-2247.4.patch.txt, 
 HIVE-2247.5.patch.txt


 We need a ALTER TABLE TABLE RENAME PARTITIONfunction that is similar t ALTER 
 TABLE RENAME.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2296) bad compressed file names from insert into


[ 
https://issues.apache.org/jira/browse/HIVE-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069314#comment-13069314
 ] 

Siying Dong commented on HIVE-2296:
---

+1

 bad compressed file names from insert into
 --

 Key: HIVE-2296
 URL: https://issues.apache.org/jira/browse/HIVE-2296
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Franklin Hu
Assignee: Franklin Hu
 Fix For: 0.8.0

 Attachments: hive-2296.1.patch, hive-2296.2.patch


 When INSERT INTO is run on a table with compressed output 
 (hive.exec.compress.output=true) and existing files in the table, it may copy 
 the new files in bad file names:
 Before INSERT INTO:
 00_0.gz
 After INSERT INTO:
 00_0.gz
 00_0.gz_copy_1
 This causes corrupted output when doing a SELECT * on the table.
 Correct behavior should be to pick a valid filename such as:
 00_0_copy_1.gz

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2236) Cli: Print Hadoop's CPU milliseconds

2011-07-20 Thread Siying Dong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2236:
--

Attachment: HIVE-2236.2.patch

remove the MapRedStat list from DriverContext and add more counters.

 Cli: Print Hadoop's CPU milliseconds
 

 Key: HIVE-2236
 URL: https://issues.apache.org/jira/browse/HIVE-2236
 Project: Hive
  Issue Type: New Feature
  Components: CLI
Reporter: Siying Dong
Assignee: Siying Dong
Priority: Minor
 Attachments: HIVE-2236.1.patch, HIVE-2236.2.patch


 CPU Milliseonds information is available from Hadoop's framework. Printing it 
 out to Hive CLI when executing a job will help users to know more about their 
 jobs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2201) reduce name node calls in hive by creating temporary directories

2011-07-20 Thread Siying Dong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2201:
--

Attachment: HIVE-2201.4.patch

1. change block merge task too
2. change the capital file name

 reduce name node calls in hive by creating temporary directories
 

 Key: HIVE-2201
 URL: https://issues.apache.org/jira/browse/HIVE-2201
 Project: Hive
  Issue Type: Improvement
Reporter: Namit Jain
Assignee: Siying Dong
 Attachments: HIVE-2201.1.patch, HIVE-2201.2.patch, HIVE-2201.3.patch, 
 HIVE-2201.4.patch


 Currently, in Hive, when a file gets written by a FileSinkOperator,
 the sequence of operations is as follows:
 1. In tmp directory tmp1, create a tmp file _tmp_1
 2. At the end of the operator, move
 /tmp1/_tmp_1 to /tmp1/1
 3. Move directory /tmp1 to /tmp2
 4. For all files in /tmp2, remove all files starting with _tmp and
 duplicate files.
 Due to speculative execution, a lot of temporary files are created
 in /tmp1 (or /tmp2). This leads to a lot of name node calls,
 specially for large queries.
 The protocol above can be modified slightly:
 1. In tmp directory tmp1, create a tmp file _tmp_1
 2. At the end of the operator, move
 /tmp1/_tmp_1 to /tmp2/1
 3. Move directory /tmp2 to /tmp3
 4. For all files in /tmp3, remove all duplicate files.
 This should reduce the number of tmp files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2282) Local mode needs to work well with block sampling

2011-07-15 Thread Siying Dong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13066299#comment-13066299
 ] 

Siying Dong commented on HIVE-2282:
---

+1, will commit after testing.

 Local mode needs to work well with block sampling
 -

 Key: HIVE-2282
 URL: https://issues.apache.org/jira/browse/HIVE-2282
 Project: Hive
  Issue Type: Improvement
Reporter: Siying Dong
Assignee: Kevin Wilfong
 Attachments: HIVE-2282.1.patch.txt, HIVE-2282.2.patch.txt, 
 HIVE-2282.3.patch.txt


 Currently, if block sampling is enabled and large set of data are sampled to 
 a small set, local mode needs to be kicked in. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HIVE-2247) ALTER TABLE RENAME PARTITION


 [ 
https://issues.apache.org/jira/browse/HIVE-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong reassigned HIVE-2247:
-

Assignee: Weiyan Wang

 ALTER TABLE RENAME PARTITION
 

 Key: HIVE-2247
 URL: https://issues.apache.org/jira/browse/HIVE-2247
 Project: Hive
  Issue Type: New Feature
Reporter: Siying Dong
Assignee: Weiyan Wang

 We need a ALTER TABLE TABLE RENAME PARTITIONfunction that is similar t ALTER 
 TABLE RENAME.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-2282) Local mode needs to work well with block sampling

Local mode needs to work well with block sampling
-

 Key: HIVE-2282
 URL: https://issues.apache.org/jira/browse/HIVE-2282
 Project: Hive
  Issue Type: Improvement
Reporter: Siying Dong


Currently, if block sampling is enabled and large set of data are sampled to a 
small set, local mode needs to be kicked in. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2272) add TIMESTAMP data type


[ 
https://issues.apache.org/jira/browse/HIVE-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13064935#comment-13064935
 ] 

Siying Dong commented on HIVE-2272:
---

Can you add it to review board?

 add TIMESTAMP data type
 ---

 Key: HIVE-2272
 URL: https://issues.apache.org/jira/browse/HIVE-2272
 Project: Hive
  Issue Type: New Feature
Reporter: Franklin Hu
Assignee: Franklin Hu
 Attachments: hive-2272.1.patch, hive-2272.2.patch, hive-2272.3.patch


 Add TIMESTAMP type to serde2 that supports unix timestamp (1970-01-01 
 00:00:01 UTC to 2038-01-19 03:14:07 UTC) with optional nanosecond precision 
 using both LazyBinary and LazySimple SerDes. 
 For LazySimpleSerDe, the data is stored in jdbc compliant java.sql.Timestamp 
 parsable strings.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2247) ALTER TABLE RENAME PARTITION


[ 
https://issues.apache.org/jira/browse/HIVE-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13064994#comment-13064994
 ] 

Siying Dong commented on HIVE-2247:
---

Sorry for the confusion. I just meaned to change the directory name where the 
data is, and change the location parameter in the partition metadata.
If we decide not to change physical path, we just change partition name. If we 
need to change the physical path, then we need to change partition name and 
location.



 ALTER TABLE RENAME PARTITION
 

 Key: HIVE-2247
 URL: https://issues.apache.org/jira/browse/HIVE-2247
 Project: Hive
  Issue Type: New Feature
Reporter: Siying Dong
Assignee: Weiyan Wang
 Attachments: HIVE-2247.3.patch.txt


 We need a ALTER TABLE TABLE RENAME PARTITIONfunction that is similar t ALTER 
 TABLE RENAME.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2247) ALTER TABLE RENAME PARTITION


[ 
https://issues.apache.org/jira/browse/HIVE-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13063457#comment-13063457
 ] 

Siying Dong commented on HIVE-2247:
---

The use case of use is that we want to have sanity check for the quality of the 
data in a temp partition name before we move the data to the partition that 
people consider that the partition is ready. We want to avoid data scanning for 
this operation.

 ALTER TABLE RENAME PARTITION
 

 Key: HIVE-2247
 URL: https://issues.apache.org/jira/browse/HIVE-2247
 Project: Hive
  Issue Type: New Feature
Reporter: Siying Dong

 We need a ALTER TABLE TABLE RENAME PARTITIONfunction that is similar t ALTER 
 TABLE RENAME.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-1721) use bloom filters to improve the performance of joins


[ 
https://issues.apache.org/jira/browse/HIVE-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13063501#comment-13063501
 ] 

Siying Dong commented on HIVE-1721:
---

Andrew, what do you mean by the filter could be built in parallel with an MR 
job? Our initial plan was to only build filter based on smaller tables and 
apply the filter against the big table to reduce data to be shuffled. 

For the syntax, the plan is to use syntax like MAPJOIN. We can do something 
like SELECT /*+ BLOOMFILTER(t1) +*/ ... FROM t1 JOIN t2 ...

 use bloom filters to improve the performance of joins
 -

 Key: HIVE-1721
 URL: https://issues.apache.org/jira/browse/HIVE-1721
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: J. Andrew Key
  Labels: optimization

 In case of map-joins, it is likely that the big table will not find many 
 matching rows from the small table.
 Currently, we perform a hash-map lookup for every row in the big table, which 
 can be pretty expensive.
 It might be useful to try out a bloom-filter containing all the elements in 
 the small table.
 Each element from the big table is first searched in the bloom filter, and 
 only in case of a positive match,
 the small table hash table is explored.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-306) Support INSERT [INTO] destination


 [ 
https://issues.apache.org/jira/browse/HIVE-306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-306:
-

Status: Patch Available  (was: Open)

 Support INSERT [INTO] destination
 ---

 Key: HIVE-306
 URL: https://issues.apache.org/jira/browse/HIVE-306
 Project: Hive
  Issue Type: New Feature
Reporter: Zheng Shao
Assignee: Franklin Hu
 Attachments: hive-306.1.patch, hive-306.2.patch, hive-306.3.patch, 
 hive-306.4.patch


 Currently hive only supports INSERT OVERWRITE destination. We should 
 support INSERT [INTO] destination.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2236) Cli: Print Hadoop's CPU milliseconds


 [ 
https://issues.apache.org/jira/browse/HIVE-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2236:
--

Status: Patch Available  (was: Open)

 Cli: Print Hadoop's CPU milliseconds
 

 Key: HIVE-2236
 URL: https://issues.apache.org/jira/browse/HIVE-2236
 Project: Hive
  Issue Type: New Feature
  Components: CLI
Reporter: Siying Dong
Assignee: Siying Dong
Priority: Minor
 Attachments: HIVE-2236.1.patch


 CPU Milliseonds information is available from Hadoop's framework. Printing it 
 out to Hive CLI when executing a job will help users to know more about their 
 jobs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-2247) CREATE TABLE RENAME PARTITION

CREATE TABLE RENAME PARTITION
-

 Key: HIVE-2247
 URL: https://issues.apache.org/jira/browse/HIVE-2247
 Project: Hive
  Issue Type: New Feature
Reporter: Siying Dong


We need a ALTER TABLE TABLE RENAME PARTITIONfunction that is similar t ALTER 
TABLE RENAME.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-2248) Comparison Operators convert number types to common type instead of double if necessary

Comparison Operators convert number types to common type instead of double if 
necessary
---

 Key: HIVE-2248
 URL: https://issues.apache.org/jira/browse/HIVE-2248
 Project: Hive
  Issue Type: Bug
Reporter: Siying Dong
Assignee: Siying Dong


Now if the two sides of comparison is of different type, we always convert both 
to double and compare. It was a slight regression from the change in 
https://issues.apache.org/jira/browse/HIVE-1638. The old UDFOPComparison, 
using GenericUDFBridge, always tried to find common type first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2248) Comparison Operators convert number types to common type instead of double if necessary

[
https://issues.apache.org/jira/browse/HIVE-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Siying Dong updated HIVE-2248:
--

Description:
Now if the two sides of comparison is of different type, we always convert both
to double and compare. It was a slight regression from the change in
https://issues.apache.org/jira/browse/HIVE-1638. The old UDFOPComparison,
using GenericUDFBridge, always tried to find common type first.

The worse case is this: If you did WHERE BIGINT_COLUMN = 0 , we always
convert the column and 0 to double and compare, which is wasteful, though it is
usually a minor costs in the system. But it is easy to fix.

was:Now if the two sides of comparison is of different type, we always
convert both to double and compare. It was a slight regression from the change
in https://issues.apache.org/jira/browse/HIVE-1638. The old UDFOPComparison,
using GenericUDFBridge, always tried to find common type first.

Comparison Operators convert number types to common type instead of double if
necessary
---

Key: HIVE-2248
URL: https://issues.apache.org/jira/browse/HIVE-2248
Project: Hive
Issue Type: Bug
Reporter: Siying Dong
Assignee: Siying Dong

Now if the two sides of comparison is of different type, we always convert
both to double and compare. It was a slight regression from the change in
https://issues.apache.org/jira/browse/HIVE-1638. The old UDFOPComparison,
using GenericUDFBridge, always tried to find common type first.
The worse case is this: If you did WHERE BIGINT_COLUMN = 0 , we always
convert the column and 0 to double and compare, which is wasteful, though it
is usually a minor costs in the system. But it is easy to fix.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2248) Comparison Operators convert number types to common type instead of double if necessary


 [ 
https://issues.apache.org/jira/browse/HIVE-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2248:
--

Status: Patch Available  (was: Open)

 Comparison Operators convert number types to common type instead of double if 
 necessary
 ---

 Key: HIVE-2248
 URL: https://issues.apache.org/jira/browse/HIVE-2248
 Project: Hive
  Issue Type: Bug
Reporter: Siying Dong
Assignee: Siying Dong
 Attachments: HIVE-2248.1.patch


 Now if the two sides of comparison is of different type, we always convert 
 both to double and compare. It was a slight regression from the change in 
 https://issues.apache.org/jira/browse/HIVE-1638. The old UDFOPComparison, 
 using GenericUDFBridge, always tried to find common type first.
 The worse case is this: If you did WHERE BIGINT_COLUMN = 0 , we always 
 convert the column and 0 to double and compare, which is wasteful, though it 
 is usually a minor costs in the system. But it is easy to fix.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2248) Comparison Operators convert number types to common type instead of double if necessary


 [ 
https://issues.apache.org/jira/browse/HIVE-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2248:
--

Attachment: HIVE-2248.1.patch

 Comparison Operators convert number types to common type instead of double if 
 necessary
 ---

 Key: HIVE-2248
 URL: https://issues.apache.org/jira/browse/HIVE-2248
 Project: Hive
  Issue Type: Bug
Reporter: Siying Dong
Assignee: Siying Dong
 Attachments: HIVE-2248.1.patch


 Now if the two sides of comparison is of different type, we always convert 
 both to double and compare. It was a slight regression from the change in 
 https://issues.apache.org/jira/browse/HIVE-1638. The old UDFOPComparison, 
 using GenericUDFBridge, always tried to find common type first.
 The worse case is this: If you did WHERE BIGINT_COLUMN = 0 , we always 
 convert the column and 0 to double and compare, which is wasteful, though it 
 is usually a minor costs in the system. But it is easy to fix.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-2249) When creating constant expression for numbers, try to infer type from another comparison operand, instead of trying to use integer first, and then long and double

When creating constant expression for numbers, try to infer type from another 
comparison operand, instead of trying to use integer first, and then long and 
double
--

 Key: HIVE-2249
 URL: https://issues.apache.org/jira/browse/HIVE-2249
 Project: Hive
  Issue Type: Improvement
Reporter: Siying Dong


The current code to build constant expression for numbers, here is the code:

 try {
v = Double.valueOf(expr.getText());
v = Long.valueOf(expr.getText());
v = Integer.valueOf(expr.getText());
  } catch (NumberFormatException e) {
// do nothing here, we will throw an exception in the following block
  }
  if (v == null) {
throw new SemanticException(ErrorMsg.INVALID_NUMERICAL_CONSTANT
.getMsg(expr));
  }
  return new ExprNodeConstantDesc(v);


The for the case that WHERE BIG_INT_COLUMN = 0, or WHERE DOUBLE_COLUMN = 
0, we always have to do a type conversion when comparing, which is unnecessary 
if it is slightly smarter to choose type when creating the constant expression. 
We can simply walk one level up the tree, find another comparison party and use 
the same type with that one if it is possible. For user's wrong query like 
'INT_COLUMN=1.1', we can even do more.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-306) Support INSERT [INTO] destination


[ 
https://issues.apache.org/jira/browse/HIVE-306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058140#comment-13058140
 ] 

Siying Dong commented on HIVE-306:
--

+1. Looks good to me for now. I'm running tests. If it is committed, please 
open a follow-up JIRA for making moving files more efficient and compacting 
smaller files smarter for it.

 Support INSERT [INTO] destination
 ---

 Key: HIVE-306
 URL: https://issues.apache.org/jira/browse/HIVE-306
 Project: Hive
  Issue Type: New Feature
Reporter: Zheng Shao
Assignee: Franklin Hu
 Attachments: hive-306.1.patch, hive-306.2.patch


 Currently hive only supports INSERT OVERWRITE destination. We should 
 support INSERT [INTO] destination.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2035) Use block-level merge for RCFile if merging intermediate results are needed

2011-06-27 Thread Siying Dong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055355#comment-13055355
 ] 

Siying Dong commented on HIVE-2035:
---

+1, will run regression tests

 Use block-level merge for RCFile if merging intermediate results are needed
 ---

 Key: HIVE-2035
 URL: https://issues.apache.org/jira/browse/HIVE-2035
 Project: Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Franklin Hu
 Attachments: hive-2035.1.patch, hive-2035.3.patch


 Currently if hive.merge.mapredfiles and/or hive.merge.mapfile is set to true 
 the intermediate data could be merged using an additional MapReduce job. This 
 could be quite expensive if the data size is large. With HIVE-1950, merging 
 can be done in the RCFile block level so that it bypasses the 
 (de-)compression, (de-)serialization phases. This could improve the merge 
 process significantly. 
 This JIRA should handle the case where the input table is not stored in 
 RCFile, but the destination table is (which requires the intermediate data 
 should be stored in the same format as the destination table). 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2035) Use block-level merge for RCFile if merging intermediate results are needed

2011-06-27 Thread Siying Dong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2035:
--

Status: Patch Available  (was: Open)

 Use block-level merge for RCFile if merging intermediate results are needed
 ---

 Key: HIVE-2035
 URL: https://issues.apache.org/jira/browse/HIVE-2035
 Project: Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Franklin Hu
 Attachments: hive-2035.1.patch, hive-2035.3.patch


 Currently if hive.merge.mapredfiles and/or hive.merge.mapfile is set to true 
 the intermediate data could be merged using an additional MapReduce job. This 
 could be quite expensive if the data size is large. With HIVE-1950, merging 
 can be done in the RCFile block level so that it bypasses the 
 (de-)compression, (de-)serialization phases. This could improve the merge 
 process significantly. 
 This JIRA should handle the case where the input table is not stored in 
 RCFile, but the destination table is (which requires the intermediate data 
 should be stored in the same format as the destination table). 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2035) Use block-level merge for RCFile if merging intermediate results are needed

2011-06-27 Thread Siying Dong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13056205#comment-13056205
 ] 

Siying Dong commented on HIVE-2035:
---

committed

 Use block-level merge for RCFile if merging intermediate results are needed
 ---

 Key: HIVE-2035
 URL: https://issues.apache.org/jira/browse/HIVE-2035
 Project: Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Franklin Hu
 Attachments: hive-2035.1.patch, hive-2035.3.patch


 Currently if hive.merge.mapredfiles and/or hive.merge.mapfile is set to true 
 the intermediate data could be merged using an additional MapReduce job. This 
 could be quite expensive if the data size is large. With HIVE-1950, merging 
 can be done in the RCFile block level so that it bypasses the 
 (de-)compression, (de-)serialization phases. This could improve the merge 
 process significantly. 
 This JIRA should handle the case where the input table is not stored in 
 RCFile, but the destination table is (which requires the intermediate data 
 should be stored in the same format as the destination table). 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-2236) Cli: Print Hadoop's CPU milliseconds

2011-06-23 Thread Siying Dong (JIRA)

Cli: Print Hadoop's CPU milliseconds


 Key: HIVE-2236
 URL: https://issues.apache.org/jira/browse/HIVE-2236
 Project: Hive
  Issue Type: New Feature
  Components: CLI
Reporter: Siying Dong
Priority: Minor


CPU Milliseonds information is available from Hadoop's framework. Printing it 
out to Hive CLI when executing a job will help users to know more about their 
jobs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2236) Cli: Print Hadoop's CPU milliseconds

2011-06-23 Thread Siying Dong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2236:
--

Status: Patch Available  (was: Open)

 Cli: Print Hadoop's CPU milliseconds
 

 Key: HIVE-2236
 URL: https://issues.apache.org/jira/browse/HIVE-2236
 Project: Hive
  Issue Type: New Feature
  Components: CLI
Reporter: Siying Dong
Assignee: Siying Dong
Priority: Minor
 Attachments: HIVE-2236.1.patch


 CPU Milliseonds information is available from Hadoop's framework. Printing it 
 out to Hive CLI when executing a job will help users to know more about their 
 jobs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2201) reduce name node calls in hive by creating temporary directories

2011-06-23 Thread Siying Dong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054188#comment-13054188
 ] 

Siying Dong commented on HIVE-2201:
---

ping

 reduce name node calls in hive by creating temporary directories
 

 Key: HIVE-2201
 URL: https://issues.apache.org/jira/browse/HIVE-2201
 Project: Hive
  Issue Type: Improvement
Reporter: Namit Jain
Assignee: Siying Dong
 Attachments: HIVE-2201.1.patch, HIVE-2201.2.patch, HIVE-2201.3.patch


 Currently, in Hive, when a file gets written by a FileSinkOperator,
 the sequence of operations is as follows:
 1. In tmp directory tmp1, create a tmp file _tmp_1
 2. At the end of the operator, move
 /tmp1/_tmp_1 to /tmp1/1
 3. Move directory /tmp1 to /tmp2
 4. For all files in /tmp2, remove all files starting with _tmp and
 duplicate files.
 Due to speculative execution, a lot of temporary files are created
 in /tmp1 (or /tmp2). This leads to a lot of name node calls,
 specially for large queries.
 The protocol above can be modified slightly:
 1. In tmp directory tmp1, create a tmp file _tmp_1
 2. At the end of the operator, move
 /tmp1/_tmp_1 to /tmp2/1
 3. Move directory /tmp2 to /tmp3
 4. For all files in /tmp3, remove all duplicate files.
 This should reduce the number of tmp files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2035) Use block-level merge for RCFile if merging intermediate results are needed

2011-06-17 Thread Siying Dong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051415#comment-13051415
 ] 

Siying Dong commented on HIVE-2035:
---

will take a look.

 Use block-level merge for RCFile if merging intermediate results are needed
 ---

 Key: HIVE-2035
 URL: https://issues.apache.org/jira/browse/HIVE-2035
 Project: Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Franklin Hu
 Attachments: hive-2035.1.patch


 Currently if hive.merge.mapredfiles and/or hive.merge.mapfile is set to true 
 the intermediate data could be merged using an additional MapReduce job. This 
 could be quite expensive if the data size is large. With HIVE-1950, merging 
 can be done in the RCFile block level so that it bypasses the 
 (de-)compression, (de-)serialization phases. This could improve the merge 
 process significantly. 
 This JIRA should handle the case where the input table is not stored in 
 RCFile, but the destination table is (which requires the intermediate data 
 should be stored in the same format as the destination table). 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2201) reduce name node calls in hive by creating temporary directories

2011-06-13 Thread Siying Dong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2201:
--

Status: Patch Available  (was: In Progress)

 reduce name node calls in hive by creating temporary directories
 

 Key: HIVE-2201
 URL: https://issues.apache.org/jira/browse/HIVE-2201
 Project: Hive
  Issue Type: Improvement
Reporter: Namit Jain
Assignee: Siying Dong
 Attachments: HIVE-2201.1.patch, HIVE-2201.2.patch, HIVE-2201.3.patch


 Currently, in Hive, when a file gets written by a FileSinkOperator,
 the sequence of operations is as follows:
 1. In tmp directory tmp1, create a tmp file _tmp_1
 2. At the end of the operator, move
 /tmp1/_tmp_1 to /tmp1/1
 3. Move directory /tmp1 to /tmp2
 4. For all files in /tmp2, remove all files starting with _tmp and
 duplicate files.
 Due to speculative execution, a lot of temporary files are created
 in /tmp1 (or /tmp2). This leads to a lot of name node calls,
 specially for large queries.
 The protocol above can be modified slightly:
 1. In tmp directory tmp1, create a tmp file _tmp_1
 2. At the end of the operator, move
 /tmp1/_tmp_1 to /tmp2/1
 3. Move directory /tmp2 to /tmp3
 4. For all files in /tmp3, remove all duplicate files.
 This should reduce the number of tmp files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2201) reduce name node calls in hive by creating temporary directories


 [ 
https://issues.apache.org/jira/browse/HIVE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2201:
--

Assignee: Siying Dong
 Summary: reduce name node calls in hive by creating temporary directories  
(was: remove name node calls in hive by creating temporary directories)

 reduce name node calls in hive by creating temporary directories
 

 Key: HIVE-2201
 URL: https://issues.apache.org/jira/browse/HIVE-2201
 Project: Hive
  Issue Type: Improvement
Reporter: Namit Jain
Assignee: Siying Dong

 Currently, in Hive, when a file gets written by a FileSinkOperator,
 the sequence of operations is as follows:
 1. In tmp directory tmp1, create a tmp file _tmp_1
 2. At the end of the operator, move
 /tmp1/_tmp_1 to /tmp1/1
 3. Move directory /tmp1 to /tmp2
 4. For all files in /tmp2, remove all files starting with _tmp and
 duplicate files.
 Due to speculative execution, a lot of temporary files are created
 in /tmp1 (or /tmp2). This leads to a lot of name node calls,
 specially for large queries.
 The protocol above can be modified slightly:
 1. In tmp directory tmp1, create a tmp file _tmp_1
 2. At the end of the operator, move
 /tmp1/_tmp_1 to /tmp2/1
 3. Move directory /tmp2 to /tmp3
 4. For all files in /tmp3, remove all duplicate files.
 This should reduce the number of tmp files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2201) reduce name node calls in hive by creating temporary directories


 [ 
https://issues.apache.org/jira/browse/HIVE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2201:
--

Attachment: HIVE-2201.1.patch

Implemented the logic.
Discovered one problem: when moving from /tmp1/_tmp_1 to /tmp2/1, we might need 
to check whether /tmp2 exists before moving it. This patch avoids this call by 
pre-create the temp directory before submitting the job. However, we cannot do 
that for dynamic partitioning as we don't know the directory names. So for 
dynamic partitioning, we have some extra costs added for DFS namenode read. So 
far I think this tradeoff is worthwhile. Potentially this cost can be reduced 
it by caching directories created. We can try that approach as a followup.

 reduce name node calls in hive by creating temporary directories
 

 Key: HIVE-2201
 URL: https://issues.apache.org/jira/browse/HIVE-2201
 Project: Hive
  Issue Type: Improvement
Reporter: Namit Jain
Assignee: Siying Dong
 Attachments: HIVE-2201.1.patch


 Currently, in Hive, when a file gets written by a FileSinkOperator,
 the sequence of operations is as follows:
 1. In tmp directory tmp1, create a tmp file _tmp_1
 2. At the end of the operator, move
 /tmp1/_tmp_1 to /tmp1/1
 3. Move directory /tmp1 to /tmp2
 4. For all files in /tmp2, remove all files starting with _tmp and
 duplicate files.
 Due to speculative execution, a lot of temporary files are created
 in /tmp1 (or /tmp2). This leads to a lot of name node calls,
 specially for large queries.
 The protocol above can be modified slightly:
 1. In tmp directory tmp1, create a tmp file _tmp_1
 2. At the end of the operator, move
 /tmp1/_tmp_1 to /tmp2/1
 3. Move directory /tmp2 to /tmp3
 4. For all files in /tmp3, remove all duplicate files.
 This should reduce the number of tmp files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2201) reduce name node calls in hive by creating temporary directories


 [ 
https://issues.apache.org/jira/browse/HIVE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2201:
--

Status: Patch Available  (was: In Progress)

 reduce name node calls in hive by creating temporary directories
 

 Key: HIVE-2201
 URL: https://issues.apache.org/jira/browse/HIVE-2201
 Project: Hive
  Issue Type: Improvement
Reporter: Namit Jain
Assignee: Siying Dong
 Attachments: HIVE-2201.1.patch


 Currently, in Hive, when a file gets written by a FileSinkOperator,
 the sequence of operations is as follows:
 1. In tmp directory tmp1, create a tmp file _tmp_1
 2. At the end of the operator, move
 /tmp1/_tmp_1 to /tmp1/1
 3. Move directory /tmp1 to /tmp2
 4. For all files in /tmp2, remove all files starting with _tmp and
 duplicate files.
 Due to speculative execution, a lot of temporary files are created
 in /tmp1 (or /tmp2). This leads to a lot of name node calls,
 specially for large queries.
 The protocol above can be modified slightly:
 1. In tmp directory tmp1, create a tmp file _tmp_1
 2. At the end of the operator, move
 /tmp1/_tmp_1 to /tmp2/1
 3. Move directory /tmp2 to /tmp3
 4. For all files in /tmp3, remove all duplicate files.
 This should reduce the number of tmp files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2201) reduce name node calls in hive by creating temporary directories


 [ 
https://issues.apache.org/jira/browse/HIVE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2201:
--

Attachment: (was: HIVE-2201.1.patch)

 reduce name node calls in hive by creating temporary directories
 

 Key: HIVE-2201
 URL: https://issues.apache.org/jira/browse/HIVE-2201
 Project: Hive
  Issue Type: Improvement
Reporter: Namit Jain
Assignee: Siying Dong
 Attachments: HIVE-2201.1.patch


 Currently, in Hive, when a file gets written by a FileSinkOperator,
 the sequence of operations is as follows:
 1. In tmp directory tmp1, create a tmp file _tmp_1
 2. At the end of the operator, move
 /tmp1/_tmp_1 to /tmp1/1
 3. Move directory /tmp1 to /tmp2
 4. For all files in /tmp2, remove all files starting with _tmp and
 duplicate files.
 Due to speculative execution, a lot of temporary files are created
 in /tmp1 (or /tmp2). This leads to a lot of name node calls,
 specially for large queries.
 The protocol above can be modified slightly:
 1. In tmp directory tmp1, create a tmp file _tmp_1
 2. At the end of the operator, move
 /tmp1/_tmp_1 to /tmp2/1
 3. Move directory /tmp2 to /tmp3
 4. For all files in /tmp3, remove all duplicate files.
 This should reduce the number of tmp files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Work started] (HIVE-2201) reduce name node calls in hive by creating temporary directories


 [ 
https://issues.apache.org/jira/browse/HIVE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-2201 started by Siying Dong.

 reduce name node calls in hive by creating temporary directories
 

 Key: HIVE-2201
 URL: https://issues.apache.org/jira/browse/HIVE-2201
 Project: Hive
  Issue Type: Improvement
Reporter: Namit Jain
Assignee: Siying Dong
 Attachments: HIVE-2201.1.patch


 Currently, in Hive, when a file gets written by a FileSinkOperator,
 the sequence of operations is as follows:
 1. In tmp directory tmp1, create a tmp file _tmp_1
 2. At the end of the operator, move
 /tmp1/_tmp_1 to /tmp1/1
 3. Move directory /tmp1 to /tmp2
 4. For all files in /tmp2, remove all files starting with _tmp and
 duplicate files.
 Due to speculative execution, a lot of temporary files are created
 in /tmp1 (or /tmp2). This leads to a lot of name node calls,
 specially for large queries.
 The protocol above can be modified slightly:
 1. In tmp directory tmp1, create a tmp file _tmp_1
 2. At the end of the operator, move
 /tmp1/_tmp_1 to /tmp2/1
 3. Move directory /tmp2 to /tmp3
 4. For all files in /tmp3, remove all duplicate files.
 This should reduce the number of tmp files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2201) reduce name node calls in hive by creating temporary directories


 [ 
https://issues.apache.org/jira/browse/HIVE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2201:
--

Attachment: HIVE-2201.1.patch

 reduce name node calls in hive by creating temporary directories
 

 Key: HIVE-2201
 URL: https://issues.apache.org/jira/browse/HIVE-2201
 Project: Hive
  Issue Type: Improvement
Reporter: Namit Jain
Assignee: Siying Dong
 Attachments: HIVE-2201.1.patch


 Currently, in Hive, when a file gets written by a FileSinkOperator,
 the sequence of operations is as follows:
 1. In tmp directory tmp1, create a tmp file _tmp_1
 2. At the end of the operator, move
 /tmp1/_tmp_1 to /tmp1/1
 3. Move directory /tmp1 to /tmp2
 4. For all files in /tmp2, remove all files starting with _tmp and
 duplicate files.
 Due to speculative execution, a lot of temporary files are created
 in /tmp1 (or /tmp2). This leads to a lot of name node calls,
 specially for large queries.
 The protocol above can be modified slightly:
 1. In tmp directory tmp1, create a tmp file _tmp_1
 2. At the end of the operator, move
 /tmp1/_tmp_1 to /tmp2/1
 3. Move directory /tmp2 to /tmp3
 4. For all files in /tmp3, remove all duplicate files.
 This should reduce the number of tmp files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2201) reduce name node calls in hive by creating temporary directories


 [ 
https://issues.apache.org/jira/browse/HIVE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2201:
--

Attachment: HIVE-2201.2.patch

fix a bug.

 reduce name node calls in hive by creating temporary directories
 

 Key: HIVE-2201
 URL: https://issues.apache.org/jira/browse/HIVE-2201
 Project: Hive
  Issue Type: Improvement
Reporter: Namit Jain
Assignee: Siying Dong
 Attachments: HIVE-2201.1.patch, HIVE-2201.2.patch


 Currently, in Hive, when a file gets written by a FileSinkOperator,
 the sequence of operations is as follows:
 1. In tmp directory tmp1, create a tmp file _tmp_1
 2. At the end of the operator, move
 /tmp1/_tmp_1 to /tmp1/1
 3. Move directory /tmp1 to /tmp2
 4. For all files in /tmp2, remove all files starting with _tmp and
 duplicate files.
 Due to speculative execution, a lot of temporary files are created
 in /tmp1 (or /tmp2). This leads to a lot of name node calls,
 specially for large queries.
 The protocol above can be modified slightly:
 1. In tmp directory tmp1, create a tmp file _tmp_1
 2. At the end of the operator, move
 /tmp1/_tmp_1 to /tmp2/1
 3. Move directory /tmp2 to /tmp3
 4. For all files in /tmp3, remove all duplicate files.
 This should reduce the number of tmp files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-2211) Revert

Revert
--

 Key: HIVE-2211
 URL: https://issues.apache.org/jira/browse/HIVE-2211
 Project: Hive
  Issue Type: Bug
Reporter: Siying Dong


Quick fix a bug caused by HIVE-243

HIVE-234 removed the codes to wait for the threads to finish and use 
ThreadPoolExector.shutdown() to wait for the results. The usage of 
ThreadPoolExecutor.shutdown(), however, is wrong. The codes assume that the 
function blocks until all threads finish running but it actually only marks 
status and won't block. It caused wrong result of Utilities.getInputSummary() 
and caused many jobs are executed as local mode while they have huge data.

Revert those changes quickly. We can have a follow-up to see how to deal with 
this more efficiently if you want.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2211) Revert


 [ 
https://issues.apache.org/jira/browse/HIVE-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2211:
--

Status: Patch Available  (was: Open)

 Revert
 --

 Key: HIVE-2211
 URL: https://issues.apache.org/jira/browse/HIVE-2211
 Project: Hive
  Issue Type: Bug
Reporter: Siying Dong
 Attachments: HIVE-2211.1.patch


 Quick fix a bug caused by HIVE-243
 HIVE-234 removed the codes to wait for the threads to finish and use 
 ThreadPoolExector.shutdown() to wait for the results. The usage of 
 ThreadPoolExecutor.shutdown(), however, is wrong. The codes assume that the 
 function blocks until all threads finish running but it actually only marks 
 status and won't block. It caused wrong result of Utilities.getInputSummary() 
 and caused many jobs are executed as local mode while they have huge data.
 Revert those changes quickly. We can have a follow-up to see how to deal with 
 this more efficiently if you want.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2211) Revert


 [ 
https://issues.apache.org/jira/browse/HIVE-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2211:
--

Attachment: HIVE-2211.1.patch

Just a simple revert. I did a small modification: when catching 
InterruptedException, stop waiting pending threads and exit.

 Revert
 --

 Key: HIVE-2211
 URL: https://issues.apache.org/jira/browse/HIVE-2211
 Project: Hive
  Issue Type: Bug
Reporter: Siying Dong
 Attachments: HIVE-2211.1.patch


 Quick fix a bug caused by HIVE-243
 HIVE-234 removed the codes to wait for the threads to finish and use 
 ThreadPoolExector.shutdown() to wait for the results. The usage of 
 ThreadPoolExecutor.shutdown(), however, is wrong. The codes assume that the 
 function blocks until all threads finish running but it actually only marks 
 status and won't block. It caused wrong result of Utilities.getInputSummary() 
 and caused many jobs are executed as local mode while they have huge data.
 Revert those changes quickly. We can have a follow-up to see how to deal with 
 this more efficiently if you want.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2211) Fix a bug caused by HIVE-243