from:"Prasad Chakka \(JIRA\)"

[jira] Updated: (HIVE-1293) Concurreny Model for Hive

2010-07-01 Thread Prasad Chakka (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Chakka updated HIVE-1293:


Attachment: hive_leases.txt

I had written up an algorithm to create leases in metastore sometime back. Not 
sure how useful it is now but if someone wants to implement leases without 
depending on a 3rd party system this may come handy.

> Concurreny Model for Hive
> -
>
> Key: HIVE-1293
> URL: https://issues.apache.org/jira/browse/HIVE-1293
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Fix For: 0.7.0
>
> Attachments: hive.1293.1.patch, hive_leases.txt
>
>
> Concurrency model for Hive:
> Currently, hive does not provide a good concurrency model. The only 
> guanrantee provided in case of concurrent readers and writers is that
> reader will not see partial data from the old version (before the write) and 
> partial data from the new version (after the write).
> This has come across as a big problem, specially for background processes 
> performing maintenance operations.
> The following possible solutions come to mind.
> 1. Locks: Acquire read/write locks - they can be acquired at the beginning of 
> the query or the write locks can be delayed till move
> task (when the directory is actually moved). Care needs to be taken for 
> deadlocks.
> 2. Versioning: The writer can create a new version if the current version is 
> being read. Note that, it is not equivalent to snapshots,
> the old version can only be accessed by the current readers, and will be 
> deleted when all of them have finished.
> Comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1293) Concurreny Model for Hive

2010-07-01 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884343#action_12884343
 ] 

Prasad Chakka commented on HIVE-1293:
-

same as https://issues.apache.org/jira/browse/HIVE-829 ?



> Concurreny Model for Hive
> -
>
> Key: HIVE-1293
> URL: https://issues.apache.org/jira/browse/HIVE-1293
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Fix For: 0.7.0
>
> Attachments: hive.1293.1.patch
>
>
> Concurrency model for Hive:
> Currently, hive does not provide a good concurrency model. The only 
> guanrantee provided in case of concurrent readers and writers is that
> reader will not see partial data from the old version (before the write) and 
> partial data from the new version (after the write).
> This has come across as a big problem, specially for background processes 
> performing maintenance operations.
> The following possible solutions come to mind.
> 1. Locks: Acquire read/write locks - they can be acquired at the beginning of 
> the query or the write locks can be delayed till move
> task (when the directory is actually moved). Care needs to be taken for 
> deadlocks.
> 2. Versioning: The writer can create a new version if the current version is 
> being read. Note that, it is not equivalent to snapshots,
> the old version can only be accessed by the current readers, and will be 
> deleted when all of them have finished.
> Comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-897) fix inconsistent expectations from table/partition location value

2010-07-01 Thread Prasad Chakka (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Chakka updated HIVE-897:
---

Assignee: (was: Prasad Chakka)

@Velu,
I don't remember details any more. I think the problem is that code expectes 
full URI in some cases (LoadSemanticAnalyzer) and relative URI in some cases. 
The metastore stores whatever is given to it so the metastore db can contain 
either full or relative depending on how the table is created. The purpose of 
this task to make the handling of 'location' parameter uniform throughout the 
code (hive.metastore, hive.ql.metastore & load & move related classes).

Feel free to take over as appropriate.

> fix inconsistent expectations from table/partition location value
> -
>
> Key: HIVE-897
> URL: https://issues.apache.org/jira/browse/HIVE-897
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Prasad Chakka
>
> currently code expects this to be full URI in some locations 
> (LoadSemanticAnalyzer). Also HiveAlterHandle should work in either case. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-55) restrict table and column names to be alphanumeric and _ characters

2010-07-01 Thread Prasad Chakka (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-55?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Chakka updated HIVE-55:
--

Assignee: (was: Prasad Chakka)

Paul, not sure how useful this is any more...

> restrict table and column names to be alphanumeric and _ characters
> ---
>
> Key: HIVE-55
> URL: https://issues.apache.org/jira/browse/HIVE-55
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Prasad Chakka
>
> currently the DDL will restrict to alpha-numeric and _ chars but not if the 
> tables were created or altered using metastore clients directly. this JIRA 
> aims to fix that.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1364) Increase the maximum length of SERDEPROPERTIES values (currently 767 characters)

2010-06-15 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879092#action_12879092
 ] 

Prasad Chakka commented on HIVE-1364:
-

it used to be much higher in the beginning but quite a few users reported 
problems on some mysql dbs. 767 seemed to work most dbs. before committing this 
can someone test this on some different dbs (with and without UTF encoding)?

> Increase the maximum length of SERDEPROPERTIES values (currently 767 
> characters)
> 
>
> Key: HIVE-1364
> URL: https://issues.apache.org/jira/browse/HIVE-1364
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.5.0
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
> Attachments: HIVE-1364.patch
>
>
> The value component of a SERDEPROPERTIES key/value pair is currently limited
> to a maximum length of 767 characters. I believe that the motivation for 
> limiting the length to 
> 767 characters is that this value is the maximum allowed length of an index in
> a MySQL database running on the InnoDB engine: 
> http://bugs.mysql.com/bug.php?id=13315
> * The Metastore OR mapping currently limits many fields (including 
> SERDEPROPERTIES.PARAM_VALUE) to a maximum length of 767 characters despite 
> the fact that these fields are not indexed.
> * The maximum length of a VARCHAR value in MySQL 5.0.3 and later is 65,535.
> * We can expect many users to hit the 767 character limit on 
> SERDEPROPERTIES.PARAM_VALUE when using the hbase.columns.mapping 
> serdeproperty to map a table that has many columns.
> I propose increasing the maximum allowed length of 
> SERDEPROPERTIES.PARAM_VALUE to 8192.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1219) More robust handling of metastore connection failures

2010-03-18 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12847051#action_12847051
 ] 

Prasad Chakka commented on HIVE-1219:
-

@paul, makes sense to add separator in the constants. otherwise people are 
bound to make mistakes.

ObjectStore.java
# good idea to add a lock. But only the first thread that encountered problems 
with its PersistentManager object should try to recreate 
PersistenetManagerFactory and not the subsequent threads. May be you can create 
a new PMF only when PM's reference to PMF match and otherwise. Though I doubt 
how much of this will be real problem for metastore server.

HiveMetastore.java
# updateConnectionURL() throw back the exception instead of just logging and 
returning false. Also there is no need to log when rethrowing the exception. 
Caller would log it anyways if needed (in initConnectionUrlHook)

Do you reload hive conf on every retry?


Otherwise patch looks good.

> More robust handling of metastore connection failures
> -
>
> Key: HIVE-1219
> URL: https://issues.apache.org/jira/browse/HIVE-1219
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Metastore
>Reporter: Paul Yang
>Assignee: Paul Yang
> Fix For: 0.6.0
>
> Attachments: HIVE-1219.1.patch, HIVE-1219.2.patch, HIVE-1219.3.patch, 
> HIVE-1219.4.patch, HIVE-1219.5.patch
>
>
> Currently, if metastore's connection to the datastore is broken, the query 
> fails and the exception such as the following is thrown
> {code}
> 2010-01-28 11:50:20,885 ERROR exec.MoveTask 
> (SessionState.java:printError(248)) - Failed with exception Unable to fetch 
> table tmp_table
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table 
> tmp_table
> at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:362)
> at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:333)
> at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:112)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:99)
> at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:64)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:582)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:462)
> at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:324)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
> at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:200)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:256)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Caused by: javax.jdo.JDODataStoreException: Communications link failure
> Last packet sent to the server was 1 ms ago.
> NestedThrowables:
> com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link 
> failure
> Last packet sent to the server was 1 ms ago.
> at 
> org.datanucleus.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:289)
> {code}
> In order to reduce the impact of transient network issues and momentarily 
> unavailable datastores, two possible improvements are:
> 1. Retrying the metastore command in case of connection failure before 
> propagating up the exception.
> 2. Retrieving the datastore hostname / connection URL through the use of an 
> extension. This extension would be useful in the case where a remote service 
> maintained the location of the currently available datastore. In case of 
> hostname changes or failovers to a backup datastore, the extension would 
> allow hive clients to run without manual intervention.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1219) More robust handling of metastore connection failures

2010-03-17 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846539#action_12846539
 ] 

Prasad Chakka commented on HIVE-1219:
-

for example in create_table(), the logic of creating table in metastore and 
creating dirs in hdfs should be put in a different function and the code block 
of executeWithRetry() should only contain a function call to this new function.

Also I am getting little bit confused since there seems to be some change in 
logic as well along with retries. Can you split the changes in logic and 
retries into separate diffs?

> More robust handling of metastore connection failures
> -
>
> Key: HIVE-1219
> URL: https://issues.apache.org/jira/browse/HIVE-1219
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Metastore
>Reporter: Paul Yang
>Assignee: Paul Yang
> Fix For: 0.6.0
>
> Attachments: HIVE-1219.1.patch, HIVE-1219.2.patch, HIVE-1219.3.patch, 
> HIVE-1219.4.patch
>
>
> Currently, if metastore's connection to the datastore is broken, the query 
> fails and the exception such as the following is thrown
> {code}
> 2010-01-28 11:50:20,885 ERROR exec.MoveTask 
> (SessionState.java:printError(248)) - Failed with exception Unable to fetch 
> table tmp_table
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table 
> tmp_table
> at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:362)
> at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:333)
> at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:112)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:99)
> at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:64)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:582)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:462)
> at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:324)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
> at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:200)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:256)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Caused by: javax.jdo.JDODataStoreException: Communications link failure
> Last packet sent to the server was 1 ms ago.
> NestedThrowables:
> com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link 
> failure
> Last packet sent to the server was 1 ms ago.
> at 
> org.datanucleus.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:289)
> {code}
> In order to reduce the impact of transient network issues and momentarily 
> unavailable datastores, two possible improvements are:
> 1. Retrying the metastore command in case of connection failure before 
> propagating up the exception.
> 2. Retrieving the datastore hostname / connection URL through the use of an 
> extension. This extension would be useful in the case where a remote service 
> maintained the location of the currently available datastore. In case of 
> hostname changes or failovers to a backup datastore, the extension would 
> allow hive clients to run without manual intervention.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1219) More robust handling of metastore connection failures

2010-03-16 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846319#action_12846319
 ] 

Prasad Chakka commented on HIVE-1219:
-

Is it possible to put the code block in a function and just call the function 
inside the body of ExecuteWithRetry()? This would improve readability and the 
core logic is messed around much with retry logic. 

> More robust handling of metastore connection failures
> -
>
> Key: HIVE-1219
> URL: https://issues.apache.org/jira/browse/HIVE-1219
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Metastore
>Reporter: Paul Yang
>Assignee: Paul Yang
> Fix For: 0.6.0
>
> Attachments: HIVE-1219.1.patch, HIVE-1219.2.patch, HIVE-1219.3.patch, 
> HIVE-1219.4.patch
>
>
> Currently, if metastore's connection to the datastore is broken, the query 
> fails and the exception such as the following is thrown
> {code}
> 2010-01-28 11:50:20,885 ERROR exec.MoveTask 
> (SessionState.java:printError(248)) - Failed with exception Unable to fetch 
> table tmp_table
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table 
> tmp_table
> at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:362)
> at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:333)
> at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:112)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:99)
> at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:64)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:582)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:462)
> at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:324)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
> at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:200)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:256)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Caused by: javax.jdo.JDODataStoreException: Communications link failure
> Last packet sent to the server was 1 ms ago.
> NestedThrowables:
> com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link 
> failure
> Last packet sent to the server was 1 ms ago.
> at 
> org.datanucleus.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:289)
> {code}
> In order to reduce the impact of transient network issues and momentarily 
> unavailable datastores, two possible improvements are:
> 1. Retrying the metastore command in case of connection failure before 
> propagating up the exception.
> 2. Retrieving the datastore hostname / connection URL through the use of an 
> extension. This extension would be useful in the case where a remote service 
> maintained the location of the currently available datastore. In case of 
> hostname changes or failovers to a backup datastore, the extension would 
> allow hive clients to run without manual intervention.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-705) Let Hive can analyse hbase's tables

2010-03-03 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840832#action_12840832
 ] 

Prasad Chakka commented on HIVE-705:


John, Why are pre, commit, rollback functions needed in MetaHook? Isn't it 
enough just to drop table as a rollback for create, and do the drop table after 
hive drop table? With the current definition the MetaHook implementation needs 
to keep state around which Hive itself doesn't do.

Also alter table on external tables should be allowed since underlying storage 
format for external tables is not managed by Hive itself. In such cases alter 
table is just changing metadata in side Hive.

> Let Hive can analyse hbase's tables
> ---
>
> Key: HIVE-705
> URL: https://issues.apache.org/jira/browse/HIVE-705
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Samuel Guo
>Assignee: John Sichi
> Fix For: 0.6.0
>
> Attachments: hbase-0.19.3-test.jar, hbase-0.19.3.jar, 
> hbase-0.20.3-test.jar, hbase-0.20.3.jar, HIVE-705.1.patch, HIVE-705.2.patch, 
> HIVE-705_draft.patch, HIVE-705_revision806905.patch, 
> HIVE-705_revision883033.patch, zookeeper-3.2.2.jar
>
>
> Add a serde over the hbase's tables, so that hive can analyse the data stored 
> in hbase easily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-1176) 'create if not exists' fails for a table name with 'select' in it

2010-02-17 Thread Prasad Chakka (JIRA)

'create if not exists' fails for a table name with 'select' in it
-

 Key: HIVE-1176
 URL: https://issues.apache.org/jira/browse/HIVE-1176
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Metastore, Query Processor
Reporter: Prasad Chakka



hive> create table if not exists tmp_select(s string, c string, n int);
org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got 
exception: javax.jdo.JDOUserException JDOQL Single-String query should always 
start with SELECT)
at org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:441)
at 
org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:423)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:5538)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5192)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275)
at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: MetaException(message:Got exception: javax.jdo.JDOUserException 
JDOQL Single-String query should always start with SELECT)
at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:612)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:450)
at org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:439)
... 15 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1096) Hive Variables

2010-02-01 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12828201#action_12828201
 ] 

Prasad Chakka commented on HIVE-1096:
-

@edward, I usually use Eclipse to debug these kind of things. Much faster IMO.

> Hive Variables
> --
>
> Key: HIVE-1096
> URL: https://issues.apache.org/jira/browse/HIVE-1096
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Attachments: hive-1096-2.diff, hive-1096.diff
>
>
> From mailing list:
> --Amazon Elastic MapReduce version of Hive seems to have a nice feature 
> called "Variables." Basically you can define a variable via command-line 
> while invoking hive with -d DT=2009-12-09 and then refer to the variable via 
> ${DT} within the hive queries. This could be extremely useful. I can't seem 
> to find this feature even on trunk. Is this feature currently anywhere in the 
> roadmap?--
> This could be implemented in many places.
> A simple place to put this is 
> in Driver.compile or Driver.run we can do string substitutions at that level, 
> and further downstream need not be effected. 
> There could be some benefits to doing this further downstream, parser,plan. 
> but based on the simple needs we may not need to overthink this.
> I will get started on implementing in compile unless someone wants to discuss 
> this more.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1096) Hive Variables

2010-01-29 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12806486#action_12806486
 ] 

Prasad Chakka commented on HIVE-1096:
-

@edward, your patch will have problems if a comment has $ and it is not a real 
variable. hive will be throwing necessary errors.

regarding the parsing code, afaik there is no single place where all string 
literals are processed. either ashish and zheng would be able to answer this 
better.

> Hive Variables
> --
>
> Key: HIVE-1096
> URL: https://issues.apache.org/jira/browse/HIVE-1096
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Attachments: hive-1096.diff
>
>
> From mailing list:
> --Amazon Elastic MapReduce version of Hive seems to have a nice feature 
> called "Variables." Basically you can define a variable via command-line 
> while invoking hive with -d DT=2009-12-09 and then refer to the variable via 
> ${DT} within the hive queries. This could be extremely useful. I can't seem 
> to find this feature even on trunk. Is this feature currently anywhere in the 
> roadmap?--
> This could be implemented in many places.
> A simple place to put this is 
> in Driver.compile or Driver.run we can do string substitutions at that level, 
> and further downstream need not be effected. 
> There could be some benefits to doing this further downstream, parser,plan. 
> but based on the simple needs we may not need to overthink this.
> I will get started on implementing in compile unless someone wants to discuss 
> this more.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1116) alter table rename should rename hdfs location of table as well

2010-01-28 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805985#action_12805985
 ] 

Prasad Chakka commented on HIVE-1116:
-

I remember doing this quite sometime ago(only for non-external tables). Are you 
sure it doesn't work in any scenario?

> alter table rename should rename hdfs location of table as well
> ---
>
> Key: HIVE-1116
> URL: https://issues.apache.org/jira/browse/HIVE-1116
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Joydeep Sen Sarma
>
> if the location is not an external location - this would be safer.
> the problem right now is that it's tricky to use the drop and rename way of 
> writing new data into a table. consider:
> Initialization block:
> drop table a_tmp
> create table a_tmp like a;
> Loading block:
> load data  into a_tmp;
> drop table a;
> alter table a_tmp rename to a;
> this looks safe. but it's not. if one runs this multiple times - then data is 
> lost (since 'a' is pointing to 'a_tmp''s location after any iteration. and 
> dropping table 'a' blows away loaded data in the next iteration). 
> if the location is being managed by Hive - then 'rename' should switch 
> location as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1105) Add service script for starting metastore server

2010-01-26 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805286#action_12805286
 ] 

Prasad Chakka commented on HIVE-1105:
-

nice. it will be great if can you add this info to that wiki.

> Add service script for starting metastore server
> 
>
> Key: HIVE-1105
> URL: https://issues.apache.org/jira/browse/HIVE-1105
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Server Infrastructure
>Affects Versions: 0.4.1
>Reporter: John Sichi
>Priority: Minor
> Fix For: 0.6.0
>
> Attachments: HIVE-1105.1.patch
>
>
> The instructions on this page recommend running Java directly in order to 
> start the metastore:
> http://wiki.apache.org/hadoop/Hive/AdminManual/MetastoreAdmin
> Since we already have a generic service-starter script, it would be nice to 
> be able to do this instead:
> hive --service metastore
> I've written a metastore.sh for this purpose.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1098) Fix Eclipse launch configurations

2010-01-25 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804840#action_12804840
 ] 

Prasad Chakka commented on HIVE-1098:
-

hive-model.jar is needed only for JUnit launch configurations. this jar can be 
created by running 'ant model-jar' in hive/metastore. this is kind of hack that 
i put in to enable running metastore related unit tests in Eclipse. 

> Fix Eclipse launch configurations
> -
>
> Key: HIVE-1098
> URL: https://issues.apache.org/jira/browse/HIVE-1098
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Build Infrastructure
>Affects Versions: 0.5.0, 0.6.0
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
>
> All of the Eclipse launch configurations in eclipse-templates are currently 
> broken.
> The configurations reference hive_model.jar, which no longer exists, but there
> appear to be other problems as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1059) Date/DateTime/TimeStamp types should throw an error

2010-01-25 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804837#action_12804837
 ] 

Prasad Chakka commented on HIVE-1059:
-

sorry for late comment, but i don't think this works if some one created a 
table with timestamp datatype using metastore thrift api. or does it?

> Date/DateTime/TimeStamp types should throw an error
> ---
>
> Key: HIVE-1059
> URL: https://issues.apache.org/jira/browse/HIVE-1059
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Paul Yang
>Assignee: Paul Yang
> Fix For: 0.5.0, 0.6.0
>
> Attachments: HIVE-1059.1.patch, HIVE-1059.branch-0.5.1.patch
>
>
> Currently don't support date, datetime, or timestamp types. Using these in a 
> create table / alter table should throw an error.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1096) Hive Variables

2010-01-25 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804716#action_12804716
 ] 

Prasad Chakka commented on HIVE-1096:
-

Hive will be doing string replace on all string literals returned from parser.

> Hive Variables
> --
>
> Key: HIVE-1096
> URL: https://issues.apache.org/jira/browse/HIVE-1096
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> From mailing list:
> --Amazon Elastic MapReduce version of Hive seems to have a nice feature 
> called "Variables." Basically you can define a variable via command-line 
> while invoking hive with -d DT=2009-12-09 and then refer to the variable via 
> ${DT} within the hive queries. This could be extremely useful. I can't seem 
> to find this feature even on trunk. Is this feature currently anywhere in the 
> roadmap?--
> This could be implemented in many places.
> A simple place to put this is 
> in Driver.compile or Driver.run we can do string substitutions at that level, 
> and further downstream need not be effected. 
> There could be some benefits to doing this further downstream, parser,plan. 
> but based on the simple needs we may not need to overthink this.
> I will get started on implementing in compile unless someone wants to discuss 
> this more.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1096) Hive Variables

2010-01-25 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804688#action_12804688
 ] 

Prasad Chakka commented on HIVE-1096:
-

I think most useful global variables are in-built variables that do not really 
need metastore (eg. date based stuff, TODAY, THIS_WEEK, THIS_MONTH or user name 
or host name etc). Also session variables and user specific variables that can 
be loaded with a .hiverc. So it may be possible to implement most of the 
requirements without true global variables (and metastore).

String substitution will create problems with comments and also with error 
reporting. 
 

> Hive Variables
> --
>
> Key: HIVE-1096
> URL: https://issues.apache.org/jira/browse/HIVE-1096
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> From mailing list:
> --Amazon Elastic MapReduce version of Hive seems to have a nice feature 
> called "Variables." Basically you can define a variable via command-line 
> while invoking hive with -d DT=2009-12-09 and then refer to the variable via 
> ${DT} within the hive queries. This could be extremely useful. I can't seem 
> to find this feature even on trunk. Is this feature currently anywhere in the 
> roadmap?--
> This could be implemented in many places.
> A simple place to put this is 
> in Driver.compile or Driver.run we can do string substitutions at that level, 
> and further downstream need not be effected. 
> There could be some benefits to doing this further downstream, parser,plan. 
> but based on the simple needs we may not need to overthink this.
> I will get started on implementing in compile unless someone wants to discuss 
> this more.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1084) Cleanup Class names

2010-01-24 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804356#action_12804356
 ] 

Prasad Chakka commented on HIVE-1084:
-

please use 'svn rename' to change file names instead of doing 'svn delete' and 
'svn add'. the former will preserve the history where as later do not preserver 
the history of the file. 

> Cleanup Class names
> ---
>
> Key: HIVE-1084
> URL: https://issues.apache.org/jira/browse/HIVE-1084
> Project: Hadoop Hive
>  Issue Type: Task
>Affects Versions: 0.6.0
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
> Attachments: cleanup-class-names.2.patch, cleanup-class-names.patch
>
>
> [Sun's Code Conventions for the Java Programming 
> Language|http://java.sun.com/docs/codeconv/] document stipulates that Java 
> class names must begin with a capital letter.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-990) Incorporate CheckStyle into Hive's build.xml

2010-01-19 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802601#action_12802601
 ] 

Prasad Chakka commented on HIVE-990:


zheng, with java i don't think 80 is a realistic limit. imagine trying to fit 
initializations of generics into 80chars. :) i remember setting 100char as 
limit in eclipse as well.

> Incorporate CheckStyle into Hive's build.xml
> 
>
> Key: HIVE-990
> URL: https://issues.apache.org/jira/browse/HIVE-990
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
> Attachments: checkstyle-errors.html, HIVE-990.patch
>
>
> Hadoop and Pig both have CheckStyle integrated into their build. This is 
> useful for catching
> a variety of errors as well as for enforcing a specific coding style and 
> maintaining good code hygiene.
> We just need to snatch Hadoop's checkstyle.xml and integrate it into Hive's 
> build.xml file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-972) support views

2010-01-16 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801267#action_12801267
 ] 

Prasad Chakka commented on HIVE-972:


12. Block alter_table, alter_partition queries in metastore server side code 
rather than the ql or client side code.

Zheng, AFAIK backward compatibility (i.e. older code acting against latest 
schema) should not be a problem. schema upgrade should happen automatically if 
users haven't changed or modified datanuclues/jpox parameters in 
hive-default.xml.

> support views
> -
>
> Key: HIVE-972
> URL: https://issues.apache.org/jira/browse/HIVE-972
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Metastore, Query Processor
>Reporter: Namit Jain
>Assignee: John Sichi
> Attachments: HIVE-972.1.patch, HIVE-972.2.patch
>
>
> Hive currently does not support views. 
> It would be a very nice feature to have.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-972) support views

2010-01-14 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800382#action_12800382
 ] 

Prasad Chakka commented on HIVE-972:



I thought view definitions are supposed to be frozen when a view is created. 
Isn't that true of partitions as well? so why would an older partition need to 
be regenerated automatically when a view definition changes? or is the 'frozen 
definition' part doesn't apply to materialized views?

Also the "dead-weight" doesn't matter since null valued columns add very little 
overhead in mysql.

But I agree with 1 and 2 so let's keep the way you have them in the patch.


> support views
> -
>
> Key: HIVE-972
> URL: https://issues.apache.org/jira/browse/HIVE-972
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Metastore, Query Processor
>Reporter: Namit Jain
>Assignee: John Sichi
> Attachments: HIVE-972.1.patch
>
>
> Hive currently does not support views. 
> It would be a very nice feature to have.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-972) support views

2010-01-14 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800369#action_12800369
 ] 

Prasad Chakka commented on HIVE-972:


Then looks like we do have to keep SD so what is the advantage of putting 
additional columns in Table instead of SD? I just feel that code and schema 
needs to be duplicated down the road. It is not a big deal but still why?

I suppose the unnecessary attributes of SD can just be default values for 
views. Not sure much can be done there.


> support views
> -
>
> Key: HIVE-972
> URL: https://issues.apache.org/jira/browse/HIVE-972
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Metastore, Query Processor
>Reporter: Namit Jain
>Assignee: John Sichi
> Attachments: HIVE-972.1.patch
>
>
> Hive currently does not support views. 
> It would be a very nice feature to have.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-972) support views

2010-01-14 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800356#action_12800356
 ] 

Prasad Chakka commented on HIVE-972:


I think I set a property in JDO ORM file to make it check for non-null SD. It 
shouldn't be that difficult to change that. How does a 'describe' command get 
the columns of a view?

> support views
> -
>
> Key: HIVE-972
> URL: https://issues.apache.org/jira/browse/HIVE-972
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Metastore, Query Processor
>Reporter: Namit Jain
>Assignee: John Sichi
> Attachments: HIVE-972.1.patch
>
>
> Hive currently does not support views. 
> It would be a very nice feature to have.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-972) support views

2010-01-14 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800346#action_12800346
 ] 

Prasad Chakka commented on HIVE-972:


regd (3)
is storagedescritor needed at all for a view?

> support views
> -
>
> Key: HIVE-972
> URL: https://issues.apache.org/jira/browse/HIVE-972
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Metastore, Query Processor
>Reporter: Namit Jain
>Assignee: John Sichi
> Attachments: HIVE-972.1.patch
>
>
> Hive currently does not support views. 
> It would be a very nice feature to have.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1012) Ability to run testcases via regular expression

2009-12-28 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794903#action_12794903
 ] 

Prasad Chakka commented on HIVE-1012:
-

Carl, this is good to have but why not use the same key 'qfile' for both? You 
can always do a glob after splitting the value string at commas.

> Ability to run testcases via regular expression
> ---
>
> Key: HIVE-1012
> URL: https://issues.apache.org/jira/browse/HIVE-1012
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Testing Infrastructure
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
> Attachments: HIVE-1012.patch
>
>
> Currently the testing infrastructure makes it possible to specify individual 
> testcases using the "qfile" switch, e.g.:
> ant test -Dtestcase=TestCliDriver -Dqfile=udf_sin.q,udf_cos.q,udf_acos.q
> I would also like to be able to specify testcases using a regular expression, 
> e.g.:
> ant test -Dtestcase=TestCliDriver -Dqfile_regex="udf.*"
> The previous command should trigger the execution of all testcases starting 
> with "udf".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-972) support views

2009-12-22 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793882#action_12793882
 ] 

Prasad Chakka commented on HIVE-972:


mainly i don't understand how 'show partitions' will work on views, or whether 
it should work at all. if it is not supported then following issues will make 
it difficult to use views

* 'show partitions' is a very useful function for downstream processes that are 
waiting on availability on a particular partition. 
* Partition keys are also useful metadata information for users browsing views
* In a 'strict' mode where a predicate on a partition key is required for any 
partitioned table, a user has to know what column the predicate should be 
applied which then will be translated to base tables' partition columns.



> support views
> -
>
> Key: HIVE-972
> URL: https://issues.apache.org/jira/browse/HIVE-972
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Metastore, Query Processor
>Reporter: Namit Jain
>Assignee: John Sichi
>
> Hive currently does not support views. 
> It would be a very nice feature to have.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-571) add ability to rename a column.

2009-12-22 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793865#action_12793865
 ] 

Prasad Chakka commented on HIVE-571:


Yongqing, can you add the change to wiki after this is committed? when you do 
that please mention that the underlying data is untouched and it is the 
responsibility of the user to change the data to fit the new schema.

> add ability to rename a column.
> ---
>
> Key: HIVE-571
> URL: https://issues.apache.org/jira/browse/HIVE-571
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 0.4.0
>Reporter: Prasad Chakka
>Assignee: He Yongqiang
> Fix For: 0.5.0
>
> Attachments: hive-571-2009-12-22-2.patch, hive-571-2009-12-22.patch
>
>
> currently only way to rename a column is to use 'REPLACE COLUMNS' which can 
> be cumbersome if the table has lots of columns.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-972) support views

2009-12-22 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793811#action_12793811
 ] 

Prasad Chakka commented on HIVE-972:


@john, i will be in office on all work days next week and tomorrow as well.


> support views
> -
>
> Key: HIVE-972
> URL: https://issues.apache.org/jira/browse/HIVE-972
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Metastore, Query Processor
>Reporter: Namit Jain
>Assignee: John Sichi
>
> Hive currently does not support views. 
> It would be a very nice feature to have.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-571) add ability to rename a column.

2009-12-22 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793790#action_12793790
 ] 

Prasad Chakka commented on HIVE-571:


if we can support changing of the column type in other alter table statements 
(REPLACE COLUMNS), why not support it here? also why not exactly support the 
mysql syntax

{code}
CHANGE [COLUMN] old_col_name new_col_name column_definition
[FIRST|AFTER col_name]
{code}

atleast we should support changing the comment.


> add ability to rename a column.
> ---
>
> Key: HIVE-571
> URL: https://issues.apache.org/jira/browse/HIVE-571
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 0.4.0
>Reporter: Prasad Chakka
>Assignee: He Yongqiang
> Fix For: 0.5.0
>
> Attachments: hive-571-2009-12-22.patch
>
>
> currently only way to rename a column is to use 'REPLACE COLUMNS' which can 
> be cumbersome if the table has lots of columns.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-972) support views

2009-12-22 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793750#action_12793750
 ] 

Prasad Chakka commented on HIVE-972:


Sorry, the bold formatting was unintentional and a side effect of saying '\*'. 
It should have read this way


IMO, '\*' should represent all columns and if a view creator does not want to 
inherit changes to base table schema then she can specify the exact columns 
instead of '\*'.


> support views
> -
>
> Key: HIVE-972
> URL: https://issues.apache.org/jira/browse/HIVE-972
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Metastore, Query Processor
>Reporter: Namit Jain
>Assignee: John Sichi
>
> Hive currently does not support views. 
> It would be a very nice feature to have.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-972) support views

2009-12-22 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793746#action_12793746
 ] 

Prasad Chakka commented on HIVE-972:


.. continuing discussion from hive-users@

1.
regarding metadata:
Are you saying that If a view is created on a partitioned table, it would 
inherit the base table's partition columns as regular columns? (this will be a 
common usecase when inserts into multiple partitions is supported) Even if we 
are not supporting partitions right now, we need to think of the way we would 
model metadata for partitioned views (not necessarily materialized views) so 
that the changes made right now will not conflict too much with the future 
changes.

I like the flat model as the inheritance model is too much of a overkill and 
not really suited in this case. But I would rather put the view-def into 
'StorageDescriptor' class. This would work well when partitioned views are 
supported since 'Partition' class will inherit the view definition and the 
future view schema can evolve while freezing the schema of partitions created 
earlier.

The type of the 'view-def' column has to be a CLOB but I am not sure of 
performance implications of having a CLOB column in a very large table. If it 
is not advisable then it may be useful to put this CLOB (or view metadata in a 
separate class and link to StorageDescriptor)

2.
> In SQL:200n, a view definition is supposed to be frozen at the time it is 
> created, so that if the view is defined as select * from t, where t is a 
> table with two columns a and b, then later requests to select * from the view 
> should return just columns a and b, even if a new column c is later added to 
> the table. This is implemented correctly by most DBMS products.

Do you know the reasoning behind this? This would make changing the base table 
schema very hard and if I am not mistaken Facebook's base table schemas change 
(mostly addition of new columns) more than ocasionally and it will be an 
administrative nightmare to change all the dependent views. IMO, '*' should 
represent all columns and if a view creator does not want to inherit changes to 
base table schema then she can specify the exact columns instead of '*'.

3.
I like DependencyParticipant idea. We could use simple inheritance strategy 
where the dependent table contains (table/view id, used_name_of_dependent_obj, 
obj_id, obj_type)

4. +1 to Leniant Dependency Invalidation but there should be a command to 
preview the invalid views.

> support views
> -
>
> Key: HIVE-972
> URL: https://issues.apache.org/jira/browse/HIVE-972
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Metastore, Query Processor
>Reporter: Namit Jain
>Assignee: John Sichi
>
> Hive currently does not support views. 
> It would be a very nice feature to have.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-957) Partiition Metadata and Table Metadata

2009-11-30 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783950#action_12783950
 ] 

Prasad Chakka commented on HIVE-957:


it is not that much code to write a new command to alter all the partitions. 
the metadata calls already exist, only grammar needs to be enhanced.

> Partiition Metadata and Table Metadata
> --
>
> Key: HIVE-957
> URL: https://issues.apache.org/jira/browse/HIVE-957
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>
> Right now, we choose to use partition lever metadata. All metadata (column 
> names, column types, fileformat, serde class, serde properties) right now are 
> from partition level metadata. But hive does not support a method now to 
> alter all existing partitions' metadata, so users mostly choose to alter 
> table metadata, and think hive will use the new  table level metadata. 
> One approach is that we may need to provide a way to let user alter all 
> partitions' metadata with one simple command. Right now a short term solution 
> is to only get fileformat, serde class metadata from paritition level 
> metadata, and use all other metadata from table.
> any comments?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-951) Selectively include EXTERNAL TABLE source files via REGEX

2009-11-25 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12782548#action_12782548
 ] 

Prasad Chakka commented on HIVE-951:


@namit,  I agree with others that this would be very convenient and useful 
functionality for external tables. external table data and location consistency 
is left to be managed by users. so changing the fileset in that location is not 
a Hive problem. Hive will query the location and get the list of paths at the 
time the query is being executed. if a file gets removed or added and query 
fails because of a  deleted file, it is user's responsibility.


> Selectively include EXTERNAL TABLE source files via REGEX
> -
>
> Key: HIVE-951
> URL: https://issues.apache.org/jira/browse/HIVE-951
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Carl Steinbach
>
> CREATE EXTERNAL TABLE should allow users to cherry-pick files via regular 
> expression. 
> CREATE EXTERNAL TABLE was designed to allow users to access data that exists 
> outside of Hive, and
> currently makes the assumption that all of the files located under the 
> supplied path should be included
> in the new table. Users frequently encounter directories containing multiple
> datasets, or directories that contain data in heterogeneous schemas, and it's 
> often
> impractical or impossible to adjust the layout of the directory to meet the 
> requirements of 
> CREATE EXTERNAL TABLE. A good example of this problem is creating an external 
> table based
> on the contents of an S3 bucket. 
> One way to solve this problem is to extend the syntax of CREATE EXTERNAL TABLE
> as follows:
> CREATE EXTERNAL TABLE
> ...
> LOCATION path [file_regex]
> ...
> For example:
> {code:sql}
> CREATE EXTERNAL TABLE mytable1 ( a string, b string, c string )
> STORED AS TEXTFILE
> LOCATION 's3://my.bucket/' 'folder/2009.*\.bz2$';
> {code}
> Creates mytable1 which includes all files in s3:/my.bucket with a filename 
> matching 'folder/2009*.bz2'
> {code:sql}
> CREATE EXTERNAL TABLE mytable2 ( d string, e int, f int, g int )
> STORED AS TEXTFILE 
> LOCATION 'hdfs://data/' 'xyz.*2009.bz2$';
> {code}
> Creates mytable2 including all files matching 'xyz*2009.bz2' located 
> under hdfs://data/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-940) restrict creation of partitions with empty partition keys

2009-11-18 Thread Prasad Chakka (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Chakka updated HIVE-940:
---

Attachment: hive-940.2.patch

fixed an incorrect error message in addpart1.q.out file.

> restrict creation of partitions with empty partition keys
> -
>
> Key: HIVE-940
> URL: https://issues.apache.org/jira/browse/HIVE-940
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.2.0, 0.3.0, 0.3.1, 0.3.2, 0.4.0, 0.4.1, 0.5.0
>Reporter: Prasad Chakka
>Assignee: Prasad Chakka
> Fix For: 0.2.0
>
> Attachments: hive-940.2.patch, hive-940.patch
>
>
> create table pc (a int) partitioned by (b string, c string);
> alter table pc add partition (b="f", c='');
> above alter cmd fails but actually creates a partition with name 'b=f/c=' but 
> describe partition on the same name fails. creation of such partitions should 
> not be allowed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-940) restrict creation of partitions with empty partition keys

2009-11-18 Thread Prasad Chakka (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Chakka updated HIVE-940:
---

Attachment: hive-940.patch

disallows partitions with empty keys.


> restrict creation of partitions with empty partition keys
> -
>
> Key: HIVE-940
> URL: https://issues.apache.org/jira/browse/HIVE-940
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.2.0, 0.3.0, 0.3.1, 0.3.2, 0.4.0, 0.4.1, 0.5.0
>Reporter: Prasad Chakka
>Assignee: Prasad Chakka
> Fix For: 0.2.0
>
> Attachments: hive-940.patch
>
>
> create table pc (a int) partitioned by (b string, c string);
> alter table pc add partition (b="f", c='');
> above alter cmd fails but actually creates a partition with name 'b=f/c=' but 
> describe partition on the same name fails. creation of such partitions should 
> not be allowed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-940) restrict creation of partitions with empty partition keys

2009-11-18 Thread Prasad Chakka (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Chakka updated HIVE-940:
---

Status: Patch Available  (was: Open)

> restrict creation of partitions with empty partition keys
> -
>
> Key: HIVE-940
> URL: https://issues.apache.org/jira/browse/HIVE-940
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.3.0, 0.2.0, 0.3.1, 0.3.2, 0.4.0, 0.4.1, 0.5.0
>Reporter: Prasad Chakka
>Assignee: Prasad Chakka
> Fix For: 0.2.0
>
> Attachments: hive-940.patch
>
>
> create table pc (a int) partitioned by (b string, c string);
> alter table pc add partition (b="f", c='');
> above alter cmd fails but actually creates a partition with name 'b=f/c=' but 
> describe partition on the same name fails. creation of such partitions should 
> not be allowed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-940) restrict creation of partitions with empty partition keys

2009-11-18 Thread Prasad Chakka (JIRA)

restrict creation of partitions with empty partition keys
-

 Key: HIVE-940
 URL: https://issues.apache.org/jira/browse/HIVE-940
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.3.0, 0.2.0, 0.3.1, 0.3.2, 0.4.0, 0.4.1, 0.5.0
Reporter: Prasad Chakka
Assignee: Prasad Chakka
 Fix For: 0.2.0


create table pc (a int) partitioned by (b string, c string);

alter table pc add partition (b="f", c='');

above alter cmd fails but actually creates a partition with name 'b=f/c=' but 
describe partition on the same name fails. creation of such partitions should 
not be allowed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-936) dynamic partitions creation based on values

2009-11-17 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778935#action_12778935
 ] 

Prasad Chakka commented on HIVE-936:


@Namit: you can't change the partition keys of a table.

> dynamic partitions creation based on values
> ---
>
> Key: HIVE-936
> URL: https://issues.apache.org/jira/browse/HIVE-936
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Ning Zhang
>Assignee: Ning Zhang
>
> If a Hive table is created as partitioned, DML could only inserted into one 
> partitioin per query. Ideally partitions should be created on the fly based 
> on the value of the partition columns. As an example:
> {{{
>   create table T (a int, b string) partitioned by (ds string);
>   insert overwrite table T select a, b, ds from S where ds >= '2009-11-01' 
> and ds <= '2009-11-16';
> }}}
> should be able to execute in one DML rather than possibley 16 DML for each 
> distinct ds values. CTAS and alter table should be able to do the same thing:
> {{{
>   create table T partitioned by (ds string) as select * from S where ds >= 
> '2009-11-01' and ds <= '2009-11-16';
> }}}
>  and
> {{{
>   create table T(a int, b string, ds string);
>   insert overwrite table T select * from S where ds >= '2009-11-1' and ds <= 
> '2009-11-16';
>   alter table T partitioned by (ds);
> }}}
> should all return the same results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-918) Allow partition-wise file format

2009-11-09 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12775233#action_12775233
 ] 

Prasad Chakka commented on HIVE-918:


Regarding metadata,
1) Every partition has its own Storage Descriptor object which contains 
information about file descriptors, serdes and other storage related 
information that a table can have. There is no need to add more metadata to 
partition.

2) If we have 1) what is the need for this? Can we not get all the different 
file formats that a table has by querying the storage descriptors of all 
partitions of a table?

The plan for allowing metadata evolution is that the 'alter table ...' cmd 
changes metadata for a table but not for existing partitions. All new 
partitions inherit the new metadata of the table. Hive QL should figure out the 
list of partitions that a query touches on using just the latest table metadata 
and then fetch the partition metadata for any further stuff. Are we changing 
this to something different, if so what?



> Allow partition-wise  file format
> -
>
> Key: HIVE-918
> URL: https://issues.apache.org/jira/browse/HIVE-918
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: He Yongqiang
>Assignee: He Yongqiang
>
> Right now all partitions in a hive table share the same file format. We 
> should allow partition wise file format. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-912) table name in DDL should be case-insensitive

2009-11-04 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773588#action_12773588
 ] 

Prasad Chakka commented on HIVE-912:


are you about the regular create table stmt is case sensitive? I remember 
putting some unit-tests for that.

> table name in DDL should be case-insensitive
> 
>
> Key: HIVE-912
> URL: https://issues.apache.org/jira/browse/HIVE-912
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: Hive-912.patch
>
>
> Found the issue is not only limited to CTAS, it is true for all create-table 
> statement. Modify the title and will upload a new patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-900) Map-side join failed if there are large number of mappers

2009-10-29 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771551#action_12771551
 ] 

Prasad Chakka commented on HIVE-900:


just a of the wall idea, temporarily increase the replication factor for this 
block so that it is available in more racks thus reducing the network cost and 
also avoiding BlockMissingException. ofcourse, we need to find a way to 
reliably set the replication factor back to original setting.

> Map-side join failed if there are large number of mappers
> -
>
> Key: HIVE-900
> URL: https://issues.apache.org/jira/browse/HIVE-900
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Ning Zhang
>Assignee: Ning Zhang
>
> Map-side join is efficient when joining a huge table with a small table so 
> that the mapper can read the small table into main memory and do join on each 
> mapper. However, if there are too many mappers generated for the map join, a 
> large number of mappers will simultaneously send request to read the same 
> block of the small table. Currently Hadoop has a upper limit of the # of 
> request of a the same block (250?). If that is reached a 
> BlockMissingException will be thrown. That cause a lot of mappers been 
> killed. Retry won't solve but worsen the problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-900) Map-side join failed if there are large number of mappers

2009-10-29 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771553#action_12771553
 ] 

Prasad Chakka commented on HIVE-900:


@venky, may be you can unblock your work by manually increasing the replication 
factory to very high and then issuing the query?

> Map-side join failed if there are large number of mappers
> -
>
> Key: HIVE-900
> URL: https://issues.apache.org/jira/browse/HIVE-900
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Ning Zhang
>Assignee: Ning Zhang
>
> Map-side join is efficient when joining a huge table with a small table so 
> that the mapper can read the small table into main memory and do join on each 
> mapper. However, if there are too many mappers generated for the map join, a 
> large number of mappers will simultaneously send request to read the same 
> block of the small table. Currently Hadoop has a upper limit of the # of 
> request of a the same block (250?). If that is reached a 
> BlockMissingException will be thrown. That cause a lot of mappers been 
> killed. Retry won't solve but worsen the problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-904) unit test failure in repair.q

2009-10-27 Thread Prasad Chakka (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Chakka updated HIVE-904:
---

Resolution: Duplicate
Status: Resolved  (was: Patch Available)

Duplicate of HIVE-903.

> unit test failure in repair.q
> -
>
> Key: HIVE-904
> URL: https://issues.apache.org/jira/browse/HIVE-904
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Zheng Shao
>Assignee: Prasad Chakka
> Attachments: HIVE-904.patch
>
>
> It seems that the order of the output partitions are not deterministic.
> {code}
> [junit] Begin query: repair.q
> [junit] diff -a -I \(file:\)\|\(/tmp/.*\) -I lastUpdateTime -I 
> lastAccessTime -I owner -I transient_lastDdlTime 
> /data/users/zshao/tools/deploy-trunk-apache-hive/.ptest_0/build/ql/test/l\
> ogs/clientpositive/repair.q.out 
> /data/users/zshao/tools/deploy-trunk-apache-hive/.ptest_0/ql/src/test/results/clientpositive/repair.q.out
> [junit] 18c18
> [junit] < Partitions not in metastore:  repairtable:p1=b/p2=a   
> repairtable:p1=a/p2=a
> [junit] ---
> [junit] > Partitions not in metastore:  repairtable:p1=a/p2=a   
> repairtable:p1=b/p2=a
> [junit] 23,24c23
> [junit] < Partitions not in metastore:  repairtable:p1=b/p2=a   
> repairtable:p1=a/p2=a
> [junit] < Repair: Added partition to metastore repairtable:p1=b/p2=a
> [junit] ---
> [junit] > Partitions not in metastore:  repairtable:p1=a/p2=a   
> repairtable:p1=b/p2=a
> [junit] 25a25
> [junit] > Repair: Added partition to metastore repairtable:p1=b/p2=a
> [junit] Exception: Client execution results failed with error code = 1
> [junit] junit.framework.AssertionFailedError: Client execution results 
> failed with error code = 1
> [junit] at junit.framework.Assert.fail(Assert.java:47)
> [junit] at 
> org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_repair(TestCliDriver.java:3442)
> [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> [junit] at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> [junit] at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> [junit] at java.lang.reflect.Method.invoke(Method.java:597)
> [junit] at junit.framework.TestCase.runTest(TestCase.java:154)
> [junit] at junit.framework.TestCase.runBare(TestCase.java:127)
> [junit] at junit.framework.TestResult$1.protect(TestResult.java:106)
> [junit] at 
> junit.framework.TestResult.runProtected(TestResult.java:124)
> [junit] at junit.framework.TestResult.run(TestResult.java:109)
> [junit] at junit.framework.TestCase.run(TestCase.java:118)
> [junit] at junit.framework.TestSuite.runTest(TestSuite.java:208)
> [junit] at junit.framework.TestSuite.run(TestSuite.java:203)
> [junit] at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:420)
> [junit] at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:911)
> [junit] at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:768)
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-904) unit test failure in repair.q

2009-10-26 Thread Prasad Chakka (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Chakka updated HIVE-904:
---

Attachment: HIVE-904.patch

The problem was not in the repair command per se but the test that added has 
more than one non-existing partition and the msck output was not sorted.


> unit test failure in repair.q
> -
>
> Key: HIVE-904
> URL: https://issues.apache.org/jira/browse/HIVE-904
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Zheng Shao
> Attachments: HIVE-904.patch
>
>
> It seems that the order of the output partitions are not deterministic.
> {code}
> [junit] Begin query: repair.q
> [junit] diff -a -I \(file:\)\|\(/tmp/.*\) -I lastUpdateTime -I 
> lastAccessTime -I owner -I transient_lastDdlTime 
> /data/users/zshao/tools/deploy-trunk-apache-hive/.ptest_0/build/ql/test/l\
> ogs/clientpositive/repair.q.out 
> /data/users/zshao/tools/deploy-trunk-apache-hive/.ptest_0/ql/src/test/results/clientpositive/repair.q.out
> [junit] 18c18
> [junit] < Partitions not in metastore:  repairtable:p1=b/p2=a   
> repairtable:p1=a/p2=a
> [junit] ---
> [junit] > Partitions not in metastore:  repairtable:p1=a/p2=a   
> repairtable:p1=b/p2=a
> [junit] 23,24c23
> [junit] < Partitions not in metastore:  repairtable:p1=b/p2=a   
> repairtable:p1=a/p2=a
> [junit] < Repair: Added partition to metastore repairtable:p1=b/p2=a
> [junit] ---
> [junit] > Partitions not in metastore:  repairtable:p1=a/p2=a   
> repairtable:p1=b/p2=a
> [junit] 25a25
> [junit] > Repair: Added partition to metastore repairtable:p1=b/p2=a
> [junit] Exception: Client execution results failed with error code = 1
> [junit] junit.framework.AssertionFailedError: Client execution results 
> failed with error code = 1
> [junit] at junit.framework.Assert.fail(Assert.java:47)
> [junit] at 
> org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_repair(TestCliDriver.java:3442)
> [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> [junit] at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> [junit] at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> [junit] at java.lang.reflect.Method.invoke(Method.java:597)
> [junit] at junit.framework.TestCase.runTest(TestCase.java:154)
> [junit] at junit.framework.TestCase.runBare(TestCase.java:127)
> [junit] at junit.framework.TestResult$1.protect(TestResult.java:106)
> [junit] at 
> junit.framework.TestResult.runProtected(TestResult.java:124)
> [junit] at junit.framework.TestResult.run(TestResult.java:109)
> [junit] at junit.framework.TestCase.run(TestCase.java:118)
> [junit] at junit.framework.TestSuite.runTest(TestSuite.java:208)
> [junit] at junit.framework.TestSuite.run(TestSuite.java:203)
> [junit] at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:420)
> [junit] at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:911)
> [junit] at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:768)
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-904) unit test failure in repair.q

2009-10-26 Thread Prasad Chakka (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Chakka updated HIVE-904:
---

Assignee: Prasad Chakka
  Status: Patch Available  (was: Open)

Just ran the repair.q test and it seems to pass fine on my system but that 
doesn't tell anything since this test was passing for me even before. Can 
someone check whether this works on where it is currently failing?

> unit test failure in repair.q
> -
>
> Key: HIVE-904
> URL: https://issues.apache.org/jira/browse/HIVE-904
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Zheng Shao
>Assignee: Prasad Chakka
> Attachments: HIVE-904.patch
>
>
> It seems that the order of the output partitions are not deterministic.
> {code}
> [junit] Begin query: repair.q
> [junit] diff -a -I \(file:\)\|\(/tmp/.*\) -I lastUpdateTime -I 
> lastAccessTime -I owner -I transient_lastDdlTime 
> /data/users/zshao/tools/deploy-trunk-apache-hive/.ptest_0/build/ql/test/l\
> ogs/clientpositive/repair.q.out 
> /data/users/zshao/tools/deploy-trunk-apache-hive/.ptest_0/ql/src/test/results/clientpositive/repair.q.out
> [junit] 18c18
> [junit] < Partitions not in metastore:  repairtable:p1=b/p2=a   
> repairtable:p1=a/p2=a
> [junit] ---
> [junit] > Partitions not in metastore:  repairtable:p1=a/p2=a   
> repairtable:p1=b/p2=a
> [junit] 23,24c23
> [junit] < Partitions not in metastore:  repairtable:p1=b/p2=a   
> repairtable:p1=a/p2=a
> [junit] < Repair: Added partition to metastore repairtable:p1=b/p2=a
> [junit] ---
> [junit] > Partitions not in metastore:  repairtable:p1=a/p2=a   
> repairtable:p1=b/p2=a
> [junit] 25a25
> [junit] > Repair: Added partition to metastore repairtable:p1=b/p2=a
> [junit] Exception: Client execution results failed with error code = 1
> [junit] junit.framework.AssertionFailedError: Client execution results 
> failed with error code = 1
> [junit] at junit.framework.Assert.fail(Assert.java:47)
> [junit] at 
> org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_repair(TestCliDriver.java:3442)
> [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> [junit] at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> [junit] at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> [junit] at java.lang.reflect.Method.invoke(Method.java:597)
> [junit] at junit.framework.TestCase.runTest(TestCase.java:154)
> [junit] at junit.framework.TestCase.runBare(TestCase.java:127)
> [junit] at junit.framework.TestResult$1.protect(TestResult.java:106)
> [junit] at 
> junit.framework.TestResult.runProtected(TestResult.java:124)
> [junit] at junit.framework.TestResult.run(TestResult.java:109)
> [junit] at junit.framework.TestCase.run(TestCase.java:118)
> [junit] at junit.framework.TestSuite.runTest(TestSuite.java:208)
> [junit] at junit.framework.TestSuite.run(TestSuite.java:203)
> [junit] at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:420)
> [junit] at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:911)
> [junit] at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:768)
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HIVE-897) fix inconsistent expectations from table/partition location value

2009-10-22 Thread Prasad Chakka (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Chakka reassigned HIVE-897:
--

Assignee: Prasad Chakka

> fix inconsistent expectations from table/partition location value
> -
>
> Key: HIVE-897
> URL: https://issues.apache.org/jira/browse/HIVE-897
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Prasad Chakka
>Assignee: Prasad Chakka
>
> currently code expects this to be full URI in some locations 
> (LoadSemanticAnalyzer). Also HiveAlterHandle should work in either case. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-897) fix inconsistent expectations from table/partition location value

2009-10-22 Thread Prasad Chakka (JIRA)

fix inconsistent expectations from table/partition location value
-

 Key: HIVE-897
 URL: https://issues.apache.org/jira/browse/HIVE-897
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Prasad Chakka


currently code expects this to be full URI in some locations 
(LoadSemanticAnalyzer). Also HiveAlterHandle should work in either case. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-874) add partitions found during metastore check

2009-10-19 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12767690#action_12767690
 ] 

Prasad Chakka commented on HIVE-874:


committed to trunk. Tahnks Cyrus.

> add partitions found during metastore check
> ---
>
> Key: HIVE-874
> URL: https://issues.apache.org/jira/browse/HIVE-874
> Project: Hadoop Hive
>  Issue Type: Improvement
>Affects Versions: 0.2.0, 0.3.0, 0.3.1, 0.3.2, 0.4.0, 0.4.1, 0.5.0
>Reporter: Prasad Chakka
>Assignee: Cyrus Katrak
> Attachments: HIVE-874.patch
>
>
> 'msck' just reports the list of partition directories that exist but do not 
> have corresponding metadata. This can happen if a process outside of hive is 
> populating the directories. Hive should support an option to 'msck' that 
> would also add default metadata for these partitions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-874) add partitions found during metastore check

2009-10-17 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12767012#action_12767012
 ] 

Prasad Chakka commented on HIVE-874:


looks good, will run tests and commit to trunk.

> add partitions found during metastore check
> ---
>
> Key: HIVE-874
> URL: https://issues.apache.org/jira/browse/HIVE-874
> Project: Hadoop Hive
>  Issue Type: Improvement
>Affects Versions: 0.2.0, 0.3.0, 0.3.1, 0.3.2, 0.4.0, 0.4.1, 0.5.0
>Reporter: Prasad Chakka
>Assignee: Cyrus Katrak
> Attachments: HIVE-874.patch
>
>
> 'msck' just reports the list of partition directories that exist but do not 
> have corresponding metadata. This can happen if a process outside of hive is 
> populating the directories. Hive should support an option to 'msck' that 
> would also add default metadata for these partitions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-884) Metastore Server should exit if error happens

2009-10-17 Thread Prasad Chakka (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Chakka updated HIVE-884:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

committed to 0.4 and trunk. Thanks Zheng!

> Metastore Server should exit if error happens
> -
>
> Key: HIVE-884
> URL: https://issues.apache.org/jira/browse/HIVE-884
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.4.1, 0.5.0
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Attachments: HIVE-884.1.patch, HIVE-884.2.patch
>
>
> Currently, HiveMetaStore (the thrift server) is not exiting when the main 
> thread saw an Exception.
> The process should exit when that happens.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-884) Metastore Server should exit if error happens

2009-10-17 Thread Prasad Chakka (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Chakka updated HIVE-884:
---

Attachment: HIVE-884.2.patch

modified zheng's patch by adding an log4j error message in addition to stdout.

> Metastore Server should exit if error happens
> -
>
> Key: HIVE-884
> URL: https://issues.apache.org/jira/browse/HIVE-884
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.4.1, 0.5.0
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Attachments: HIVE-884.1.patch, HIVE-884.2.patch
>
>
> Currently, HiveMetaStore (the thrift server) is not exiting when the main 
> thread saw an Exception.
> The process should exit when that happens.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-884) Metastore Server should exit if error happens

2009-10-17 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766993#action_12766993
 ] 

Prasad Chakka commented on HIVE-884:


the main() function exits after printing the exception. when main() returns the 
whole process should exit. no? The problem may be something else.

> Metastore Server should exit if error happens
> -
>
> Key: HIVE-884
> URL: https://issues.apache.org/jira/browse/HIVE-884
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.4.1, 0.5.0
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Attachments: HIVE-884.1.patch
>
>
> Currently, HiveMetaStore (the thrift server) is not exiting when the main 
> thread saw an Exception.
> The process should exit when that happens.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-884) Metastore Server should exit if error happens

2009-10-17 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766992#action_12766992
 ] 

Prasad Chakka commented on HIVE-884:


i am not sure there are any testcases for stand alone metastore server. we have 
to start one.

i am testing this patch (along with log message)

> Metastore Server should exit if error happens
> -
>
> Key: HIVE-884
> URL: https://issues.apache.org/jira/browse/HIVE-884
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.4.1, 0.5.0
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Attachments: HIVE-884.1.patch
>
>
> Currently, HiveMetaStore (the thrift server) is not exiting when the main 
> thread saw an Exception.
> The process should exit when that happens.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-883) URISyntaxException when partition value contains special chars

2009-10-17 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766989#action_12766989
 ] 

Prasad Chakka commented on HIVE-883:


also, can you add describe partition in the test?

> URISyntaxException when partition value contains special chars
> --
>
> Key: HIVE-883
> URL: https://issues.apache.org/jira/browse/HIVE-883
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.4.0, 0.5.0
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Attachments: HIVE-883.1.patch, HIVE-883.2.patch
>
>
> When we try to insert into a partitioned table that the partition value 
> contains special char ":", we will see an exception
> {code}
> stack trace:
> java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative 
> path in absolute URI: ts=2009-10-16 16:14:10
> at org.apache.hadoop.fs.Path.initialize(Path.java:140)
> at org.apache.hadoop.fs.Path.(Path.java:126)
> at org.apache.hadoop.fs.Path.(Path.java:45)
> at 
> org.apache.hadoop.hive.ql.metadata.Partition.initialize(Partition.java:146)
> at 
> org.apache.hadoop.hive.ql.metadata.Partition.(Partition.java:123)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer$tableSpec.(BaseSemanticAnalyzer.java:292)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:747)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:4383)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:87)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:251)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:283)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:251)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:166)
> at org.apache.hadoop.mapred.JobShell.run(JobShell.java:194)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> at org.apache.hadoop.mapred.JobShell.main(JobShell.java:220)
> Caused by: java.net.URISyntaxException: Relative path in absolute URI: 
> ts=2009-10-16 16:14:10
> at java.net.URI.checkPath(URI.java:1787)
> at java.net.URI.(URI.java:735)
> at org.apache.hadoop.fs.Path.initialize(Path.java:137)
> ... 22 more
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-883) URISyntaxException when partition value contains special chars

2009-10-17 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766988#action_12766988
 ] 

Prasad Chakka commented on HIVE-883:


ignore my comment above. this patch does contain decode() as well.

.bq
// NOTE: This is for generating the internal path name for partitions. Users
// should always use the MetaStore API to get the path name for a partition.
// Users should not directly take partition values and turn it into a path 
// name by themselves, because the logic below may change in the future.

lot of people already do this conversion currently. i suppose this doesn't 
affect as long as their partitions do not contain these special characters.

let's keep this open for few days to see if any one has concerns.

> URISyntaxException when partition value contains special chars
> --
>
> Key: HIVE-883
> URL: https://issues.apache.org/jira/browse/HIVE-883
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.4.0, 0.5.0
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Attachments: HIVE-883.1.patch, HIVE-883.2.patch
>
>
> When we try to insert into a partitioned table that the partition value 
> contains special char ":", we will see an exception
> {code}
> stack trace:
> java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative 
> path in absolute URI: ts=2009-10-16 16:14:10
> at org.apache.hadoop.fs.Path.initialize(Path.java:140)
> at org.apache.hadoop.fs.Path.(Path.java:126)
> at org.apache.hadoop.fs.Path.(Path.java:45)
> at 
> org.apache.hadoop.hive.ql.metadata.Partition.initialize(Partition.java:146)
> at 
> org.apache.hadoop.hive.ql.metadata.Partition.(Partition.java:123)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer$tableSpec.(BaseSemanticAnalyzer.java:292)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:747)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:4383)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:87)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:251)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:283)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:251)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:166)
> at org.apache.hadoop.mapred.JobShell.run(JobShell.java:194)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> at org.apache.hadoop.mapred.JobShell.main(JobShell.java:220)
> Caused by: java.net.URISyntaxException: Relative path in absolute URI: 
> ts=2009-10-16 16:14:10
> at java.net.URI.checkPath(URI.java:1787)
> at java.net.URI.(URI.java:735)
> at org.apache.hadoop.fs.Path.initialize(Path.java:137)
> ... 22 more
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-883) URISyntaxException when partition value contains special chars

2009-10-17 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766986#action_12766986
 ] 

Prasad Chakka commented on HIVE-883:


we need to support decode (ie. get the partition key values from hdfs path 
name). check Warehouse.mapeSpecFromName(). This is used while partition pruning 
and also in 'msck repair' whose patch is pending.



> URISyntaxException when partition value contains special chars
> --
>
> Key: HIVE-883
> URL: https://issues.apache.org/jira/browse/HIVE-883
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.4.0, 0.5.0
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Attachments: HIVE-883.1.patch, HIVE-883.2.patch
>
>
> When we try to insert into a partitioned table that the partition value 
> contains special char ":", we will see an exception
> {code}
> stack trace:
> java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative 
> path in absolute URI: ts=2009-10-16 16:14:10
> at org.apache.hadoop.fs.Path.initialize(Path.java:140)
> at org.apache.hadoop.fs.Path.(Path.java:126)
> at org.apache.hadoop.fs.Path.(Path.java:45)
> at 
> org.apache.hadoop.hive.ql.metadata.Partition.initialize(Partition.java:146)
> at 
> org.apache.hadoop.hive.ql.metadata.Partition.(Partition.java:123)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer$tableSpec.(BaseSemanticAnalyzer.java:292)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:747)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:4383)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:87)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:251)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:283)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:251)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:166)
> at org.apache.hadoop.mapred.JobShell.run(JobShell.java:194)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> at org.apache.hadoop.mapred.JobShell.main(JobShell.java:220)
> Caused by: java.net.URISyntaxException: Relative path in absolute URI: 
> ts=2009-10-16 16:14:10
> at java.net.URI.checkPath(URI.java:1787)
> at java.net.URI.(URI.java:735)
> at org.apache.hadoop.fs.Path.initialize(Path.java:137)
> ... 22 more
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-884) Metastore Server should exit if error happens

2009-10-17 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766967#action_12766967
 ] 

Prasad Chakka commented on HIVE-884:


can you do print the stack trace to log as well, if logging was initialized? I 
don't think printing to stdout will result in updating log.

> Metastore Server should exit if error happens
> -
>
> Key: HIVE-884
> URL: https://issues.apache.org/jira/browse/HIVE-884
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.4.1, 0.5.0
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Attachments: HIVE-884.1.patch
>
>
> Currently, HiveMetaStore (the thrift server) is not exiting when the main 
> thread saw an Exception.
> The process should exit when that happens.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-880) user group information not populated for pre and post hook

2009-10-15 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766307#action_12766307
 ] 

Prasad Chakka commented on HIVE-880:


i think we do something similar in DDLTask (for getting the username for table 
owner).

> user group information not populated for pre and post hook
> --
>
> Key: HIVE-880
> URL: https://issues.apache.org/jira/browse/HIVE-880
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.880.1.patch, hive.880.2.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-493) automatically infer existing partitions of table from HDFS files.

2009-10-13 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765164#action_12765164
 ] 

Prasad Chakka commented on HIVE-493:


Created HIVE-874 for this patch as this doesn't exactly solve this problem. I 
would like to keep this open for that 'perfect' solution :)

> automatically infer existing partitions of table from HDFS files.
> -
>
> Key: HIVE-493
> URL: https://issues.apache.org/jira/browse/HIVE-493
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Metastore, Query Processor
>Affects Versions: 0.3.0, 0.3.1, 0.4.0
>Reporter: Prasad Chakka
> Attachments: HIVE-493-2.patch, HIVE-493.patch
>
>
> Initially partition list for a table is inferred from HDFS directory 
> structure instead of looking into metastore (partitions are created using 
> 'alter table ... add partition'). but this automatic inferring was removed to 
> favor the later approach during checking-in metastore checker feature and 
> also to facilitate external partitions.
> Joydeep and Frederick mentioned that it would simple for users to create the 
> HDFS directory and let Hive infer rather than explicitly add a partition. But 
> doing that raises following...
> 1) External partitions -- so we have to mix both approaches and partition 
> list is merged list of inferred partitions and registered partitions. and 
> duplicates have to be resolved.
> 2) Partition level schemas can't supported. Which schema to chose for the 
> inferred partitions? the table schema when the inferred partition is created 
> or the latest tale schema? how do we know the table schema when the inferred 
> partitions is created?
> 3) If partitions have to be registered the partitions can be disabled without 
> actually deleting the data. this feature is not supported and may not be that 
> useful but nevertheless this can't be supported with inferred partitions
> 4) Indexes are being added. So if partitions are not registered then indexes 
> for such partitions can not be maintained automatically.
> I would like to know what is the general thinking about this among users of 
> Hive. If inferred partitions are preferred then can we live with restricted 
> functionality that this imposes?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-874) add partitions found during metastore check

2009-10-13 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765162#action_12765162
 ] 

Prasad Chakka commented on HIVE-874:


Cyrus,

I created a separate JIRA since HIVE-493 is about automatically inferring 
partitions during query time and not updating metadata at all. I would like to 
keep that open since this may not be sufficient for some usecases. Could you 
upload your patch here?

As for unit tests, check hive/ql/src/test/queries/clientpositive directory that 
contains the CLI tests. One of the tests could be to create hdfs directories 
for a table like 'srcpart' and run 'msck repair' on it and do a 'show 
partitions' on the table again to see if the partition is listed. We need to 
have some unit tests so that this functionality will not be broken accidentally 
by some other check-in.



> add partitions found during metastore check
> ---
>
> Key: HIVE-874
> URL: https://issues.apache.org/jira/browse/HIVE-874
> Project: Hadoop Hive
>  Issue Type: Improvement
>Affects Versions: 0.2.0, 0.3.0, 0.3.1, 0.3.2, 0.4.0, 0.4.1, 0.5.0
>Reporter: Prasad Chakka
>Assignee: Cyrus Katrak
>
> 'msck' just reports the list of partition directories that exist but do not 
> have corresponding metadata. This can happen if a process outside of hive is 
> populating the directories. Hive should support an option to 'msck' that 
> would also add default metadata for these partitions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-874) add partitions found during metastore check

2009-10-13 Thread Prasad Chakka (JIRA)

add partitions found during metastore check
---

 Key: HIVE-874
 URL: https://issues.apache.org/jira/browse/HIVE-874
 Project: Hadoop Hive
  Issue Type: Improvement
Affects Versions: 0.3.0, 0.2.0, 0.3.1, 0.3.2, 0.4.0, 0.4.1, 0.5.0
Reporter: Prasad Chakka
Assignee: Cyrus Katrak


'msck' just reports the list of partition directories that exist but do not 
have corresponding metadata. This can happen if a process outside of hive is 
populating the directories. Hive should support an option to 'msck' that would 
also add default metadata for these partitions.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-493) automatically infer existing partitions of table from HDFS files.

2009-10-11 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764542#action_12764542
 ] 

Prasad Chakka commented on HIVE-493:


Cyrus, 

Thanks for providing this patch. Very useful.

It is possible that on an HDFS with permissions enabled, a partition/table 
directory is not accessible to the current user but metadata will be deleted 
here so I am little uncomfortable in removing partitions. I am not really sure 
that there is that much utility for removing partitions compared to the risk 
loosing partitions permanently. What do you think? 

Couple of comments on the code:
1) Can you add a test or two to the msck test package.
2) REPAIR should be an optional keyword to the MSCK ANTRL clause instead of 
being whole another clause. Look at how KW_EXTERNAL is used in createStatement 
clause.
3) Following like should be outside of the for loop since there is only one 
table here.
{code}
Table table = db.getTable(MetaStoreUtils.DEFAULT_DATABASE_NAME,
msckDesc.getTableName());
{code}
4) Is this cast '(Map )' really needed?



> automatically infer existing partitions of table from HDFS files.
> -
>
> Key: HIVE-493
> URL: https://issues.apache.org/jira/browse/HIVE-493
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Metastore, Query Processor
>Affects Versions: 0.3.0, 0.3.1, 0.4.0
>Reporter: Prasad Chakka
> Attachments: HIVE-493.patch
>
>
> Initially partition list for a table is inferred from HDFS directory 
> structure instead of looking into metastore (partitions are created using 
> 'alter table ... add partition'). but this automatic inferring was removed to 
> favor the later approach during checking-in metastore checker feature and 
> also to facilitate external partitions.
> Joydeep and Frederick mentioned that it would simple for users to create the 
> HDFS directory and let Hive infer rather than explicitly add a partition. But 
> doing that raises following...
> 1) External partitions -- so we have to mix both approaches and partition 
> list is merged list of inferred partitions and registered partitions. and 
> duplicates have to be resolved.
> 2) Partition level schemas can't supported. Which schema to chose for the 
> inferred partitions? the table schema when the inferred partition is created 
> or the latest tale schema? how do we know the table schema when the inferred 
> partitions is created?
> 3) If partitions have to be registered the partitions can be disabled without 
> actually deleting the data. this feature is not supported and may not be that 
> useful but nevertheless this can't be supported with inferred partitions
> 4) Indexes are being added. So if partitions are not registered then indexes 
> for such partitions can not be maintained automatically.
> I would like to know what is the general thinking about this among users of 
> Hive. If inferred partitions are preferred then can we live with restricted 
> functionality that this imposes?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-31) Hive: support CREATE TABLE xxx SELECT yyy.* FROM yyy

2009-09-29 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-31?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12760767#action_12760767
 ] 

Prasad Chakka commented on HIVE-31:
---

I don't understand why analyzeCreateTable() is moved to SemanticAnalyzer.java. 
That file is already very big. Can you not call analyzeInternal() from 
DDLSemanticAnalyzer?

> Hive: support CREATE TABLE xxx SELECT yyy.* FROM yyy
> 
>
> Key: HIVE-31
> URL: https://issues.apache.org/jira/browse/HIVE-31
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Zheng Shao
>Assignee: Ning Zhang
> Attachments: HIVE-31.patch
>
>
> We should allow users to create a table using query result, without 
> specifying the column names and column types.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-493) automatically infer existing partitions of table from HDFS files.

2009-09-28 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12760301#action_12760301
 ] 

Prasad Chakka commented on HIVE-493:


When to run 'add partition' command is depending on when you want to make the 
partition available for use. But you should run that command only once per 
partition not everytime you add new files to the partition.


bq. In fact, we are thinking of this issue in our project. Is there any good 
practices?
if you have a locking server available in your system such as ZooKeeper you 
should use it. Otherwise you could create an HDFS file as lock. But you need to 
be careful here since a client can die after acquiring a lock and thus creating 
lot of orphaned locks.

Correct solution will be to create a lease server inside of Hive using 
metastore db. I might end up doing this if I get couple of days of time. 

> automatically infer existing partitions of table from HDFS files.
> -
>
> Key: HIVE-493
> URL: https://issues.apache.org/jira/browse/HIVE-493
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Metastore, Query Processor
>Affects Versions: 0.3.0, 0.3.1, 0.4.0
>Reporter: Prasad Chakka
>
> Initially partition list for a table is inferred from HDFS directory 
> structure instead of looking into metastore (partitions are created using 
> 'alter table ... add partition'). but this automatic inferring was removed to 
> favor the later approach during checking-in metastore checker feature and 
> also to facilitate external partitions.
> Joydeep and Frederick mentioned that it would simple for users to create the 
> HDFS directory and let Hive infer rather than explicitly add a partition. But 
> doing that raises following...
> 1) External partitions -- so we have to mix both approaches and partition 
> list is merged list of inferred partitions and registered partitions. and 
> duplicates have to be resolved.
> 2) Partition level schemas can't supported. Which schema to chose for the 
> inferred partitions? the table schema when the inferred partition is created 
> or the latest tale schema? how do we know the table schema when the inferred 
> partitions is created?
> 3) If partitions have to be registered the partitions can be disabled without 
> actually deleting the data. this feature is not supported and may not be that 
> useful but nevertheless this can't be supported with inferred partitions
> 4) Indexes are being added. So if partitions are not registered then indexes 
> for such partitions can not be maintained automatically.
> I would like to know what is the general thinking about this among users of 
> Hive. If inferred partitions are preferred then can we live with restricted 
> functionality that this imposes?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-493) automatically infer existing partitions of table from HDFS files.

2009-09-28 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12760293#action_12760293
 ] 

Prasad Chakka commented on HIVE-493:


1) You can do 'alter table  add partition ' at the end of 
the map-reduce job that creates the partition. You don't really 'automatic 
inference' unless you  do not have any control over the partition creation 
process

2) Adding new files to existing partitions and tables should not be a problem 
now. But you may want to do the above add partition command only when last of 
the partition files have been added otherwise the dependent data processes 
might see incomplete data. With automatic inferring of partitions, the 
dependent processes can see incomplete data.

3) Netflix guys wrote a custom map/reducer job to compact/merging process. But 
you may want to co-ordinate the partition readers and compactors so that later 
does not clobber the directory.

> automatically infer existing partitions of table from HDFS files.
> -
>
> Key: HIVE-493
> URL: https://issues.apache.org/jira/browse/HIVE-493
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Metastore, Query Processor
>Affects Versions: 0.3.0, 0.3.1, 0.4.0
>Reporter: Prasad Chakka
>
> Initially partition list for a table is inferred from HDFS directory 
> structure instead of looking into metastore (partitions are created using 
> 'alter table ... add partition'). but this automatic inferring was removed to 
> favor the later approach during checking-in metastore checker feature and 
> also to facilitate external partitions.
> Joydeep and Frederick mentioned that it would simple for users to create the 
> HDFS directory and let Hive infer rather than explicitly add a partition. But 
> doing that raises following...
> 1) External partitions -- so we have to mix both approaches and partition 
> list is merged list of inferred partitions and registered partitions. and 
> duplicates have to be resolved.
> 2) Partition level schemas can't supported. Which schema to chose for the 
> inferred partitions? the table schema when the inferred partition is created 
> or the latest tale schema? how do we know the table schema when the inferred 
> partitions is created?
> 3) If partitions have to be registered the partitions can be disabled without 
> actually deleting the data. this feature is not supported and may not be that 
> useful but nevertheless this can't be supported with inferred partitions
> 4) Indexes are being added. So if partitions are not registered then indexes 
> for such partitions can not be maintained automatically.
> I would like to know what is the general thinking about this among users of 
> Hive. If inferred partitions are preferred then can we live with restricted 
> functionality that this imposes?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-417) Implement Indexing in Hive

2009-09-21 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758139#action_12758139
 ] 

Prasad Chakka commented on HIVE-417:


yes they do but they don't use for table scans which are done if the query 
selectivity is greater than 10% (or some such). they use the index for index 
scans and in joins. I wrote the table scan code :)

> Implement Indexing in Hive
> --
>
> Key: HIVE-417
> URL: https://issues.apache.org/jira/browse/HIVE-417
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Metastore, Query Processor
>Affects Versions: 0.2.0, 0.3.0, 0.3.1, 0.4.0
>Reporter: Prasad Chakka
>Assignee: He Yongqiang
> Attachments: hive-417.proto.patch, hive-417－2009-07-18.patch
>
>
> Implement indexing on Hive so that lookup and range queries are efficient.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-417) Implement Indexing in Hive

2009-09-21 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758127#action_12758127
 ] 

Prasad Chakka commented on HIVE-417:


i don't think it makes much sense unless there is some clustering or sorting 
property. if there is clustering and sorting and the selectivity of a query is 
much higher than 10% then storing this metadata along with data makes sense 
instead of a separate block. the 10% threshold may be larger for Hive but the 
point still stands. in OLAP case data is change seldom and the size of this 
kind of metadata is much smaller than the data itself so the overhead of 
storing this data is negligible.

something similar to this is done in DB2 Multi-Dimensional Clustering where 
whole blocks (disk blocks) are skipped if the key value doesn't fit the query.

> Implement Indexing in Hive
> --
>
> Key: HIVE-417
> URL: https://issues.apache.org/jira/browse/HIVE-417
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Metastore, Query Processor
>Affects Versions: 0.2.0, 0.3.0, 0.3.1, 0.4.0
>Reporter: Prasad Chakka
>Assignee: He Yongqiang
> Attachments: hive-417.proto.patch, hive-417－2009-07-18.patch
>
>
> Implement indexing on Hive so that lookup and range queries are efficient.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-417) Implement Indexing in Hive

2009-09-21 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758097#action_12758097
 ] 

Prasad Chakka commented on HIVE-417:


there can be a summary index here as well (every SequenceFile block will have 
min & max column values in the index). thought you are hinting at that.

> Implement Indexing in Hive
> --
>
> Key: HIVE-417
> URL: https://issues.apache.org/jira/browse/HIVE-417
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Metastore, Query Processor
>Affects Versions: 0.2.0, 0.3.0, 0.3.1, 0.4.0
>Reporter: Prasad Chakka
>Assignee: He Yongqiang
> Attachments: hive-417.proto.patch, hive-417－2009-07-18.patch
>
>
> Implement indexing on Hive so that lookup and range queries are efficient.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-417) Implement Indexing in Hive

2009-09-21 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758084#action_12758084
 ] 

Prasad Chakka commented on HIVE-417:


@jeff, i think this is more suitable for storing it along with data where 
blocks of data can skipped while scanning rows. i think columnar storage might 
already be doing this. 

> Implement Indexing in Hive
> --
>
> Key: HIVE-417
> URL: https://issues.apache.org/jira/browse/HIVE-417
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Metastore, Query Processor
>Affects Versions: 0.2.0, 0.3.0, 0.3.1, 0.4.0
>Reporter: Prasad Chakka
>Assignee: He Yongqiang
> Attachments: hive-417.proto.patch, hive-417－2009-07-18.patch
>
>
> Implement indexing on Hive so that lookup and range queries are efficient.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-675) add database/scheme support Hive QL

2009-09-20 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757767#action_12757767
 ] 

Prasad Chakka commented on HIVE-675:


What is the reason for not using a default path (/dbname) if 
location argument is null? It is possible that tools other than Hive CLI can 
create databases and they don't necessarily know what is the default path 
should be.




> add database/scheme support Hive QL
> ---
>
> Key: HIVE-675
> URL: https://issues.apache.org/jira/browse/HIVE-675
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Prasad Chakka
>Assignee: He Yongqiang
> Attachments: hive-675-2009-9-16.patch, hive-675-2009-9-19.patch, 
> hive-675-2009-9-7.patch, hive-675-2009-9-8.patch
>
>
> Currently all Hive tables reside in single namespace (default). Hive should 
> support multiple namespaces (databases or schemas) such that users can create 
> tables in their specific namespaces. These name spaces can have different 
> warehouse directories (with a default naming scheme) and possibly different 
> properties.
> There is already some support for this in metastore but Hive query parser 
> should have this feature as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-837) virtual column support (filename) in hive

2009-09-17 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756658#action_12756658
 ] 

Prasad Chakka commented on HIVE-837:


buckets have other semantic meaning which is not the case for files so we 
should not lump buckets with meta/virtual columns. we could possibly add a 
virtual column/udf called bucket() for that.

mysql gives lot of virtual data as udfs (curtime(), database(), current_user(), 
default(column)) etc instead of virtual columns. i think it makes sense to make 
them udfs just incase some virtual columns need arguments.

> virtual column support (filename) in hive
> -
>
> Key: HIVE-837
> URL: https://issues.apache.org/jira/browse/HIVE-837
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>
> Copying from some mails:
> I am dumping files into a hive partion on five minute intervals. I am using 
> LOAD DATA into a partition.
> weblogs
> web1.00
> web1.05
> web1.10
> ...
> web2.00
> web2.05
> web1.10
> 
> Things that would be useful..
> Select files from the folder with a regex or exact name
> select * FROM logs where FILENAME LIKE(WEB1*)
> select * FROM LOGS WHERE FILENAME=web2.00
> Also it would be nice to be able to select offsets in a file, this would make 
> sense with appends
> select * from logs WHERE FILENAME=web2.00 FROMOFFSET=454644 [TOOFFSET=]
> select  
> substr(filename, 4, 7) as  class_A, 
> substr(filename,  8, 10) as class_B
> count( x ) as cnt
> from FOO
> group by
> substr(filename, 4, 7), 
> substr(filename,  8, 10) ;
> Hive should support virtual columns

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-675) add database/scheme support Hive QL

2009-09-15 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12755888#action_12755888
 ] 

Prasad Chakka commented on HIVE-675:


Yeah, i never meant to fix this in this JIRA. just saying that we might need it 
in the future.

> add database/scheme support Hive QL
> ---
>
> Key: HIVE-675
> URL: https://issues.apache.org/jira/browse/HIVE-675
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Prasad Chakka
>Assignee: He Yongqiang
> Attachments: hive-675-2009-9-16.patch, hive-675-2009-9-7.patch, 
> hive-675-2009-9-8.patch
>
>
> Currently all Hive tables reside in single namespace (default). Hive should 
> support multiple namespaces (databases or schemas) such that users can create 
> tables in their specific namespaces. These name spaces can have different 
> warehouse directories (with a default naming scheme) and possibly different 
> properties.
> There is already some support for this in metastore but Hive query parser 
> should have this feature as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-718) Load data inpath into a new partition without overwrite does not move the file

2009-09-12 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754651#action_12754651
 ] 

Prasad Chakka commented on HIVE-718:


+1 to todd's suggestion. let's fix the regression by just creating the 
directory in copyFiles() and let's open a separate JIRA for concurrency and 
atomicity issues.



> Load data inpath into a new partition without overwrite does not move the file
> --
>
> Key: HIVE-718
> URL: https://issues.apache.org/jira/browse/HIVE-718
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
> Attachments: HIVE-718.1.patch, HIVE-718.2.patch, hive-718.txt
>
>
> The bug can be reproduced as following. Note that it only happens for 
> partitioned tables. The select after the first load returns nothing, while 
> the second returns the data correctly.
> insert.txt in the current local directory contains 3 lines: "a", "b" and "c".
> {code}
> > create table tmp_insert_test (value string) stored as textfile;
> > load data local inpath 'insert.txt' into table tmp_insert_test;
> > select * from tmp_insert_test;
> a
> b
> c
> > create table tmp_insert_test_p ( value string) partitioned by (ds string) 
> > stored as textfile;
> > load data local inpath 'insert.txt' into table tmp_insert_test_p partition 
> > (ds = '2009-08-01');
> > select * from tmp_insert_test_p where ds= '2009-08-01';
> > load data local inpath 'insert.txt' into table tmp_insert_test_p partition 
> > (ds = '2009-08-01');
> > select * from tmp_insert_test_p where ds= '2009-08-01';
> a   2009-08-01
> b   2009-08-01
> d   2009-08-01
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-718) Load data inpath into a new partition without overwrite does not move the file

2009-09-11 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754461#action_12754461
 ] 

Prasad Chakka commented on HIVE-718:


the only change i made was not to create the partition/table until 
Hive.copyFiles() returns. i,e the partition/table directory was created (if it 
did not exist) before copyFiles() was called in 0.3. It could be the reason for 
the discrepancy between 0.3 and 0.4 but I am not sure.

We can't do the create the directory if we want to support correct semantics 
(i.e. the partition directory does not exist until the data has been copied 
completely). This  is needed for both COPY or REPLACE without which down stream 
data get corrupted/incomplete data.

But if you want to keep 0.3 semantics (which we might want to since COPY 
otherwise is quite unusable), we just need to create destf directory in 
Hive.copyFiles(). 


> Load data inpath into a new partition without overwrite does not move the file
> --
>
> Key: HIVE-718
> URL: https://issues.apache.org/jira/browse/HIVE-718
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
> Attachments: HIVE-718.1.patch, HIVE-718.2.patch, hive-718.txt
>
>
> The bug can be reproduced as following. Note that it only happens for 
> partitioned tables. The select after the first load returns nothing, while 
> the second returns the data correctly.
> insert.txt in the current local directory contains 3 lines: "a", "b" and "c".
> {code}
> > create table tmp_insert_test (value string) stored as textfile;
> > load data local inpath 'insert.txt' into table tmp_insert_test;
> > select * from tmp_insert_test;
> a
> b
> c
> > create table tmp_insert_test_p ( value string) partitioned by (ds string) 
> > stored as textfile;
> > load data local inpath 'insert.txt' into table tmp_insert_test_p partition 
> > (ds = '2009-08-01');
> > select * from tmp_insert_test_p where ds= '2009-08-01';
> > load data local inpath 'insert.txt' into table tmp_insert_test_p partition 
> > (ds = '2009-08-01');
> > select * from tmp_insert_test_p where ds= '2009-08-01';
> a   2009-08-01
> b   2009-08-01
> d   2009-08-01
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-718) Load data inpath into a new partition without overwrite does not move the file

2009-09-11 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754402#action_12754402
 ] 

Prasad Chakka commented on HIVE-718:


It is possible for users now to add new files to the table/partition directory 
without going through Hive to load files. Hive will pick it up these for the 
future queries. Currently the directory structure is easily inferred and so 
external processes can write and read data independent of Hive. if we start 
having versions then they have to read the correct version from Hive to 
interact with the data. Doesn't this goes against the philosophy of openness? :)

> Load data inpath into a new partition without overwrite does not move the file
> --
>
> Key: HIVE-718
> URL: https://issues.apache.org/jira/browse/HIVE-718
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
> Attachments: HIVE-718.1.patch, HIVE-718.2.patch, hive-718.txt
>
>
> The bug can be reproduced as following. Note that it only happens for 
> partitioned tables. The select after the first load returns nothing, while 
> the second returns the data correctly.
> insert.txt in the current local directory contains 3 lines: "a", "b" and "c".
> {code}
> > create table tmp_insert_test (value string) stored as textfile;
> > load data local inpath 'insert.txt' into table tmp_insert_test;
> > select * from tmp_insert_test;
> a
> b
> c
> > create table tmp_insert_test_p ( value string) partitioned by (ds string) 
> > stored as textfile;
> > load data local inpath 'insert.txt' into table tmp_insert_test_p partition 
> > (ds = '2009-08-01');
> > select * from tmp_insert_test_p where ds= '2009-08-01';
> > load data local inpath 'insert.txt' into table tmp_insert_test_p partition 
> > (ds = '2009-08-01');
> > select * from tmp_insert_test_p where ds= '2009-08-01';
> a   2009-08-01
> b   2009-08-01
> d   2009-08-01
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HIVE-341) Specifying partition column without table alias causes unknown exception

2009-09-11 Thread Prasad Chakka (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Chakka resolved HIVE-341.


Resolution: Fixed

fixed with the new partition pruner code.

> Specifying partition column without table alias causes unknown exception
> 
>
> Key: HIVE-341
> URL: https://issues.apache.org/jira/browse/HIVE-341
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.2.0
>Reporter: Raghotham Murthy
>
> Created two tables - tmp_rsm_abc and tmp_rsm_abc1. The latter is partitioned 
> on ds. Query on first table succeeds, but query on second fails. See the 
> session below.
> hive> describe tmp_rsm_abc;   
>   
> a string
> b int
> Time taken: 0.116 seconds
> hive> select a, b from tmp_rsm_abc where b > 5;  
> <- this query succeeds
> Unknown   19
> Unknown   29
> Unknown   29
> Unknown   29
> Unknown   30
> Unknown   25
> Unknown   15
> Unknown   17
> Unknown   28
> Unknown   17
> Time taken: 8.198 seconds
> hive> create table tmp_rsm_abc1(a string, b int) partitioned by (ds string);
> OK
> Time taken: 0.118 seconds
> hive> insert overwrite table tmp_rsm_abc1 partition (ds = '10') select a, b 
> from tmp_rsm_abc where b > 5;
> 10 Rows loaded to tmp_rsm_abc1
> OK
> Time taken: 9.319 seconds
> hive> select a, b from tmp_rsm_abc1 where ds = '10'; 
> <- this query fails
> FAILED: Unknown exception : null
> Time taken: 0.053 seconds
> hive> 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HIVE-19) Support of change the metadata of a hive table

2009-09-11 Thread Prasad Chakka (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Chakka resolved HIVE-19.
---

Resolution: Fixed

this is fixed sometime ago by joydeep.

> Support of change the metadata of a hive table
> --
>
> Key: HIVE-19
> URL: https://issues.apache.org/jira/browse/HIVE-19
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Configuration
>Reporter: Zheng Shao
>
> From Joey Pan [j...@rocketfuelinc.com]
> The issue occurs when try to query table when restarting ec2 cluster (will 
> get diff server ip), currently the warehouse dir is hardcoded as some 
> internal ip. 
> It failed after retrying the old location: 
> 08/11/02 14:41:51 INFO ipc.Client: Retrying connect to server: 
> ip-10-250-75-160.ec2.internal/10.250.75.160:50001. Already tried 0 time(s).
> 08/11/02 14:41:52 INFO ipc.Client: Retrying connect to server: 
> ip-10-250-75-160.ec2.internal/10.250.75.160:50001. Already tried 1 time(s).
> 08/11/02 14:41:53 INFO ipc.Client: Retrying connect to server: 
> ip-10-250-75-160.ec2.internal/10.250.75.160:50001. Already tried 2 time(s).
> 08/11/02 14:41:54 INFO ipc.Client: Retrying connect to server: 
> ip-10-250-75-160.ec2.internal/10.250.75.160:50001. Already tried 3 time(s).
> 08/11/02 14:41:55 INFO ipc.Client: Retrying connect to server: 
> ip-10-250-75-160.ec2.internal/10.250.75.160:50001. Already tried 4 time(s).
> 08/11/02 14:41:56 INFO ipc.Client: Retrying connect to server: 
> ip-10-250-75-160.ec2.internal/10.250.75.160:50001. Already tried 5 time(s).
> 08/11/02 14:41:57 INFO ipc.Client: Retrying connect to server: 
> ip-10-250-75-160.ec2.internal/10.250.75.160:50001. Already tried 6 time(s).
> 08/11/02 14:41:58 INFO ipc.Client: Retrying connect to server: 
> ip-10-250-75-160.ec2.internal/10.250.75.160:50001. Already tried 7 time(s).
> 08/11/02 14:41:59 INFO ipc.Client: Retrying connect to server: 
> ip-10-250-75-160.ec2.internal/10.250.75.160:50001. Already tried 8 time(s).
> 08/11/02 14:42:00 INFO ipc.Client: Retrying connect to server: 
> ip-10-250-75-160.ec2.internal/10.250.75.160:50001. Already tried 9 time(s).
> Is there a way to set the warehouse.dir manually for the already existent db? 
> Otherwise all tables have to be created again... 
> Thanks, 
> joey

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-675) add database/scheme support Hive QL

2009-09-11 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754372#action_12754372
 ] 

Prasad Chakka commented on HIVE-675:


There is no way to refer to SessionState from inside of hive.ql.metadata or 
hive.metadata. 

I think this problem is going to bite us. We may need to access Session context 
from all over the package so we need to put all this information in Common 
which is accessible to all.

> add database/scheme support Hive QL
> ---
>
> Key: HIVE-675
> URL: https://issues.apache.org/jira/browse/HIVE-675
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Prasad Chakka
>Assignee: He Yongqiang
> Attachments: hive-675-2009-9-7.patch, hive-675-2009-9-8.patch
>
>
> Currently all Hive tables reside in single namespace (default). Hive should 
> support multiple namespaces (databases or schemas) such that users can create 
> tables in their specific namespaces. These name spaces can have different 
> warehouse directories (with a default naming scheme) and possibly different 
> properties.
> There is already some support for this in metastore but Hive query parser 
> should have this feature as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-109) 'location' clause for table creation should only be allowed for external tables

2009-09-11 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754371#action_12754371
 ] 

Prasad Chakka commented on HIVE-109:


I think there are quite a few people who use this feature and those usecases 
will break by fixing this. But I don't see any problem here, if users don't 
want their data to be deleted with the table they can use EXTERNAL flag 
otherwise they will not use it. It does not matter whether 'location' has been 
specified or not.

> 'location' clause for table creation should only be allowed for external 
> tables
> ---
>
> Key: HIVE-109
> URL: https://issues.apache.org/jira/browse/HIVE-109
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Joydeep Sen Sarma
>
> currently - the code does not by and large distinguish between external and 
> internal tables. one clear distinction though is that storage for external 
> tables is managed outside hive. this leads to consequences like HIVE-86 - so 
> that hive does not mess around with tables whose storage is managed 
> externally. however - currently - we allow users to specify location for 
> internal tables - which is confusing and could lead to data being deleted in 
> external folders. we should not allow this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HIVE-310) alter table is not using correct serde while replacing columns

2009-09-11 Thread Prasad Chakka (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Chakka resolved HIVE-310.


Resolution: Won't Fix

not sure this is required any more. going to close this as 'will not fix'. will 
reopen this if any one has a valid case for this still.

> alter table is not using correct serde while replacing columns
> --
>
> Key: HIVE-310
> URL: https://issues.apache.org/jira/browse/HIVE-310
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.2.0
>Reporter: Prasad Chakka
>Assignee: Prasad Chakka
>Priority: Minor
> Attachments: hive-310.patch
>
>
> Alter table should set correct serde while replacing columns depending on the 
> column types instead of setting the serde to MetadataTypedColumnsetSerDe 
> always.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-718) Load data inpath into a new partition without overwrite does not move the file

2009-09-11 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754347#action_12754347
 ] 

Prasad Chakka commented on HIVE-718:


from HADOOP-6240, it appears that atomic rename() might not be supported on all 
FileSystems. Given this, I think we may need to avoid depending on HDFS for 
atomicity or locking.

There was another discussion on hive-users@ about providing some kind of 
locking so that two different jobs can do conflicting things to a directory (if 
there is one writer and a reader). So if we go down that route, this problem 
can be solved by acquiring write locks (or leases). Ofcourse, this won't work 
if one of the processes is a non-Hive process. Even ashish's solution will not 
work since the external process needs to handle the version numbers correctly.


> Load data inpath into a new partition without overwrite does not move the file
> --
>
> Key: HIVE-718
> URL: https://issues.apache.org/jira/browse/HIVE-718
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
> Attachments: HIVE-718.1.patch, HIVE-718.2.patch, hive-718.txt
>
>
> The bug can be reproduced as following. Note that it only happens for 
> partitioned tables. The select after the first load returns nothing, while 
> the second returns the data correctly.
> insert.txt in the current local directory contains 3 lines: "a", "b" and "c".
> {code}
> > create table tmp_insert_test (value string) stored as textfile;
> > load data local inpath 'insert.txt' into table tmp_insert_test;
> > select * from tmp_insert_test;
> a
> b
> c
> > create table tmp_insert_test_p ( value string) partitioned by (ds string) 
> > stored as textfile;
> > load data local inpath 'insert.txt' into table tmp_insert_test_p partition 
> > (ds = '2009-08-01');
> > select * from tmp_insert_test_p where ds= '2009-08-01';
> > load data local inpath 'insert.txt' into table tmp_insert_test_p partition 
> > (ds = '2009-08-01');
> > select * from tmp_insert_test_p where ds= '2009-08-01';
> a   2009-08-01
> b   2009-08-01
> d   2009-08-01
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-675) add database/scheme support Hive QL

2009-09-10 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753675#action_12753675
 ] 

Prasad Chakka commented on HIVE-675:


wait, if approach A for HIVE-584 is chosen then shouldn't the current patch 
work without any changes? 

> add database/scheme support Hive QL
> ---
>
> Key: HIVE-675
> URL: https://issues.apache.org/jira/browse/HIVE-675
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Prasad Chakka
>Assignee: He Yongqiang
> Attachments: hive-675-2009-9-7.patch, hive-675-2009-9-8.patch
>
>
> Currently all Hive tables reside in single namespace (default). Hive should 
> support multiple namespaces (databases or schemas) such that users can create 
> tables in their specific namespaces. These name spaces can have different 
> warehouse directories (with a default naming scheme) and possibly different 
> properties.
> There is already some support for this in metastore but Hive query parser 
> should have this feature as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-675) add database/scheme support Hive QL

2009-09-09 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753166#action_12753166
 ] 

Prasad Chakka commented on HIVE-675:


Yes, by default currentDatabase() can be used and since only Hive.java holds 
the current database name, it should be fine.

otherwise patch looks good.

> add database/scheme support Hive QL
> ---
>
> Key: HIVE-675
> URL: https://issues.apache.org/jira/browse/HIVE-675
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Prasad Chakka
>Assignee: He Yongqiang
> Attachments: hive-675-2009-9-7.patch, hive-675-2009-9-8.patch
>
>
> Currently all Hive tables reside in single namespace (default). Hive should 
> support multiple namespaces (databases or schemas) such that users can create 
> tables in their specific namespaces. These name spaces can have different 
> warehouse directories (with a default naming scheme) and possibly different 
> properties.
> There is already some support for this in metastore but Hive query parser 
> should have this feature as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-675) add database/scheme support Hive QL

2009-09-08 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12752884#action_12752884
 ] 

Prasad Chakka commented on HIVE-675:


MetaStoreUtils.DEFAULT_DATABASE_NAME is sometimes replaced with currentDatabase 
and sometimes left empty (in DDLTask.java) why?
i thought all Hive.java functions that do not take database name as argument 
are deprecated. no?

> add database/scheme support Hive QL
> ---
>
> Key: HIVE-675
> URL: https://issues.apache.org/jira/browse/HIVE-675
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Prasad Chakka
>Assignee: He Yongqiang
> Attachments: hive-675-2009-9-7.patch, hive-675-2009-9-8.patch
>
>
> Currently all Hive tables reside in single namespace (default). Hive should 
> support multiple namespaces (databases or schemas) such that users can create 
> tables in their specific namespaces. These name spaces can have different 
> warehouse directories (with a default naming scheme) and possibly different 
> properties.
> There is already some support for this in metastore but Hive query parser 
> should have this feature as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-675) add database/scheme support Hive QL

2009-09-07 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12752204#action_12752204
 ] 

Prasad Chakka commented on HIVE-675:


1) more test clientpositive & clientnegative tests
2) db location should take an URI rather than a path. since location can point 
to a different HDFS instance
3) this path does not support ... If you are not 
planning to add that support then create a new JIRA.
4) there should be a way to easily move tbl from on db to another db. may be 
open a JIRA for that as well?
5) HiveMetaStore.java: if the given db location is not a proper URI then it is 
silently ignored and default path is being used. HiveMetastore should throw an 
error instead if the path is not absolute or a proper URI
6) remove ql/junit233767264.properties from patch
7) remove ql/junitvmwatcher1237844024.properties from patch
8) there are lot of other locations where MetaStoreUtils.DEFAULT_DATABASE_NAME 
is used. It should be replaced by Hive.currentDatabase
9) is Hive.currentDatabase thread-safe. does it work in HiveServer where 
multiple threads can each have separate currentDatabases?
10) Hive.createDatabase() & dropDatabase() should throw 
AlreadyExistsException() instead of throwin HiveException
11) rename isDatabaseExist() to databaseExists()
12) currentDatabase should either be in SessionState or Hive. There is no need 
for it to be stored in both objects.

> add database/scheme support Hive QL
> ---
>
> Key: HIVE-675
> URL: https://issues.apache.org/jira/browse/HIVE-675
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Prasad Chakka
> Attachments: hive-675-2009-9-7.patch
>
>
> Currently all Hive tables reside in single namespace (default). Hive should 
> support multiple namespaces (databases or schemas) such that users can create 
> tables in their specific namespaces. These name spaces can have different 
> warehouse directories (with a default naming scheme) and possibly different 
> properties.
> There is already some support for this in metastore but Hive query parser 
> should have this feature as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-750) new partitionpruner does not work with test mode

2009-09-04 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12751561#action_12751561
 ] 

Prasad Chakka commented on HIVE-750:


talked with namit offline. current patch looks good, the optimization can be 
done in a different JIRA.

> new partitionpruner does not work with test mode
> 
>
> Key: HIVE-750
> URL: https://issues.apache.org/jira/browse/HIVE-750
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
>Priority: Critical
> Attachments: hive.750.1.patch
>
>
> set hive.test.mode=true;
> the new partition pruner does not work

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-750) new partitionpruner does not work with test mode

2009-09-03 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12751128#action_12751128
 ] 

Prasad Chakka commented on HIVE-750:


You are correct but that can be fixed by chagning step4 to F3 ppds to F2 
instead.

> new partitionpruner does not work with test mode
> 
>
> Key: HIVE-750
> URL: https://issues.apache.org/jira/browse/HIVE-750
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
>Priority: Critical
> Attachments: hive.750.1.patch
>
>
> set hive.test.mode=true;
> the new partition pruner does not work

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-750) new partitionpruner does not work with test mode

2009-09-03 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12751112#action_12751112
 ] 

Prasad Chakka commented on HIVE-750:


ofcourse this is not a complete fix since if only some preds get pushed down, 
this algo can't determine the rest of the preds to create new filter op. but it 
is good enough of a improvement and will fix the problem of creating multiple 
identical filter-ops for each of the non-deterministic preds in a filter op.

> new partitionpruner does not work with test mode
> 
>
> Key: HIVE-750
> URL: https://issues.apache.org/jira/browse/HIVE-750
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
>Priority: Critical
> Attachments: hive.750.1.patch
>
>
> set hive.test.mode=true;
> the new partition pruner does not work

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-750) new partitionpruner does not work with test mode

2009-09-03 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12751110#action_12751110
 ] 

Prasad Chakka commented on HIVE-750:


predicate pushdown was optimization before the new pruner came along so this is 
not a bug but a lack of feature :)

patch looks good but since a new child filter operator is created every time 
there is a non-deterministic predicate in an expression, there will be two 
identical filter operators if there are two non-deterministic are encountered 
while walking down the tree. 

a better option would be

1) merge the children's pushdown preds with the current filter op's preds
2) run the ppd algorithm to extract pushdown predicates
3) if step 2 does not return an empty set then skip to step 6 (new)
4) if the current op's child is a filter operator then skip to step 6 (new)
5) create a new filter op with the child's pushdown preds (new)
6) pass the results of step2 to current op's parent

so the above algorithm would create a new filter-op whenever a predicate can't 
be pushed down. this can help in case multiple sub-queries where some of the 
predicates can be pushed down one join but not all the joins in the query.

> new partitionpruner does not work with test mode
> 
>
> Key: HIVE-750
> URL: https://issues.apache.org/jira/browse/HIVE-750
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
>Priority: Critical
> Attachments: hive.750.1.patch
>
>
> set hive.test.mode=true;
> the new partition pruner does not work

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-804) Support deletion of partitions based on a prefix partition spefication

2009-08-28 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749037#action_12749037
 ] 

Prasad Chakka commented on HIVE-804:


I think you can give a location to 'alter table add partition' even if the 
table is not external.

> Support deletion of partitions based on a prefix partition spefication
> --
>
> Key: HIVE-804
> URL: https://issues.apache.org/jira/browse/HIVE-804
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Zheng Shao
>
> Sometimes users create partitions like (date='...', time='...'). It is useful 
> if user can delete all the partitions of the same day (and different time) 
> with a single command:
> {code}
> ALTER TABLE test DROP PARTITION (date='2009-08-26');
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-804) Support deletion of partitions based on a prefix partition spefication

2009-08-28 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749032#action_12749032
 ] 

Prasad Chakka commented on HIVE-804:


* Do we ever want to support arbitrary partition key/vals instead of just a 
prefix? What about range queries?

# for dropping partitions that matches a given prefix,
## 2A is good enough
# for dropping partitions whose keys match the given key/vals
## partition key/values are stored in model.Partition.values attribute which is 
a list. So you can possibly generate the query based on which partion key is 
needed.
# for dropping ranges
## if the range is applied on a prefix, then 2A can be used otherwise need to 
bring in all partitions

IMO, the trickier problem is to make sure the partition data directory(ies) 
gets deleted atomically along with the partitions. Even for a prefix, the 
partition data directories can be different based on how the partition is 
created. we need to get those semantics correctly.




> Support deletion of partitions based on a prefix partition spefication
> --
>
> Key: HIVE-804
> URL: https://issues.apache.org/jira/browse/HIVE-804
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Zheng Shao
>
> Sometimes users create partitions like (date='...', time='...'). It is useful 
> if user can delete all the partitions of the same day (and different time) 
> with a single command:
> {code}
> ALTER TABLE test DROP PARTITION (date='2009-08-26');
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-805) Session level metastore

2009-08-27 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12748631#action_12748631
 ] 

Prasad Chakka commented on HIVE-805:


# can you rename 'test' mode to 'temporary' mode or something like that? test 
here should mean either dry-run or temporary.
# this patch tries to allow creation of a partition of a regular table in 
temporary store. i am sure that it fails. i don't think there is a good 
solution at all since the metastore requires the table to exist before creating 
a partition. should we allow this at all? if we need it then we may have to 
redesign this.
# once a session table is created, a table parameter should identify that as 
such. this can be done by adding that parameter before creating the table in 
session metastore. alter_table etc that take in a table object should depend on 
this table instead of trying to alter both metastores.
# if ignoreUnknownTab=true then a NoSuchObjectException will not be thrown. so 
the below code will be incorrect.
{code}
boolean tableDropped = false;
if (this.conf.getBoolVar(HiveConf.ConfVars.HIVESESSIONTEST)) {
  try {
getSessionMSC().dropTable(dbName, tableName, deleteData, ignoreUnknownTab);
tableDropped = true;
  }
  catch (NoSuchObjectException e) {
// Ignore if the table is not found
  }
}

if (!tableDropped)
  getMSC().dropTable(dbName, tableName, deleteData, ignoreUnknownTab);
{code}

this pattern can be rewritten as

{code}
if (this.conf.getBoolVar(HiveConf.ConfVars.HIVESESSIONTEST)) {
  try {
getSessionMSC().dropTable(dbName, tableName, deleteData, ignoreUnknownTab);
  }
  catch (NoSuchObjectException e) {
getMSC().dropTable(dbName, tableName, deleteData, ignoreUnknownTab);
  }
}
{code}


> Session level metastore
> ---
>
> Key: HIVE-805
> URL: https://issues.apache.org/jira/browse/HIVE-805
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.2.0
>Reporter: Ashish Thusoo
>Assignee: Ashish Thusoo
> Fix For: 0.5.0
>
> Attachments: HIVE-805.patch
>
>
> Implement a shadow metastore that is in memory and runs for a session. This 
> can contain definitions for session specific views that can be used to 
> implement data flow variables in Hive. It can also be used for testing 
> scripts. First we will support the later use case where in all the DDL 
> statements in the session create objects in the session metastore and all the 
> queries are converted to explain internal. Any thoughts on load commands?
> This feature is enabled when
> set hive.session.test = true
> is done in the session.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-805) Session level metastore

2009-08-27 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12748565#action_12748565
 ] 

Prasad Chakka commented on HIVE-805:


we need to do this now since Metastore.createTable() will create the directory 
for you so when the session level metastore closes, these directories will be 
unnecessarily hanging.
didn't look into the code yet.

> Session level metastore
> ---
>
> Key: HIVE-805
> URL: https://issues.apache.org/jira/browse/HIVE-805
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.2.0
>Reporter: Ashish Thusoo
>Assignee: Ashish Thusoo
> Fix For: 0.5.0
>
> Attachments: HIVE-805.patch
>
>
> Implement a shadow metastore that is in memory and runs for a session. This 
> can contain definitions for session specific views that can be used to 
> implement data flow variables in Hive. It can also be used for testing 
> scripts. First we will support the later use case where in all the DDL 
> statements in the session create objects in the session metastore and all the 
> queries are converted to explain internal. Any thoughts on load commands?
> This feature is enabled when
> set hive.session.test = true
> is done in the session.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-805) Session level metastore

2009-08-26 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12748233#action_12748233
 ] 

Prasad Chakka commented on HIVE-805:


what is the HDFS location for tables in session level metastore? is it the same 
location as regular table?

> Session level metastore
> ---
>
> Key: HIVE-805
> URL: https://issues.apache.org/jira/browse/HIVE-805
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.2.0
>Reporter: Ashish Thusoo
>Assignee: Ashish Thusoo
> Fix For: 0.5.0
>
> Attachments: HIVE-805.patch
>
>
> Implement a shadow metastore that is in memory and runs for a session. This 
> can contain definitions for session specific views that can be used to 
> implement data flow variables in Hive. It can also be used for testing 
> scripts. First we will support the later use case where in all the DDL 
> statements in the session create objects in the session metastore and all the 
> queries are converted to explain internal. Any thoughts on load commands?
> This feature is enabled when
> set hive.session.test = true
> is done in the session.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-718) Load data inpath into a new partition without overwrite does not move the file

2009-08-26 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12748104#action_12748104
 ] 

Prasad Chakka commented on HIVE-718:


@todd
dhruba said he would be willing to provide a 'mkdir' method that would mimic 
the POSIX mkdir() fn. But in the meantime i think the algorithm with atomic 
directory renames would be sufficient. we can try to generate a pseudo random 
number and check for that directory to exist before creating it. Since we are 
generating pseudo random numbers, the probability that two different loads into 
same directory will be generating same random number and checking for existence 
at the same time is pretty low. 

> Load data inpath into a new partition without overwrite does not move the file
> --
>
> Key: HIVE-718
> URL: https://issues.apache.org/jira/browse/HIVE-718
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
> Attachments: HIVE-718.1.patch, HIVE-718.2.patch, hive-718.txt
>
>
> The bug can be reproduced as following. Note that it only happens for 
> partitioned tables. The select after the first load returns nothing, while 
> the second returns the data correctly.
> insert.txt in the current local directory contains 3 lines: "a", "b" and "c".
> {code}
> > create table tmp_insert_test (value string) stored as textfile;
> > load data local inpath 'insert.txt' into table tmp_insert_test;
> > select * from tmp_insert_test;
> a
> b
> c
> > create table tmp_insert_test_p ( value string) partitioned by (ds string) 
> > stored as textfile;
> > load data local inpath 'insert.txt' into table tmp_insert_test_p partition 
> > (ds = '2009-08-01');
> > select * from tmp_insert_test_p where ds= '2009-08-01';
> > load data local inpath 'insert.txt' into table tmp_insert_test_p partition 
> > (ds = '2009-08-01');
> > select * from tmp_insert_test_p where ds= '2009-08-01';
> a   2009-08-01
> b   2009-08-01
> d   2009-08-01
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-718) Load data inpath into a new partition without overwrite does not move the file

2009-08-20 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745672#action_12745672
 ] 

Prasad Chakka commented on HIVE-718:


continuing the ugly part, you can do some sort of leases instead of locks using 
files. create a file uniquely named with the current hour so lease will be held 
for an hour atmost. there might be lot of files hanging around but they can be 
created in /tmp which will get cleaned.

another option is, ashish is implementing session level temporary tables in 
metastore. you can use them as locks without worrying them existing after the 
session.

> Load data inpath into a new partition without overwrite does not move the file
> --
>
> Key: HIVE-718
> URL: https://issues.apache.org/jira/browse/HIVE-718
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
> Attachments: HIVE-718.1.patch, HIVE-718.2.patch, hive-718.txt
>
>
> The bug can be reproduced as following. Note that it only happens for 
> partitioned tables. The select after the first load returns nothing, while 
> the second returns the data correctly.
> insert.txt in the current local directory contains 3 lines: "a", "b" and "c".
> {code}
> > create table tmp_insert_test (value string) stored as textfile;
> > load data local inpath 'insert.txt' into table tmp_insert_test;
> > select * from tmp_insert_test;
> a
> b
> c
> > create table tmp_insert_test_p ( value string) partitioned by (ds string) 
> > stored as textfile;
> > load data local inpath 'insert.txt' into table tmp_insert_test_p partition 
> > (ds = '2009-08-01');
> > select * from tmp_insert_test_p where ds= '2009-08-01';
> > load data local inpath 'insert.txt' into table tmp_insert_test_p partition 
> > (ds = '2009-08-01');
> > select * from tmp_insert_test_p where ds= '2009-08-01';
> a   2009-08-01
> b   2009-08-01
> d   2009-08-01
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-718) Load data inpath into a new partition without overwrite does not move the file

2009-08-20 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745569#action_12745569
 ] 

Prasad Chakka commented on HIVE-718:


Two statements issue load into same partition around the same time. For one, 
dirExisted will be true and false for the other. Suppose the 'false' stmt 
copies a file named 'a1' first and then 'true' stmt will fail if it copies the 
same file. so it will try to undo the previous copies and then delete the dir. 
But the 'false' stmt keeps copying the files blissfully and succeed but the 
files 'a1' and others that were copied before the 'true' stmt deleted the 
directory will not be there but there won't be an error for the 'false' stmt.

hoping my writing is understandable enough :)

> Load data inpath into a new partition without overwrite does not move the file
> --
>
> Key: HIVE-718
> URL: https://issues.apache.org/jira/browse/HIVE-718
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
> Attachments: HIVE-718.1.patch, HIVE-718.2.patch, hive-718.txt
>
>
> The bug can be reproduced as following. Note that it only happens for 
> partitioned tables. The select after the first load returns nothing, while 
> the second returns the data correctly.
> insert.txt in the current local directory contains 3 lines: "a", "b" and "c".
> {code}
> > create table tmp_insert_test (value string) stored as textfile;
> > load data local inpath 'insert.txt' into table tmp_insert_test;
> > select * from tmp_insert_test;
> a
> b
> c
> > create table tmp_insert_test_p ( value string) partitioned by (ds string) 
> > stored as textfile;
> > load data local inpath 'insert.txt' into table tmp_insert_test_p partition 
> > (ds = '2009-08-01');
> > select * from tmp_insert_test_p where ds= '2009-08-01';
> > load data local inpath 'insert.txt' into table tmp_insert_test_p partition 
> > (ds = '2009-08-01');
> > select * from tmp_insert_test_p where ds= '2009-08-01';
> a   2009-08-01
> b   2009-08-01
> d   2009-08-01
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-718) Load data inpath into a new partition without overwrite does not move the file

2009-08-20 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745548#action_12745548
 ] 

Prasad Chakka commented on HIVE-718:


'overwrite' path has less of an issue in the sense that only one of two 
competing statements will win out. the resulting directory will not contain 
some files from first statement and some from the second statement. (this 
assuming probability of two statements creating same random tmp directory is 
very less)

my concern in this case is that, it is possible to corrupt the existing 
partition with only a part of new files and overwrite some of the old files and 
user has no way of knowing that such a thing has happened and it may not 
possible to recover the data.

but if you guys think the current patch is no worse than the existing solution, 
i  do not have a problem.

> Load data inpath into a new partition without overwrite does not move the file
> --
>
> Key: HIVE-718
> URL: https://issues.apache.org/jira/browse/HIVE-718
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
> Attachments: HIVE-718.1.patch, HIVE-718.2.patch, hive-718.txt
>
>
> The bug can be reproduced as following. Note that it only happens for 
> partitioned tables. The select after the first load returns nothing, while 
> the second returns the data correctly.
> insert.txt in the current local directory contains 3 lines: "a", "b" and "c".
> {code}
> > create table tmp_insert_test (value string) stored as textfile;
> > load data local inpath 'insert.txt' into table tmp_insert_test;
> > select * from tmp_insert_test;
> a
> b
> c
> > create table tmp_insert_test_p ( value string) partitioned by (ds string) 
> > stored as textfile;
> > load data local inpath 'insert.txt' into table tmp_insert_test_p partition 
> > (ds = '2009-08-01');
> > select * from tmp_insert_test_p where ds= '2009-08-01';
> > load data local inpath 'insert.txt' into table tmp_insert_test_p partition 
> > (ds = '2009-08-01');
> > select * from tmp_insert_test_p where ds= '2009-08-01';
> a   2009-08-01
> b   2009-08-01
> d   2009-08-01
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-718) Load data inpath into a new partition without overwrite does not move the file

2009-08-17 Thread Prasad Chakka (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744138#action_12744138
 ] 

Prasad Chakka commented on HIVE-718:


lacking atomicity can lead to silent bugs which are unacceptable in production 
systems.

> Load data inpath into a new partition without overwrite does not move the file
> --
>
> Key: HIVE-718
> URL: https://issues.apache.org/jira/browse/HIVE-718
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
> Attachments: HIVE-718.1.patch, HIVE-718.2.patch, hive-718.txt
>
>
> The bug can be reproduced as following. Note that it only happens for 
> partitioned tables. The select after the first load returns nothing, while 
> the second returns the data correctly.
> insert.txt in the current local directory contains 3 lines: "a", "b" and "c".
> {code}
> > create table tmp_insert_test (value string) stored as textfile;
> > load data local inpath 'insert.txt' into table tmp_insert_test;
> > select * from tmp_insert_test;
> a
> b
> c
> > create table tmp_insert_test_p ( value string) partitioned by (ds string) 
> > stored as textfile;
> > load data local inpath 'insert.txt' into table tmp_insert_test_p partition 
> > (ds = '2009-08-01');
> > select * from tmp_insert_test_p where ds= '2009-08-01';
> > load data local inpath 'insert.txt' into table tmp_insert_test_p partition 
> > (ds = '2009-08-01');
> > select * from tmp_insert_test_p where ds= '2009-08-01';
> a   2009-08-01
> b   2009-08-01
> d   2009-08-01
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

1 2 3 4 5 6 >

1 - 100 of 550 matches

Mail list logo