[jira] [Commented] (SQOOP-489) Cannot define partition keys for Hive tables created through Sqoop

2012-05-24 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13283145#comment-13283145
 ] 

Cheolsoo Park commented on SQOOP-489:
-

Thanks for filing a jira Kate.

To fix the issue, I suggest that we exclude the column that is specified by 
partition key from the column definitions if any partition key is specified.

i.e.
{code}
CREATE TABLE IF NOT EXISTS `FOO` (`I` STRING, `J` STRING) PARTITIONED BY (I 
STRING)
{code}

=>

{code}
CREATE TABLE IF NOT EXISTS `FOO` (`J` STRING) PARTITIONED BY (I STRING)
{code}

I also thought about adding the --columns option to hive-create-table, but this 
seems less preferable because this will force the user to specify all the 
columns except partition key in the command to import table. 

Please let me know if anyone has a better suggestion.

Another note that I'd like to add is that in my experiment, the current query 
sometimes works in Hive while it sometimes doesn't. So you might find it work 
for you. Nevertheless, I confirmed with a Hive developer that this should be a 
syntax error, so we should not rely on this Hive behavior.

> Cannot define partition keys for Hive tables created through Sqoop
> --
>
> Key: SQOOP-489
> URL: https://issues.apache.org/jira/browse/SQOOP-489
> Project: Sqoop
>  Issue Type: Bug
>Affects Versions: 1.4.1-incubating
>Reporter: Kathleen Ting
>
> By enabling the table option, Sqoop includes every column in the table in the 
> create table query, and by enabling the hive-partition-key option, Sqoop 
> blindly appends the "partitioned by" clause. Now if you specify one of 
> columns in the table in the hive-partition-key, this will cause a syntax 
> error in Hive.
> For example, if we have a table 'FOO' that has columns 'I' and 'J':
> sqoop create-hive-table --table FOO ...
> will generate the following Hive query:
> CREATE TABLE IF NOT EXISTS `FOO` ( `I` STRING, `J` STRING)
> Now if we add "--hive-partition-key I" to the command, Sqoop generates the 
> following query:
> CREATE TABLE IF NOT EXISTS `FOO` ( `I` STRING, `J` STRING) PARTITIONED BY (I 
> STRING)
> The problem is that since 'I' is defined twice (once in CRATE TABLE and once 
> in PARTITIONED BY), this is a syntax error in Hive.
> This correct query would be something like:
> CREATE TABLE IF NOT EXISTS `FOO` (`J` STRING) PARTITIONED BY (I STRING)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (SQOOP-489) Cannot define partition keys for Hive tables created through Sqoop

2012-05-24 Thread Jarek Jarcec Cecho (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13283148#comment-13283148
 ] 

Jarek Jarcec Cecho commented on SQOOP-489:
--

I believe that adjusting generated "Create table" statement is only half of the 
job. We also need to stop importing those columns, because otherwise we would 
end up with different number of columns in imported file and in hive which 
might lead to a huge confusion on user side.

Jarcec

> Cannot define partition keys for Hive tables created through Sqoop
> --
>
> Key: SQOOP-489
> URL: https://issues.apache.org/jira/browse/SQOOP-489
> Project: Sqoop
>  Issue Type: Bug
>Affects Versions: 1.4.1-incubating
>Reporter: Kathleen Ting
>
> By enabling the table option, Sqoop includes every column in the table in the 
> create table query, and by enabling the hive-partition-key option, Sqoop 
> blindly appends the "partitioned by" clause. Now if you specify one of 
> columns in the table in the hive-partition-key, this will cause a syntax 
> error in Hive.
> For example, if we have a table 'FOO' that has columns 'I' and 'J':
> sqoop create-hive-table --table FOO ...
> will generate the following Hive query:
> CREATE TABLE IF NOT EXISTS `FOO` ( `I` STRING, `J` STRING)
> Now if we add "--hive-partition-key I" to the command, Sqoop generates the 
> following query:
> CREATE TABLE IF NOT EXISTS `FOO` ( `I` STRING, `J` STRING) PARTITIONED BY (I 
> STRING)
> The problem is that since 'I' is defined twice (once in CRATE TABLE and once 
> in PARTITIONED BY), this is a syntax error in Hive.
> This correct query would be something like:
> CREATE TABLE IF NOT EXISTS `FOO` (`J` STRING) PARTITIONED BY (I STRING)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (SQOOP-489) Cannot define partition keys for Hive tables created through Sqoop

2012-05-24 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13283156#comment-13283156
 ] 

Cheolsoo Park commented on SQOOP-489:
-

Hello Jarcec,

Thank you for your comment. I am sorry, but I seems a bit confused about your 
concern. Even excluding the partition key from the create table statement, hive 
table will still have that column, doesn't it? So this change only applies to 
the statement but not applied to the table itself.

Also, unless getColumnNames() is modified, it shouldn't effect the number of 
columns in import job. Am I missing something here? Please correct me if I am 
wrong.

> Cannot define partition keys for Hive tables created through Sqoop
> --
>
> Key: SQOOP-489
> URL: https://issues.apache.org/jira/browse/SQOOP-489
> Project: Sqoop
>  Issue Type: Bug
>Affects Versions: 1.4.1-incubating
>Reporter: Kathleen Ting
>
> By enabling the table option, Sqoop includes every column in the table in the 
> create table query, and by enabling the hive-partition-key option, Sqoop 
> blindly appends the "partitioned by" clause. Now if you specify one of 
> columns in the table in the hive-partition-key, this will cause a syntax 
> error in Hive.
> For example, if we have a table 'FOO' that has columns 'I' and 'J':
> sqoop create-hive-table --table FOO ...
> will generate the following Hive query:
> CREATE TABLE IF NOT EXISTS `FOO` ( `I` STRING, `J` STRING)
> Now if we add "--hive-partition-key I" to the command, Sqoop generates the 
> following query:
> CREATE TABLE IF NOT EXISTS `FOO` ( `I` STRING, `J` STRING) PARTITIONED BY (I 
> STRING)
> The problem is that since 'I' is defined twice (once in CRATE TABLE and once 
> in PARTITIONED BY), this is a syntax error in Hive.
> This correct query would be something like:
> CREATE TABLE IF NOT EXISTS `FOO` (`J` STRING) PARTITIONED BY (I STRING)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (SQOOP-489) Cannot define partition keys for Hive tables created through Sqoop

2012-05-24 Thread Jarek Jarcec Cecho (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13283168#comment-13283168
 ] 

Jarek Jarcec Cecho commented on SQOOP-489:
--

I wasn't checking sources so I might be wrong here. But let me explain what I 
was afraid of on example:

Imagine table with three columns A, B, C with one row - 1, 2, 3. User specify 
that A should be used as partitioned column. Then after applying suggested fix 
we will create following create table statement:

CREATE TABLE IF NOT EXISTS `FOO` (`B` STRING, `C` STRING) PARTITIONED BY (A 
STRING)

But sqoop import will by default import all columns and thus resulting in file 
"1, 2, 3". Now when hive will read the file column B will contain value 1 (e.g. 
value of A), column C will contain value 2 (value of column B) and value 3 for 
column C will be missing entirely from hive perspective. Again I did no 
research nor try it in practice. It's solely based on my experience and I 
wanted to express my worries prior any coding. 

Is it more clear now?

Jarcec

> Cannot define partition keys for Hive tables created through Sqoop
> --
>
> Key: SQOOP-489
> URL: https://issues.apache.org/jira/browse/SQOOP-489
> Project: Sqoop
>  Issue Type: Bug
>Affects Versions: 1.4.1-incubating
>Reporter: Kathleen Ting
> Attachments: SQOOP-489.patch
>
>
> By enabling the table option, Sqoop includes every column in the table in the 
> create table query, and by enabling the hive-partition-key option, Sqoop 
> blindly appends the "partitioned by" clause. Now if you specify one of 
> columns in the table in the hive-partition-key, this will cause a syntax 
> error in Hive.
> For example, if we have a table 'FOO' that has columns 'I' and 'J':
> sqoop create-hive-table --table FOO ...
> will generate the following Hive query:
> CREATE TABLE IF NOT EXISTS `FOO` ( `I` STRING, `J` STRING)
> Now if we add "--hive-partition-key I" to the command, Sqoop generates the 
> following query:
> CREATE TABLE IF NOT EXISTS `FOO` ( `I` STRING, `J` STRING) PARTITIONED BY (I 
> STRING)
> The problem is that since 'I' is defined twice (once in CRATE TABLE and once 
> in PARTITIONED BY), this is a syntax error in Hive.
> This correct query would be something like:
> CREATE TABLE IF NOT EXISTS `FOO` (`J` STRING) PARTITIONED BY (I STRING)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (SQOOP-489) Cannot define partition keys for Hive tables created through Sqoop

2012-05-25 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13283627#comment-13283627
 ] 

Cheolsoo Park commented on SQOOP-489:
-

Hi Jarcec, that is a good point. The order of columns could be a problem. I 
will verify this and get back to you.

Thank you for the explanation!

> Cannot define partition keys for Hive tables created through Sqoop
> --
>
> Key: SQOOP-489
> URL: https://issues.apache.org/jira/browse/SQOOP-489
> Project: Sqoop
>  Issue Type: Bug
>Affects Versions: 1.4.1-incubating
>Reporter: Kathleen Ting
> Attachments: SQOOP-489.patch
>
>
> By enabling the table option, Sqoop includes every column in the table in the 
> create table query, and by enabling the hive-partition-key option, Sqoop 
> blindly appends the "partitioned by" clause. Now if you specify one of 
> columns in the table in the hive-partition-key, this will cause a syntax 
> error in Hive.
> For example, if we have a table 'FOO' that has columns 'I' and 'J':
> sqoop create-hive-table --table FOO ...
> will generate the following Hive query:
> CREATE TABLE IF NOT EXISTS `FOO` ( `I` STRING, `J` STRING)
> Now if we add "--hive-partition-key I" to the command, Sqoop generates the 
> following query:
> CREATE TABLE IF NOT EXISTS `FOO` ( `I` STRING, `J` STRING) PARTITIONED BY (I 
> STRING)
> The problem is that since 'I' is defined twice (once in CRATE TABLE and once 
> in PARTITIONED BY), this is a syntax error in Hive.
> This correct query would be something like:
> CREATE TABLE IF NOT EXISTS `FOO` (`J` STRING) PARTITIONED BY (I STRING)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (SQOOP-489) Cannot define partition keys for Hive tables created through Sqoop

2012-05-25 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13283859#comment-13283859
 ] 

Cheolsoo Park commented on SQOOP-489:
-

Hi Jarcec, you're right. My patch screws up Hive view if I do Hive import.

Here is the result of my experiment:

1. Table foo in Oracle
{code}
SQL> select * from foo;

 I  J  K
-- -- --
 1  2  3
{code}

2. Sqoop command
{code}
sqoop import --verbose ... --table SQOOPTEST.FOO -m 1 --hive-import 
--hive-table FOO --hive-partition-key I --hive-partition-value 1.0
{code}

3. Output file in Hdfs
{code}
1,2,3
{code}

4. Hive view
{code}
hive> select * from foo;
OK
1.0 2.0 1.0
{code}

5. Hive table definition
{code}
hive> SHOW TABLE EXTENDED  LIKE foo;
OK
tableName:foo
owner:cheolsoo
location:hdfs://localhost/user/hive/warehouse/foo
inputformat:org.apache.hadoop.mapred.TextInputFormat
outputformat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
columns:struct columns { double j, double k}
partitioned:true
partitionColumns:struct partition_columns { string i}
totalNumberFiles:2
totalFileSize:6
maxFileSize:6
minFileSize:0
lastAccessTime:1337993171274
lastUpdateTime:1337993176838
{code}

> Cannot define partition keys for Hive tables created through Sqoop
> --
>
> Key: SQOOP-489
> URL: https://issues.apache.org/jira/browse/SQOOP-489
> Project: Sqoop
>  Issue Type: Bug
>Affects Versions: 1.4.1-incubating
>Reporter: Kathleen Ting
> Attachments: SQOOP-489.patch
>
>
> By enabling the table option, Sqoop includes every column in the table in the 
> create table query, and by enabling the hive-partition-key option, Sqoop 
> blindly appends the "partitioned by" clause. Now if you specify one of 
> columns in the table in the hive-partition-key, this will cause a syntax 
> error in Hive.
> For example, if we have a table 'FOO' that has columns 'I' and 'J':
> sqoop create-hive-table --table FOO ...
> will generate the following Hive query:
> CREATE TABLE IF NOT EXISTS `FOO` ( `I` STRING, `J` STRING)
> Now if we add "--hive-partition-key I" to the command, Sqoop generates the 
> following query:
> CREATE TABLE IF NOT EXISTS `FOO` ( `I` STRING, `J` STRING) PARTITIONED BY (I 
> STRING)
> The problem is that since 'I' is defined twice (once in CRATE TABLE and once 
> in PARTITIONED BY), this is a syntax error in Hive.
> This correct query would be something like:
> CREATE TABLE IF NOT EXISTS `FOO` (`J` STRING) PARTITIONED BY (I STRING)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (SQOOP-489) Cannot define partition keys for Hive tables created through Sqoop

2012-05-25 Thread Jarek Jarcec Cecho (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13283912#comment-13283912
 ] 

Jarek Jarcec Cecho commented on SQOOP-489:
--

I also do have one concern with automatic fix. This time, let me start with 
example :-)

Let's consider table with three columns A, B and C with two rows - "1, 2, 3" 
and "4, 5, 6". Also let say that we applied suggested fix and also solve the 
issue with importing all columns. Sqoop command would contain --hive-partition 
A --hive-value 666. Then file on HDFS  would probably contain two rows - "2, 3" 
and "5, 6". View on hive would be:

B  C  A
2  3  666
5  6  666

I do not see an issue with changing column ordering, however I'm concerned that 
we silently ignored one column in the table and replaced it with constant value 
from command line -- both values "1" and "4" from original table were lost.

I do not have easy solution to address this problem and therefore I would 
suggest to fix this JIRA by gracefully exiting in case that user will specify 
partition column as one of the exported columns.

Jarcec

> Cannot define partition keys for Hive tables created through Sqoop
> --
>
> Key: SQOOP-489
> URL: https://issues.apache.org/jira/browse/SQOOP-489
> Project: Sqoop
>  Issue Type: Bug
>Affects Versions: 1.4.1-incubating
>Reporter: Kathleen Ting
>Assignee: Cheolsoo Park
> Attachments: SQOOP-489.patch
>
>
> By enabling the table option, Sqoop includes every column in the table in the 
> create table query, and by enabling the hive-partition-key option, Sqoop 
> blindly appends the "partitioned by" clause. Now if you specify one of 
> columns in the table in the hive-partition-key, this will cause a syntax 
> error in Hive.
> For example, if we have a table 'FOO' that has columns 'I' and 'J':
> sqoop create-hive-table --table FOO ...
> will generate the following Hive query:
> CREATE TABLE IF NOT EXISTS `FOO` ( `I` STRING, `J` STRING)
> Now if we add "--hive-partition-key I" to the command, Sqoop generates the 
> following query:
> CREATE TABLE IF NOT EXISTS `FOO` ( `I` STRING, `J` STRING) PARTITIONED BY (I 
> STRING)
> The problem is that since 'I' is defined twice (once in CRATE TABLE and once 
> in PARTITIONED BY), this is a syntax error in Hive.
> This correct query would be something like:
> CREATE TABLE IF NOT EXISTS `FOO` (`J` STRING) PARTITIONED BY (I STRING)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (SQOOP-489) Cannot define partition keys for Hive tables created through Sqoop

2012-05-29 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284777#comment-13284777
 ] 

Cheolsoo Park commented on SQOOP-489:
-

Hi Jarcec, thank you for your kind explanation. Examples always help. :-)

{quote}
I'm concerned that we silently ignored one column in the table and replaced it 
with constant value from command line – both values "1" and "4" from original 
table were lost.
{quote}

I have a quick question. The case that you're describing can be reproduced even 
now with '--columns B,C --hive-partition A --hive-partition-value 666', right? 
In this case, we're also replacing the column A with a constant value. Are you 
assuming that this is OK because the user _explicitly_ specified what he/she is 
doing?

After all, what I am debating is whether or not the Hive table always has to 
mirror the source table that it is importing from.

> Cannot define partition keys for Hive tables created through Sqoop
> --
>
> Key: SQOOP-489
> URL: https://issues.apache.org/jira/browse/SQOOP-489
> Project: Sqoop
>  Issue Type: Bug
>Affects Versions: 1.4.1-incubating
>Reporter: Kathleen Ting
>Assignee: Cheolsoo Park
> Attachments: SQOOP-489.patch
>
>
> By enabling the table option, Sqoop includes every column in the table in the 
> create table query, and by enabling the hive-partition-key option, Sqoop 
> blindly appends the "partitioned by" clause. Now if you specify one of 
> columns in the table in the hive-partition-key, this will cause a syntax 
> error in Hive.
> For example, if we have a table 'FOO' that has columns 'I' and 'J':
> sqoop create-hive-table --table FOO ...
> will generate the following Hive query:
> CREATE TABLE IF NOT EXISTS `FOO` ( `I` STRING, `J` STRING)
> Now if we add "--hive-partition-key I" to the command, Sqoop generates the 
> following query:
> CREATE TABLE IF NOT EXISTS `FOO` ( `I` STRING, `J` STRING) PARTITIONED BY (I 
> STRING)
> The problem is that since 'I' is defined twice (once in CRATE TABLE and once 
> in PARTITIONED BY), this is a syntax error in Hive.
> This correct query would be something like:
> CREATE TABLE IF NOT EXISTS `FOO` (`J` STRING) PARTITIONED BY (I STRING)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (SQOOP-489) Cannot define partition keys for Hive tables created through Sqoop

2012-05-29 Thread Jarek Jarcec Cecho (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284783#comment-13284783
 ] 

Jarek Jarcec Cecho commented on SQOOP-489:
--

Yeah, I figured that it would be better to explain that on example with my 
English level :-)

To answer your question - yes exactly. If user will do '--columns B,C 
--hive-partition A --hive-partition-value 666' than it's his personal 
responsibility to know what he is doing. I'm afraid that do something similar 
behind his back might create confusion to some users.

Jarcec

> Cannot define partition keys for Hive tables created through Sqoop
> --
>
> Key: SQOOP-489
> URL: https://issues.apache.org/jira/browse/SQOOP-489
> Project: Sqoop
>  Issue Type: Bug
>Affects Versions: 1.4.1-incubating
>Reporter: Kathleen Ting
>Assignee: Cheolsoo Park
> Attachments: SQOOP-489.patch
>
>
> By enabling the table option, Sqoop includes every column in the table in the 
> create table query, and by enabling the hive-partition-key option, Sqoop 
> blindly appends the "partitioned by" clause. Now if you specify one of 
> columns in the table in the hive-partition-key, this will cause a syntax 
> error in Hive.
> For example, if we have a table 'FOO' that has columns 'I' and 'J':
> sqoop create-hive-table --table FOO ...
> will generate the following Hive query:
> CREATE TABLE IF NOT EXISTS `FOO` ( `I` STRING, `J` STRING)
> Now if we add "--hive-partition-key I" to the command, Sqoop generates the 
> following query:
> CREATE TABLE IF NOT EXISTS `FOO` ( `I` STRING, `J` STRING) PARTITIONED BY (I 
> STRING)
> The problem is that since 'I' is defined twice (once in CRATE TABLE and once 
> in PARTITIONED BY), this is a syntax error in Hive.
> This correct query would be something like:
> CREATE TABLE IF NOT EXISTS `FOO` (`J` STRING) PARTITIONED BY (I STRING)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (SQOOP-489) Cannot define partition keys for Hive tables created through Sqoop

2012-05-29 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13285029#comment-13285029
 ] 

Cheolsoo Park commented on SQOOP-489:
-

OK, thanks for clarification, Jarcec. I agree with you.

I think that there are two cases that we have to deal with:

1) sqoop create-hive-table --table FOO --hive-partition-key X
2) sqoop import --hive-import --table FOO --hive-partition-key X

Following Jarcec's suggestion, the fix would be summarized as follows:

For 1),
 --hive-partition-key X is invalid if X is a column of table FOO. (Note that as 
of now, --columns is not available for create-hive-table.)

For 2), 
 i) if --columns is not specified, --hive-partition-key X is invalid if X is a 
column of table FOO.
 ii) if --columns is specified, --hive-partition-key X is invalid if X is 
specified in --columns.

Currently, Sqoop generates ill-formed Hive queries for these cases; however, 
Sqoop should rather print a proper warning message.

Please let me know if anyone has other opinions.

> Cannot define partition keys for Hive tables created through Sqoop
> --
>
> Key: SQOOP-489
> URL: https://issues.apache.org/jira/browse/SQOOP-489
> Project: Sqoop
>  Issue Type: Bug
>Affects Versions: 1.4.1-incubating
>Reporter: Kathleen Ting
>Assignee: Cheolsoo Park
> Attachments: SQOOP-489.patch
>
>
> By enabling the table option, Sqoop includes every column in the table in the 
> create table query, and by enabling the hive-partition-key option, Sqoop 
> blindly appends the "partitioned by" clause. Now if you specify one of 
> columns in the table in the hive-partition-key, this will cause a syntax 
> error in Hive.
> For example, if we have a table 'FOO' that has columns 'I' and 'J':
> sqoop create-hive-table --table FOO ...
> will generate the following Hive query:
> CREATE TABLE IF NOT EXISTS `FOO` ( `I` STRING, `J` STRING)
> Now if we add "--hive-partition-key I" to the command, Sqoop generates the 
> following query:
> CREATE TABLE IF NOT EXISTS `FOO` ( `I` STRING, `J` STRING) PARTITIONED BY (I 
> STRING)
> The problem is that since 'I' is defined twice (once in CRATE TABLE and once 
> in PARTITIONED BY), this is a syntax error in Hive.
> This correct query would be something like:
> CREATE TABLE IF NOT EXISTS `FOO` (`J` STRING) PARTITIONED BY (I STRING)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (SQOOP-489) Cannot define partition keys for Hive tables created through Sqoop

2012-05-29 Thread Jarek Jarcec Cecho (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13285121#comment-13285121
 ] 

Jarek Jarcec Cecho commented on SQOOP-489:
--

Cheolsoo: +1

Jarcec

> Cannot define partition keys for Hive tables created through Sqoop
> --
>
> Key: SQOOP-489
> URL: https://issues.apache.org/jira/browse/SQOOP-489
> Project: Sqoop
>  Issue Type: Bug
>Affects Versions: 1.4.1-incubating
>Reporter: Kathleen Ting
>Assignee: Cheolsoo Park
> Attachments: SQOOP-489.patch
>
>
> By enabling the table option, Sqoop includes every column in the table in the 
> create table query, and by enabling the hive-partition-key option, Sqoop 
> blindly appends the "partitioned by" clause. Now if you specify one of 
> columns in the table in the hive-partition-key, this will cause a syntax 
> error in Hive.
> For example, if we have a table 'FOO' that has columns 'I' and 'J':
> sqoop create-hive-table --table FOO ...
> will generate the following Hive query:
> CREATE TABLE IF NOT EXISTS `FOO` ( `I` STRING, `J` STRING)
> Now if we add "--hive-partition-key I" to the command, Sqoop generates the 
> following query:
> CREATE TABLE IF NOT EXISTS `FOO` ( `I` STRING, `J` STRING) PARTITIONED BY (I 
> STRING)
> The problem is that since 'I' is defined twice (once in CRATE TABLE and once 
> in PARTITIONED BY), this is a syntax error in Hive.
> This correct query would be something like:
> CREATE TABLE IF NOT EXISTS `FOO` (`J` STRING) PARTITIONED BY (I STRING)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (SQOOP-489) Cannot define partition keys for Hive tables created through Sqoop

2012-05-29 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13285353#comment-13285353
 ] 

Cheolsoo Park commented on SQOOP-489:
-

Review board:
https://reviews.apache.org/r/5291/

> Cannot define partition keys for Hive tables created through Sqoop
> --
>
> Key: SQOOP-489
> URL: https://issues.apache.org/jira/browse/SQOOP-489
> Project: Sqoop
>  Issue Type: Bug
>Affects Versions: 1.4.1-incubating
>Reporter: Kathleen Ting
>Assignee: Cheolsoo Park
> Attachments: SQOOP-489.patch
>
>
> By enabling the table option, Sqoop includes every column in the table in the 
> create table query, and by enabling the hive-partition-key option, Sqoop 
> blindly appends the "partitioned by" clause. Now if you specify one of 
> columns in the table in the hive-partition-key, this will cause a syntax 
> error in Hive.
> For example, if we have a table 'FOO' that has columns 'I' and 'J':
> sqoop create-hive-table --table FOO ...
> will generate the following Hive query:
> CREATE TABLE IF NOT EXISTS `FOO` ( `I` STRING, `J` STRING)
> Now if we add "--hive-partition-key I" to the command, Sqoop generates the 
> following query:
> CREATE TABLE IF NOT EXISTS `FOO` ( `I` STRING, `J` STRING) PARTITIONED BY (I 
> STRING)
> The problem is that since 'I' is defined twice (once in CRATE TABLE and once 
> in PARTITIONED BY), this is a syntax error in Hive.
> This correct query would be something like:
> CREATE TABLE IF NOT EXISTS `FOO` (`J` STRING) PARTITIONED BY (I STRING)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (SQOOP-489) Cannot define partition keys for Hive tables created through Sqoop

2012-05-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/SQOOP-489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13285971#comment-13285971
 ] 

Hudson commented on SQOOP-489:
--

Integrated in Sqoop-ant-jdk-1.6 #116 (See 
[https://builds.apache.org/job/Sqoop-ant-jdk-1.6/116/])
SQOOP-489. Cannot define partition keys for Hive tables created through 
Sqoop.

(Cheolsoo Park via Jarek Jarcec Cecho) (Revision 1344429)

 Result = SUCCESS
jarcec : 
Files : 
* /sqoop/trunk/src/java/org/apache/sqoop/hive/TableDefWriter.java
* /sqoop/trunk/src/test/com/cloudera/sqoop/hive/TestHiveImport.java


> Cannot define partition keys for Hive tables created through Sqoop
> --
>
> Key: SQOOP-489
> URL: https://issues.apache.org/jira/browse/SQOOP-489
> Project: Sqoop
>  Issue Type: Bug
>Affects Versions: 1.4.1-incubating
>Reporter: Kathleen Ting
>Assignee: Cheolsoo Park
> Attachments: SQOOP-489.patch, SQOOP-489.patch, SQOOP-489.patch
>
>
> By enabling the table option, Sqoop includes every column in the table in the 
> create table query, and by enabling the hive-partition-key option, Sqoop 
> blindly appends the "partitioned by" clause. Now if you specify one of 
> columns in the table in the hive-partition-key, this will cause a syntax 
> error in Hive.
> For example, if we have a table 'FOO' that has columns 'I' and 'J':
> sqoop create-hive-table --table FOO ...
> will generate the following Hive query:
> CREATE TABLE IF NOT EXISTS `FOO` ( `I` STRING, `J` STRING)
> Now if we add "--hive-partition-key I" to the command, Sqoop generates the 
> following query:
> CREATE TABLE IF NOT EXISTS `FOO` ( `I` STRING, `J` STRING) PARTITIONED BY (I 
> STRING)
> The problem is that since 'I' is defined twice (once in CRATE TABLE and once 
> in PARTITIONED BY), this is a syntax error in Hive.
> This correct query would be something like:
> CREATE TABLE IF NOT EXISTS `FOO` (`J` STRING) PARTITIONED BY (I STRING)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira