[jira] [Commented] (HIVE-11721) non-ascii characters shows improper with "insert into"

2015-10-20 Thread Aleksei S (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964706#comment-14964706
 ] 

Aleksei S commented on HIVE-11721:
--

Looks good now. TestCustomAuthentication failed because HS2 failed to start. 
The other 2 fail in other patches.

> non-ascii characters shows improper with "insert into"
> --
>
> Key: HIVE-11721
> URL: https://issues.apache.org/jira/browse/HIVE-11721
> Project: Hive
>  Issue Type: Bug
>  Components: Database/Schema
>Affects Versions: 1.1.0, 1.2.1, 2.0.0
>Reporter: Jun Yin
>Assignee: Aleksei S
> Attachments: HIVE-11721.1.patch, HIVE-11721.patch
>
>
> Hive: 1.1.0
> hive> create table char_255_noascii as select cast("Garçu 谢谢 Kôkaku 
> ありがとうございますkidôtai한국어" as char(255));
> hive> select * from char_255_noascii;
> OK
> Garçu 谢谢 Kôkaku ありがとうございますkidôtai>한국어
> it shows correct, and also it works good with "LOAD DATA" 
> but when I try another way to insert data as below:
> hive> create table nonascii(t1 char(255));
> OK
> Time taken: 0.125 seconds
> hive> insert into nonascii values("Garçu 谢谢 Kôkaku ありがとうございますkidôtai한국어");
> hive> select * from nonascii;
> OK
> Gar�u "" K�kaku B�LhFTVD~Ykid�tai\m� 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12210) Fix a few failing tests: testCliDriver_udf_explode and testCliDriver_udtf_explode

2015-10-19 Thread Aleksei S (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964415#comment-14964415
 ] 

Aleksei S commented on HIVE-12210:
--

Thanks [~pxiong]

> Fix a few failing tests: testCliDriver_udf_explode and 
> testCliDriver_udtf_explode
> -
>
> Key: HIVE-12210
> URL: https://issues.apache.org/jira/browse/HIVE-12210
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Aleksei S
>Assignee: Aleksei S
> Fix For: 2.0.0
>
> Attachments: HIVE-12210.patch
>
>
> The following tests fail after HIVE-11785 because of missing 
> "serialization.escape.crlf true" property in the output.
> {code}
> org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_explode
> org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udtf_explode
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11721) non-ascii characters shows improper with "insert into"

2015-10-19 Thread Aleksei S (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964413#comment-14964413
 ] 

Aleksei S commented on HIVE-11721:
--

The tests seem to have failed because there was an error initializing spark 
cluster:
{code}
java.lang.IllegalStateException: Timed out waiting for Spark cluster to init
{code}
Is it possible to rerun the tests?

> non-ascii characters shows improper with "insert into"
> --
>
> Key: HIVE-11721
> URL: https://issues.apache.org/jira/browse/HIVE-11721
> Project: Hive
>  Issue Type: Bug
>  Components: Database/Schema
>Affects Versions: 1.1.0, 1.2.1, 2.0.0
>Reporter: Jun Yin
>Assignee: Aleksei S
> Attachments: HIVE-11721.patch
>
>
> Hive: 1.1.0
> hive> create table char_255_noascii as select cast("Garçu 谢谢 Kôkaku 
> ありがとうございますkidôtai한국어" as char(255));
> hive> select * from char_255_noascii;
> OK
> Garçu 谢谢 Kôkaku ありがとうございますkidôtai>한국어
> it shows correct, and also it works good with "LOAD DATA" 
> but when I try another way to insert data as below:
> hive> create table nonascii(t1 char(255));
> OK
> Time taken: 0.125 seconds
> hive> insert into nonascii values("Garçu 谢谢 Kôkaku ありがとうございますkidôtai한국어");
> hive> select * from nonascii;
> OK
> Gar�u "" K�kaku B�LhFTVD~Ykid�tai\m� 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12213) Investigating the test failure TestHCatClient.testTableSchemaPropagation

2015-10-19 Thread Aleksei S (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksei S updated HIVE-12213:
-
Attachment: HIVE-12213.patch

I investigated the test and found that it's related to fast table stats update 
and it started failing after HIVE-10631. After more digging I found that it 
happens because of more stats being put into the table during its creation in 
target metastore which in turn is cause by a fact that directory for a table 
already existed during the second table creation (see test source).

As far as I understand "fast stats update" at table level should be done only 
for non-partitioned table. It's clear from the original patch in HIVE-3959 
where the method is even called "updateUnpartitionedTableStatsFast".

So, the fix I made is to do fast stats update only for non-partitioned tables 
during table creation. I can also put assert in "updateTableStatsFast()" to 
make sure it's called only for unpartitioned tables. Let me know if it's needed.

Note, that even though it fixes this test, it's possible to create a similar 
test with unpartitioned table which won't pass for exactly the same reason: 
directory will be existing in the second run and more stats will be put in (see 
line 229 in MetaStoreUtils with the check of !newDir). I cannot say it's a bug, 
because stats are correct in both cases, but it's more of a question whether 
zero values quick stats should be set for an empty table or not.


> Investigating the test failure TestHCatClient.testTableSchemaPropagation
> 
>
> Key: HIVE-12213
> URL: https://issues.apache.org/jira/browse/HIVE-12213
> Project: Hive
>  Issue Type: Test
>  Components: Test
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aleksei S
>Priority: Minor
> Attachments: HIVE-12213.patch
>
>
> The test has been failing for some time with following error.
> {noformat}
> Error Message
> Table after deserialization should have been identical to sourceTable. 
> expected:<[TABLE_PROPERTIES]> but was:<[]>
> Stacktrace
> java.lang.AssertionError: Table after deserialization should have been 
> identical to sourceTable. expected:<[TABLE_PROPERTIES]> but was:<[]>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at 
> org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation(TestHCatClient.java:1065)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11721) non-ascii characters shows improper with "insert into"

2015-10-19 Thread Aleksei S (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksei S updated HIVE-11721:
-
Attachment: HIVE-11721.1.patch

Rerunning the tests.

> non-ascii characters shows improper with "insert into"
> --
>
> Key: HIVE-11721
> URL: https://issues.apache.org/jira/browse/HIVE-11721
> Project: Hive
>  Issue Type: Bug
>  Components: Database/Schema
>Affects Versions: 1.1.0, 1.2.1, 2.0.0
>Reporter: Jun Yin
>Assignee: Aleksei S
> Attachments: HIVE-11721.1.patch, HIVE-11721.patch
>
>
> Hive: 1.1.0
> hive> create table char_255_noascii as select cast("Garçu 谢谢 Kôkaku 
> ありがとうございますkidôtai한국어" as char(255));
> hive> select * from char_255_noascii;
> OK
> Garçu 谢谢 Kôkaku ありがとうございますkidôtai>한국어
> it shows correct, and also it works good with "LOAD DATA" 
> but when I try another way to insert data as below:
> hive> create table nonascii(t1 char(255));
> OK
> Time taken: 0.125 seconds
> hive> insert into nonascii values("Garçu 谢谢 Kôkaku ありがとうございますkidôtai한국어");
> hive> select * from nonascii;
> OK
> Gar�u "" K�kaku B�LhFTVD~Ykid�tai\m� 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-12213) Investigating the test failure TestHCatClient.testTableSchemaPropagation

2015-10-19 Thread Aleksei S (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksei S reassigned HIVE-12213:


Assignee: Aleksei S

> Investigating the test failure TestHCatClient.testTableSchemaPropagation
> 
>
> Key: HIVE-12213
> URL: https://issues.apache.org/jira/browse/HIVE-12213
> Project: Hive
>  Issue Type: Test
>  Components: Test
>Reporter: Aihua Xu
>Assignee: Aleksei S
>Priority: Minor
>
> The test has been failing for some time with following error.
> {noformat}
> Error Message
> Table after deserialization should have been identical to sourceTable. 
> expected:<[TABLE_PROPERTIES]> but was:<[]>
> Stacktrace
> java.lang.AssertionError: Table after deserialization should have been 
> identical to sourceTable. expected:<[TABLE_PROPERTIES]> but was:<[]>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at 
> org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation(TestHCatClient.java:1065)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12210) Fix a few failing tests

2015-10-18 Thread Aleksei S (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksei S updated HIVE-12210:
-
Attachment: HIVE-12210.patch

> Fix a few failing tests
> ---
>
> Key: HIVE-12210
> URL: https://issues.apache.org/jira/browse/HIVE-12210
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Aleksei S
>Assignee: Aleksei S
> Attachments: HIVE-12210.patch
>
>
> The following tests fail after HIVE-11785 because of missing 
> "serialization.escape.crlf true" property in the output.
> {code}
> org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_explode
> org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udtf_explode
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12207) Query fails when non-ascii characters are used in string literals

2015-10-18 Thread Aleksei S (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksei S updated HIVE-12207:
-
Attachment: HIVE-12207.patch

The reason for the issue is that calcite uses latin encoding ('ISO-8859-1') by 
default. In order to pass non-latin characters they need to be converted to 
NlsString with explicit encoding set.

> Query fails when non-ascii characters are used in string literals
> -
>
> Key: HIVE-12207
> URL: https://issues.apache.org/jira/browse/HIVE-12207
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Aleksei S
>Assignee: Aleksei S
> Attachments: HIVE-12207.patch
>
>
> While debugging HIVE-11721 I found that using non-ascii characters in string 
> literals causes calcite planner to throw the following exception:
> {code}
> 2015-10-17T23:07:20,586 ERROR [main]: parse.CalcitePlanner 
> (CalcitePlanner.java:genOPTree(292)) - CBO failed, skipping CBO.
> org.apache.calcite.runtime.CalciteException: Failed to encode 'Абвгде' in 
> character set 'ISO-8859-1'
> {code}
> The query is:
> {code}
> select concat("Абвгде", "谢谢") from src limit 1;
> {code}
> Other queries with non-ascii literals fail as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11718) JDBC ResultSet.setFetchSize(0) returns no results

2015-10-18 Thread Aleksei S (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksei S updated HIVE-11718:
-
Attachment: HIVE-11718.patch

I created a patch for batch size validation and added a unit test for it.

> JDBC ResultSet.setFetchSize(0) returns no results
> -
>
> Key: HIVE-11718
> URL: https://issues.apache.org/jira/browse/HIVE-11718
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 1.2.1
>Reporter: Son Nguyen
> Attachments: HIVE-11718.patch
>
>
> Hi,
> According to JDBC document, the driver setFetchSize(0) should ignore, but 
> Hive JDBC driver returns no result.
> Our product uses setFetchSize to fine tune performance, sometimes we would 
> like to leave setFetchSize(0) up to the driver to make best guess of the 
> fetch size.
> Thanks
> Son Nguyen



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11721) non-ascii characters shows improper with "insert into"

2015-10-17 Thread Aleksei S (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksei S updated HIVE-11721:
-
Attachment: HIVE-11721.patch

I debugged the issue and found that the reason is that the contents of a 
virtual table is written as bytes while keeping only lower 8 bits, which 
doesn't work with non-ascii characters.
The fix is to create a Text object (which is used as a virtual table storage 
format) and encode values with it.

> non-ascii characters shows improper with "insert into"
> --
>
> Key: HIVE-11721
> URL: https://issues.apache.org/jira/browse/HIVE-11721
> Project: Hive
>  Issue Type: Bug
>  Components: Database/Schema
>Affects Versions: 1.1.0, 1.2.1, 2.0.0
>Reporter: Jun Yin
> Attachments: HIVE-11721.patch
>
>
> Hive: 1.1.0
> hive> create table char_255_noascii as select cast("Garçu 谢谢 Kôkaku 
> ありがとうございますkidôtai한국어" as char(255));
> hive> select * from char_255_noascii;
> OK
> Garçu 谢谢 Kôkaku ありがとうございますkidôtai>한국어
> it shows correct, and also it works good with "LOAD DATA" 
> but when I try another way to insert data as below:
> hive> create table nonascii(t1 char(255));
> OK
> Time taken: 0.125 seconds
> hive> insert into nonascii values("Garçu 谢谢 Kôkaku ありがとうございますkidôtai한국어");
> hive> select * from nonascii;
> OK
> Gar�u "" K�kaku B�LhFTVD~Ykid�tai\m� 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-11721) non-ascii characters shows improper with "insert into"

2015-10-17 Thread Aleksei S (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksei S reassigned HIVE-11721:


Assignee: Aleksei S

> non-ascii characters shows improper with "insert into"
> --
>
> Key: HIVE-11721
> URL: https://issues.apache.org/jira/browse/HIVE-11721
> Project: Hive
>  Issue Type: Bug
>  Components: Database/Schema
>Affects Versions: 1.1.0, 1.2.1, 2.0.0
>Reporter: Jun Yin
>Assignee: Aleksei S
> Attachments: HIVE-11721.patch
>
>
> Hive: 1.1.0
> hive> create table char_255_noascii as select cast("Garçu 谢谢 Kôkaku 
> ありがとうございますkidôtai한국어" as char(255));
> hive> select * from char_255_noascii;
> OK
> Garçu 谢谢 Kôkaku ありがとうございますkidôtai>한국어
> it shows correct, and also it works good with "LOAD DATA" 
> but when I try another way to insert data as below:
> hive> create table nonascii(t1 char(255));
> OK
> Time taken: 0.125 seconds
> hive> insert into nonascii values("Garçu 谢谢 Kôkaku ありがとうございますkidôtai한국어");
> hive> select * from nonascii;
> OK
> Gar�u "" K�kaku B�LhFTVD~Ykid�tai\m� 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)