[jira] [Commented] (IMPALA-8721) Wrong result when Impala reads a Hive written parquet TimeStamp column

2023-07-13 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-8721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17742844#comment-17742844
 ] 

ASF subversion and git services commented on IMPALA-8721:
-

Commit fbd8664b6b4d4b5d3df4290dc2309227803e245c in impala's branch 
refs/heads/master from Michael Smith
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=fbd8664b6 ]

IMPALA-12275: Read files written with DeflateCodec

DeflateCodec is an alias to DefaultCodec. Impala works with
DefaultCodec. Fixes reading files written with DeflateCodec.

DeflateCodec isn't an issue with text files because they don't include a
codec header. Sequence files do, which we check on decompress.

Moves TestTextInterop to a E2E test since it doesn't require any special
startup options and refactors out test running to be format-agnostic.
Updates text file test as IMPALA-8721 is fixed. Removes creating a table
in Impala for Hive to read, as it didn't test anything new. Adds tests
for sequence files; excludes reading zstd due to IMPALA-12276.

Testing:
- manual exhaustive run of updated tests

Change-Id: Id5ec1d0345ae35597f6aade9d8b9eef2257efeba
Reviewed-on: http://gerrit.cloudera.org:8080/20181
Reviewed-by: Joe McDonnell 
Tested-by: Michael Smith 


> Wrong result when Impala reads a Hive written parquet TimeStamp column
> --
>
> Key: IMPALA-8721
> URL: https://issues.apache.org/jira/browse/IMPALA-8721
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Abhishek Rawat
>Assignee: Tim Armstrong
>Priority: Critical
>  Labels: Interoperability, correctness, hive, impala, parquet, 
> timestamp
> Fix For: Impala 4.0.0
>
>
>  
> Easy to repro on latest upstream:
> {code:java}
> hive> create table t1_hive(c1 timestamp) stored as parquet;
> hive> insert into t1_hive values('2009-03-09 01:20:03.6');
> hive> select * from t1_hive;
> OK
> 2009-03-09 01:20:03.6
> [localhost:21000] default> invalidate metadata t1_hive;
> [localhost:21000] default> select * from t1_hive;
> Query: select * from t1_hive
> Query submitted at: 2019-06-24 09:55:36 (Coordinator: 
> http://optimus-prime:25000)
> Query progress can be monitored at: 
> http://optimus-prime:25000/query_plan?query_id=b34f85cb5da29c26:d4dfcb24
> +---+
> | c1 |
> +---+
> | 2009-03-09 09:20:03.6 |  +---+
> bin/start-impala-cluster.py 
> --impalad_args='-convert_legacy_hive_parquet_utc_timestamps=true'
> [localhost:21000] default> select * from t1_hive;
> Query: select * from t1_hive
> Query submitted at: 2019-06-24 10:00:22 (Coordinator: 
> http://optimus-prime:25000)
> Query progress can be monitored at: 
> http://optimus-prime:25000/query_plan?query_id=d5428bb21fb259b9:7b107034
> +---+
> | c1 |
> +---+
> | 2009-03-09 02:20:03.6 |. < +---+
>  
> {code}
>  
> This issue is causing testcase test_hive_impala_interop to fail. Untill this 
> issue is fixed, the testcase will be updated to not include a timestamp 
> column. The test case should be updated to include a timestamp column once 
> this issue is fixed.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8721) Wrong result when Impala reads a Hive written parquet TimeStamp column

2021-02-09 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-8721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282137#comment-17282137
 ] 

ASF subversion and git services commented on IMPALA-8721:
-

Commit 1f7b413d11321bd74aaa1a9ea9ed30e4d80d in impala's branch 
refs/heads/master from Tim Armstrong
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=1f7b413 ]

IMPALA-8721: re-enable test_hive_impala_interop

The test now passes because HIVE-21290 was fixed.

Revert "IMPALA-8689: test_hive_impala_interop failing with "Timeout >7200s""

This reverts commit 5d8c99ce74c45a7d04f11e1f252b346d654f02bf.

Change-Id: I7e2beabd7082a45a0fc3b60d318cf698079768ff
Reviewed-on: http://gerrit.cloudera.org:8080/17042
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Wrong result when Impala reads a Hive written parquet TimeStamp column
> --
>
> Key: IMPALA-8721
> URL: https://issues.apache.org/jira/browse/IMPALA-8721
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Abhishek Rawat
>Assignee: Tim Armstrong
>Priority: Critical
>  Labels: Interoperability, correctness, hive, impala, parquet, 
> timestamp
>
>  
> Easy to repro on latest upstream:
> {code:java}
> hive> create table t1_hive(c1 timestamp) stored as parquet;
> hive> insert into t1_hive values('2009-03-09 01:20:03.6');
> hive> select * from t1_hive;
> OK
> 2009-03-09 01:20:03.6
> [localhost:21000] default> invalidate metadata t1_hive;
> [localhost:21000] default> select * from t1_hive;
> Query: select * from t1_hive
> Query submitted at: 2019-06-24 09:55:36 (Coordinator: 
> http://optimus-prime:25000)
> Query progress can be monitored at: 
> http://optimus-prime:25000/query_plan?query_id=b34f85cb5da29c26:d4dfcb24
> +---+
> | c1 |
> +---+
> | 2009-03-09 09:20:03.6 |  +---+
> bin/start-impala-cluster.py 
> --impalad_args='-convert_legacy_hive_parquet_utc_timestamps=true'
> [localhost:21000] default> select * from t1_hive;
> Query: select * from t1_hive
> Query submitted at: 2019-06-24 10:00:22 (Coordinator: 
> http://optimus-prime:25000)
> Query progress can be monitored at: 
> http://optimus-prime:25000/query_plan?query_id=d5428bb21fb259b9:7b107034
> +---+
> | c1 |
> +---+
> | 2009-03-09 02:20:03.6 |. < +---+
>  
> {code}
>  
> This issue is causing testcase test_hive_impala_interop to fail. Untill this 
> issue is fixed, the testcase will be updated to not include a timestamp 
> column. The test case should be updated to include a timestamp column once 
> this issue is fixed.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8721) Wrong result when Impala reads a Hive written parquet TimeStamp column

2021-02-08 Thread Tim Armstrong (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-8721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17281518#comment-17281518
 ] 

Tim Armstrong commented on IMPALA-8721:
---

I think this was fixed by HIVE-21290 - the test passes now if I revert 
IMPALA-8689

> Wrong result when Impala reads a Hive written parquet TimeStamp column
> --
>
> Key: IMPALA-8721
> URL: https://issues.apache.org/jira/browse/IMPALA-8721
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Abhishek Rawat
>Assignee: Tim Armstrong
>Priority: Critical
>  Labels: Interoperability, correctness, hive, impala, parquet, 
> timestamp
>
>  
> Easy to repro on latest upstream:
> {code:java}
> hive> create table t1_hive(c1 timestamp) stored as parquet;
> hive> insert into t1_hive values('2009-03-09 01:20:03.6');
> hive> select * from t1_hive;
> OK
> 2009-03-09 01:20:03.6
> [localhost:21000] default> invalidate metadata t1_hive;
> [localhost:21000] default> select * from t1_hive;
> Query: select * from t1_hive
> Query submitted at: 2019-06-24 09:55:36 (Coordinator: 
> http://optimus-prime:25000)
> Query progress can be monitored at: 
> http://optimus-prime:25000/query_plan?query_id=b34f85cb5da29c26:d4dfcb24
> +---+
> | c1 |
> +---+
> | 2009-03-09 09:20:03.6 |  +---+
> bin/start-impala-cluster.py 
> --impalad_args='-convert_legacy_hive_parquet_utc_timestamps=true'
> [localhost:21000] default> select * from t1_hive;
> Query: select * from t1_hive
> Query submitted at: 2019-06-24 10:00:22 (Coordinator: 
> http://optimus-prime:25000)
> Query progress can be monitored at: 
> http://optimus-prime:25000/query_plan?query_id=d5428bb21fb259b9:7b107034
> +---+
> | c1 |
> +---+
> | 2009-03-09 02:20:03.6 |. < +---+
>  
> {code}
>  
> This issue is causing testcase test_hive_impala_interop to fail. Untill this 
> issue is fixed, the testcase will be updated to not include a timestamp 
> column. The test case should be updated to include a timestamp column once 
> this issue is fixed.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8721) Wrong result when Impala reads a Hive written parquet TimeStamp column

2019-07-01 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16876469#comment-16876469
 ] 

ASF subversion and git services commented on IMPALA-8721:
-

Commit 5d8c99ce74c45a7d04f11e1f252b346d654f02bf in impala's branch 
refs/heads/master from Abhishek Rawat
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=5d8c99c ]

IMPALA-8689: test_hive_impala_interop failing with "Timeout >7200s"

The newly added Hive<->Impala interop test fails due to unexpected
wrong results when reading TimeStamp column value written by Hive.
The short term measure is to remove TimeStamp column from the interop
tests. The original issue will be fixed by IMPALA-8721.

Testing: Ran the testcase N number of times on both upstream and
downstream code base.

Change-Id: I148c79a31f9aada1b75614390434462d1e483f28
Reviewed-on: http://gerrit.cloudera.org:8080/13755
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Wrong result when Impala reads a Hive written parquet TimeStamp column
> --
>
> Key: IMPALA-8721
> URL: https://issues.apache.org/jira/browse/IMPALA-8721
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Abhishek Rawat
>Priority: Major
>  Labels: Interoperability, hive, impala, parquet, timestamp
> Fix For: Impala 3.3.0
>
>
>  
> Easy to repro on latest upstream:
> {code:java}
> hive> create table t1_hive(c1 timestamp) stored as parquet;
> hive> insert into t1_hive values('2009-03-09 01:20:03.6');
> hive> select * from t1_hive;
> OK
> 2009-03-09 01:20:03.6
> [localhost:21000] default> invalidate metadata t1_hive;
> [localhost:21000] default> select * from t1_hive;
> Query: select * from t1_hive
> Query submitted at: 2019-06-24 09:55:36 (Coordinator: 
> http://optimus-prime:25000)
> Query progress can be monitored at: 
> http://optimus-prime:25000/query_plan?query_id=b34f85cb5da29c26:d4dfcb24
> +---+
> | c1 |
> +---+
> | 2009-03-09 09:20:03.6 |  +---+
> bin/start-impala-cluster.py 
> --impalad_args='-convert_legacy_hive_parquet_utc_timestamps=true'
> [localhost:21000] default> select * from t1_hive;
> Query: select * from t1_hive
> Query submitted at: 2019-06-24 10:00:22 (Coordinator: 
> http://optimus-prime:25000)
> Query progress can be monitored at: 
> http://optimus-prime:25000/query_plan?query_id=d5428bb21fb259b9:7b107034
> +---+
> | c1 |
> +---+
> | 2009-03-09 02:20:03.6 |. < +---+
>  
> {code}
>  
> This issue is causing testcase test_hive_impala_interop to fail. Untill this 
> issue is fixed, the testcase will be updated to not include a timestamp 
> column. The test case should be updated to include a timestamp column once 
> this issue is fixed.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org