[jira] [Closed] (IMPALA-11003) Disallow schema evolution for migrated Iceberg tables

2022-07-12 Thread LiPenglin (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LiPenglin closed IMPALA-11003.
--
Resolution: Duplicate

> Disallow schema evolution for migrated Iceberg tables
> -
>
> Key: IMPALA-11003
> URL: https://issues.apache.org/jira/browse/IMPALA-11003
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Zoltán Borók-Nagy
>Assignee: LiPenglin
>Priority: Major
>  Labels: impala-iceberg
>
> When external tables are converted to Iceberg, the data files remain intact.
> This means that the old data files don't have field id information which is 
> essential for schema evolution.
> Migrated tables are tagged with table property "MIGRATED_TO_ICEBERG". See 
> https://github.com/apache/hive/pull/2744/files
> Hive disallows some schema changing operations, e.g. "replace col" and 
> "change col". See HIVE-25643.
> Impala should also disallow delete column / change column. ADD COLUMN should 
> be fine.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-11332) impala-shell strips trailing whitespace from csv output

2022-07-12 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-11332.

 Fix Version/s: Impala 4.2.0
Target Version: Impala 4.2.0
Resolution: Fixed

> impala-shell strips trailing whitespace from csv output
> ---
>
> Key: IMPALA-11332
> URL: https://issues.apache.org/jira/browse/IMPALA-11332
> Project: IMPALA
>  Issue Type: Bug
>  Components: Clients
>Affects Versions: Impala 4.2.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Critical
> Fix For: Impala 4.2.0
>
>
> A query like this should have trailing whitespace in the impala-shell output:
>  
> {noformat}
> impala-shell -B -q 'select "TEST NODE: Creating table       "'
> Starting Impala Shell with no authentication using Python 3.6.9
> Warning: live_progress only applies to interactive shell sessions, and is 
> being skipped for now.
> Opened TCP connection to localhost:21050
> Connected to localhost:21050
> Server version: impalad version 4.1.0-SNAPSHOT DEBUG (build 
> 730b7addb84839f2aebf63af501cc9e626f0d1c8)
> Query: select "TEST NODE: Creating table       "
> Query submitted at: 2022-06-01 12:37:58 (Coordinator: 
> http://joemcdonnell:25000)
> Query progress can be monitored at: 
> http://joemcdonnell:25000/query_plan?query_id=3743cf7e04b44e18:5ee28b3b
> TEST NODE: Creating table <- whitespace should exist here
> Fetched 1 row(s) in 0.11s{noformat}
> This is due to this rstrip() call here:
>  
> [https://github.com/apache/impala/blob/master/shell/shell_output.py#L93]
>  
> {noformat}
>     for row in rows:
>       if sys.version_info.major == 2:
>         row = [val.encode('utf-8', 'replace') if isinstance(val, unicode) 
> else val
>             for val in row]
>       writer.writerow(row)
>     rows = temp_buffer.getvalue().rstrip() <
>     temp_buffer.close()
>     return rows{noformat}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-10122) Allow view authorization to be deferred until selection time

2022-07-12 Thread Fang-Yu Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao resolved IMPALA-10122.
--
Target Version: Impala 4.2.0
Resolution: Fixed

Resolve this JIRA since the fix has been merged.

> Allow view authorization to be deferred until selection time
> 
>
> Key: IMPALA-10122
> URL: https://issues.apache.org/jira/browse/IMPALA-10122
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>
> Recall that currently Impala performs authorization with Ranger to check 
> whether the requesting user is granted the privilege of {{SELECT}} for the 
> underlying tables when a view is created and thus does not check whether the 
> requesting user is granted the {{SELECT}} privilege on the underlying tables 
> when the view is selected.
> On the other hand, currently a Spark user is not allowed to directly create a 
> view in HMS without involving the Impala frontend, because Spark clients are 
> normal users (v.s. superusers). To relax this restriction, it would be good 
> to allow a Spark user to directly create a view in HMS without involving the 
> Impala frontend. However, it can be seen that the authorization check is 
> skipped for views created in this manner since HMS currently does not possess 
> the capability to perform the authorization. Due to this relaxation, for a 
> view created this way, the authorization of the view needs to be carried out 
> at the selection time to make sure the requesting user is indeed granted the 
> {{SELECT}} privileges on the underlying tables defined in the view.
> There is also a corresponding Hive JIRA at HIVE-24026. Refer to there for 
> further details.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-11423) Queries in state "waiting to be closed" still hold ACID locks

2022-07-12 Thread Jira
Zoltán Borók-Nagy created IMPALA-11423:
--

 Summary: Queries in state "waiting to be closed" still hold ACID 
locks
 Key: IMPALA-11423
 URL: https://issues.apache.org/jira/browse/IMPALA-11423
 Project: IMPALA
  Issue Type: Bug
Reporter: Zoltán Borók-Nagy


Impala queries that are in state "Waiting to be closed" still keep the ACID 
transactions open (and hold locks).

>From the documentation:
_These queries are no longer executing, either because they encountered an 
error or because they have returned all of their results, but they are still 
active so that their results can be inspected. To free the resources they are 
using, they must be closed._

But the ACID-resources like transactions and locks should be freed as soon as 
possible, no need to wait for the client to invoke close() on the query handle.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-11302) Improve error message for CREATE EXTERNAL TABLE iceberg command

2022-07-12 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-11302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy resolved IMPALA-11302.

Fix Version/s: Impala 4.2.0
   Resolution: Fixed

> Improve error message for CREATE EXTERNAL TABLE iceberg command
> ---
>
> Key: IMPALA-11302
> URL: https://issues.apache.org/jira/browse/IMPALA-11302
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Vivek Sharma
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-iceberg
> Fix For: Impala 4.2.0
>
> Attachments: impala-icerberg-err.png
>
>
> For the following DDL
> {code:java}
> create external table t4 (id int) stored as iceberg ; {code}
>  
> The error message appears incorrect and can be refined
> {code:java}
> ImpalaRuntimeException: Error making 'createTable' RPC to Hive Metastore: 
> CAUSED BY: TableLoadingException: Failed to load Iceberg table with id: 
> vsh.t4 CAUSED BY: NoSuchTableException: Table does not exist: vsh.t4 {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-11246) TestAcid.test_lock_timings is flaky

2022-07-12 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-11246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy resolved IMPALA-11246.

Fix Version/s: Impala 4.1.0
   Resolution: Fixed

> TestAcid.test_lock_timings is flaky
> ---
>
> Key: IMPALA-11246
> URL: https://issues.apache.org/jira/browse/IMPALA-11246
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 4.1.0
>Reporter: Csaba Ringhofer
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: flaky-test
> Fix For: Impala 4.1.0
>
>
> query_test.test_acid.TestAcid.test_lock_timings[protocol: beeswax | 
> exec_option: {'test_replan': 1, 'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> text/none]
> stacktrace:
> query_test/test_acid.py:343: in test_lock_timings
> assert elapsed > 20 and elapsed < 25
> E   assert (25.887187957763672 > 20 and 25.887187957763672 < 25)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-11346) Migrated partitioned Iceberg tables might return ERROR when WHERE condition is used on partition column

2022-07-12 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-11346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy resolved IMPALA-11346.

Fix Version/s: Impala 4.2.0
   Resolution: Fixed

> Migrated partitioned Iceberg tables might return ERROR when WHERE condition 
> is used on partition column
> ---
>
> Key: IMPALA-11346
> URL: https://issues.apache.org/jira/browse/IMPALA-11346
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-iceberg
> Fix For: Impala 4.2.0
>
>
> {noformat}
> [localhost:21050] default> select * from 
> functional_parquet.iceberg_alltypes_part where p_bool=false;
> Fetched 0 row(s) in 0.11s
> [localhost:21050] default> select * from 
> functional_parquet.iceberg_alltypes_part where p_bool=true;
> ERROR: Unable to find SchemaNode for path 
> 'functional_parquet.iceberg_alltypes_part.p_bool' in the schema of file 
> 'hdfs://localhost:20500/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_alltypes_part/p_bool=true/p_int=1/p_bigint=11/p_float=1.1/p_double=2.222/p_decimal=123.321/p_date=2022-02-22/p_string=impala/00_0'.
> [localhost:21050] default> select * from 
> functional_parquet.iceberg_alltypes_part where i=3;
> Fetched 0 row(s) in 0.12s
> [localhost:21050] default> select * from 
> functional_parquet.iceberg_alltypes_part where i=1;
> +---++---+--+---+--+---++--+
> | i | p_bool | p_int | p_bigint | p_float       | p_double | p_decimal | 
> p_date     | p_string |
> +---++---+--+---+--+---++--+
> | 1 | true   | 1     | 11       | 1.1002384 | 2.222    | 123.321   | 
> 2022-02-22 | impala   |
> +---++---+--+---+--+---++--+
> Fetched 1 row(s) in 0.12s
> [localhost:21050] default> select * from 
> functional_parquet.iceberg_alltypes_part where p_int=1;
> ERROR: Unable to find SchemaNode for path 
> 'functional_parquet.iceberg_alltypes_part.p_int' in the schema of file 
> 'hdfs://localhost:20500/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_alltypes_part/p_bool=true/p_int=1/p_bigint=11/p_float=1.1/p_double=2.222/p_decimal=123.321/p_date=2022-02-22/p_string=impala/00_0'.
> [localhost:21050] default> select * from 
> functional_parquet.iceberg_alltypes_part where p_int=3;
> Fetched 0 row(s) in 0.11s{noformat}
> So we don't get incorrect results at least, but getting errors on partition 
> column values that are existing.
> It seems like it works well with ORC.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-11303) Exception is not raised for Iceberg DDL that misses LOCATION clause

2022-07-12 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-11303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy resolved IMPALA-11303.

Fix Version/s: Impala 4.2.0
   Resolution: Fixed

> Exception is not raised for Iceberg DDL that misses LOCATION clause
> ---
>
> Key: IMPALA-11303
> URL: https://issues.apache.org/jira/browse/IMPALA-11303
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Vivek Sharma
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-iceberg
> Fix For: Impala 4.2.0
>
> Attachments: impala-err-2.png
>
>
> {code:java}
> CREATE EXTERNAL TABLE t7(
>   level STRING
> )
> STORED AS ICEBERG
> TBLPROPERTIES('iceberg.catalog'='hadoop.tables'); {code}
> For the above DDL, the error message to include LOCATION shows in the result 
> summary. See attachment
> We should raise an exception instead and let user know upfront
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-11414) Off-by-one error in Parquet late materialization

2022-07-12 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-11414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy resolved IMPALA-11414.

Fix Version/s: Impala 4.2.0
   Resolution: Fixed

> Off-by-one error in Parquet late materialization
> 
>
> Key: IMPALA-11414
> URL: https://issues.apache.org/jira/browse/IMPALA-11414
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
> Fix For: Impala 4.2.0
>
>
> With PARQUET_LATE_MATERIALIZATION we can set the number of minimum 
> consecutive rows that if filtered out, we avoid materialization of rows in 
> other columns in parquet.
> E.g. if PARQUET_LATE_MATERIALIZATION is 10, and in a filtered column we find 
> at least 10 consecutive rows that don't pass the predicates we avoid 
> materializing the corresponding rows in the other columns.
> But due to an off-by-one error we actually only need 
> (PARQUET_LATE_MATERIALIZATION - 1) consecutive elements. This means if we set 
> PARQUET_LATE_MATERIALIZATION to one, then we need zero consecutive filtered 
> out elements which leads to a crash/DCHECK. The bug is in the 
> GetMicroBatches() algorithm when we produce the micro batches based on the 
> selected rows.
> Setting PARQUET_LATE_MATERIALIZATION to 0 doesn't make sense so it shouldn't 
> be allowed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)