[jira] [Created] (IMPALA-12935) Allow function parsing for Impala Calcite planner

2024-03-22 Thread Steve Carlin (Jira)
Steve Carlin created IMPALA-12935:
-

 Summary: Allow function parsing for Impala Calcite planner
 Key: IMPALA-12935
 URL: https://issues.apache.org/jira/browse/IMPALA-12935
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Steve Carlin


We need the ability to parse and validate Impala functions using the Calcite 
planner

This commit is not attended to work for all functions, or even most functions.  
It will work as a base to be reviewed, and at least some functions will work.  
More complicated functions will be added in a later commit.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12934) Import parser files from Calcite into Impala

2024-03-22 Thread Steve Carlin (Jira)
Steve Carlin created IMPALA-12934:
-

 Summary: Import parser files from Calcite into Impala
 Key: IMPALA-12934
 URL: https://issues.apache.org/jira/browse/IMPALA-12934
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Steve Carlin


Since the Impala sql syntax is different from the Calcite sql syntax, Impala 
needs it's own parsing files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12564) TSAN test run fails to start Hive Server on Ubuntu 22.04

2024-03-22 Thread Laszlo Gaal (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829995#comment-17829995
 ] 

Laszlo Gaal commented on IMPALA-12564:
--

https://gerrit.cloudera.org/c/21191

> TSAN test run fails to start Hive Server on Ubuntu 22.04
> 
>
> Key: IMPALA-12564
> URL: https://issues.apache.org/jira/browse/IMPALA-12564
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Michael Smith
>Assignee: Laszlo Gaal
>Priority: Major
>
> Hive Server startup hangs during ARM TSAN test runs. Logs were not 
> particularly illuminating.
> hive-server2.out shows
> {code}
> /usr/lib/jvm/java-8-openjdk-amd64/bin/java: symbol lookup error: 
> /home/michael/Impala/be/build/debug/service/libfesupport.so: undefined 
> symbol: __tsan_init
> {code}
> Can work around it locally by commenting out 
> https://github.com/apache/impala/blob/4.3.0/testdata/bin/run-hive-server.sh#L140-L146.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-12564) TSAN test run fails to start Hive Server on Ubuntu 22.04

2024-03-22 Thread Laszlo Gaal (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-12564 started by Laszlo Gaal.

> TSAN test run fails to start Hive Server on Ubuntu 22.04
> 
>
> Key: IMPALA-12564
> URL: https://issues.apache.org/jira/browse/IMPALA-12564
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Michael Smith
>Assignee: Laszlo Gaal
>Priority: Major
>
> Hive Server startup hangs during ARM TSAN test runs. Logs were not 
> particularly illuminating.
> hive-server2.out shows
> {code}
> /usr/lib/jvm/java-8-openjdk-amd64/bin/java: symbol lookup error: 
> /home/michael/Impala/be/build/debug/service/libfesupport.so: undefined 
> symbol: __tsan_init
> {code}
> Can work around it locally by commenting out 
> https://github.com/apache/impala/blob/4.3.0/testdata/bin/run-hive-server.sh#L140-L146.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-12564) TSAN test run fails to start Hive Server on Ubuntu 22.04

2024-03-22 Thread Laszlo Gaal (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laszlo Gaal reassigned IMPALA-12564:


Assignee: Laszlo Gaal  (was: Michael Smith)

> TSAN test run fails to start Hive Server on Ubuntu 22.04
> 
>
> Key: IMPALA-12564
> URL: https://issues.apache.org/jira/browse/IMPALA-12564
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Michael Smith
>Assignee: Laszlo Gaal
>Priority: Major
>
> Hive Server startup hangs during ARM TSAN test runs. Logs were not 
> particularly illuminating.
> hive-server2.out shows
> {code}
> /usr/lib/jvm/java-8-openjdk-amd64/bin/java: symbol lookup error: 
> /home/michael/Impala/be/build/debug/service/libfesupport.so: undefined 
> symbol: __tsan_init
> {code}
> Can work around it locally by commenting out 
> https://github.com/apache/impala/blob/4.3.0/testdata/bin/run-hive-server.sh#L140-L146.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-12922) Make batch_sizes optional in create_exec_option_dimension

2024-03-22 Thread Riza Suminto (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Riza Suminto reassigned IMPALA-12922:
-

Assignee: Riza Suminto

> Make batch_sizes optional in create_exec_option_dimension
> -
>
> Key: IMPALA-12922
> URL: https://issues.apache.org/jira/browse/IMPALA-12922
> Project: IMPALA
>  Issue Type: Test
>Reporter: Riza Suminto
>Assignee: Riza Suminto
>Priority: Major
>
> BATCH_SIZE option is a required parameter with default value of [0] in 
> create_exec_option_dimension.
> [https://github.com/apache/impala/blob/4477398ae46415d3fb32db2a8fd5e6d2060cbd3f/tests/common/test_dimensions.py#L225]
>  
> However, only few EE tests actively parameterize BATCH_SIZE option to non 
> default values. batch_sizes parameter in create_exec_option_dimension can be 
> made optional just like sync_ddl. This will reduce the length of test 
> identifiers in many EE and custom cluster tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-12898) Tidy up test matrix of test_scanner.py

2024-03-22 Thread Riza Suminto (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Riza Suminto resolved IMPALA-12898.
---
Fix Version/s: Impala 4.4.0
   Resolution: Fixed

> Tidy up test matrix of test_scanner.py
> --
>
> Key: IMPALA-12898
> URL: https://issues.apache.org/jira/browse/IMPALA-12898
> Project: IMPALA
>  Issue Type: Test
>  Components: Infrastructure
>Reporter: Riza Suminto
>Assignee: Riza Suminto
>Priority: Minor
> Fix For: Impala 4.4.0
>
>
> Several tests in test_scanner.py seemingly declared with tests dimensions 
> that is ignored by its tests.
> For example, TestScannersAllTableFormats
> {code:python}
> class TestScannersAllTableFormats(ImpalaTestSuite):
>   BATCH_SIZES = [0, 1, 16]
>   @classmethod
>   def get_workload(cls):
> return 'functional-query'
>   @classmethod
>   def add_test_dimensions(cls):
> super(TestScannersAllTableFormats, cls).add_test_dimensions()
> if cls.exploration_strategy() == 'core':
>   # The purpose of this test is to get some base coverage of all the file 
> formats.
>   # Even in 'core', we'll test each format by using the pairwise strategy.
>   
> cls.ImpalaTestMatrix.add_dimension(cls.create_table_info_dimension('pairwise'))
> cls.ImpalaTestMatrix.add_dimension(
> ImpalaTestDimension('batch_size', 
> *TestScannersAllTableFormats.BATCH_SIZES))
> cls.ImpalaTestMatrix.add_dimension(
> ImpalaTestDimension('debug_action', *DEBUG_ACTION_DIMS))
> cls.ImpalaTestMatrix.add_dimension(ImpalaTestDimension('mt_dop', 
> *MT_DOP_VALUES))
>   def test_scanners(self, vector):
> new_vector = deepcopy(vector)
> # Copy over test dimensions to the matching query options.
> new_vector.get_value('exec_option')['batch_size'] = 
> vector.get_value('batch_size')
> new_vector.get_value('exec_option')['debug_action'] = 
> vector.get_value('debug_action')
> new_vector.get_value('exec_option')['mt_dop'] = vector.get_value('mt_dop')
> self.run_test_case('QueryTest/scanners', new_vector)
>   def test_many_nulls(self, vector):
> if vector.get_value('table_format').file_format == 'hbase':
>   # manynulls table not loaded for HBase
>   pytest.skip()
> # Copy over test dimensions to the matching query options.
> new_vector = deepcopy(vector)
> new_vector.get_value('exec_option')['batch_size'] = 
> vector.get_value('batch_size')
> new_vector.get_value('exec_option')['debug_action'] = 
> vector.get_value('debug_action')
> new_vector.get_value('exec_option')['mt_dop'] = vector.get_value('mt_dop')
> self.run_test_case('QueryTest/scanners-many-nulls', new_vector)
>   def test_hdfs_scanner_profile(self, vector):
> if vector.get_value('table_format').file_format in ('kudu', 'hbase') or \
>vector.get_value('exec_option')['num_nodes'] != 0:
>   pytest.skip()
> self.run_test_case('QueryTest/hdfs_scanner_profile', vector)
>   def test_string_escaping(self, vector):
> """Test handling of string escape sequences."""
> if vector.get_value('table_format').file_format == 'rc':
>   # IMPALA-7778: RCFile scanner incorrectly ignores escapes for now.
>   self.run_test_case('QueryTest/string-escaping-rcfile-bug', vector)
> else:
>   self.run_test_case('QueryTest/string-escaping', vector)
> {code}
> test_scanners and test_many_nulls correctly copy exec_option values from  
> test vector. But test_hdfs_scanner_profile and test_string_escaping is not, 
> and unnecessary run multiple times even though it does not permuting its 
> exec_option. This and other test classes inside test_scanner.py can benefit 
> from refactoring and dimension reduction.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-12904) test_type_conversions_hive3 silently passes because of wrongly defined test dimensions

2024-03-22 Thread Riza Suminto (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Riza Suminto resolved IMPALA-12904.
---
 Fix Version/s: Impala 4.4.0
Target Version: Impala 4.4.0
Resolution: Fixed

> test_type_conversions_hive3 silently passes because of wrongly defined test 
> dimensions
> --
>
> Key: IMPALA-12904
> URL: https://issues.apache.org/jira/browse/IMPALA-12904
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
> Fix For: Impala 4.4.0
>
>
> test_type_conversions_hive3 silently passes because we are not creating the 
> test dimenstion for query option orc_shema_resolution correctly.
> Instead of:
>  
> {noformat}
> cls.ImpalaTestMatrix.add_dimension(ImpalaTestDimension('orc_schema_resolution',
>  0, 1)){noformat}
> We should do the following:
> {noformat}
> add_exec_option_dimension(cls, 'orc_schema_resolution', [0, 1]){noformat}
> We should fix how we add the test dimension, and also fix the underlying bug 
> it reveals.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12927) Support reading BINARY columns in JSON tables

2024-03-22 Thread Csaba Ringhofer (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829953#comment-17829953
 ] 

Csaba Ringhofer commented on IMPALA-12927:
--

I think that the best would be to check tbl property "json.binary.format":
 * if not set, give a clear error message
 * if base64, do base64 decoding
 * if rawstring, handle it the way Hive does: 
[https://github.com/apache/hive/blame/f216bbb632752f467321869cee03adf9477409cf/serde/src/java/org/apache/hadoop/hive/serde2/json/HiveJsonReader.java#L455]

Note that I am don't know how exactly special characters are handled in the 
rawstring case.

> Support reading BINARY columns in JSON tables
> -
>
> Key: IMPALA-12927
> URL: https://issues.apache.org/jira/browse/IMPALA-12927
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Csaba Ringhofer
>Assignee: Zihao Ye
>Priority: Major
>
> Currently Impala cannot read BINARY columns in JSON files written by Hive 
> correctly and returns runtime errors:
> {code}
> select * from functional_json.binary_tbl;
> ++--++
> | id | string_col   | binary_col |
> ++--++
> | 1  | ascii        | NULL       |
> | 2  | ascii        | NULL       |
> | 3  | null         | NULL       |
> | 4  | empty        |            |
> | 5  | valid utf8   | NULL       |
> | 6  | valid utf8   | NULL       |
> | 7  | invalid utf8 | NULL       |
> | 8  | invalid utf8 | NULL       |
> ++--++
> WARNINGS: Error converting column: functional_json.binary_tbl.binary_col, 
> type: STRING, data: 'binary1'
> Error parsing row: file: 
> hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before 
> offset: 481
> Error converting column: functional_json.binary_tbl.binary_col, type: STRING, 
> data: 'binary2'
> Error parsing row: file: 
> hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before 
> offset: 481
> Error converting column: functional_json.binary_tbl.binary_col, type: STRING, 
> data: 'árvíztűrőtükörfúró'
> Error parsing row: file: 
> hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before 
> offset: 481
> Error converting column: functional_json.binary_tbl.binary_col, type: STRING, 
> data: '你好hello'
> Error parsing row: file: 
> hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before 
> offset: 481
> Error converting column: functional_json.binary_tbl.binary_col, type: STRING, 
> data: '��'
> Error parsing row: file: 
> hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before 
> offset: 481
> Error converting column: functional_json.binary_tbl.binary_col, type: STRING, 
> data: '�D3"'
> Error parsing row: file: 
> hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before 
> offset: 481
> {code}
> The single file in the table looks like this:
> {code}
>  hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0
> {"id":1,"string_col":"ascii","binary_col":"binary1"}
> {"id":2,"string_col":"ascii","binary_col":"binary2"}
> {"id":3,"string_col":"null","binary_col":null}
> {"id":4,"string_col":"empty","binary_col":""}
> {"id":5,"string_col":"valid utf8","binary_col":"árvíztűrőtükörfúró"}
> {"id":6,"string_col":"valid utf8","binary_col":"你好hello"}
> {"id":7,"string_col":"invalid utf8","binary_col":"\u�\u�"}
> {"id":8,"string_col":"invalid utf8","binary_col":"�D3\"\u0011\u"}
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12933) Catalogd should set eventTypeSkipList when fetching specifit events for a table

2024-03-22 Thread Quanlong Huang (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829845#comment-17829845
 ] 

Quanlong Huang commented on IMPALA-12933:
-

In IMPALA-12399, we added OPEN_TXN to the eventTypeSkipList. We can also add 
UPDATE_PART_COL_STAT_EVENT which is also unused by Impala.

> Catalogd should set eventTypeSkipList when fetching specifit events for a 
> table
> ---
>
> Key: IMPALA-12933
> URL: https://issues.apache.org/jira/browse/IMPALA-12933
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
>
> There are several places that catalogd will fetch all events of a specifit 
> type on a table. E.g. in TableLoader#load(), if the table has an old 
> createEventId, catalogd will fetch all CREATE_TABLE events after that 
> createEventId on the table.
> Fetching the list of events is expensive since the filtering is done on 
> client side, i.e. catalogd fetch all events and filter them locally based on 
> the event type and table name:
> [https://github.com/apache/impala/blob/14e3ed4f97292499b2e6ee8d5a756dc648d9/fe/src/main/java/org/apache/impala/catalog/TableLoader.java#L98-L102]
> [https://github.com/apache/impala/blob/b7ddbcad0dd6accb559a3f391a897a8c442d1728/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L336]
> This could take hours if there are lots of events (e.g 1M) in HMS. In fact, 
> NotificationEventRequest can specify an eventTypeSkipList. Catalogd can do 
> the filtering of event type in HMS side. On higher Hive versions that have 
> HIVE-27499, catalogd can also specify the table name in the request 
> (IMPALA-12607).
> This Jira focus on specifying the eventTypeSkipList when fetching events of a 
> particular type on a table.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-12894) Optimized count(*) for Iceberg gives wrong results after a Spark rewrite_data_files

2024-03-22 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-12894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy reassigned IMPALA-12894:
--

Assignee: Zoltán Borók-Nagy

> Optimized count(*) for Iceberg gives wrong results after a Spark 
> rewrite_data_files
> ---
>
> Key: IMPALA-12894
> URL: https://issues.apache.org/jira/browse/IMPALA-12894
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 4.3.0
>Reporter: Gabor Kaszab
>Assignee: Zoltán Borók-Nagy
>Priority: Critical
>  Labels: correctness, impala-iceberg
> Attachments: count_star_correctness_repro.tar.gz
>
>
> Issue was introduced by https://issues.apache.org/jira/browse/IMPALA-11802 
> that implemented an optimized way to get results for count(*). However, if 
> the table was compacted by Spark this optimization can give incorrect results.
> The reason is that Spark can[ skip dropping delete 
> files|https://iceberg.apache.org/docs/latest/spark-procedures/#rewrite_position_delete_files]
>  that are pointing to compacted data files, as a result there might be delete 
> files after compaction that are no longer applied to any data files.
> Repro:
> With Impala
> {code:java}
> create table default.iceberg_testing (id int, j bigint) STORED AS ICEBERG
> TBLPROPERTIES('iceberg.catalog'='hadoop.catalog',
>               'iceberg.catalog_location'='/tmp/spark_iceberg_catalog/',
>               'iceberg.table_identifier'='iceberg_testing',
>               'format-version'='2');
> insert into iceberg_testing values
> (1, 1), (2, 4), (3, 9), (4, 16), (5, 25);
> update iceberg_testing set j = -100 where id = 4;
> delete from iceberg_testing where id = 4;{code}
> Count * returns 4 at this point.
> Run compaction in Spark:
> {code:java}
> spark.sql(s"CALL local.system.rewrite_data_files(table => 
> 'default.iceberg_testing', options => map('min-input-files','2') )").show() 
> {code}
> Now count * in Impala returns 8 (might require an IM if in HadoopCatalog). 
> Hive returns correct results. Also a SELECT * returns correct results.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-12926) Refactor BINARY type handling in the backend

2024-03-22 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker resolved IMPALA-12926.

Resolution: Fixed

> Refactor BINARY type handling in the backend
> 
>
> Key: IMPALA-12926
> URL: https://issues.apache.org/jira/browse/IMPALA-12926
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Daniel Becker
>Assignee: Daniel Becker
>Priority: Major
>
> Currently the STRING and BINARY types are not distinguished in most of the 
> backend. In contrast to the frontend, PrimitiveType::TYPE_BINARY is not used 
> there at all, TYPE_STRING being used instead. This is to ensure that 
> everything that works for STRING also works for BINARY. So far only file 
> readers and writers have had to handle them differently, and they have access 
> to ColumnDescriptors which contain AuxColumnType fields that differentiate 
> these two types.
> However, only top-level columns have ColumnDescriptors. Adding support or 
> BINARYs within complex types (see IMPALA-11491 and IMPALA-12651) necessitates 
> adding type information about STRING vs BINARY to embedded fields as well.
> Using PrimitiveType::TYPE_BINARY would probably be the cleanest solution but 
> it would affect huge parts of the code as TYPE_BINARY would have to be added 
> to hundreds of switch statements and this would be error prone.
> Instead, we should introduce a new field in ColumnType: 'is_binary', which is 
> true if the type is a BINARY and false otherwise. We keep using TYPE_STRING 
> as the PrimitiveType of the ColumnType for BINARYs. This way full type 
> information is present in ColumnType but code that does not differentiate 
> between STRING and BINARY will continue to work for BINARY.
> With this change, AuxColumnType is no longer needed and should be removed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12933) Catalogd should set eventTypeSkipList when fetching specifit events for a table

2024-03-22 Thread Quanlong Huang (Jira)
Quanlong Huang created IMPALA-12933:
---

 Summary: Catalogd should set eventTypeSkipList when fetching 
specifit events for a table
 Key: IMPALA-12933
 URL: https://issues.apache.org/jira/browse/IMPALA-12933
 Project: IMPALA
  Issue Type: Bug
  Components: Catalog
Reporter: Quanlong Huang
Assignee: Quanlong Huang


There are several places that catalogd will fetch all events of a specifit type 
on a table. E.g. in TableLoader#load(), if the table has an old createEventId, 
catalogd will fetch all CREATE_TABLE events after that createEventId on the 
table.

Fetching the list of events is expensive since the filtering is done on client 
side, i.e. catalogd fetch all events and filter them locally based on the event 
type and table name:
[https://github.com/apache/impala/blob/14e3ed4f97292499b2e6ee8d5a756dc648d9/fe/src/main/java/org/apache/impala/catalog/TableLoader.java#L98-L102]
[https://github.com/apache/impala/blob/b7ddbcad0dd6accb559a3f391a897a8c442d1728/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L336]

This could take hours if there are lots of events (e.g 1M) in HMS. In fact, 
NotificationEventRequest can specify an eventTypeSkipList. Catalogd can do the 
filtering of event type in HMS side. On higher Hive versions that have 
HIVE-27499, catalogd can also specify the table name in the request 
(IMPALA-12607).



This Jira focus on specifying the eventTypeSkipList when fetching events of a 
particular type on a table.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-12928) Mask dbcp.password table property of JDBC table for 'desc formatted' and 'show create table' commands

2024-03-22 Thread Wenzhe Zhou (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenzhe Zhou reassigned IMPALA-12928:


Assignee: Wenzhe Zhou

> Mask dbcp.password table property of JDBC table for 'desc formatted' and 
> 'show create table' commands
> -
>
> Key: IMPALA-12928
> URL: https://issues.apache.org/jira/browse/IMPALA-12928
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Clients, Frontend
>Reporter: Wenzhe Zhou
>Assignee: Wenzhe Zhou
>Priority: Major
>
> 'desc formatted' and 'show create table' commands show all of table 
> properties in clear text. For external JDBC table, dbcp.password table 
> property should be masked in the output of above two commands. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-12928) Mask dbcp.password table property of JDBC table for 'desc formatted' and 'show create table' commands

2024-03-22 Thread Wenzhe Zhou (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-12928 started by Wenzhe Zhou.

> Mask dbcp.password table property of JDBC table for 'desc formatted' and 
> 'show create table' commands
> -
>
> Key: IMPALA-12928
> URL: https://issues.apache.org/jira/browse/IMPALA-12928
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Clients, Frontend
>Reporter: Wenzhe Zhou
>Assignee: Wenzhe Zhou
>Priority: Major
>
> 'desc formatted' and 'show create table' commands show all of table 
> properties in clear text. For external JDBC table, dbcp.password table 
> property should be masked in the output of above two commands. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org