[jira] [Created] (IMPALA-13047) Support restarting a specified impalad in bin/start-impala-cluster.py

2024-04-29 Thread Quanlong Huang (Jira)
Quanlong Huang created IMPALA-13047:
---

 Summary: Support restarting a specified impalad in 
bin/start-impala-cluster.py
 Key: IMPALA-13047
 URL: https://issues.apache.org/jira/browse/IMPALA-13047
 Project: IMPALA
  Issue Type: New Feature
  Components: Infrastructure
Reporter: Quanlong Huang
Assignee: Quanlong Huang


Currently, bin/start-impala-cluster.py can restart catalogd, statestored and 
*all* impalads. It'd be useful to support only restarting one impalad. We need 
this in the debug of IMPALA-13009.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12917) Several tests in TestEventProcessingError fail

2024-04-29 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-12917:

Fix Version/s: Impala 4.4.0

> Several tests in TestEventProcessingError fail
> --
>
> Key: IMPALA-12917
> URL: https://issues.apache.org/jira/browse/IMPALA-12917
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Daniel Becker
>Assignee: Venugopal Reddy K
>Priority: Blocker
>  Labels: broken-build
> Fix For: Impala 4.4.0
>
>
> The failing tests are
> TestEventProcessingError.test_event_processor_error_alter_partition
> TestEventProcessingError.test_event_processor_error_alter_partitions
> TestEventProcessingError.test_event_processor_error_commit_compaction_event
> TestEventProcessingError.test_event_processor_error_commit_txn
> TestEventProcessingError.test_event_processor_error_stress_test
> Stacktrace:
> {code:java}
> E   Error: Error while compiling statement: FAILED: Execution Error, return 
> code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask. 
> java.lang.NullPointerException
> E at org.apache.tez.client.TezClient.cleanStagingDir(TezClient.java:424)
> E at org.apache.tez.client.TezClient.start(TezClient.java:413)
> E at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.startSessionAndContainers(TezSessionState.java:556)
> E at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.openInternal(TezSessionState.java:387)
> E at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:302)
> E at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolSession.open(TezSessionPoolSession.java:106)
> E at 
> org.apache.hadoop.hive.ql.exec.tez.TezTask.ensureSessionHasResources(TezTask.java:468)
> E at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:227)
> E at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
> E at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
> E at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:356)
> E at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:329)
> E at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246)
> E at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:107)
> E at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:809)
> E at org.apache.hadoop.hive.ql.Driver.run(Driver.java:546)
> E at org.apache.hadoop.hive.ql.Driver.run(Driver.java:540)
> E at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:190)
> E at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:235)
> E at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:92)
> E at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:340)
> E at java.security.AccessController.doPrivileged(Native Method)
> E at javax.security.auth.Subject.doAs(Subject.java:422)
> E at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
> E at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:360)
> E at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> E at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> E at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> E at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> E at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> E at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> E at java.lang.Thread.run(Thread.java:748) (state=08S01,code=1)
> {code}
> These tests were introduced by IMPALA-12832, [~VenuReddy] could you take a 
> look?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12684) Use IMPALA_COMPRESSED_DEBUG_INFO=true by default

2024-04-29 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842167#comment-17842167
 ] 

ASF subversion and git services commented on IMPALA-12684:
--

Commit 56f35ad40a7ded4a700222eef8dc99cf7ea44625 in impala's branch 
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=56f35ad40 ]

IMPALA-12684: Enable IMPALA_COMPRESSED_DEBUG_INFO by default

IMPALA_COMPRESSED_DEBUG_INFO was introduced in IMPALA-11511
and reduces Impala binary sizes by >50%. Debug tools like
gdb and our minidump processing scripts handle compressed
debug information properly. There are slightly higher link
times and additional overhead when doing debugging.

Overall, the reduction in binary sizes seems worth it given
the modest overhead. Compressing the debug information also
avoids concerns that adding debug information to toolchain
components would increase binary sizes.

This changes the default for IMPALA_COMPRESSED_DEBUG_INFO to
true.

Testing:
 - Ran pstack on a Centos 7 machine running tests with
   IMPALA_COMPRESSED_DEBUG_INFO=true and verified that
   the symbols work properly
 - Forced the production of minidumps for a job using
   IMPALA_COMPRESSED_DEBUG_INFO=true and verified it is
   processed properly.
 - Used this locally for development for several months

Change-Id: I31640f1453d351b11644bb46af3d2158b22af5b3
Reviewed-on: http://gerrit.cloudera.org:8080/20871
Reviewed-by: Quanlong Huang 
Tested-by: Impala Public Jenkins 


> Use IMPALA_COMPRESSED_DEBUG_INFO=true by default
> 
>
> Key: IMPALA-12684
> URL: https://issues.apache.org/jira/browse/IMPALA-12684
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 4.4.0
>Reporter: Joe McDonnell
>Priority: Major
>
> A large part of the Impala binary's size is debug information. The 
> IMPALA_COMPRESSED_DEBUG_INFO environment variable was introduced in 
> IMPALA-11511 and reduces the binary size by >50%. For developer environments, 
> this is can dramatically reduce disk space usage, especially when building 
> the backend tests. The downside is slower linking and slower processing of 
> the debuginfo (attaching with GDB, dumping symbols for minidumps).
> The disk space difference is very large, and this can help enable us to 
> expand debug information for the toolchain. We should switch the default 
> value to true.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-11511) Provide an option to build with compressed debug info

2024-04-29 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842168#comment-17842168
 ] 

ASF subversion and git services commented on IMPALA-11511:
--

Commit 56f35ad40a7ded4a700222eef8dc99cf7ea44625 in impala's branch 
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=56f35ad40 ]

IMPALA-12684: Enable IMPALA_COMPRESSED_DEBUG_INFO by default

IMPALA_COMPRESSED_DEBUG_INFO was introduced in IMPALA-11511
and reduces Impala binary sizes by >50%. Debug tools like
gdb and our minidump processing scripts handle compressed
debug information properly. There are slightly higher link
times and additional overhead when doing debugging.

Overall, the reduction in binary sizes seems worth it given
the modest overhead. Compressing the debug information also
avoids concerns that adding debug information to toolchain
components would increase binary sizes.

This changes the default for IMPALA_COMPRESSED_DEBUG_INFO to
true.

Testing:
 - Ran pstack on a Centos 7 machine running tests with
   IMPALA_COMPRESSED_DEBUG_INFO=true and verified that
   the symbols work properly
 - Forced the production of minidumps for a job using
   IMPALA_COMPRESSED_DEBUG_INFO=true and verified it is
   processed properly.
 - Used this locally for development for several months

Change-Id: I31640f1453d351b11644bb46af3d2158b22af5b3
Reviewed-on: http://gerrit.cloudera.org:8080/20871
Reviewed-by: Quanlong Huang 
Tested-by: Impala Public Jenkins 


> Provide an option to build with compressed debug info
> -
>
> Key: IMPALA-11511
> URL: https://issues.apache.org/jira/browse/IMPALA-11511
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 4.2.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
> Fix For: Impala 4.2.0
>
>
> For builds with debug information, the debug information is often a large 
> portion of the binary size. There is a feature that compresses the debug info 
> using ZLIB via the "-gz" compilation flag. It makes a very large difference 
> to the size of our binaries:
> {noformat}
> GCC 7.5:
> debug: 726767520
> debug with -gz: 325970776
> release: 707911496
> with with -gz: 301671026
> GCC 10.4:
> debug: 870378832
> debug with -gz: 351442253
> release: 974600536
> release with -gz: 367938487{noformat}
> The size reduction would be useful for developers, but support in other tools 
> is mixed. gdb has support and seems to work fine. breakpad does not have 
> support. However, it is easy to convert a binary with compressed debug 
> symbols to one with normal debug symbols using objcopy:
> {noformat}
> objcopy --decompress-debug-sections [in_binary] [out_binary]{noformat}
> Given a minidump, it is possible to run objcopy to decompress the debug 
> symbols for the original binary, dump the breakpad symbols, and then process 
> the minidump successfully. So, it should be possible to modify 
> bin/dump_breakpad_symbols.py to do this automatically.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-13046) Rework Iceberg mixed format delete test for Hive optimization

2024-04-29 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith updated IMPALA-13046:
---
Description: 
A Hive improvement (HIVE-27731) to Iceberg support changed Hive's behavior 
around handling deletes. It used to always add a delete file, but now if a 
delete would negate all the contents of a data file it instead removes the data 
file. This breaks iceberg-mixed-format-position-deletes.test
{code}
query_test/test_iceberg.py:1472: in test_read_mixed_format_position_deletes
vector, unique_database)
common/impala_test_suite.py:820: in run_test_case
self.__verify_results_and_errors(vector, test_section, result, use_db)
common/impala_test_suite.py:627: in __verify_results_and_errors
replace_filenames_with_placeholder)
common/test_result_verifier.py:520: in verify_raw_results
VERIFIER_MAP[verifier](expected, actual)
common/test_result_verifier.py:290: in verify_query_result_is_subset
unicode(expected_row), unicode(actual_results))
E   AssertionError: Could not find expected row 
row_regex:'hdfs://localhost:20500/test-warehouse/test_read_mixed_format_position_deletes_6fb8ae98.db/ice_mixed_formats/data/.*-data-.*.parquet','.*B','','.*'
 in actual rows:
E   
'hdfs://localhost:20500/test-warehouse/test_read_mixed_format_position_deletes_6fb8ae98.db/ice_mixed_formats/data/0-0-data-jenkins_20240427231939_2457cdfb-2e04-471a-9661-4551626f60ee-job_17142740646310_0071-4-1.orc','306B','','NONE'
{code}
as the only resulting data file is a .orc file containing the only valid row.

  was:
A Hive improvement (probably HIVE-28069) to Iceberg support changed Hive's 
behavior around handling deletes. It used to always add a delete file, but now 
if a delete would negate all the contents of a data file it instead removes the 
data file. This breaks iceberg-mixed-format-position-deletes.test
{code}
query_test/test_iceberg.py:1472: in test_read_mixed_format_position_deletes
vector, unique_database)
common/impala_test_suite.py:820: in run_test_case
self.__verify_results_and_errors(vector, test_section, result, use_db)
common/impala_test_suite.py:627: in __verify_results_and_errors
replace_filenames_with_placeholder)
common/test_result_verifier.py:520: in verify_raw_results
VERIFIER_MAP[verifier](expected, actual)
common/test_result_verifier.py:290: in verify_query_result_is_subset
unicode(expected_row), unicode(actual_results))
E   AssertionError: Could not find expected row 
row_regex:'hdfs://localhost:20500/test-warehouse/test_read_mixed_format_position_deletes_6fb8ae98.db/ice_mixed_formats/data/.*-data-.*.parquet','.*B','','.*'
 in actual rows:
E   
'hdfs://localhost:20500/test-warehouse/test_read_mixed_format_position_deletes_6fb8ae98.db/ice_mixed_formats/data/0-0-data-jenkins_20240427231939_2457cdfb-2e04-471a-9661-4551626f60ee-job_17142740646310_0071-4-1.orc','306B','','NONE'
{code}
as the only resulting data file is a .orc file containing the only valid row.


> Rework Iceberg mixed format delete test for Hive optimization
> -
>
> Key: IMPALA-13046
> URL: https://issues.apache.org/jira/browse/IMPALA-13046
> Project: IMPALA
>  Issue Type: Task
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Major
>
> A Hive improvement (HIVE-27731) to Iceberg support changed Hive's behavior 
> around handling deletes. It used to always add a delete file, but now if a 
> delete would negate all the contents of a data file it instead removes the 
> data file. This breaks iceberg-mixed-format-position-deletes.test
> {code}
> query_test/test_iceberg.py:1472: in test_read_mixed_format_position_deletes
> vector, unique_database)
> common/impala_test_suite.py:820: in run_test_case
> self.__verify_results_and_errors(vector, test_section, result, use_db)
> common/impala_test_suite.py:627: in __verify_results_and_errors
> replace_filenames_with_placeholder)
> common/test_result_verifier.py:520: in verify_raw_results
> VERIFIER_MAP[verifier](expected, actual)
> common/test_result_verifier.py:290: in verify_query_result_is_subset
> unicode(expected_row), unicode(actual_results))
> E   AssertionError: Could not find expected row 
> row_regex:'hdfs://localhost:20500/test-warehouse/test_read_mixed_format_position_deletes_6fb8ae98.db/ice_mixed_formats/data/.*-data-.*.parquet','.*B','','.*'
>  in actual rows:
> E   
> 'hdfs://localhost:20500/test-warehouse/test_read_mixed_format_position_deletes_6fb8ae98.db/ice_mixed_formats/data/0-0-data-jenkins_20240427231939_2457cdfb-2e04-471a-9661-4551626f60ee-job_17142740646310_0071-4-1.orc','306B','','NONE'
> {code}
> as the only resulting data file is a .orc file containing the only valid row.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IMPALA-13046) Rework Iceberg mixed format delete test for Hive optimization

2024-04-29 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith updated IMPALA-13046:
---
Description: 
A Hive improvement (probably HIVE-28069) to Iceberg support changed Hive's 
behavior around handling deletes. It used to always add a delete file, but now 
if a delete would negate all the contents of a data file it instead removes the 
data file. This breaks iceberg-mixed-format-position-deletes.test
{code}
query_test/test_iceberg.py:1472: in test_read_mixed_format_position_deletes
vector, unique_database)
common/impala_test_suite.py:820: in run_test_case
self.__verify_results_and_errors(vector, test_section, result, use_db)
common/impala_test_suite.py:627: in __verify_results_and_errors
replace_filenames_with_placeholder)
common/test_result_verifier.py:520: in verify_raw_results
VERIFIER_MAP[verifier](expected, actual)
common/test_result_verifier.py:290: in verify_query_result_is_subset
unicode(expected_row), unicode(actual_results))
E   AssertionError: Could not find expected row 
row_regex:'hdfs://localhost:20500/test-warehouse/test_read_mixed_format_position_deletes_6fb8ae98.db/ice_mixed_formats/data/.*-data-.*.parquet','.*B','','.*'
 in actual rows:
E   
'hdfs://localhost:20500/test-warehouse/test_read_mixed_format_position_deletes_6fb8ae98.db/ice_mixed_formats/data/0-0-data-jenkins_20240427231939_2457cdfb-2e04-471a-9661-4551626f60ee-job_17142740646310_0071-4-1.orc','306B','','NONE'
{code}
as the only resulting data file is a .orc file containing the only valid row.

  was:
A Hive improvement (TBD) to Iceberg support changed Hive's behavior around 
handling deletes. It used to always add a delete file, but now if a delete 
would negate all the contents of a data file it instead removes the data file. 
This breaks iceberg-mixed-format-position-deletes.test
{code}
query_test/test_iceberg.py:1472: in test_read_mixed_format_position_deletes
vector, unique_database)
common/impala_test_suite.py:820: in run_test_case
self.__verify_results_and_errors(vector, test_section, result, use_db)
common/impala_test_suite.py:627: in __verify_results_and_errors
replace_filenames_with_placeholder)
common/test_result_verifier.py:520: in verify_raw_results
VERIFIER_MAP[verifier](expected, actual)
common/test_result_verifier.py:290: in verify_query_result_is_subset
unicode(expected_row), unicode(actual_results))
E   AssertionError: Could not find expected row 
row_regex:'hdfs://localhost:20500/test-warehouse/test_read_mixed_format_position_deletes_6fb8ae98.db/ice_mixed_formats/data/.*-data-.*.parquet','.*B','','.*'
 in actual rows:
E   
'hdfs://localhost:20500/test-warehouse/test_read_mixed_format_position_deletes_6fb8ae98.db/ice_mixed_formats/data/0-0-data-jenkins_20240427231939_2457cdfb-2e04-471a-9661-4551626f60ee-job_17142740646310_0071-4-1.orc','306B','','NONE'
{code}
as the only resulting data file is a .orc file containing the only valid row.


> Rework Iceberg mixed format delete test for Hive optimization
> -
>
> Key: IMPALA-13046
> URL: https://issues.apache.org/jira/browse/IMPALA-13046
> Project: IMPALA
>  Issue Type: Task
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Major
>
> A Hive improvement (probably HIVE-28069) to Iceberg support changed Hive's 
> behavior around handling deletes. It used to always add a delete file, but 
> now if a delete would negate all the contents of a data file it instead 
> removes the data file. This breaks iceberg-mixed-format-position-deletes.test
> {code}
> query_test/test_iceberg.py:1472: in test_read_mixed_format_position_deletes
> vector, unique_database)
> common/impala_test_suite.py:820: in run_test_case
> self.__verify_results_and_errors(vector, test_section, result, use_db)
> common/impala_test_suite.py:627: in __verify_results_and_errors
> replace_filenames_with_placeholder)
> common/test_result_verifier.py:520: in verify_raw_results
> VERIFIER_MAP[verifier](expected, actual)
> common/test_result_verifier.py:290: in verify_query_result_is_subset
> unicode(expected_row), unicode(actual_results))
> E   AssertionError: Could not find expected row 
> row_regex:'hdfs://localhost:20500/test-warehouse/test_read_mixed_format_position_deletes_6fb8ae98.db/ice_mixed_formats/data/.*-data-.*.parquet','.*B','','.*'
>  in actual rows:
> E   
> 'hdfs://localhost:20500/test-warehouse/test_read_mixed_format_position_deletes_6fb8ae98.db/ice_mixed_formats/data/0-0-data-jenkins_20240427231939_2457cdfb-2e04-471a-9661-4551626f60ee-job_17142740646310_0071-4-1.orc','306B','','NONE'
> {code}
> as the only resulting data file is a .orc file containing the only valid row.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work started] (IMPALA-13046) Rework Iceberg mixed format delete test for Hive optimization

2024-04-29 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-13046 started by Michael Smith.
--
> Rework Iceberg mixed format delete test for Hive optimization
> -
>
> Key: IMPALA-13046
> URL: https://issues.apache.org/jira/browse/IMPALA-13046
> Project: IMPALA
>  Issue Type: Task
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Major
>
> A Hive improvement (TBD) to Iceberg support changed Hive's behavior around 
> handling deletes. It used to always add a delete file, but now if a delete 
> would negate all the contents of a data file it instead removes the data 
> file. This breaks iceberg-mixed-format-position-deletes.test
> {code}
> query_test/test_iceberg.py:1472: in test_read_mixed_format_position_deletes
> vector, unique_database)
> common/impala_test_suite.py:820: in run_test_case
> self.__verify_results_and_errors(vector, test_section, result, use_db)
> common/impala_test_suite.py:627: in __verify_results_and_errors
> replace_filenames_with_placeholder)
> common/test_result_verifier.py:520: in verify_raw_results
> VERIFIER_MAP[verifier](expected, actual)
> common/test_result_verifier.py:290: in verify_query_result_is_subset
> unicode(expected_row), unicode(actual_results))
> E   AssertionError: Could not find expected row 
> row_regex:'hdfs://localhost:20500/test-warehouse/test_read_mixed_format_position_deletes_6fb8ae98.db/ice_mixed_formats/data/.*-data-.*.parquet','.*B','','.*'
>  in actual rows:
> E   
> 'hdfs://localhost:20500/test-warehouse/test_read_mixed_format_position_deletes_6fb8ae98.db/ice_mixed_formats/data/0-0-data-jenkins_20240427231939_2457cdfb-2e04-471a-9661-4551626f60ee-job_17142740646310_0071-4-1.orc','306B','','NONE'
> {code}
> as the only resulting data file is a .orc file containing the only valid row.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-13046) Rework Iceberg mixed format delete test for Hive optimization

2024-04-29 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith reassigned IMPALA-13046:
--

Assignee: Michael Smith

> Rework Iceberg mixed format delete test for Hive optimization
> -
>
> Key: IMPALA-13046
> URL: https://issues.apache.org/jira/browse/IMPALA-13046
> Project: IMPALA
>  Issue Type: Task
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Major
>
> A Hive improvement (TBD) to Iceberg support changed Hive's behavior around 
> handling deletes. It used to always add a delete file, but now if a delete 
> would negate all the contents of a data file it instead removes the data 
> file. This breaks iceberg-mixed-format-position-deletes.test
> {code}
> query_test/test_iceberg.py:1472: in test_read_mixed_format_position_deletes
> vector, unique_database)
> common/impala_test_suite.py:820: in run_test_case
> self.__verify_results_and_errors(vector, test_section, result, use_db)
> common/impala_test_suite.py:627: in __verify_results_and_errors
> replace_filenames_with_placeholder)
> common/test_result_verifier.py:520: in verify_raw_results
> VERIFIER_MAP[verifier](expected, actual)
> common/test_result_verifier.py:290: in verify_query_result_is_subset
> unicode(expected_row), unicode(actual_results))
> E   AssertionError: Could not find expected row 
> row_regex:'hdfs://localhost:20500/test-warehouse/test_read_mixed_format_position_deletes_6fb8ae98.db/ice_mixed_formats/data/.*-data-.*.parquet','.*B','','.*'
>  in actual rows:
> E   
> 'hdfs://localhost:20500/test-warehouse/test_read_mixed_format_position_deletes_6fb8ae98.db/ice_mixed_formats/data/0-0-data-jenkins_20240427231939_2457cdfb-2e04-471a-9661-4551626f60ee-job_17142740646310_0071-4-1.orc','306B','','NONE'
> {code}
> as the only resulting data file is a .orc file containing the only valid row.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13046) Rework Iceberg mixed format delete test for Hive optimization

2024-04-29 Thread Michael Smith (Jira)
Michael Smith created IMPALA-13046:
--

 Summary: Rework Iceberg mixed format delete test for Hive 
optimization
 Key: IMPALA-13046
 URL: https://issues.apache.org/jira/browse/IMPALA-13046
 Project: IMPALA
  Issue Type: Task
Reporter: Michael Smith


A Hive improvement (TBD) to Iceberg support changed Hive's behavior around 
handling deletes. It used to always add a delete file, but now if a delete 
would negate all the contents of a data file it instead removes the data file. 
This breaks iceberg-mixed-format-position-deletes.test
{code}
query_test/test_iceberg.py:1472: in test_read_mixed_format_position_deletes
vector, unique_database)
common/impala_test_suite.py:820: in run_test_case
self.__verify_results_and_errors(vector, test_section, result, use_db)
common/impala_test_suite.py:627: in __verify_results_and_errors
replace_filenames_with_placeholder)
common/test_result_verifier.py:520: in verify_raw_results
VERIFIER_MAP[verifier](expected, actual)
common/test_result_verifier.py:290: in verify_query_result_is_subset
unicode(expected_row), unicode(actual_results))
E   AssertionError: Could not find expected row 
row_regex:'hdfs://localhost:20500/test-warehouse/test_read_mixed_format_position_deletes_6fb8ae98.db/ice_mixed_formats/data/.*-data-.*.parquet','.*B','','.*'
 in actual rows:
E   
'hdfs://localhost:20500/test-warehouse/test_read_mixed_format_position_deletes_6fb8ae98.db/ice_mixed_formats/data/0-0-data-jenkins_20240427231939_2457cdfb-2e04-471a-9661-4551626f60ee-job_17142740646310_0071-4-1.orc','306B','','NONE'
{code}
as the only resulting data file is a .orc file containing the only valid row.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13044) Upgrade bouncycastle to 1.78

2024-04-29 Thread Peter Rozsa (Jira)
Peter Rozsa created IMPALA-13044:


 Summary: Upgrade bouncycastle to 1.78
 Key: IMPALA-13044
 URL: https://issues.apache.org/jira/browse/IMPALA-13044
 Project: IMPALA
  Issue Type: Task
  Components: Frontend
Affects Versions: Impala 4.3.0
Reporter: Peter Rozsa
Assignee: Peter Rozsa


Impala uses boucycastle:1.68 which contains various CVEs. Upgrading to 1.78 
resolves these security concerns.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org