[jira] [Updated] (IMPALA-10985) always_true hint is not needed if all predicates are on partitioning columns

2024-06-27 Thread Csaba Ringhofer (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer updated IMPALA-10985:
-
Description: IMPALA-10314 added always_true hint that leads to assuming 
that a file in a table will return at least one row even if there is a WHERE 
clause. Currently we need to add it even if all columns used in the WHERE are 
partitioning columns. This is not needed, as these predicates can't drop any 
more rows after partition pruning.  (was: IMPALA-10314 added always_true hint 
that leads to assuming that a file in a table will return at least on row even 
if there is a WHERE clause. Currently we need to add it even if all columns 
used in the WHERE are partitioning columns. This is not needed, as these 
predicates can't drop any more rows after partition pruning.)

> always_true hint is not needed if all predicates are on partitioning columns
> 
>
> Key: IMPALA-10985
> URL: https://issues.apache.org/jira/browse/IMPALA-10985
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Csaba Ringhofer
>Priority: Minor
>
> IMPALA-10314 added always_true hint that leads to assuming that a file in a 
> table will return at least one row even if there is a WHERE clause. Currently 
> we need to add it even if all columns used in the WHERE are partitioning 
> columns. This is not needed, as these predicates can't drop any more rows 
> after partition pruning.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-5078) Break up expr-test.cc

2024-06-27 Thread Csaba Ringhofer (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-5078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17860452#comment-17860452
 ] 

Csaba Ringhofer commented on IMPALA-5078:
-

[~sy117] Were you able to make some progress with this / do you plan to?
This ticket is not urgent, but in the long run it would be really nice to break 
up expr-test.cc

> Break up expr-test.cc
> -
>
> Key: IMPALA-5078
> URL: https://issues.apache.org/jira/browse/IMPALA-5078
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Henry Robinson
>Assignee: Csaba Ringhofer
>Priority: Minor
>  Labels: newbie, ramp-up
> Attachments: Screen Shot 2020-06-30 at 12.19.16 PM.png, Screen Shot 
> 2020-07-10 at 1.01.43 PM.png, Screen Shot 2020-07-10 at 11.16.36 AM.png, 
> Screen Shot 2020-07-10 at 11.27.57 AM.png, image-2020-07-10-13-22-48-230.png
>
>
> {{expr-test.cc}} clocks in at 7129 lines, which is about enough for my emacs 
> to start slowing down a bit. Let's see if we can refactor it enough to have a 
> couple of test files. Maybe moving all the string instructions into a 
> separate {{expr-string-test.cc}}, and having a common header will be enough 
> to make it a bit more manageable. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13183) Add default timeout for hs2/beeswax server sockets

2024-06-26 Thread Csaba Ringhofer (Jira)
Csaba Ringhofer created IMPALA-13183:


 Summary: Add default timeout for hs2/beeswax server sockets
 Key: IMPALA-13183
 URL: https://issues.apache.org/jira/browse/IMPALA-13183
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Reporter: Csaba Ringhofer


Currently Impala only sets timeout  for specific operations, for example during 
SASL handshake and when checking if connection can be closed due to idle 
session.
https://github.com/apache/impala/blob/d39596f6fb7da54c24d02523c4691e6b1973857b/be/src/rpc/TAcceptQueueServer.cpp#L153
https://github.com/apache/impala/blob/d39596f6fb7da54c24d02523c4691e6b1973857b/be/src/transport/TSaslServerTransport.cpp#L145

There are several cases where an inactive client could keep the connection open 
indefinitely, for example if it hasn't opened a session yet.
I think that there should be a general longer timeout set for both send/recv, 
e.g. flag client_default_timout_s=3600.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13183) Add default timeout for hs2/beeswax server sockets

2024-06-26 Thread Csaba Ringhofer (Jira)
Csaba Ringhofer created IMPALA-13183:


 Summary: Add default timeout for hs2/beeswax server sockets
 Key: IMPALA-13183
 URL: https://issues.apache.org/jira/browse/IMPALA-13183
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Reporter: Csaba Ringhofer


Currently Impala only sets timeout  for specific operations, for example during 
SASL handshake and when checking if connection can be closed due to idle 
session.
https://github.com/apache/impala/blob/d39596f6fb7da54c24d02523c4691e6b1973857b/be/src/rpc/TAcceptQueueServer.cpp#L153
https://github.com/apache/impala/blob/d39596f6fb7da54c24d02523c4691e6b1973857b/be/src/transport/TSaslServerTransport.cpp#L145

There are several cases where an inactive client could keep the connection open 
indefinitely, for example if it hasn't opened a session yet.
I think that there should be a general longer timeout set for both send/recv, 
e.g. flag client_default_timout_s=3600.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-12370) Add an option to customize timezone when working with UNIXTIME_MICROS columns of Kudu tables

2024-06-20 Thread Csaba Ringhofer (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer resolved IMPALA-12370.
--
Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> Add an option to customize timezone when working with UNIXTIME_MICROS columns 
> of Kudu tables
> 
>
> Key: IMPALA-12370
> URL: https://issues.apache.org/jira/browse/IMPALA-12370
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Alexey Serbin
>Assignee: Csaba Ringhofer
>Priority: Major
>  Labels: timezone
> Fix For: Impala 4.5.0
>
>
> Impala uses the timezone of its server when converting Unix epoch time stored 
> in a Kudu table in a column of UNIXTIME_MICROS type (legacy type name 
> TIMESTAMP) into a timestamp.  As one can see, the former (a values stored in 
> a column of the UNIXTIME_MICROS type) does not contain information about 
> timezone, but the latter (the result timestamp returned by Impala) does, and 
> Impala's convention does make sense and works totally fine if the data is 
> being written and read by Impala or by other application that uses the same 
> convention.
> However, Spark uses a different convention.  Spark applications convert 
> timestamps to the UTC timezone before representing the result as Unix epoch 
> time.  So, when a Spark application stores timestamp data in a Kudu table, 
> there is a difference in the result timestamps upon reading the stored data 
> via Impala if Impala servers are running in other than the UTC timezone.
> As of now, the workaround is to run Impala servers in the UTC timezone, so 
> the convention used by Spark produces the same result as the convention used 
> by Impala when converting between timestamps and Unix epoch times.
> In this context, it would be great to make it possible customizing the 
> timezone that's used by Impala when working with UNIXTIME_MICROS/TIMESTAMP 
> values stored in Kudu tables.  That will free the users from the 
> inconvenience of running their clusters in the UTC timezone if they use a mix 
> of Spark/Impala applications to work with the same data stored in Kudu 
> tables.  Ideally, the setting should be per Kudu table, but a system-wide 
> flag is also an option.
> This is similar to IMPALA-1658.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IMPALA-12322) return wrong timestamp when scan kudu timestamp with timezone

2024-06-07 Thread Csaba Ringhofer (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853203#comment-17853203
 ] 

Csaba Ringhofer commented on IMPALA-12322:
--

Thanks for the feedback[~eyizoha]. I have uploaded a patch that adds a new 
query option:  https://gerrit.cloudera.org/#/c/21492/

> return wrong timestamp when scan kudu timestamp with timezone
> -
>
> Key: IMPALA-12322
> URL: https://issues.apache.org/jira/browse/IMPALA-12322
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 4.1.1
> Environment: impala 4.1.1
>Reporter: daicheng
>Assignee: Zihao Ye
>Priority: Major
> Attachments: image-2022-04-24-00-01-05-746-1.png, 
> image-2022-04-24-00-01-05-746.png, image-2022-04-24-00-01-37-520.png, 
> image-2022-04-24-00-03-14-467-1.png, image-2022-04-24-00-03-14-467.png, 
> image-2022-04-24-00-04-16-240-1.png, image-2022-04-24-00-04-16-240.png, 
> image-2022-04-24-00-04-52-860-1.png, image-2022-04-24-00-04-52-860.png, 
> image-2022-04-24-00-05-52-086-1.png, image-2022-04-24-00-05-52-086.png, 
> image-2022-04-24-00-07-09-776-1.png, image-2022-04-24-00-07-09-776.png, 
> image-2023-07-28-20-31-09-457.png, image-2023-07-28-22-27-38-521.png, 
> image-2023-07-28-22-29-40-083.png, image-2023-07-28-22-36-17-460.png, 
> image-2023-07-28-22-36-37-884.png, image-2023-07-28-22-38-19-728.png
>
>
> impala version is 3.1.0-cdh6.1
> i have set system timezone=Asia/Shanghai:
> !image-2022-04-24-00-01-37-520.png!
> !image-2022-04-24-00-01-05-746.png!
> here is the bug:
> *step 1*
> i have parquet file with two columns like below,and read it with impala-shell 
> and spark (timezone=shanghai)
> !image-2022-04-24-00-03-14-467.png|width=1016,height=154!
> !image-2022-04-24-00-04-16-240.png|width=944,height=367!
> the result both exactly right。
> *step two*
> create kudu table  with impala-shell:
> CREATE TABLE default.test_{_}test{_}_test_time2 (id BIGINT,t 
> TIMESTAMP,PRIMARY KEY (id) ) STORED AS KUDU;
> note: kudu version:1.8
> and  insert 2 row into the table with spark :
> !image-2022-04-24-00-04-52-860.png|width=914,height=279!
> *stop 3*
> read it with spark (timezone=shanghai),spark read kudu table with kudu-client 
> api,here is the result:
> !image-2022-04-24-00-05-52-086.png|width=914,height=301!
> the result is still exactly right。
> but read it with impala-shell: 
> !image-2022-04-24-00-07-09-776.png|width=915,height=154!
> the result show late 8hour
> *conclusion*
>    it seems like impala timezone didn't work when kudu column type is 
> timestamp, but it work fine in parquet file,I don't know why?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12322) return wrong timestamp when scan kudu timestamp with timezone

2024-05-31 Thread Csaba Ringhofer (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851147#comment-17851147
 ] 

Csaba Ringhofer commented on IMPALA-12322:
--

[~eyizoha] convert_kudu_utc_timestamps only affects reading, so if Impala 
writes a Kudu table, it will read back a different timestamp than what it 
written

In IMPALA-12370 there is some discussion about how to configure writing 
behavior. Do you think that convert_kudu_utc_timestamps should also govern 
writing, or that should get a separate query option?

> return wrong timestamp when scan kudu timestamp with timezone
> -
>
> Key: IMPALA-12322
> URL: https://issues.apache.org/jira/browse/IMPALA-12322
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 4.1.1
> Environment: impala 4.1.1
>Reporter: daicheng
>Assignee: Zihao Ye
>Priority: Major
> Attachments: image-2022-04-24-00-01-05-746-1.png, 
> image-2022-04-24-00-01-05-746.png, image-2022-04-24-00-01-37-520.png, 
> image-2022-04-24-00-03-14-467-1.png, image-2022-04-24-00-03-14-467.png, 
> image-2022-04-24-00-04-16-240-1.png, image-2022-04-24-00-04-16-240.png, 
> image-2022-04-24-00-04-52-860-1.png, image-2022-04-24-00-04-52-860.png, 
> image-2022-04-24-00-05-52-086-1.png, image-2022-04-24-00-05-52-086.png, 
> image-2022-04-24-00-07-09-776-1.png, image-2022-04-24-00-07-09-776.png, 
> image-2023-07-28-20-31-09-457.png, image-2023-07-28-22-27-38-521.png, 
> image-2023-07-28-22-29-40-083.png, image-2023-07-28-22-36-17-460.png, 
> image-2023-07-28-22-36-37-884.png, image-2023-07-28-22-38-19-728.png
>
>
> impala version is 3.1.0-cdh6.1
> i have set system timezone=Asia/Shanghai:
> !image-2022-04-24-00-01-37-520.png!
> !image-2022-04-24-00-01-05-746.png!
> here is the bug:
> *step 1*
> i have parquet file with two columns like below,and read it with impala-shell 
> and spark (timezone=shanghai)
> !image-2022-04-24-00-03-14-467.png|width=1016,height=154!
> !image-2022-04-24-00-04-16-240.png|width=944,height=367!
> the result both exactly right。
> *step two*
> create kudu table  with impala-shell:
> CREATE TABLE default.test_{_}test{_}_test_time2 (id BIGINT,t 
> TIMESTAMP,PRIMARY KEY (id) ) STORED AS KUDU;
> note: kudu version:1.8
> and  insert 2 row into the table with spark :
> !image-2022-04-24-00-04-52-860.png|width=914,height=279!
> *stop 3*
> read it with spark (timezone=shanghai),spark read kudu table with kudu-client 
> api,here is the result:
> !image-2022-04-24-00-05-52-086.png|width=914,height=301!
> the result is still exactly right。
> but read it with impala-shell: 
> !image-2022-04-24-00-07-09-776.png|width=915,height=154!
> the result show late 8hour
> *conclusion*
>    it seems like impala timezone didn't work when kudu column type is 
> timestamp, but it work fine in parquet file,I don't know why?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12370) Add an option to customize timezone when working with UNIXTIME_MICROS columns of Kudu tables

2024-05-31 Thread Csaba Ringhofer (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851132#comment-17851132
 ] 

Csaba Ringhofer commented on IMPALA-12370:
--

>That will free the users from the inconvenience of running their clusters in 
>the UTC timezone
The timezone doesn't need to be set at server level in Impala, it can be set 
per query using query option "timezone", e.g. set timezone=CET;

> Ideally, the setting should be per Kudu table, but a system-wide flag is also 
> an option.
Query option convert_kudu_utc_timestamps, only affects reading, so there could 
be a writing related one to, e.g. write_kudu_utc_timestamps. (or 
convert_kudu_utc_timestamps could be changed to also affect writing).

I agree that the ideal would be to be able to override this per table, for 
example with a table property like "impala.use_kudu_utc_timestamps" which would 
override both convert_kudu_utc_timestamps / write_kudu_utc_timestamps.
It would be even better if other components would also respect this property, 
so if it is false, then they would write in the timezone agnostic "Impala" way. 

> Add an option to customize timezone when working with UNIXTIME_MICROS columns 
> of Kudu tables
> 
>
> Key: IMPALA-12370
> URL: https://issues.apache.org/jira/browse/IMPALA-12370
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Alexey Serbin
>Priority: Major
>  Labels: timezone
>
> Impala uses the timezone of its server when converting Unix epoch time stored 
> in a Kudu table in a column of UNIXTIME_MICROS type (legacy type name 
> TIMESTAMP) into a timestamp.  As one can see, the former (a values stored in 
> a column of the UNIXTIME_MICROS type) does not contain information about 
> timezone, but the latter (the result timestamp returned by Impala) does, and 
> Impala's convention does make sense and works totally fine if the data is 
> being written and read by Impala or by other application that uses the same 
> convention.
> However, Spark uses a different convention.  Spark applications convert 
> timestamps to the UTC timezone before representing the result as Unix epoch 
> time.  So, when a Spark application stores timestamp data in a Kudu table, 
> there is a difference in the result timestamps upon reading the stored data 
> via Impala if Impala servers are running in other than the UTC timezone.
> As of now, the workaround is to run Impala servers in the UTC timezone, so 
> the convention used by Spark produces the same result as the convention used 
> by Impala when converting between timestamps and Unix epoch times.
> In this context, it would be great to make it possible customizing the 
> timezone that's used by Impala when working with UNIXTIME_MICROS/TIMESTAMP 
> values stored in Kudu tables.  That will free the users from the 
> inconvenience of running their clusters in the UTC timezone if they use a mix 
> of Spark/Impala applications to work with the same data stored in Kudu 
> tables.  Ideally, the setting should be per Kudu table, but a system-wide 
> flag is also an option.
> This is similar to IMPALA-1658.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12656) impala-shell cannot be installed on Python 3.11

2024-05-29 Thread Csaba Ringhofer (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850457#comment-17850457
 ] 

Csaba Ringhofer commented on IMPALA-12656:
--

I also bumped into this and tried building the python-sasl PRs
https://github.com/cloudera/python-sasl/pull/32 worked with 3.11 and 3.12 but 
broke 2.7 (at least in my environment). The other PR only fix 3.11, but had 
other build failures with 3.12.

I think that this is a good reason to drop Python 2.7 support.

> impala-shell cannot be installed on Python 3.11
> ---
>
> Key: IMPALA-12656
> URL: https://issues.apache.org/jira/browse/IMPALA-12656
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 4.3.0
>Reporter: Michael Smith
>Priority: Major
>  Labels: python3
>
> Trying to {{pip install impala-shell}} fails with
> {code:java}
>       clang -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG 
> -g -fwrapv -O3 -Wall -isysroot 
> /Library/Developer/CommandLineTools/SDKs/MacOSX14.sdk -Isasl 
> -I/opt/homebrew/opt/python@3.11/Frameworks/Python.framework/Versions/3.11/include/python3.11
>  -c sasl/saslwrapper.cpp -o 
> build/temp.macosx-14-arm64-cpython-311/sasl/saslwrapper.o
>       sasl/saslwrapper.cpp:196:12: fatal error: 'longintrepr.h' file not found
>         #include "longintrepr.h"
>                  ^~~
>       1 error generated. {code}
> Python 3.11 moved this file to a subdirectory in 
> [https://github.com/python/cpython/commit/8e5de40f90476249e9a2e5ef135143b5c6a0b512.]
> Adopting [https://github.com/cloudera/python-sasl/pull/31] or 
> [https://github.com/cloudera/python-sasl/pull/32] might fix it. But they need 
> to be included in a new release of sasl on pypi.org.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12656) impala-shell cannot be installed on Python 3.11

2024-05-29 Thread Csaba Ringhofer (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer updated IMPALA-12656:
-
Priority: Critical  (was: Major)

> impala-shell cannot be installed on Python 3.11
> ---
>
> Key: IMPALA-12656
> URL: https://issues.apache.org/jira/browse/IMPALA-12656
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 4.3.0
>Reporter: Michael Smith
>Priority: Critical
>  Labels: python3
>
> Trying to {{pip install impala-shell}} fails with
> {code:java}
>       clang -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG 
> -g -fwrapv -O3 -Wall -isysroot 
> /Library/Developer/CommandLineTools/SDKs/MacOSX14.sdk -Isasl 
> -I/opt/homebrew/opt/python@3.11/Frameworks/Python.framework/Versions/3.11/include/python3.11
>  -c sasl/saslwrapper.cpp -o 
> build/temp.macosx-14-arm64-cpython-311/sasl/saslwrapper.o
>       sasl/saslwrapper.cpp:196:12: fatal error: 'longintrepr.h' file not found
>         #include "longintrepr.h"
>                  ^~~
>       1 error generated. {code}
> Python 3.11 moved this file to a subdirectory in 
> [https://github.com/python/cpython/commit/8e5de40f90476249e9a2e5ef135143b5c6a0b512.]
> Adopting [https://github.com/cloudera/python-sasl/pull/31] or 
> [https://github.com/cloudera/python-sasl/pull/32] might fix it. But they need 
> to be included in a new release of sasl on pypi.org.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-11512) BINARY support in Iceberg

2024-05-23 Thread Csaba Ringhofer (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848937#comment-17848937
 ] 

Csaba Ringhofer commented on IMPALA-11512:
--

BINARY columns seem to be working with Iceberg, but testing seems very limited. 
I didn't find any test with partition spec on BINARY columns.

> BINARY support in Iceberg
> -
>
> Key: IMPALA-11512
> URL: https://issues.apache.org/jira/browse/IMPALA-11512
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Frontend
>Reporter: Csaba Ringhofer
>Priority: Major
>  Labels: impala-iceberg
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-12990) impala-shell broken if Iceberg delete deletes 0 rows

2024-05-17 Thread Csaba Ringhofer (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer resolved IMPALA-12990.
--
Fix Version/s: Impala 4.4.0
   Resolution: Fixed

> impala-shell broken if Iceberg delete deletes 0 rows
> 
>
> Key: IMPALA-12990
> URL: https://issues.apache.org/jira/browse/IMPALA-12990
> Project: IMPALA
>  Issue Type: Bug
>  Components: Clients
>Reporter: Csaba Ringhofer
>Assignee: Csaba Ringhofer
>Priority: Major
>  Labels: iceberg
> Fix For: Impala 4.4.0
>
>
> Happens only with Python 3
> {code}
> impala-python3 shell/impala_shell.py
> create table icebergupdatet (i int, s string) stored as iceberg;
> alter table icebergupdatet set tblproperties("format-version"="2");
> delete from icebergupdatet where i=0;
> Unknown Exception : '>' not supported between instances of 'NoneType' and 
> 'int'
> Traceback (most recent call last):
>   File "shell/impala_shell.py", line 1428, in _execute_stmt
> if is_dml and num_rows == 0 and num_deleted_rows > 0:
> TypeError: '>' not supported between instances of 'NoneType' and 'int'
> {code}
> The same erros should also happen when the delete removes > 0 rows, but the 
> impala server has an older version that doesn't set TDmlResult.rows_deleted



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-12990) impala-shell broken if Iceberg delete deletes 0 rows

2024-05-17 Thread Csaba Ringhofer (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer resolved IMPALA-12990.
--
Fix Version/s: Impala 4.4.0
   Resolution: Fixed

> impala-shell broken if Iceberg delete deletes 0 rows
> 
>
> Key: IMPALA-12990
> URL: https://issues.apache.org/jira/browse/IMPALA-12990
> Project: IMPALA
>  Issue Type: Bug
>  Components: Clients
>Reporter: Csaba Ringhofer
>Assignee: Csaba Ringhofer
>Priority: Major
>  Labels: iceberg
> Fix For: Impala 4.4.0
>
>
> Happens only with Python 3
> {code}
> impala-python3 shell/impala_shell.py
> create table icebergupdatet (i int, s string) stored as iceberg;
> alter table icebergupdatet set tblproperties("format-version"="2");
> delete from icebergupdatet where i=0;
> Unknown Exception : '>' not supported between instances of 'NoneType' and 
> 'int'
> Traceback (most recent call last):
>   File "shell/impala_shell.py", line 1428, in _execute_stmt
> if is_dml and num_rows == 0 and num_deleted_rows > 0:
> TypeError: '>' not supported between instances of 'NoneType' and 'int'
> {code}
> The same erros should also happen when the delete removes > 0 rows, but the 
> impala server has an older version that doesn't set TDmlResult.rows_deleted



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-12978) IMPALA-12544 made impala-shell incompatible with old impala servers

2024-05-17 Thread Csaba Ringhofer (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer resolved IMPALA-12978.
--
Fix Version/s: Impala 4.4.0
   Resolution: Fixed

> IMPALA-12544 made impala-shell incompatible with old impala servers
> ---
>
> Key: IMPALA-12978
> URL: https://issues.apache.org/jira/browse/IMPALA-12978
> Project: IMPALA
>  Issue Type: Bug
>  Components: Clients
>Reporter: Csaba Ringhofer
>Assignee: Csaba Ringhofer
>Priority: Critical
> Fix For: Impala 4.4.0
>
>
> IMPALA-12544 uses  "progress.total_fragment_instances > 0:", but 
> total_fragment_instances is None if the server is older and does not know 
> this Thrift member yet (added in IMPALA-12048). 
> [https://github.com/apache/impala/blob/fb3c379f395635f9f6927b40694bc3dd95a2866f/shell/impala_shell.py#L1320]
>  
> This leads to error messages in interactive shell sessions when progress 
> reporting is enabled.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-12392) Fix describe statements once HIVE-24509 arrives as dependency

2024-05-16 Thread Csaba Ringhofer (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer resolved IMPALA-12392.
--
Resolution: Done

[~hemanth619]sure, closing this

> Fix describe statements once HIVE-24509 arrives as dependency
> -
>
> Key: IMPALA-12392
> URL: https://issues.apache.org/jira/browse/IMPALA-12392
> Project: IMPALA
>  Issue Type: Task
>  Components: Catalog
>Reporter: Csaba Ringhofer
>Assignee: Csaba Ringhofer
>Priority: Major
>
> HIVE-24509 will break test_describe_materialized_view as the ShowUtils in not 
> included in our shaded jar.
> It would be also nice to switch to ShowUtils.TextMetaDataTable here:
> https://github.com/apache/impala/blob/a34f7ce63299c72ef45a99b01bb4e80210befbff/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java#L88
> AFAIK the old function is kept in Hive only because of Impala.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13056) HBaseTableScanner's timeout handling looks broken

2024-05-03 Thread Csaba Ringhofer (Jira)
Csaba Ringhofer created IMPALA-13056:


 Summary: HBaseTableScanner's timeout handling looks broken
 Key: IMPALA-13056
 URL: https://issues.apache.org/jira/browse/IMPALA-13056
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Reporter: Csaba Ringhofer


https://gerrit.cloudera.org/#/c/12660/ rewrote some JNI exception handling code 
and accidentally eliminated the timeout handling in 
https://github.com/apache/impala/blob/7ad94006563b88d9221b4ac978dbf5b4fc0a3ca1/be/src/exec/hbase/hbase-table-scanner.cc#L518



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13056) HBaseTableScanner's timeout handling looks broken

2024-05-03 Thread Csaba Ringhofer (Jira)
Csaba Ringhofer created IMPALA-13056:


 Summary: HBaseTableScanner's timeout handling looks broken
 Key: IMPALA-13056
 URL: https://issues.apache.org/jira/browse/IMPALA-13056
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Reporter: Csaba Ringhofer


https://gerrit.cloudera.org/#/c/12660/ rewrote some JNI exception handling code 
and accidentally eliminated the timeout handling in 
https://github.com/apache/impala/blob/7ad94006563b88d9221b4ac978dbf5b4fc0a3ca1/be/src/exec/hbase/hbase-table-scanner.cc#L518



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IMPALA-13052) Sampling aggregate result sizes are underestimated

2024-05-02 Thread Csaba Ringhofer (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer updated IMPALA-13052:
-
Description: 
Sampling aggregates (sample, appx_median, histogram) return a string that can 
be quite large, but the planner assumes it to have a fixed small size.

Examples:
select sample(l_orderkey) from tpch.lineitem;
according to plan: row-size=12B
in reality: TotalBytesSent: 254.45 KB (this is  single row sent by a host)

select appx_median(l_orderkey) from tpch.lineitem;
according to plan: row-size= 8B
in reality: TotalBytesSent: 254.68 KB (this is  single row sent by a host)

select histogram(l_orderkey) from tpch.lineitem;
according to plan: row-size=12B
in reality: TotalBytesSent: 254.35 KB (this is  single row sent by a host)

This may be also relevant for datasketches functions, haven't checked those  
yet.

This can lead to highly underestimating the memory needs of grouping 
aggregators:
select appx_median(l_shipmode) from lineitem group by l_orderkey order by 1 
limit 1
04:AGGREGATE  FINALIZE Peak Mem:  2.19 GB   Est. Peak Mem:  18.00 MB
01:AGGREGATE STREAMING  Peak Mem:   2.37 GB   Est. Peak Mem:  45.79 MB

Enforcing PREAGG_BYTES_LIMIT also doesn't seem to work well -setting a 40MB 
limit decreased peak mem to 1.5 GB. My guess is that the pre-aggregation logic 
is not prepared for aggregation states that grow during the execution, so it 
can decide to not add another group to the hash table, but can't deny 
increasing an existing one's state.


  was:
Sampling aggregates (sample, appx_median, histogram) return a string that can 
be quite large, but the planner assumes it to have a fixed small size.

Examples:
select sample(l_orderkey) from tpch.lineitem;
according to plan: row-size=12B
in reality: TotalBytesSent: 254.45 KB (this is  single row sent by a host)

select appx_median(l_orderkey) from tpch.lineitem;
according to plan: row-size= 8B
in reality: TotalBytesSent: 254.68 KB (this is  single row sent by a host)

select histogram(l_orderkey) from tpch.lineitem;
according to plan: row-size=12B
in reality: TotalBytesSent: 254.35 KB (this is  single row sent by a host)

This may be also relevant for datasketches functions, haven't checked thos  yet.

This can lead to highly underestimating the memory needs of grouping 
aggregators:
select appx_median(l_shipmode) from lineitem group by l_orderkey order by 1 
limit 1
04:AGGREGATE  FINALIZE Peak Mem:  2.19 GB   Est. Peak Mem:  18.00 MB
01:AGGREGATE STREAMING  Peak Mem:   2.37 GB   Est. Peak Mem:  45.79 MB


> Sampling aggregate result sizes are underestimated
> --
>
> Key: IMPALA-13052
> URL: https://issues.apache.org/jira/browse/IMPALA-13052
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Csaba Ringhofer
>Priority: Major
>
> Sampling aggregates (sample, appx_median, histogram) return a string that can 
> be quite large, but the planner assumes it to have a fixed small size.
> Examples:
> select sample(l_orderkey) from tpch.lineitem;
> according to plan: row-size=12B
> in reality: TotalBytesSent: 254.45 KB (this is  single row sent by a host)
> select appx_median(l_orderkey) from tpch.lineitem;
> according to plan: row-size= 8B
> in reality: TotalBytesSent: 254.68 KB (this is  single row sent by a host)
> select histogram(l_orderkey) from tpch.lineitem;
> according to plan: row-size=12B
> in reality: TotalBytesSent: 254.35 KB (this is  single row sent by a host)
> This may be also relevant for datasketches functions, haven't checked those  
> yet.
> This can lead to highly underestimating the memory needs of grouping 
> aggregators:
> select appx_median(l_shipmode) from lineitem group by l_orderkey order by 1 
> limit 1
> 04:AGGREGATE  FINALIZE Peak Mem:  2.19 GB   Est. Peak Mem:  18.00 MB
> 01:AGGREGATE STREAMING  Peak Mem:   2.37 GB   Est. Peak Mem:  45.79 MB
> Enforcing PREAGG_BYTES_LIMIT also doesn't seem to work well -setting a 40MB 
> limit decreased peak mem to 1.5 GB. My guess is that the pre-aggregation 
> logic is not prepared for aggregation states that grow during the execution, 
> so it can decide to not add another group to the hash table, but can't deny 
> increasing an existing one's state.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-13052) Sampling aggregate result sizes are underestimated

2024-05-02 Thread Csaba Ringhofer (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer updated IMPALA-13052:
-
Description: 
Sampling aggregates (sample, appx_median, histogram) return a string that can 
be quite large, but the planner assumes it to have a fixed small size.

Examples:
select sample(l_orderkey) from tpch.lineitem;
according to plan: row-size=12B
in reality: TotalBytesSent: 254.45 KB (this is  single row sent by a host)

select appx_median(l_orderkey) from tpch.lineitem;
according to plan: row-size= 8B
in reality: TotalBytesSent: 254.68 KB (this is  single row sent by a host)

select histogram(l_orderkey) from tpch.lineitem;
according to plan: row-size=12B
in reality: TotalBytesSent: 254.35 KB (this is  single row sent by a host)

This may be also relevant for datasketches functions, haven't checked thos  yet.

This can lead to highly underestimating the memory needs of grouping 
aggregators:
select appx_median(l_shipmode) from lineitem group by l_orderkey order by 1 
limit 1
04:AGGREGATE  FINALIZE Peak Mem:  2.19 GB   Est. Peak Mem:  18.00 MB
01:AGGREGATE STREAMING  Peak Mem:   2.37 GB   Est. Peak Mem:  45.79 MB

  was:
Sampling aggregates (sample, appx_median, histogram) return a string that can 
be quite large, but the planner assumes it to have a fixed small size.

Examples:
select sample(l_orderkey) from tpch.lineitem;
according to plan: row-size=12B
in reality: TotalBytesSent: 254.45 KB (this is  single row sent by a host)

select appx_median(l_orderkey) from tpch.lineitem;
according to plan: row-size= 8B
in reality: TotalBytesSent: 254.68 KB (this is  single row sent by a host)

select histogram(l_orderkey) from tpch.lineitem;
according to plan: row-size=12B
in reality: TotalBytesSent: 254.35 KB (this is  single row sent by a host)

This may be also relevant for datasketches functions.



> Sampling aggregate result sizes are underestimated
> --
>
> Key: IMPALA-13052
> URL: https://issues.apache.org/jira/browse/IMPALA-13052
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Csaba Ringhofer
>Priority: Major
>
> Sampling aggregates (sample, appx_median, histogram) return a string that can 
> be quite large, but the planner assumes it to have a fixed small size.
> Examples:
> select sample(l_orderkey) from tpch.lineitem;
> according to plan: row-size=12B
> in reality: TotalBytesSent: 254.45 KB (this is  single row sent by a host)
> select appx_median(l_orderkey) from tpch.lineitem;
> according to plan: row-size= 8B
> in reality: TotalBytesSent: 254.68 KB (this is  single row sent by a host)
> select histogram(l_orderkey) from tpch.lineitem;
> according to plan: row-size=12B
> in reality: TotalBytesSent: 254.35 KB (this is  single row sent by a host)
> This may be also relevant for datasketches functions, haven't checked thos  
> yet.
> This can lead to highly underestimating the memory needs of grouping 
> aggregators:
> select appx_median(l_shipmode) from lineitem group by l_orderkey order by 1 
> limit 1
> 04:AGGREGATE  FINALIZE Peak Mem:  2.19 GB   Est. Peak Mem:  18.00 MB
> 01:AGGREGATE STREAMING  Peak Mem:   2.37 GB   Est. Peak Mem:  45.79 MB



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13052) Sampling aggregate result sizes are underestimated

2024-05-02 Thread Csaba Ringhofer (Jira)
Csaba Ringhofer created IMPALA-13052:


 Summary: Sampling aggregate result sizes are underestimated
 Key: IMPALA-13052
 URL: https://issues.apache.org/jira/browse/IMPALA-13052
 Project: IMPALA
  Issue Type: Bug
Reporter: Csaba Ringhofer


Sampling aggregates (sample, appx_median, histogram) return a string that can 
be quite large, but the planner assumes it to have a fixed small size.

Examples:
select sample(l_orderkey) from tpch.lineitem;
according to plan: row-size=12B
in reality: TotalBytesSent: 254.45 KB (this is  single row sent by a host)

select appx_median(l_orderkey) from tpch.lineitem;
according to plan: row-size= 8B
in reality: TotalBytesSent: 254.68 KB (this is  single row sent by a host)

select histogram(l_orderkey) from tpch.lineitem;
according to plan: row-size=12B
in reality: TotalBytesSent: 254.35 KB (this is  single row sent by a host)

This may be also relevant for datasketches functions.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13052) Sampling aggregate result sizes are underestimated

2024-05-02 Thread Csaba Ringhofer (Jira)
Csaba Ringhofer created IMPALA-13052:


 Summary: Sampling aggregate result sizes are underestimated
 Key: IMPALA-13052
 URL: https://issues.apache.org/jira/browse/IMPALA-13052
 Project: IMPALA
  Issue Type: Bug
Reporter: Csaba Ringhofer


Sampling aggregates (sample, appx_median, histogram) return a string that can 
be quite large, but the planner assumes it to have a fixed small size.

Examples:
select sample(l_orderkey) from tpch.lineitem;
according to plan: row-size=12B
in reality: TotalBytesSent: 254.45 KB (this is  single row sent by a host)

select appx_median(l_orderkey) from tpch.lineitem;
according to plan: row-size= 8B
in reality: TotalBytesSent: 254.68 KB (this is  single row sent by a host)

select histogram(l_orderkey) from tpch.lineitem;
according to plan: row-size=12B
in reality: TotalBytesSent: 254.35 KB (this is  single row sent by a host)

This may be also relevant for datasketches functions.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IMPALA-13048) Shuffle hint on joins is ignored in some cases

2024-04-30 Thread Csaba Ringhofer (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-13048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer updated IMPALA-13048:
-
Description: 
I noticed that shuffle hint is ignored without any warning in some cases

shuffle hint is not applied in this query:

{code}
explain select  * from alltypestiny a2 join /* +SHUFFLE */ alltypes a1 on 
a1.id=a2.id join alltypessmall a3 on a2.tinyint_col=a3.tinyint_col;
{code}
result plan
{code}
PLAN-ROOT SINK
|
07:EXCHANGE [UNPARTITIONED]
|
04:HASH JOIN [INNER JOIN, BROADCAST]
|  hash predicates: a3.tinyint_col = a2.tinyint_col
|  runtime filters: RF000 <- a2.tinyint_col
|  row-size=267B cardinality=80
|
|--06:EXCHANGE [BROADCAST]
|  |
|  03:HASH JOIN [INNER JOIN, BROADCAST]
|  |  hash predicates: a1.id = a2.id
|  |  runtime filters: RF002 <- a2.id
|  |  row-size=178B cardinality=8
|  |
|  |--05:EXCHANGE [BROADCAST]
|  |  |
|  |  00:SCAN HDFS [functional.alltypestiny a2]
|  | HDFS partitions=4/4 files=4 size=460B
|  | row-size=89B cardinality=8
|  |
|  01:SCAN HDFS [functional.alltypes a1]
| HDFS partitions=24/24 files=24 size=478.45KB
| runtime filters: RF002 -> a1.id
| row-size=89B cardinality=7.30K
|
02:SCAN HDFS [functional.alltypessmall a3]
   HDFS partitions=4/4 files=4 size=6.32KB
   runtime filters: RF000 -> a3.tinyint_col
   row-size=89B cardinality=100
{code}

if the first two tables' position is swapped, then it is applied:
{code}
explain select  * from alltypes a1 join /* +SHUFFLE */ alltypestiny a2 on 
a1.id=a2.id join alltypessmall a3 on a2.tinyint_col=a3.tinyint_col;
{code}

  was:
I noticed that shuffle hint is ignore without any warning in some cases

shuffle hint is not applied in this query:

{code}
explain select  * from alltypestiny a2 join /* +SHUFFLE */ alltypes a1 on 
a1.id=a2.id join alltypessmall a3 on a2.tinyint_col=a3.tinyint_col;
{code}
result plan
{code}
PLAN-ROOT SINK
|
07:EXCHANGE [UNPARTITIONED]
|
04:HASH JOIN [INNER JOIN, BROADCAST]
|  hash predicates: a3.tinyint_col = a2.tinyint_col
|  runtime filters: RF000 <- a2.tinyint_col
|  row-size=267B cardinality=80
|
|--06:EXCHANGE [BROADCAST]
|  |
|  03:HASH JOIN [INNER JOIN, BROADCAST]
|  |  hash predicates: a1.id = a2.id
|  |  runtime filters: RF002 <- a2.id
|  |  row-size=178B cardinality=8
|  |
|  |--05:EXCHANGE [BROADCAST]
|  |  |
|  |  00:SCAN HDFS [functional.alltypestiny a2]
|  | HDFS partitions=4/4 files=4 size=460B
|  | row-size=89B cardinality=8
|  |
|  01:SCAN HDFS [functional.alltypes a1]
| HDFS partitions=24/24 files=24 size=478.45KB
| runtime filters: RF002 -> a1.id
| row-size=89B cardinality=7.30K
|
02:SCAN HDFS [functional.alltypessmall a3]
   HDFS partitions=4/4 files=4 size=6.32KB
   runtime filters: RF000 -> a3.tinyint_col
   row-size=89B cardinality=100
{code}

if the first two tables' position is swapped, then it is applied:
{code}
explain select  * from alltypes a1 join /* +SHUFFLE */ alltypestiny a2 on 
a1.id=a2.id join alltypessmall a3 on a2.tinyint_col=a3.tinyint_col;
{code}


> Shuffle hint on joins is ignored in some cases
> --
>
> Key: IMPALA-13048
> URL: https://issues.apache.org/jira/browse/IMPALA-13048
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Csaba Ringhofer
>Priority: Major
>
> I noticed that shuffle hint is ignored without any warning in some cases
> shuffle hint is not applied in this query:
> {code}
> explain select  * from alltypestiny a2 join /* +SHUFFLE */ alltypes a1 on 
> a1.id=a2.id join alltypessmall a3 on a2.tinyint_col=a3.tinyint_col;
> {code}
> result plan
> {code}
> PLAN-ROOT SINK
> |
> 07:EXCHANGE [UNPARTITIONED]
> |
> 04:HASH JOIN [INNER JOIN, BROADCAST]
> |  hash predicates: a3.tinyint_col = a2.tinyint_col
> |  runtime filters: RF000 <- a2.tinyint_col
> |  row-size=267B cardinality=80
> |
> |--06:EXCHANGE [BROADCAST]
> |  |
> |  03:HASH JOIN [INNER JOIN, BROADCAST]
> |  |  hash predicates: a1.id = a2.id
> |  |  runtime filters: RF002 <- a2.id
> |  |  row-size=178B cardinality=8
> |  |
> |  |--05:EXCHANGE [BROADCAST]
> |  |  |
> |  |  00:SCAN HDFS [functional.alltypestiny a2]
> |  | HDFS partitions=4/4 files=4 size=460B
> |  | row-size=89B cardinality=8
> |  |
> |  01:SCAN HDFS [functional.alltypes a1]
> | HDFS partitions=24/24 files=24 size=478.45KB
> | runtime filters: RF002 -> a1.id
> | row-size=89B cardinality=7.30K
> |
> 02:SCAN HDFS [functional.alltypessmall a3]
>HDFS partitions=4/4 files=4 size=6.32KB
>runtime filters: RF000 -> a3.tinyint_col
>row-size=89B cardinality=100
> {code}
> if the first two tables' position is swapped, then it is applied:
> {code}
> explain select  * from alltypes a1 join /* +SHUFFLE */ alltypestiny a2 on 
> a1.id=a2.id join alltypessmall a3 on a2.tinyint_col=a3.tinyint_col;
> {code}



--
This message was sent by Atlassian Jira

[jira] [Created] (IMPALA-13048) Shuffle hint on joins is ignored in some cases

2024-04-30 Thread Csaba Ringhofer (Jira)
Csaba Ringhofer created IMPALA-13048:


 Summary: Shuffle hint on joins is ignored in some cases
 Key: IMPALA-13048
 URL: https://issues.apache.org/jira/browse/IMPALA-13048
 Project: IMPALA
  Issue Type: Bug
Reporter: Csaba Ringhofer


I noticed that shuffle hint is ignore without any warning in some cases

shuffle hint is not applied in this query:

{code}
explain select  * from alltypestiny a2 join /* +SHUFFLE */ alltypes a1 on 
a1.id=a2.id join alltypessmall a3 on a2.tinyint_col=a3.tinyint_col;
{code}
result plan
{code}
PLAN-ROOT SINK
|
07:EXCHANGE [UNPARTITIONED]
|
04:HASH JOIN [INNER JOIN, BROADCAST]
|  hash predicates: a3.tinyint_col = a2.tinyint_col
|  runtime filters: RF000 <- a2.tinyint_col
|  row-size=267B cardinality=80
|
|--06:EXCHANGE [BROADCAST]
|  |
|  03:HASH JOIN [INNER JOIN, BROADCAST]
|  |  hash predicates: a1.id = a2.id
|  |  runtime filters: RF002 <- a2.id
|  |  row-size=178B cardinality=8
|  |
|  |--05:EXCHANGE [BROADCAST]
|  |  |
|  |  00:SCAN HDFS [functional.alltypestiny a2]
|  | HDFS partitions=4/4 files=4 size=460B
|  | row-size=89B cardinality=8
|  |
|  01:SCAN HDFS [functional.alltypes a1]
| HDFS partitions=24/24 files=24 size=478.45KB
| runtime filters: RF002 -> a1.id
| row-size=89B cardinality=7.30K
|
02:SCAN HDFS [functional.alltypessmall a3]
   HDFS partitions=4/4 files=4 size=6.32KB
   runtime filters: RF000 -> a3.tinyint_col
   row-size=89B cardinality=100
{code}

if the first two tables' position is swapped, then it is applied:
{code}
explain select  * from alltypes a1 join /* +SHUFFLE */ alltypestiny a2 on 
a1.id=a2.id join alltypessmall a3 on a2.tinyint_col=a3.tinyint_col;
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13048) Shuffle hint on joins is ignored in some cases

2024-04-30 Thread Csaba Ringhofer (Jira)
Csaba Ringhofer created IMPALA-13048:


 Summary: Shuffle hint on joins is ignored in some cases
 Key: IMPALA-13048
 URL: https://issues.apache.org/jira/browse/IMPALA-13048
 Project: IMPALA
  Issue Type: Bug
Reporter: Csaba Ringhofer


I noticed that shuffle hint is ignore without any warning in some cases

shuffle hint is not applied in this query:

{code}
explain select  * from alltypestiny a2 join /* +SHUFFLE */ alltypes a1 on 
a1.id=a2.id join alltypessmall a3 on a2.tinyint_col=a3.tinyint_col;
{code}
result plan
{code}
PLAN-ROOT SINK
|
07:EXCHANGE [UNPARTITIONED]
|
04:HASH JOIN [INNER JOIN, BROADCAST]
|  hash predicates: a3.tinyint_col = a2.tinyint_col
|  runtime filters: RF000 <- a2.tinyint_col
|  row-size=267B cardinality=80
|
|--06:EXCHANGE [BROADCAST]
|  |
|  03:HASH JOIN [INNER JOIN, BROADCAST]
|  |  hash predicates: a1.id = a2.id
|  |  runtime filters: RF002 <- a2.id
|  |  row-size=178B cardinality=8
|  |
|  |--05:EXCHANGE [BROADCAST]
|  |  |
|  |  00:SCAN HDFS [functional.alltypestiny a2]
|  | HDFS partitions=4/4 files=4 size=460B
|  | row-size=89B cardinality=8
|  |
|  01:SCAN HDFS [functional.alltypes a1]
| HDFS partitions=24/24 files=24 size=478.45KB
| runtime filters: RF002 -> a1.id
| row-size=89B cardinality=7.30K
|
02:SCAN HDFS [functional.alltypessmall a3]
   HDFS partitions=4/4 files=4 size=6.32KB
   runtime filters: RF000 -> a3.tinyint_col
   row-size=89B cardinality=100
{code}

if the first two tables' position is swapped, then it is applied:
{code}
explain select  * from alltypes a1 join /* +SHUFFLE */ alltypestiny a2 on 
a1.id=a2.id join alltypessmall a3 on a2.tinyint_col=a3.tinyint_col;
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13040) SIGSEGV in QueryState::UpdateFilterFromRemote

2024-04-26 Thread Csaba Ringhofer (Jira)
Csaba Ringhofer created IMPALA-13040:


 Summary: SIGSEGV in  QueryState::UpdateFilterFromRemote
 Key: IMPALA-13040
 URL: https://issues.apache.org/jira/browse/IMPALA-13040
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Reporter: Csaba Ringhofer


{code}
Crash reason:  SIGSEGV /SEGV_MAPERR
Crash address: 0x48
Process uptime: not available

Thread 114 (crashed)
 0  libpthread.so.0 + 0x9d00
rax = 0x00019e57ad00   rdx = 0x2a656720
rcx = 0x059a9860   rbx = 0x
rsi = 0x00019e57ad00   rdi = 0x0038
rbp = 0x7f6233d544e0   rsp = 0x7f6233d544a8
 r8 = 0x06a53540r9 = 0x0039
r10 = 0x   r11 = 0x000a
r12 = 0x00019e57ad00   r13 = 0x7f62a2f997d0
r14 = 0x7f6233d544f8   r15 = 0x1632c0f0
rip = 0x7f62a2f96d00
Found by: given as instruction pointer in context
 1  
impalad!impala::QueryState::UpdateFilterFromRemote(impala::UpdateFilterParamsPB 
const&, kudu::rpc::RpcContext*) [query-state.cc : 1033 + 0x5]
rbp = 0x7f6233d54520   rsp = 0x7f6233d544f0
rip = 0x015c0837
Found by: previous frame's frame pointer
 2  
impalad!impala::DataStreamService::UpdateFilterFromRemote(impala::UpdateFilterParamsPB
 const*, impala::UpdateFilterResultPB*, kudu::rpc::RpcContext*) 
[data-stream-service.cc : 134 + 0xb]
rbp = 0x7f6233d54640   rsp = 0x7f6233d54530
rip = 0x017c05de
Found by: previous frame's frame pointer
{code}

The line that crashes is 
https://github.com/apache/impala/blob/b39cd79ae84c415e0aebec2c2b4d7690d2a0cc7a/be/src/runtime/query-state.cc#L1033
My guess is that inside the actual segfault is within WaitForPrepare() but it 
was inlined. Not sure if a remote filter can arrive even before 
QueryState::Init is finished - that would explain the issue, as 
instances_prepared_barrier_ is not yet created at that point.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13040) SIGSEGV in QueryState::UpdateFilterFromRemote

2024-04-26 Thread Csaba Ringhofer (Jira)
Csaba Ringhofer created IMPALA-13040:


 Summary: SIGSEGV in  QueryState::UpdateFilterFromRemote
 Key: IMPALA-13040
 URL: https://issues.apache.org/jira/browse/IMPALA-13040
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Reporter: Csaba Ringhofer


{code}
Crash reason:  SIGSEGV /SEGV_MAPERR
Crash address: 0x48
Process uptime: not available

Thread 114 (crashed)
 0  libpthread.so.0 + 0x9d00
rax = 0x00019e57ad00   rdx = 0x2a656720
rcx = 0x059a9860   rbx = 0x
rsi = 0x00019e57ad00   rdi = 0x0038
rbp = 0x7f6233d544e0   rsp = 0x7f6233d544a8
 r8 = 0x06a53540r9 = 0x0039
r10 = 0x   r11 = 0x000a
r12 = 0x00019e57ad00   r13 = 0x7f62a2f997d0
r14 = 0x7f6233d544f8   r15 = 0x1632c0f0
rip = 0x7f62a2f96d00
Found by: given as instruction pointer in context
 1  
impalad!impala::QueryState::UpdateFilterFromRemote(impala::UpdateFilterParamsPB 
const&, kudu::rpc::RpcContext*) [query-state.cc : 1033 + 0x5]
rbp = 0x7f6233d54520   rsp = 0x7f6233d544f0
rip = 0x015c0837
Found by: previous frame's frame pointer
 2  
impalad!impala::DataStreamService::UpdateFilterFromRemote(impala::UpdateFilterParamsPB
 const*, impala::UpdateFilterResultPB*, kudu::rpc::RpcContext*) 
[data-stream-service.cc : 134 + 0xb]
rbp = 0x7f6233d54640   rsp = 0x7f6233d54530
rip = 0x017c05de
Found by: previous frame's frame pointer
{code}

The line that crashes is 
https://github.com/apache/impala/blob/b39cd79ae84c415e0aebec2c2b4d7690d2a0cc7a/be/src/runtime/query-state.cc#L1033
My guess is that inside the actual segfault is within WaitForPrepare() but it 
was inlined. Not sure if a remote filter can arrive even before 
QueryState::Init is finished - that would explain the issue, as 
instances_prepared_barrier_ is not yet created at that point.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12320) test_topic_updates_unblock fails in ASAN build

2024-04-26 Thread Csaba Ringhofer (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer updated IMPALA-12320:
-
Priority: Critical  (was: Major)

> test_topic_updates_unblock fails in ASAN build
> --
>
> Key: IMPALA-12320
> URL: https://issues.apache.org/jira/browse/IMPALA-12320
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Joe McDonnell
>Priority: Critical
>  Labels: broken-build
>
> h3. Error Message
> AssertionError: alter table tpcds.store_sales recover partitions query took 
> less time than 1 msec assert 9622 > 1 + where 9622 =  ApplyResult.get of  0x7f1ab45b6d10>>() + where  > = 
> .get
> h3. Stacktrace
> {noformat}
> custom_cluster/test_topic_update_frequency.py:82: in 
> test_topic_updates_unblock
> non_blocking_query_options=non_blocking_query_options)
> custom_cluster/test_topic_update_frequency.py:132: in __run_topic_update_test
> assert slow_query_future.get() > blocking_query_min_time, \
> E   AssertionError: alter table tpcds.store_sales recover partitions query 
> took less time than 1 msec
> E   assert 9622 > 1
> E+  where 9622 =  >()
> E+where  > = 
> .get
> {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12266) Sporadic failure after migrating a table to Iceberg

2024-04-24 Thread Csaba Ringhofer (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840486#comment-17840486
 ] 

Csaba Ringhofer commented on IMPALA-12266:
--

Saw this test failing again.
select * from special_chars;
Could not resolve table reference: 'special_chars'

Looked into coordinator log:
{code}
I0422 03:48:38.383420 19888 Frontend.java:2127] 
1f4e0654b999662f:b6f1b015] Analyzing query: select * from special_chars 
db: test_convert_table_cdba7383
...
I0422 03:48:42.862898  1012 ImpaladCatalog.java:232] Deleting: 
TABLE:test_convert_table_cdba7383.special_chars version: 7785 size: 77
I0422 03:48:42.862920  1012 ImpaladCatalog.java:232] Deleting: 
TABLE:test_convert_table_cdba7383.special_chars_tmp_5eb06c80 version: 7786 
size: 714
I0422 03:48:42.862967  1012 ImpaladCatalog.java:232] Adding: CATALOG_SERVICE_ID 
version: 7786 size: 60
...
I0422 03:48:42.863464 19888 jni-util.cc:302] 1f4e0654b999662f:b6f1b015] 
org.apache.impala.common.AnalysisException: Could not resolve table reference: 
'special_chars'
at org.apache.impala.analysis.Analyzer.resolvePath(Analyzer.java:1458)
...
I0422 03:48:46.893426  1012 ImpaladCatalog.java:232] Adding: 
TABLE:test_convert_table_cdba7383.special_chars version: 7794 size: 84
{code}
I am not familiar with how convert to Iceberg works, but based on the logs
1. special_chars_tmp_5eb06c80 is created,
2. special_chars is deleted 
3. special_chars recreated

If the table is queries between 2 and 3 then the coordinator will think that it 
doesn't exist.

> Sporadic failure after migrating a table to Iceberg
> ---
>
> Key: IMPALA-12266
> URL: https://issues.apache.org/jira/browse/IMPALA-12266
> Project: IMPALA
>  Issue Type: Bug
>  Components: fe
>Affects Versions: Impala 4.2.0
>Reporter: Tamas Mate
>Assignee: Quanlong Huang
>Priority: Major
>  Labels: impala-iceberg
> Attachments: 
> catalogd.bd40020df22b.invalid-user.log.INFO.20230704-181939.1, 
> impalad.6c0f48d9ce66.invalid-user.log.INFO.20230704-181940.1
>
>
> TestIcebergTable.test_convert_table test failed in a recent verify job's 
> dockerised tests:
> https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/7629
> {code:none}
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EINNER EXCEPTION: 
> EMESSAGE: AnalysisException: Failed to load metadata for table: 
> 'parquet_nopartitioned'
> E   CAUSED BY: TableLoadingException: Could not load table 
> test_convert_table_cdba7383.parquet_nopartitioned from catalog
> E   CAUSED BY: TException: 
> TGetPartialCatalogObjectResponse(status:TStatus(status_code:GENERAL, 
> error_msgs:[NullPointerException: null]), lookup_status:OK)
> {code}
> {code:none}
> E0704 19:09:22.980131   833 JniUtil.java:183] 
> 7145c21173f2c47b:2579db55] Error in Getting partial catalog object of 
> TABLE:test_convert_table_cdba7383.parquet_nopartitioned. Time spent: 49ms
> I0704 19:09:22.980309   833 jni-util.cc:288] 
> 7145c21173f2c47b:2579db55] java.lang.NullPointerException
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.replaceTableIfUnchanged(CatalogServiceCatalog.java:2357)
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.getOrLoadTable(CatalogServiceCatalog.java:2300)
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.doGetPartialCatalogObject(CatalogServiceCatalog.java:3587)
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.getPartialCatalogObject(CatalogServiceCatalog.java:3513)
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.getPartialCatalogObject(CatalogServiceCatalog.java:3480)
>   at 
> org.apache.impala.service.JniCatalog.lambda$getPartialCatalogObject$11(JniCatalog.java:397)
>   at 
> org.apache.impala.service.JniCatalogOp.lambda$execAndSerialize$1(JniCatalogOp.java:90)
>   at org.apache.impala.service.JniCatalogOp.execOp(JniCatalogOp.java:58)
>   at 
> org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:89)
>   at 
> org.apache.impala.service.JniCatalogOp.execAndSerializeSilentStartAndFinish(JniCatalogOp.java:109)
>   at 
> org.apache.impala.service.JniCatalog.execAndSerializeSilentStartAndFinish(JniCatalog.java:238)
>   at 
> org.apache.impala.service.JniCatalog.getPartialCatalogObject(JniCatalog.java:396)
> I0704 19:09:22.980324   833 status.cc:129] 7145c21173f2c47b:2579db55] 
> NullPointerException: null
> @  0x1012f9f  impala::Status::Status()
> @  0x187f964  impala::JniUtil::GetJniExceptionMsg()
> @   0xfee920  impala::JniCall::Call<>()
> @   0xfccd0f  impala::Catalog::GetPartialCatalogObject()
> @   0xfb55a5  
> impala::CatalogServiceThriftIf::GetPartialCatalogObject()
> @   

[jira] [Created] (IMPALA-13037) EventsProcessorStressTest can hang

2024-04-24 Thread Csaba Ringhofer (Jira)
Csaba Ringhofer created IMPALA-13037:


 Summary: EventsProcessorStressTest can hang
 Key: IMPALA-13037
 URL: https://issues.apache.org/jira/browse/IMPALA-13037
 Project: IMPALA
  Issue Type: Bug
  Components: Catalog, Infrastructure
Reporter: Csaba Ringhofer


The test failed with timeout.

>From mvn.log the last line is:
20:17:53 [INFO] Running 
org.apache.impala.catalog.events.EventsProcessorStressTest

Things seem to be hanging from 2024.04.22 20:17:53 to 2024.04.23
The tests seems to wait for a Hive query.

>From FeSupport.INFO:
{code}
I0422 20:17:55.478875  7949 RandomHiveQueryRunner.java:1102] Client 0 running 
hive query set 2: 
insert into table events_stress_db_0.stress_test_tbl_0_alltypes_part partition 
(year,month) select * from functional.alltypes limit 100
   create database if not exists events_stress_db_0
   drop table if exists events_stress_db_0.stress_test_tbl_0_alltypes_part 
   create table if not exists 
events_stress_db_0.stress_test_tbl_0_alltypes_part  like  functional.alltypes 
   set hive.exec.dynamic.partition.mode = nonstrict
   set hive.exec.max.dynamic.partitions = 1
   set hive.exec.max.dynamic.partitions.pernode = 1
   set tez.session.am.dag.submit.timeout.secs = 2
I0422 20:17:55.478940  7949 HiveJdbcClientPool.java:102] Executing sql : create 
database if not exists events_stress_db_0
I0422 20:17:55.493497  7768 MetastoreShim.java:843] EventId: 33414 EventType: 
COMMIT_TXN transaction id: 2075
I0422 20:17:55.493682  7768 MetastoreEvents.java:302] Total number of events 
received: 6 Total number of events filtered out: 0
I0422 20:17:55.494762  7768 MetastoreEvents.java:825] EventId: 33407 EventType: 
CREATE_DATABASE Successfully added database events_stress_db_0
I0422 20:17:55.508478  7949 HiveJdbcClientPool.java:102] Executing sql : drop 
table if exists events_stress_db_0.stress_test_tbl_0_alltypes_part 
I0422 20:17:55.516858  7768 MetastoreEvents.java:825] EventId: 33410 EventType: 
CREATE_TABLE Successfully added table events_stress_db_0.stress_test_tbl_0_part
I0422 20:17:55.518288  7768 CatalogOpExecutor.java:4713] EventId: 33413 Table 
events_stress_db_0.stress_test_tbl_0_part is not loaded. Skipping add partitions
I0422 20:17:55.519479  7768 MetastoreEventsProcessor.java:1340] Time elapsed in 
processing event batch: 178.895ms
I0422 20:17:55.521183  7768 MetastoreEventsProcessor.java:1120] Latest event in 
HMS: id=33420, time=1713842275. Last synced event: id=33414, time=1713842275.
I0422 20:17:55.533375  7949 HiveJdbcClientPool.java:102] Executing sql : create 
table if not exists events_stress_db_0.stress_test_tbl_0_alltypes_part  like  
functional.alltypes 
I0422 20:17:55.611153  7949 HiveJdbcClientPool.java:102] Executing sql : set 
hive.exec.dynamic.partition.mode = nonstrict
I0422 20:17:55.616571  7949 HiveJdbcClientPool.java:102] Executing sql : set 
hive.exec.max.dynamic.partitions = 1
I0422 20:17:55.619197  7949 HiveJdbcClientPool.java:102] Executing sql : set 
hive.exec.max.dynamic.partitions.pernode = 1
I0422 20:17:55.621069  7949 HiveJdbcClientPool.java:102] Executing sql : set 
tez.session.am.dag.submit.timeout.secs = 2
I0422 20:17:55.622972  7949 HiveJdbcClientPool.java:102] Executing sql : insert 
into table events_stress_db_0.stress_test_tbl_0_alltypes_part partition 
(year,month) select * from functional.alltypes limit 100
I0422 20:17:57.163591  7950 CatalogServiceCatalog.java:2747] Refreshing table 
metadata: events_stress_db_0.stress_test_tbl_0_part
I0422 20:17:57.829802  7768 MetastoreEventsProcessor.java:982] Received 6 
events. First event id: 33416.
I0422 20:17:57.833026  7768 MetastoreShim.java:843] EventId: 33417 EventType: 
COMMIT_TXN transaction id: 2076
I0422 20:17:57.833222  7768 MetastoreShim.java:843] EventId: 33419 EventType: 
COMMIT_TXN transaction id: 2077
I0422 20:17:57.84  7768 MetastoreShim.java:843] EventId: 33421 EventType: 
COMMIT_TXN transaction id: 2078
I0422 20:17:57.834242  7768 MetastoreShim.java:843] EventId: 33424 EventType: 
COMMIT_TXN transaction id: 2079
I0422 20:17:57.834323  7768 MetastoreEvents.java:302] Total number of events 
received: 6 Total number of events filtered out: 0
I0422 20:17:57.834570  7768 CatalogOpExecutor.java:4862] EventId: 33416 Table 
events_stress_db_0.stress_test_tbl_0_part is not loaded. Not processing the 
event.
I0422 20:17:57.837756  7768 MetastoreEvents.java:825] EventId: 33423 EventType: 
CREATE_TABLE Successfully added table 
events_stress_db_0.stress_test_tbl_0_alltypes_part
I0422 20:17:57.838668  7768 MetastoreEventsProcessor.java:1340] Time elapsed in 
processing event batch: 8.625ms
I0422 20:17:57.840027  7768 MetastoreEventsProcessor.java:1120] Latest event in 
HMS: id=33425, time=1713842275. Last synced event: id=33424, time=1713842275.
I0422 20:18:03.143219  7768 MetastoreEventsProcessor.java:982] Received 0 
events. First event id: 

[jira] [Created] (IMPALA-13037) EventsProcessorStressTest can hang

2024-04-24 Thread Csaba Ringhofer (Jira)
Csaba Ringhofer created IMPALA-13037:


 Summary: EventsProcessorStressTest can hang
 Key: IMPALA-13037
 URL: https://issues.apache.org/jira/browse/IMPALA-13037
 Project: IMPALA
  Issue Type: Bug
  Components: Catalog, Infrastructure
Reporter: Csaba Ringhofer


The test failed with timeout.

>From mvn.log the last line is:
20:17:53 [INFO] Running 
org.apache.impala.catalog.events.EventsProcessorStressTest

Things seem to be hanging from 2024.04.22 20:17:53 to 2024.04.23
The tests seems to wait for a Hive query.

>From FeSupport.INFO:
{code}
I0422 20:17:55.478875  7949 RandomHiveQueryRunner.java:1102] Client 0 running 
hive query set 2: 
insert into table events_stress_db_0.stress_test_tbl_0_alltypes_part partition 
(year,month) select * from functional.alltypes limit 100
   create database if not exists events_stress_db_0
   drop table if exists events_stress_db_0.stress_test_tbl_0_alltypes_part 
   create table if not exists 
events_stress_db_0.stress_test_tbl_0_alltypes_part  like  functional.alltypes 
   set hive.exec.dynamic.partition.mode = nonstrict
   set hive.exec.max.dynamic.partitions = 1
   set hive.exec.max.dynamic.partitions.pernode = 1
   set tez.session.am.dag.submit.timeout.secs = 2
I0422 20:17:55.478940  7949 HiveJdbcClientPool.java:102] Executing sql : create 
database if not exists events_stress_db_0
I0422 20:17:55.493497  7768 MetastoreShim.java:843] EventId: 33414 EventType: 
COMMIT_TXN transaction id: 2075
I0422 20:17:55.493682  7768 MetastoreEvents.java:302] Total number of events 
received: 6 Total number of events filtered out: 0
I0422 20:17:55.494762  7768 MetastoreEvents.java:825] EventId: 33407 EventType: 
CREATE_DATABASE Successfully added database events_stress_db_0
I0422 20:17:55.508478  7949 HiveJdbcClientPool.java:102] Executing sql : drop 
table if exists events_stress_db_0.stress_test_tbl_0_alltypes_part 
I0422 20:17:55.516858  7768 MetastoreEvents.java:825] EventId: 33410 EventType: 
CREATE_TABLE Successfully added table events_stress_db_0.stress_test_tbl_0_part
I0422 20:17:55.518288  7768 CatalogOpExecutor.java:4713] EventId: 33413 Table 
events_stress_db_0.stress_test_tbl_0_part is not loaded. Skipping add partitions
I0422 20:17:55.519479  7768 MetastoreEventsProcessor.java:1340] Time elapsed in 
processing event batch: 178.895ms
I0422 20:17:55.521183  7768 MetastoreEventsProcessor.java:1120] Latest event in 
HMS: id=33420, time=1713842275. Last synced event: id=33414, time=1713842275.
I0422 20:17:55.533375  7949 HiveJdbcClientPool.java:102] Executing sql : create 
table if not exists events_stress_db_0.stress_test_tbl_0_alltypes_part  like  
functional.alltypes 
I0422 20:17:55.611153  7949 HiveJdbcClientPool.java:102] Executing sql : set 
hive.exec.dynamic.partition.mode = nonstrict
I0422 20:17:55.616571  7949 HiveJdbcClientPool.java:102] Executing sql : set 
hive.exec.max.dynamic.partitions = 1
I0422 20:17:55.619197  7949 HiveJdbcClientPool.java:102] Executing sql : set 
hive.exec.max.dynamic.partitions.pernode = 1
I0422 20:17:55.621069  7949 HiveJdbcClientPool.java:102] Executing sql : set 
tez.session.am.dag.submit.timeout.secs = 2
I0422 20:17:55.622972  7949 HiveJdbcClientPool.java:102] Executing sql : insert 
into table events_stress_db_0.stress_test_tbl_0_alltypes_part partition 
(year,month) select * from functional.alltypes limit 100
I0422 20:17:57.163591  7950 CatalogServiceCatalog.java:2747] Refreshing table 
metadata: events_stress_db_0.stress_test_tbl_0_part
I0422 20:17:57.829802  7768 MetastoreEventsProcessor.java:982] Received 6 
events. First event id: 33416.
I0422 20:17:57.833026  7768 MetastoreShim.java:843] EventId: 33417 EventType: 
COMMIT_TXN transaction id: 2076
I0422 20:17:57.833222  7768 MetastoreShim.java:843] EventId: 33419 EventType: 
COMMIT_TXN transaction id: 2077
I0422 20:17:57.84  7768 MetastoreShim.java:843] EventId: 33421 EventType: 
COMMIT_TXN transaction id: 2078
I0422 20:17:57.834242  7768 MetastoreShim.java:843] EventId: 33424 EventType: 
COMMIT_TXN transaction id: 2079
I0422 20:17:57.834323  7768 MetastoreEvents.java:302] Total number of events 
received: 6 Total number of events filtered out: 0
I0422 20:17:57.834570  7768 CatalogOpExecutor.java:4862] EventId: 33416 Table 
events_stress_db_0.stress_test_tbl_0_part is not loaded. Not processing the 
event.
I0422 20:17:57.837756  7768 MetastoreEvents.java:825] EventId: 33423 EventType: 
CREATE_TABLE Successfully added table 
events_stress_db_0.stress_test_tbl_0_alltypes_part
I0422 20:17:57.838668  7768 MetastoreEventsProcessor.java:1340] Time elapsed in 
processing event batch: 8.625ms
I0422 20:17:57.840027  7768 MetastoreEventsProcessor.java:1120] Latest event in 
HMS: id=33425, time=1713842275. Last synced event: id=33424, time=1713842275.
I0422 20:18:03.143219  7768 MetastoreEventsProcessor.java:982] Received 0 
events. First event id: 

[jira] [Created] (IMPALA-13026) Creating openai-api-key-secret fails sporadically

2024-04-22 Thread Csaba Ringhofer (Jira)
Csaba Ringhofer created IMPALA-13026:


 Summary: Creating openai-api-key-secret fails sporadically
 Key: IMPALA-13026
 URL: https://issues.apache.org/jira/browse/IMPALA-13026
 Project: IMPALA
  Issue Type: Bug
  Components: Infrastructure
Reporter: Csaba Ringhofer


Data load fails time to time with the following error:
{code}
00:27:17.680 Error loading data. The end of the log file is:
00:27:17.680 04:15:15 
/data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/bin/load-data.py 
--workloads functional-query -e core --table_formats kudu/none/none --force 
--impalad localhost --hive_hs2_hostport localhost:11050 --hdfs_namenode 
localhost:20500
00:27:17.680 04:15:15 Executing Hadoop command: ... hadoop credential create 
openai-api-key-secret -value secret -provider 
localjceks://file/data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/testdata/jceks/test.jceks
...

00:27:17.680 java.io.IOException: Credential openai-api-key-secret already 
exists in 
localjceks://file/data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/testdata/jceks/test.jceks
00:27:17.680at 
org.apache.hadoop.security.alias.AbstractJavaKeyStoreProvider.createCredentialEntry(AbstractJavaKeyStoreProvider.java:234)
00:27:17.680at 
org.apache.hadoop.security.alias.CredentialShell$CreateCommand.execute(CredentialShell.java:354)
00:27:17.680at 
org.apache.hadoop.tools.CommandShell.run(CommandShell.java:72)
00:27:17.680at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:81)
00:27:17.680at 
org.apache.hadoop.security.alias.CredentialShell.main(CredentialShell.java:437)
00:27:17.680 04:15:15 Error executing Hadoop command, exiting
{code}

My guess is that this happens when calling "hadoop credential create" 
concurrently with different  data loader processes.
https://github.com/apache/impala/blob/9b05a205fec397fa1e19ae467b1cc406ca43d948/bin/load-data.py#L323
Ideally this would be called in the serial phase of dataload




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13026) Creating openai-api-key-secret fails sporadically

2024-04-22 Thread Csaba Ringhofer (Jira)
Csaba Ringhofer created IMPALA-13026:


 Summary: Creating openai-api-key-secret fails sporadically
 Key: IMPALA-13026
 URL: https://issues.apache.org/jira/browse/IMPALA-13026
 Project: IMPALA
  Issue Type: Bug
  Components: Infrastructure
Reporter: Csaba Ringhofer


Data load fails time to time with the following error:
{code}
00:27:17.680 Error loading data. The end of the log file is:
00:27:17.680 04:15:15 
/data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/bin/load-data.py 
--workloads functional-query -e core --table_formats kudu/none/none --force 
--impalad localhost --hive_hs2_hostport localhost:11050 --hdfs_namenode 
localhost:20500
00:27:17.680 04:15:15 Executing Hadoop command: ... hadoop credential create 
openai-api-key-secret -value secret -provider 
localjceks://file/data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/testdata/jceks/test.jceks
...

00:27:17.680 java.io.IOException: Credential openai-api-key-secret already 
exists in 
localjceks://file/data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/testdata/jceks/test.jceks
00:27:17.680at 
org.apache.hadoop.security.alias.AbstractJavaKeyStoreProvider.createCredentialEntry(AbstractJavaKeyStoreProvider.java:234)
00:27:17.680at 
org.apache.hadoop.security.alias.CredentialShell$CreateCommand.execute(CredentialShell.java:354)
00:27:17.680at 
org.apache.hadoop.tools.CommandShell.run(CommandShell.java:72)
00:27:17.680at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:81)
00:27:17.680at 
org.apache.hadoop.security.alias.CredentialShell.main(CredentialShell.java:437)
00:27:17.680 04:15:15 Error executing Hadoop command, exiting
{code}

My guess is that this happens when calling "hadoop credential create" 
concurrently with different  data loader processes.
https://github.com/apache/impala/blob/9b05a205fec397fa1e19ae467b1cc406ca43d948/bin/load-data.py#L323
Ideally this would be called in the serial phase of dataload




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-13024) Several tests timeout waiting for admission

2024-04-21 Thread Csaba Ringhofer (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839337#comment-17839337
 ] 

Csaba Ringhofer edited comment on IMPALA-13024 at 4/21/24 8:15 AM:
---

>Slot based admission is not enabled when using default groups
This was also my assumption, but it seems that it is enforced by default.
Reproduced slot starvation locally:

Run one query with more fragment instance than core count in one impala-shell:
set mt_dop=32;
select sleep(1000*60) from tpcds.store_sales limit 200; -- 

Run a query in another impala-shell:
select * from functional.alltypestiny;
ERROR: Admission for query exceeded timeout 6ms in pool default-pool. 
Queued reason: Not enough admission control slots available on host 
csringhofer-7000-ubuntu:27000. Needed 1 slots but 32/24 are already in use. 
Additional Details: Not Applicable

UPDATE:
I understand now what is happening: the limit is only enforced on coordinator 
only queries.
While "select * from alltypestiny" failed, the much larger "select * from 
alltypes" could be run without issues. The reason is that the former query runs 
on a single node.

>From impalad.INFO:
"0421 10:10:57.505287 1586078 admission-controller.cc:1962] Trying to admit 
id=91442a9fa1d2512d:db5337c2 in pool_name=default-pool 
executor_group_name=empty group (using coordinator only) 
per_host_mem_estimate=20.00 MB dedicated_coord_mem_estimate=120.00 MB 
max_requests=-1 max_queued=200 max_mem=-1.00 B is_trivial_query=false
I0421 10:10:57.505345 1586078 admission-controller.cc:1971] Stats: 
agg_num_running=1, agg_num_queued=1, agg_mem_reserved=4.02 MB,  
local_host(local_mem_admitted=516.57 MB, local_trivial_running=0, 
num_admitted_running=1, num_queued=1, backend_mem_reserved=4.02 MB, 
topN_query_stats: queries=[d84f2a7efee0998a:45ac1206], 
total_mem_consumed=4.02 MB, fraction_of_pool_total_mem=1; pool_level_stats: 
num_running=1, min=4.02 MB, max=4.02 MB, pool_total_mem=4.02 MB, 
average_per_query=4.02 MB)
I0421 10:10:57.505407 1586078 admission-controller.cc:2227] Could not dequeue 
query id=91442a9fa1d2512d:db5337c2 reason: Not enough admission control 
slots available on host csringhofer-7000-ubuntu:27000. Needed 1 slots but 32/24 
are already in use."



was (Author: csringhofer):
>Slot based admission is not enabled when using default groups
This was also my assumption, but it seems that it is enforced by default.
Reproduced slot starvation locally:

Run one query with more fragment instance than core count in one impala-shell:
set mt_dop=32;
select sleep(1000*60) from tpcds.store_sales limit 200; -- 

Run a query in another impala-shell:
select * from functional.alltypestiny;
ERROR: Admission for query exceeded timeout 6ms in pool default-pool. 
Queued reason: Not enough admission control slots available on host 
csringhofer-7000-ubuntu:27000. Needed 1 slots but 32/24 are already in use. 
Additional Details: Not Applicable


> Several tests timeout waiting for admission
> ---
>
> Key: IMPALA-13024
> URL: https://issues.apache.org/jira/browse/IMPALA-13024
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Csaba Ringhofer
>Priority: Critical
>
> A bunch of seemingly unrelated tests failed with the following message:
> Example: 
> query_test.test_spilling.TestSpillingDebugActionDimensions.test_spilling_aggs[protocol:
>  beeswax | exec_option: {'mt_dop': 1, 'debug_action': None, 
> 'default_spillable_buffer_size': '256k'} | table_format: parquet/none] 
> {code}
> ImpalaBeeswaxException: EQuery aborted:Admission for query exceeded 
> timeout 6ms in pool default-pool. Queued reason: Not enough admission 
> control slots available on host ... . Needed 1 slots but 18/16 are already in 
> use. Additional Details: Not Applicable
> {code}
> This happened in an ASAN build. Another test also failed which may be related 
> to the cause:
> custom_cluster.test_admission_controller.TestAdmissionController.test_queue_reasons_slots
>  
> {code}
> Timeout: query 'e1410add778cd7b0:c40812b9' did not reach one of the 
> expected states [4], last known state 5
> {code}
> test_queue_reasons_slots seems to be know flaky test: IMPALA-10338



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13024) Several tests timeout waiting for admission

2024-04-21 Thread Csaba Ringhofer (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839337#comment-17839337
 ] 

Csaba Ringhofer commented on IMPALA-13024:
--

>Slot based admission is not enabled when using default groups
This was also my assumption, but it seems that it is enforced by default.
Reproduced slot starvation locally:

Run one query with more fragment instance than core count in one impala-shell:
set mt_dop=32;
select sleep(1000*60) from tpcds.store_sales limit 200; -- 

Run a query in another impala-shell:
select * from functional.alltypestiny;
ERROR: Admission for query exceeded timeout 6ms in pool default-pool. 
Queued reason: Not enough admission control slots available on host 
csringhofer-7000-ubuntu:27000. Needed 1 slots but 32/24 are already in use. 
Additional Details: Not Applicable


> Several tests timeout waiting for admission
> ---
>
> Key: IMPALA-13024
> URL: https://issues.apache.org/jira/browse/IMPALA-13024
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Csaba Ringhofer
>Priority: Critical
>
> A bunch of seemingly unrelated tests failed with the following message:
> Example: 
> query_test.test_spilling.TestSpillingDebugActionDimensions.test_spilling_aggs[protocol:
>  beeswax | exec_option: {'mt_dop': 1, 'debug_action': None, 
> 'default_spillable_buffer_size': '256k'} | table_format: parquet/none] 
> {code}
> ImpalaBeeswaxException: EQuery aborted:Admission for query exceeded 
> timeout 6ms in pool default-pool. Queued reason: Not enough admission 
> control slots available on host ... . Needed 1 slots but 18/16 are already in 
> use. Additional Details: Not Applicable
> {code}
> This happened in an ASAN build. Another test also failed which may be related 
> to the cause:
> custom_cluster.test_admission_controller.TestAdmissionController.test_queue_reasons_slots
>  
> {code}
> Timeout: query 'e1410add778cd7b0:c40812b9' did not reach one of the 
> expected states [4], last known state 5
> {code}
> test_queue_reasons_slots seems to be know flaky test: IMPALA-10338



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13024) Several tests timeout waiting for admission

2024-04-20 Thread Csaba Ringhofer (Jira)
Csaba Ringhofer created IMPALA-13024:


 Summary: Several tests timeout waiting for admission
 Key: IMPALA-13024
 URL: https://issues.apache.org/jira/browse/IMPALA-13024
 Project: IMPALA
  Issue Type: Bug
Reporter: Csaba Ringhofer


A bunch of seemingly unrelated tests failed with the following message:
Example: 
query_test.test_spilling.TestSpillingDebugActionDimensions.test_spilling_aggs[protocol:
 beeswax | exec_option: {'mt_dop': 1, 'debug_action': None, 
'default_spillable_buffer_size': '256k'} | table_format: parquet/none] 
{code}
ImpalaBeeswaxException: EQuery aborted:Admission for query exceeded timeout 
6ms in pool default-pool. Queued reason: Not enough admission control slots 
available on host ... . Needed 1 slots but 18/16 are already in use. Additional 
Details: Not Applicable
{code}

This happened in an ASAN build. Another test also failed which may be related 
to the cause:
custom_cluster.test_admission_controller.TestAdmissionController.test_queue_reasons_slots
 
{code}
Timeout: query 'e1410add778cd7b0:c40812b9' did not reach one of the 
expected states [4], last known state 5
{code}
test_queue_reasons_slots seems to be know flaky test: IMPALA-10338



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13024) Several tests timeout waiting for admission

2024-04-20 Thread Csaba Ringhofer (Jira)
Csaba Ringhofer created IMPALA-13024:


 Summary: Several tests timeout waiting for admission
 Key: IMPALA-13024
 URL: https://issues.apache.org/jira/browse/IMPALA-13024
 Project: IMPALA
  Issue Type: Bug
Reporter: Csaba Ringhofer


A bunch of seemingly unrelated tests failed with the following message:
Example: 
query_test.test_spilling.TestSpillingDebugActionDimensions.test_spilling_aggs[protocol:
 beeswax | exec_option: {'mt_dop': 1, 'debug_action': None, 
'default_spillable_buffer_size': '256k'} | table_format: parquet/none] 
{code}
ImpalaBeeswaxException: EQuery aborted:Admission for query exceeded timeout 
6ms in pool default-pool. Queued reason: Not enough admission control slots 
available on host ... . Needed 1 slots but 18/16 are already in use. Additional 
Details: Not Applicable
{code}

This happened in an ASAN build. Another test also failed which may be related 
to the cause:
custom_cluster.test_admission_controller.TestAdmissionController.test_queue_reasons_slots
 
{code}
Timeout: query 'e1410add778cd7b0:c40812b9' did not reach one of the 
expected states [4], last known state 5
{code}
test_queue_reasons_slots seems to be know flaky test: IMPALA-10338



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13021) Failed test: test_iceberg_deletes_and_updates_and_optimize

2024-04-19 Thread Csaba Ringhofer (Jira)
Csaba Ringhofer created IMPALA-13021:


 Summary: Failed test: test_iceberg_deletes_and_updates_and_optimize
 Key: IMPALA-13021
 URL: https://issues.apache.org/jira/browse/IMPALA-13021
 Project: IMPALA
  Issue Type: Bug
Reporter: Csaba Ringhofer


{code}
test_iceberg_deletes_and_updates_and_optimize
run_tasks([deleter, updater, optimizer, checker])
stress/stress_util.py:46: in run_tasks
pool.map_async(Task.run, tasks).get(timeout_seconds)
Impala-Toolchain/toolchain-packages-gcc10.4.0/python-2.7.16/lib/python2.7/multiprocessing/pool.py:568:
 in get
raise TimeoutError
E   TimeoutError
{code}
This happened in an exhaustive test run with data cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-13021) Failed test: test_iceberg_deletes_and_updates_and_optimize

2024-04-19 Thread Csaba Ringhofer (Jira)
Csaba Ringhofer created IMPALA-13021:


 Summary: Failed test: test_iceberg_deletes_and_updates_and_optimize
 Key: IMPALA-13021
 URL: https://issues.apache.org/jira/browse/IMPALA-13021
 Project: IMPALA
  Issue Type: Bug
Reporter: Csaba Ringhofer


{code}
test_iceberg_deletes_and_updates_and_optimize
run_tasks([deleter, updater, optimizer, checker])
stress/stress_util.py:46: in run_tasks
pool.map_async(Task.run, tasks).get(timeout_seconds)
Impala-Toolchain/toolchain-packages-gcc10.4.0/python-2.7.16/lib/python2.7/multiprocessing/pool.py:568:
 in get
raise TimeoutError
E   TimeoutError
{code}
This happened in an exhaustive test run with data cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IMPALA-5323) Support Kudu BINARY

2024-04-10 Thread Csaba Ringhofer (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer resolved IMPALA-5323.
-
Resolution: Fixed

> Support Kudu BINARY
> ---
>
> Key: IMPALA-5323
> URL: https://issues.apache.org/jira/browse/IMPALA-5323
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend
>Reporter: Pavel Martynov
>Assignee: Csaba Ringhofer
>Priority: Major
>  Labels: kudu
> Fix For: Impala 4.4.0
>
>
> I trying to 'CREATE EXTERNAL TABLE STORED AS KUDU' on the table with BINARY 
> Kudu column data type and got an error: Kudu type 'binary' is not supported 
> in Impala.
> This limitation is not documented, checked:
> https://impala.incubator.apache.org/docs/build/html/topics/impala_kudu.html
> https://kudu.apache.org/docs/kudu_impala_integration.html#_known_issues_and_limitations
> There are some thoughts that Kudu BINARY data type may be supported by 
> Impala's STRING data type:
> https://community.cloudera.com/t5/Interactive-Short-cycle-SQL/Does-impala-support-binary-data-type/td-p/24366
> https://groups.google.com/a/cloudera.org/forum/#!msg/impala-user/muguKJU3c3I/_oArmoxSlDMJ



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IMPALA-5323) Support Kudu BINARY

2024-04-10 Thread Csaba Ringhofer (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer updated IMPALA-5323:

Fix Version/s: Impala 4.4.0

> Support Kudu BINARY
> ---
>
> Key: IMPALA-5323
> URL: https://issues.apache.org/jira/browse/IMPALA-5323
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend
>Reporter: Pavel Martynov
>Assignee: Csaba Ringhofer
>Priority: Major
>  Labels: kudu
> Fix For: Impala 4.4.0
>
>
> I trying to 'CREATE EXTERNAL TABLE STORED AS KUDU' on the table with BINARY 
> Kudu column data type and got an error: Kudu type 'binary' is not supported 
> in Impala.
> This limitation is not documented, checked:
> https://impala.incubator.apache.org/docs/build/html/topics/impala_kudu.html
> https://kudu.apache.org/docs/kudu_impala_integration.html#_known_issues_and_limitations
> There are some thoughts that Kudu BINARY data type may be supported by 
> Impala's STRING data type:
> https://community.cloudera.com/t5/Interactive-Short-cycle-SQL/Does-impala-support-binary-data-type/td-p/24366
> https://groups.google.com/a/cloudera.org/forum/#!msg/impala-user/muguKJU3c3I/_oArmoxSlDMJ



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-5323) Support Kudu BINARY

2024-04-10 Thread Csaba Ringhofer (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer resolved IMPALA-5323.
-
Resolution: Fixed

> Support Kudu BINARY
> ---
>
> Key: IMPALA-5323
> URL: https://issues.apache.org/jira/browse/IMPALA-5323
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend
>Reporter: Pavel Martynov
>Assignee: Csaba Ringhofer
>Priority: Major
>  Labels: kudu
> Fix For: Impala 4.4.0
>
>
> I trying to 'CREATE EXTERNAL TABLE STORED AS KUDU' on the table with BINARY 
> Kudu column data type and got an error: Kudu type 'binary' is not supported 
> in Impala.
> This limitation is not documented, checked:
> https://impala.incubator.apache.org/docs/build/html/topics/impala_kudu.html
> https://kudu.apache.org/docs/kudu_impala_integration.html#_known_issues_and_limitations
> There are some thoughts that Kudu BINARY data type may be supported by 
> Impala's STRING data type:
> https://community.cloudera.com/t5/Interactive-Short-cycle-SQL/Does-impala-support-binary-data-type/td-p/24366
> https://groups.google.com/a/cloudera.org/forum/#!msg/impala-user/muguKJU3c3I/_oArmoxSlDMJ



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-12990) impala-shell broken if Iceberg delete deletes 0 rows

2024-04-10 Thread Csaba Ringhofer (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-12990 started by Csaba Ringhofer.

> impala-shell broken if Iceberg delete deletes 0 rows
> 
>
> Key: IMPALA-12990
> URL: https://issues.apache.org/jira/browse/IMPALA-12990
> Project: IMPALA
>  Issue Type: Bug
>  Components: Clients
>Reporter: Csaba Ringhofer
>Assignee: Csaba Ringhofer
>Priority: Major
>  Labels: iceberg
>
> Happens only with Python 3
> {code}
> impala-python3 shell/impala_shell.py
> create table icebergupdatet (i int, s string) stored as iceberg;
> alter table icebergupdatet set tblproperties("format-version"="2");
> delete from icebergupdatet where i=0;
> Unknown Exception : '>' not supported between instances of 'NoneType' and 
> 'int'
> Traceback (most recent call last):
>   File "shell/impala_shell.py", line 1428, in _execute_stmt
> if is_dml and num_rows == 0 and num_deleted_rows > 0:
> TypeError: '>' not supported between instances of 'NoneType' and 'int'
> {code}
> The same erros should also happen when the delete removes > 0 rows, but the 
> impala server has an older version that doesn't set TDmlResult.rows_deleted



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12990) impala-shell broken if Iceberg delete deletes 0 rows

2024-04-10 Thread Csaba Ringhofer (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835793#comment-17835793
 ] 

Csaba Ringhofer commented on IMPALA-12990:
--

https://gerrit.cloudera.org/#/c/21284

> impala-shell broken if Iceberg delete deletes 0 rows
> 
>
> Key: IMPALA-12990
> URL: https://issues.apache.org/jira/browse/IMPALA-12990
> Project: IMPALA
>  Issue Type: Bug
>  Components: Clients
>Reporter: Csaba Ringhofer
>Priority: Major
>  Labels: iceberg
>
> Happens only with Python 3
> {code}
> impala-python3 shell/impala_shell.py
> create table icebergupdatet (i int, s string) stored as iceberg;
> alter table icebergupdatet set tblproperties("format-version"="2");
> delete from icebergupdatet where i=0;
> Unknown Exception : '>' not supported between instances of 'NoneType' and 
> 'int'
> Traceback (most recent call last):
>   File "shell/impala_shell.py", line 1428, in _execute_stmt
> if is_dml and num_rows == 0 and num_deleted_rows > 0:
> TypeError: '>' not supported between instances of 'NoneType' and 'int'
> {code}
> The same erros should also happen when the delete removes > 0 rows, but the 
> impala server has an older version that doesn't set TDmlResult.rows_deleted



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12990) impala-shell broken if Iceberg delete deletes 0 rows

2024-04-10 Thread Csaba Ringhofer (Jira)
Csaba Ringhofer created IMPALA-12990:


 Summary: impala-shell broken if Iceberg delete deletes 0 rows
 Key: IMPALA-12990
 URL: https://issues.apache.org/jira/browse/IMPALA-12990
 Project: IMPALA
  Issue Type: Bug
  Components: Clients
Reporter: Csaba Ringhofer


Happens only with Python 3
{code}
impala-python3 shell/impala_shell.py

create table icebergupdatet (i int, s string) stored as iceberg;
alter table icebergupdatet set tblproperties("format-version"="2");
delete from icebergupdatet where i=0;
Unknown Exception : '>' not supported between instances of 'NoneType' and 'int'
Traceback (most recent call last):
  File "shell/impala_shell.py", line 1428, in _execute_stmt
if is_dml and num_rows == 0 and num_deleted_rows > 0:
TypeError: '>' not supported between instances of 'NoneType' and 'int'
{code}

The same erros should also happen when the delete removes > 0 rows, but the 
impala server has an older version that doesn't set TDmlResult.rows_deleted



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-12990) impala-shell broken if Iceberg delete deletes 0 rows

2024-04-10 Thread Csaba Ringhofer (Jira)
Csaba Ringhofer created IMPALA-12990:


 Summary: impala-shell broken if Iceberg delete deletes 0 rows
 Key: IMPALA-12990
 URL: https://issues.apache.org/jira/browse/IMPALA-12990
 Project: IMPALA
  Issue Type: Bug
  Components: Clients
Reporter: Csaba Ringhofer


Happens only with Python 3
{code}
impala-python3 shell/impala_shell.py

create table icebergupdatet (i int, s string) stored as iceberg;
alter table icebergupdatet set tblproperties("format-version"="2");
delete from icebergupdatet where i=0;
Unknown Exception : '>' not supported between instances of 'NoneType' and 'int'
Traceback (most recent call last):
  File "shell/impala_shell.py", line 1428, in _execute_stmt
if is_dml and num_rows == 0 and num_deleted_rows > 0:
TypeError: '>' not supported between instances of 'NoneType' and 'int'
{code}

The same erros should also happen when the delete removes > 0 rows, but the 
impala server has an older version that doesn't set TDmlResult.rows_deleted



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12987) Errors with \0 character in partition values

2024-04-09 Thread Csaba Ringhofer (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer updated IMPALA-12987:
-
Description: 
Inserting strings with "\0" values to partition columns leads errors both in 
Iceberg and Hive tables. 

The issue is more severe in Iceberg tables as from this point the table can't 
be read in Impala or Hive:
{code}
create table iceberg_unicode (s string, p string) partitioned by spec 
(identity(p)) stored as iceberg;
insert into iceberg_unicode select "a", "a\0a";
ERROR: IcebergTableLoadingException: Error loading metadata for Iceberg table 
hdfs://localhost:20500/test-warehouse/iceberg_unicode
CAUSED BY: TableLoadingException: Refreshing file and block metadata for 1 
paths for table default.iceberg_unicode: failed to load 1 paths. Check the 
catalog server log for more details.
{code}

The partition directory created above seems truncated:
hdfs://localhost:20500/test-warehouse/iceberg_unicode/data/p=a

In partition Hive tables the insert also returns an error, but the new 
partition is not created and the table remains usable. The error is similar to 
IMPALA-11499's

Note that Java handles  \0 characters in unicode in a special way, which may be 
related: 
https://docs.oracle.com/javase/1.5.0/docs/guide/jni/spec/types.html#wp16542


  was:
Inserting strings with "\0" values to partition columns leads errors both in 
Iceberg and Hive tables. 

The issue is more severe in Iceberg tables as from this point the table can't 
be read in Impala or Hive:
{code}
create table iceberg_unicode (s string, p string) partitioned by spec 
(identity(p)) stored as iceberg;
insert into iceberg_unicode select "a", "a\0a";
ERROR: IcebergTableLoadingException: Error loading metadata for Iceberg table 
hdfs://localhost:20500/test-warehouse/iceberg_unicode
CAUSED BY: TableLoadingException: Refreshing file and block metadata for 1 
paths for table default.iceberg_unicode: failed to load 1 paths. Check the 
catalog server log for more details.
{code}

The partition directory created above seems truncated:
hdfs://localhost:20500/test-warehouse/iceberg_unicode/data/p=a

In partition Hive tables the insert also returns an error, but the new 
partition is not created and the table remains usable. The error is similar to 
IMPALA-11499's

Note Java handles  \0 characters in unicode in a special way, which may be 
related: 
https://docs.oracle.com/javase/1.5.0/docs/guide/jni/spec/types.html#wp16542



> Errors with \0 character in partition values
> 
>
> Key: IMPALA-12987
> URL: https://issues.apache.org/jira/browse/IMPALA-12987
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Csaba Ringhofer
>Priority: Critical
>  Labels: iceberg
>
> Inserting strings with "\0" values to partition columns leads errors both in 
> Iceberg and Hive tables. 
> The issue is more severe in Iceberg tables as from this point the table can't 
> be read in Impala or Hive:
> {code}
> create table iceberg_unicode (s string, p string) partitioned by spec 
> (identity(p)) stored as iceberg;
> insert into iceberg_unicode select "a", "a\0a";
> ERROR: IcebergTableLoadingException: Error loading metadata for Iceberg table 
> hdfs://localhost:20500/test-warehouse/iceberg_unicode
> CAUSED BY: TableLoadingException: Refreshing file and block metadata for 1 
> paths for table default.iceberg_unicode: failed to load 1 paths. Check the 
> catalog server log for more details.
> {code}
> The partition directory created above seems truncated:
> hdfs://localhost:20500/test-warehouse/iceberg_unicode/data/p=a
> In partition Hive tables the insert also returns an error, but the new 
> partition is not created and the table remains usable. The error is similar 
> to IMPALA-11499's
> Note that Java handles  \0 characters in unicode in a special way, which may 
> be related: 
> https://docs.oracle.com/javase/1.5.0/docs/guide/jni/spec/types.html#wp16542



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12987) Errors with \0 character in partition values

2024-04-09 Thread Csaba Ringhofer (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer updated IMPALA-12987:
-
Description: 
Inserting strings with "\0" values to partition columns leads errors both in 
Iceberg and Hive tables. 

The issue is more severe in Iceberg tables as from this point the table can't 
be read in Impala or Hive:
{code}
create table iceberg_unicode (s string, p string) partitioned by spec 
(identity(p)) stored as iceberg;
insert into iceberg_unicode select "a", "a\0a";
ERROR: IcebergTableLoadingException: Error loading metadata for Iceberg table 
hdfs://localhost:20500/test-warehouse/iceberg_unicode
CAUSED BY: TableLoadingException: Refreshing file and block metadata for 1 
paths for table default.iceberg_unicode: failed to load 1 paths. Check the 
catalog server log for more details.
{code}

The partition directory created above seems truncated:
hdfs://localhost:20500/test-warehouse/iceberg_unicode/data/p=a

In partition Hive tables the insert also returns an error, but the new 
partition is not created and the table remains usable. The error is similar to 
IMPALA-11499's

Note Java handles  \0 characters in unicode in a special way, which may be 
related: 
https://docs.oracle.com/javase/1.5.0/docs/guide/jni/spec/types.html#wp16542


  was:
Inserting strings with "\0" values to partition columns leads errors both in 
Iceberg and Hive tables. 

The issue is more severe in Iceberg tables as from this point the table can't 
be read in Impala or Hive:
{code}
create table iceberg_unicode (s string, p string) partitioned by spec 
(identity(p)) stored as iceberg;
insert into iceberg_unicode select "a", "a\0a";
ERROR: IcebergTableLoadingException: Error loading metadata for Iceberg table 
hdfs://localhost:20500/test-warehouse/iceberg_unicode
CAUSED BY: TableLoadingException: Refreshing file and block metadata for 1 
paths for table default.iceberg_unicode: failed to load 1 paths. Check the 
catalog server log for more details.
{code}

In partition Hive tables the insert also returns an error, but the new 
partition is not created and the table remains usable. The error is similar to 
IMPALA-11499's



> Errors with \0 character in partition values
> 
>
> Key: IMPALA-12987
> URL: https://issues.apache.org/jira/browse/IMPALA-12987
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Csaba Ringhofer
>Priority: Critical
>  Labels: iceberg
>
> Inserting strings with "\0" values to partition columns leads errors both in 
> Iceberg and Hive tables. 
> The issue is more severe in Iceberg tables as from this point the table can't 
> be read in Impala or Hive:
> {code}
> create table iceberg_unicode (s string, p string) partitioned by spec 
> (identity(p)) stored as iceberg;
> insert into iceberg_unicode select "a", "a\0a";
> ERROR: IcebergTableLoadingException: Error loading metadata for Iceberg table 
> hdfs://localhost:20500/test-warehouse/iceberg_unicode
> CAUSED BY: TableLoadingException: Refreshing file and block metadata for 1 
> paths for table default.iceberg_unicode: failed to load 1 paths. Check the 
> catalog server log for more details.
> {code}
> The partition directory created above seems truncated:
> hdfs://localhost:20500/test-warehouse/iceberg_unicode/data/p=a
> In partition Hive tables the insert also returns an error, but the new 
> partition is not created and the table remains usable. The error is similar 
> to IMPALA-11499's
> Note Java handles  \0 characters in unicode in a special way, which may be 
> related: 
> https://docs.oracle.com/javase/1.5.0/docs/guide/jni/spec/types.html#wp16542



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12987) Errors with \0 character in partition values

2024-04-09 Thread Csaba Ringhofer (Jira)
Csaba Ringhofer created IMPALA-12987:


 Summary: Errors with \0 character in partition values
 Key: IMPALA-12987
 URL: https://issues.apache.org/jira/browse/IMPALA-12987
 Project: IMPALA
  Issue Type: Bug
Reporter: Csaba Ringhofer


Inserting strings with "\0" values to partition columns leads errors both in 
Iceberg and Hive tables. 

The issue issue more severe in Iceberg tables as from this point the table 
can't be read in Impala or Hive:
{code}
 create table iceberg_unicode (s string, p string) partitioned by spec 
(identity(p)) stored as iceberg;
insert into iceberg_unicode select "a", "a\0a";
ERROR: IcebergTableLoadingException: Error loading metadata for Iceberg table 
hdfs://localhost:20500/test-warehouse/iceberg_unicode
CAUSED BY: TableLoadingException: Refreshing file and block metadata for 1 
paths for table default.iceberg_unicode: failed to load 1 paths. Check the 
catalog server log for more details.
{code}

In partition Hive tables the insert also returns an error, but the new 
partition is not created and the table remains usable. The error is similare to 
IMPALA-11499's




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12987) Errors with \0 character in partition values

2024-04-09 Thread Csaba Ringhofer (Jira)
Csaba Ringhofer created IMPALA-12987:


 Summary: Errors with \0 character in partition values
 Key: IMPALA-12987
 URL: https://issues.apache.org/jira/browse/IMPALA-12987
 Project: IMPALA
  Issue Type: Bug
Reporter: Csaba Ringhofer


Inserting strings with "\0" values to partition columns leads errors both in 
Iceberg and Hive tables. 

The issue issue more severe in Iceberg tables as from this point the table 
can't be read in Impala or Hive:
{code}
 create table iceberg_unicode (s string, p string) partitioned by spec 
(identity(p)) stored as iceberg;
insert into iceberg_unicode select "a", "a\0a";
ERROR: IcebergTableLoadingException: Error loading metadata for Iceberg table 
hdfs://localhost:20500/test-warehouse/iceberg_unicode
CAUSED BY: TableLoadingException: Refreshing file and block metadata for 1 
paths for table default.iceberg_unicode: failed to load 1 paths. Check the 
catalog server log for more details.
{code}

In partition Hive tables the insert also returns an error, but the new 
partition is not created and the table remains usable. The error is similare to 
IMPALA-11499's




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IMPALA-12987) Errors with \0 character in partition values

2024-04-09 Thread Csaba Ringhofer (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer updated IMPALA-12987:
-
Description: 
Inserting strings with "\0" values to partition columns leads errors both in 
Iceberg and Hive tables. 

The issue is more severe in Iceberg tables as from this point the table can't 
be read in Impala or Hive:
{code}
create table iceberg_unicode (s string, p string) partitioned by spec 
(identity(p)) stored as iceberg;
insert into iceberg_unicode select "a", "a\0a";
ERROR: IcebergTableLoadingException: Error loading metadata for Iceberg table 
hdfs://localhost:20500/test-warehouse/iceberg_unicode
CAUSED BY: TableLoadingException: Refreshing file and block metadata for 1 
paths for table default.iceberg_unicode: failed to load 1 paths. Check the 
catalog server log for more details.
{code}

In partition Hive tables the insert also returns an error, but the new 
partition is not created and the table remains usable. The error is similar to 
IMPALA-11499's


  was:
Inserting strings with "\0" values to partition columns leads errors both in 
Iceberg and Hive tables. 

The issue issue more severe in Iceberg tables as from this point the table 
can't be read in Impala or Hive:
{code}
 create table iceberg_unicode (s string, p string) partitioned by spec 
(identity(p)) stored as iceberg;
insert into iceberg_unicode select "a", "a\0a";
ERROR: IcebergTableLoadingException: Error loading metadata for Iceberg table 
hdfs://localhost:20500/test-warehouse/iceberg_unicode
CAUSED BY: TableLoadingException: Refreshing file and block metadata for 1 
paths for table default.iceberg_unicode: failed to load 1 paths. Check the 
catalog server log for more details.
{code}

In partition Hive tables the insert also returns an error, but the new 
partition is not created and the table remains usable. The error is similare to 
IMPALA-11499's



> Errors with \0 character in partition values
> 
>
> Key: IMPALA-12987
> URL: https://issues.apache.org/jira/browse/IMPALA-12987
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Csaba Ringhofer
>Priority: Critical
>  Labels: iceberg
>
> Inserting strings with "\0" values to partition columns leads errors both in 
> Iceberg and Hive tables. 
> The issue is more severe in Iceberg tables as from this point the table can't 
> be read in Impala or Hive:
> {code}
> create table iceberg_unicode (s string, p string) partitioned by spec 
> (identity(p)) stored as iceberg;
> insert into iceberg_unicode select "a", "a\0a";
> ERROR: IcebergTableLoadingException: Error loading metadata for Iceberg table 
> hdfs://localhost:20500/test-warehouse/iceberg_unicode
> CAUSED BY: TableLoadingException: Refreshing file and block metadata for 1 
> paths for table default.iceberg_unicode: failed to load 1 paths. Check the 
> catalog server log for more details.
> {code}
> In partition Hive tables the insert also returns an error, but the new 
> partition is not created and the table remains usable. The error is similar 
> to IMPALA-11499's



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12969) DeserializeThriftMsg may leak JNI resources

2024-04-08 Thread Csaba Ringhofer (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer updated IMPALA-12969:
-
Priority: Critical  (was: Major)

> DeserializeThriftMsg may leak JNI resources
> ---
>
> Key: IMPALA-12969
> URL: https://issues.apache.org/jira/browse/IMPALA-12969
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Csaba Ringhofer
>Priority: Critical
> Fix For: Impala 4.4.0
>
>
> JNI's GetByteArrayElements should be followed by a ReleaseByteArrayElements 
> call, but this is not done in case there is an error during deserialization:
> [https://github.com/apache/impala/blob/f05eac647647b5e03c3aafc35f785c73d07e2658/be/src/rpc/jni-thrift-util.h#L66]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-12969) DeserializeThriftMsg may leak JNI resources

2024-04-08 Thread Csaba Ringhofer (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer resolved IMPALA-12969.
--
Fix Version/s: Impala 4.4.0
   Resolution: Fixed

> DeserializeThriftMsg may leak JNI resources
> ---
>
> Key: IMPALA-12969
> URL: https://issues.apache.org/jira/browse/IMPALA-12969
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Csaba Ringhofer
>Priority: Major
> Fix For: Impala 4.4.0
>
>
> JNI's GetByteArrayElements should be followed by a ReleaseByteArrayElements 
> call, but this is not done in case there is an error during deserialization:
> [https://github.com/apache/impala/blob/f05eac647647b5e03c3aafc35f785c73d07e2658/be/src/rpc/jni-thrift-util.h#L66]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-12969) DeserializeThriftMsg may leak JNI resources

2024-04-08 Thread Csaba Ringhofer (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer resolved IMPALA-12969.
--
Fix Version/s: Impala 4.4.0
   Resolution: Fixed

> DeserializeThriftMsg may leak JNI resources
> ---
>
> Key: IMPALA-12969
> URL: https://issues.apache.org/jira/browse/IMPALA-12969
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Csaba Ringhofer
>Priority: Major
> Fix For: Impala 4.4.0
>
>
> JNI's GetByteArrayElements should be followed by a ReleaseByteArrayElements 
> call, but this is not done in case there is an error during deserialization:
> [https://github.com/apache/impala/blob/f05eac647647b5e03c3aafc35f785c73d07e2658/be/src/rpc/jni-thrift-util.h#L66]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-12978) IMPALA-12544 made impala-shell incompatible with old impala servers

2024-04-08 Thread Csaba Ringhofer (Jira)
Csaba Ringhofer created IMPALA-12978:


 Summary: IMPALA-12544 made impala-shell incompatible with old 
impala servers
 Key: IMPALA-12978
 URL: https://issues.apache.org/jira/browse/IMPALA-12978
 Project: IMPALA
  Issue Type: Bug
  Components: Clients
Reporter: Csaba Ringhofer


IMPALA-12544 uses  "progress.total_fragment_instances > 0:", but 
total_fragment_instances is None if the server is older and does not know this 
Thrift member yet (added in IMPALA-12048). 
[https://github.com/apache/impala/blob/fb3c379f395635f9f6927b40694bc3dd95a2866f/shell/impala_shell.py#L1320]

 

This leads to error messages in interactive shell sessions when progress 
reporting is enabled.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-12978) IMPALA-12544 made impala-shell incompatible with old impala servers

2024-04-08 Thread Csaba Ringhofer (Jira)
Csaba Ringhofer created IMPALA-12978:


 Summary: IMPALA-12544 made impala-shell incompatible with old 
impala servers
 Key: IMPALA-12978
 URL: https://issues.apache.org/jira/browse/IMPALA-12978
 Project: IMPALA
  Issue Type: Bug
  Components: Clients
Reporter: Csaba Ringhofer


IMPALA-12544 uses  "progress.total_fragment_instances > 0:", but 
total_fragment_instances is None if the server is older and does not know this 
Thrift member yet (added in IMPALA-12048). 
[https://github.com/apache/impala/blob/fb3c379f395635f9f6927b40694bc3dd95a2866f/shell/impala_shell.py#L1320]

 

This leads to error messages in interactive shell sessions when progress 
reporting is enabled.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12969) DeserializeThriftMsg may leak JNI resources

2024-04-03 Thread Csaba Ringhofer (Jira)
Csaba Ringhofer created IMPALA-12969:


 Summary: DeserializeThriftMsg may leak JNI resources
 Key: IMPALA-12969
 URL: https://issues.apache.org/jira/browse/IMPALA-12969
 Project: IMPALA
  Issue Type: Bug
Reporter: Csaba Ringhofer


JNI's GetByteArrayElements should be followed by a ReleaseByteArrayElements 
call, but this is not done in case there is an error during deserialization:

[https://github.com/apache/impala/blob/f05eac647647b5e03c3aafc35f785c73d07e2658/be/src/rpc/jni-thrift-util.h#L66]

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12969) DeserializeThriftMsg may leak JNI resources

2024-04-03 Thread Csaba Ringhofer (Jira)
Csaba Ringhofer created IMPALA-12969:


 Summary: DeserializeThriftMsg may leak JNI resources
 Key: IMPALA-12969
 URL: https://issues.apache.org/jira/browse/IMPALA-12969
 Project: IMPALA
  Issue Type: Bug
Reporter: Csaba Ringhofer


JNI's GetByteArrayElements should be followed by a ReleaseByteArrayElements 
call, but this is not done in case there is an error during deserialization:

[https://github.com/apache/impala/blob/f05eac647647b5e03c3aafc35f785c73d07e2658/be/src/rpc/jni-thrift-util.h#L66]

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IMPALA-12968) Early EndDataStream RPC could be responded earlier

2024-04-03 Thread Csaba Ringhofer (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer updated IMPALA-12968:
-
Description: 
When a producer fragment sends no rows and finishes before the receiver is 
initialized the EndDataStream rpc is stored as early sender and is responded 
when the receiver is registered.

[https://github.com/apache/impala/blob/effc9df933b46eb5b0acf55a858606415425505f/be/src/runtime/krpc-data-stream-mgr.cc#L150]

While it is important to store the information that the EOS has happened to 
unregister the sender from the receiver, the RPC itself could be responded 
right after it was stored in the early sender map.

  was:
When a producer fragment sends no rows and finishes before the receiver is 
initialized te e EndDataStream rpc is stored as early sender and is responded 
when the receiver is registered.

[https://github.com/apache/impala/blob/effc9df933b46eb5b0acf55a858606415425505f/be/src/runtime/krpc-data-stream-mgr.cc#L150]

While it is important to store the information that the EOS has happened to 
unregister the sender from the receiver, the RPC itself could be responded 
right after it was stored in the early sender map.


> Early EndDataStream RPC could be responded earlier
> --
>
> Key: IMPALA-12968
> URL: https://issues.apache.org/jira/browse/IMPALA-12968
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Csaba Ringhofer
>Priority: Minor
>  Labels: krpc
>
> When a producer fragment sends no rows and finishes before the receiver is 
> initialized the EndDataStream rpc is stored as early sender and is responded 
> when the receiver is registered.
> [https://github.com/apache/impala/blob/effc9df933b46eb5b0acf55a858606415425505f/be/src/runtime/krpc-data-stream-mgr.cc#L150]
> While it is important to store the information that the EOS has happened to 
> unregister the sender from the receiver, the RPC itself could be responded 
> right after it was stored in the early sender map.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12968) Early EndDataStream RPC could be responded earlier

2024-04-03 Thread Csaba Ringhofer (Jira)
Csaba Ringhofer created IMPALA-12968:


 Summary: Early EndDataStream RPC could be responded earlier
 Key: IMPALA-12968
 URL: https://issues.apache.org/jira/browse/IMPALA-12968
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Reporter: Csaba Ringhofer


When a producer fragment sends no rows and finishes before the receiver is 
initialized te e EndDataStream rpc is stored as early sender and is responded 
when the receiver is registered.

[https://github.com/apache/impala/blob/effc9df933b46eb5b0acf55a858606415425505f/be/src/runtime/krpc-data-stream-mgr.cc#L150]

While it is important to store the information that the EOS has happened to 
unregister the sender from the receiver, the RPC itself could be responded 
right after it was stored in the early sender map.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-12968) Early EndDataStream RPC could be responded earlier

2024-04-03 Thread Csaba Ringhofer (Jira)
Csaba Ringhofer created IMPALA-12968:


 Summary: Early EndDataStream RPC could be responded earlier
 Key: IMPALA-12968
 URL: https://issues.apache.org/jira/browse/IMPALA-12968
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Reporter: Csaba Ringhofer


When a producer fragment sends no rows and finishes before the receiver is 
initialized te e EndDataStream rpc is stored as early sender and is responded 
when the receiver is registered.

[https://github.com/apache/impala/blob/effc9df933b46eb5b0acf55a858606415425505f/be/src/runtime/krpc-data-stream-mgr.cc#L150]

While it is important to store the information that the EOS has happened to 
unregister the sender from the receiver, the RPC itself could be responded 
right after it was stored in the early sender map.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-10349) Revisit constant folding on non-ASCII strings

2024-03-25 Thread Csaba Ringhofer (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830545#comment-17830545
 ] 

Csaba Ringhofer edited comment on IMPALA-10349 at 3/25/24 3:55 PM:
---

Also bumped into this related to pushing down to Kudu:
{code:java}
explain select count(*) from functional_kudu.alltypes where string_col = "á";

-- kudu predicates: string_col = 'á'

explain select count(*) from functional_kudu.alltypes where string_col = 
concat("a", "")

-- kudu predicates: string_col = 'a'

explain select count(*) from functional_kudu.alltypes where string_col = 
concat("á", "")

-- not pushed down to Kudu:

-- predicates: string_col = concat('á', '') 

{code}
>I think we should allow folding non-ASCII strings if they are legal UTF-8 
>strings.

[~stigahuang]  Do you why is it not possible to fold strings that are not valid 
UTF-8?

Currently BINARY columns also use StringLiterals, a.g cast("a" as binary) will 
be folded to a StringLiteral. It would be useful to also fold expressions like 
cast(unhex("aa")  as binary) to be able to push them down to Kudu.


was (Author: csringhofer):
Also bumped into this related to pushing down to Kudu:

{code}

explain select count(*) from functional_kudu.alltypes where string_col = "á";

-- kudu predicates: string_col = 'á'

explain select count(*) from functional_kudu.alltypes where string_col = 
concat("a", "")

-- kudu predicates: string_col = 'a'

explain select count(*) from functional_kudu.alltypes where string_col = 
concat("á", "")

-- not pushed down to Kudu:

-- predicates: string_col = concat('á', '') 

{code} 

>I think we should allow folding non-ASCII strings if they are legal UTF-8 
>strings.

[~stigahuang]  Do you why is it not possible to fold strings that are not valid 
UTF-8?

Currently BINARY columns also use StringLiterals, a.g cast("a" as binary) will 
be folded to a StringLiteral. It would be useful to also fold expressions like 
cast(unhex("aa")  as binary) to be able to push them down to Kudu.

 

> Revisit constant folding on non-ASCII strings
> -
>
> Key: IMPALA-10349
> URL: https://issues.apache.org/jira/browse/IMPALA-10349
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Reporter: Quanlong Huang
>Priority: Critical
>
> Constant folding may produce non-ASCII strings. In such cases, we currently 
> abandon folding the constant. See commit message of IMPALA-1788 or codes 
> here: 
> [https://github.com/apache/impala/blob/9672d945963e1ca3c8699340f92d7d6ce1d91c9f/fe/src/main/java/org/apache/impala/analysis/LiteralExpr.java#L274-L282]
> I think we should allow folding non-ASCII strings if they are legal UTF-8 
> strings.
> Example of constant folding work:
> {code:java}
> Query: explain select * from functional.alltypes where string_col = 
> substr('123', 1, 1)
> +-+
> | Explain String  |
> +-+
> | Max Per-Host Resource Reservation: Memory=32.00KB Threads=3 |
> | Per-Host Resource Estimates: Memory=160MB   |
> | Codegen disabled by planner |
> | |
> | PLAN-ROOT SINK  |
> | |   |
> | 01:EXCHANGE [UNPARTITIONED] |
> | |   |
> | 00:SCAN HDFS [functional.alltypes]  |
> |HDFS partitions=24/24 files=24 size=478.45KB |
> |predicates: string_col = '1' |
> |row-size=89B cardinality=730 |
> +-+
> {code}
> Example of constant folding doesn't work:
> {code:java}
> Query: explain select * from functional.alltypes where string_col = 
> substr('引擎', 1, 3)
> +-+
> | Explain String  |
> +-+
> | Max Per-Host Resource Reservation: Memory=32.00KB Threads=3 |
> | Per-Host Resource Estimates: Memory=160MB   |
> | Codegen disabled by planner |
> | |
> | PLAN-ROOT SINK  |
> | |   |
> | 01:EXCHANGE [UNPARTITIONED] |
> | |   |
> | 00:SCAN HDFS 

[jira] [Comment Edited] (IMPALA-10349) Revisit constant folding on non-ASCII strings

2024-03-25 Thread Csaba Ringhofer (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830545#comment-17830545
 ] 

Csaba Ringhofer edited comment on IMPALA-10349 at 3/25/24 3:55 PM:
---

Also bumped into this related to pushing down to Kudu:
{code:java}
explain select count(*) from functional_kudu.alltypes where string_col = "á";

-- kudu predicates: string_col = 'á'

explain select count(*) from functional_kudu.alltypes where string_col = 
concat("a", "")

-- kudu predicates: string_col = 'a'

explain select count(*) from functional_kudu.alltypes where string_col = 
concat("á", "")

-- not pushed down to Kudu:

-- predicates: string_col = concat('á', '') 

{code}
>I think we should allow folding non-ASCII strings if they are legal UTF-8 
>strings.

[~stigahuang]  Do you know why is it not possible to fold strings that are not 
valid UTF-8?

Currently BINARY columns also use StringLiterals, a.g cast("a" as binary) will 
be folded to a StringLiteral. It would be useful to also fold expressions like 
cast(unhex("aa")  as binary) to be able to push them down to Kudu.


was (Author: csringhofer):
Also bumped into this related to pushing down to Kudu:
{code:java}
explain select count(*) from functional_kudu.alltypes where string_col = "á";

-- kudu predicates: string_col = 'á'

explain select count(*) from functional_kudu.alltypes where string_col = 
concat("a", "")

-- kudu predicates: string_col = 'a'

explain select count(*) from functional_kudu.alltypes where string_col = 
concat("á", "")

-- not pushed down to Kudu:

-- predicates: string_col = concat('á', '') 

{code}
>I think we should allow folding non-ASCII strings if they are legal UTF-8 
>strings.

[~stigahuang]  Do you why is it not possible to fold strings that are not valid 
UTF-8?

Currently BINARY columns also use StringLiterals, a.g cast("a" as binary) will 
be folded to a StringLiteral. It would be useful to also fold expressions like 
cast(unhex("aa")  as binary) to be able to push them down to Kudu.

> Revisit constant folding on non-ASCII strings
> -
>
> Key: IMPALA-10349
> URL: https://issues.apache.org/jira/browse/IMPALA-10349
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Reporter: Quanlong Huang
>Priority: Critical
>
> Constant folding may produce non-ASCII strings. In such cases, we currently 
> abandon folding the constant. See commit message of IMPALA-1788 or codes 
> here: 
> [https://github.com/apache/impala/blob/9672d945963e1ca3c8699340f92d7d6ce1d91c9f/fe/src/main/java/org/apache/impala/analysis/LiteralExpr.java#L274-L282]
> I think we should allow folding non-ASCII strings if they are legal UTF-8 
> strings.
> Example of constant folding work:
> {code:java}
> Query: explain select * from functional.alltypes where string_col = 
> substr('123', 1, 1)
> +-+
> | Explain String  |
> +-+
> | Max Per-Host Resource Reservation: Memory=32.00KB Threads=3 |
> | Per-Host Resource Estimates: Memory=160MB   |
> | Codegen disabled by planner |
> | |
> | PLAN-ROOT SINK  |
> | |   |
> | 01:EXCHANGE [UNPARTITIONED] |
> | |   |
> | 00:SCAN HDFS [functional.alltypes]  |
> |HDFS partitions=24/24 files=24 size=478.45KB |
> |predicates: string_col = '1' |
> |row-size=89B cardinality=730 |
> +-+
> {code}
> Example of constant folding doesn't work:
> {code:java}
> Query: explain select * from functional.alltypes where string_col = 
> substr('引擎', 1, 3)
> +-+
> | Explain String  |
> +-+
> | Max Per-Host Resource Reservation: Memory=32.00KB Threads=3 |
> | Per-Host Resource Estimates: Memory=160MB   |
> | Codegen disabled by planner |
> | |
> | PLAN-ROOT SINK  |
> | |   |
> | 01:EXCHANGE [UNPARTITIONED] |
> | |   |
> | 00:SCAN HDFS 

[jira] [Commented] (IMPALA-10349) Revisit constant folding on non-ASCII strings

2024-03-25 Thread Csaba Ringhofer (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830545#comment-17830545
 ] 

Csaba Ringhofer commented on IMPALA-10349:
--

Also bumped into this related to pushing down to Kudu:

{code}

explain select count(*) from functional_kudu.alltypes where string_col = "á";

-- kudu predicates: string_col = 'á'

explain select count(*) from functional_kudu.alltypes where string_col = 
concat("a", "")

-- kudu predicates: string_col = 'a'

explain select count(*) from functional_kudu.alltypes where string_col = 
concat("á", "")

-- not pushed down to Kudu:

-- predicates: string_col = concat('á', '') 

{code} 

>I think we should allow folding non-ASCII strings if they are legal UTF-8 
>strings.

[~stigahuang]  Do you why is it not possible to fold strings that are not valid 
UTF-8?

Currently BINARY columns also use StringLiterals, a.g cast("a" as binary) will 
be folded to a StringLiteral. It would be useful to also fold expressions like 
cast(unhex("aa")  as binary) to be able to push them down to Kudu.

 

> Revisit constant folding on non-ASCII strings
> -
>
> Key: IMPALA-10349
> URL: https://issues.apache.org/jira/browse/IMPALA-10349
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Reporter: Quanlong Huang
>Priority: Critical
>
> Constant folding may produce non-ASCII strings. In such cases, we currently 
> abandon folding the constant. See commit message of IMPALA-1788 or codes 
> here: 
> [https://github.com/apache/impala/blob/9672d945963e1ca3c8699340f92d7d6ce1d91c9f/fe/src/main/java/org/apache/impala/analysis/LiteralExpr.java#L274-L282]
> I think we should allow folding non-ASCII strings if they are legal UTF-8 
> strings.
> Example of constant folding work:
> {code:java}
> Query: explain select * from functional.alltypes where string_col = 
> substr('123', 1, 1)
> +-+
> | Explain String  |
> +-+
> | Max Per-Host Resource Reservation: Memory=32.00KB Threads=3 |
> | Per-Host Resource Estimates: Memory=160MB   |
> | Codegen disabled by planner |
> | |
> | PLAN-ROOT SINK  |
> | |   |
> | 01:EXCHANGE [UNPARTITIONED] |
> | |   |
> | 00:SCAN HDFS [functional.alltypes]  |
> |HDFS partitions=24/24 files=24 size=478.45KB |
> |predicates: string_col = '1' |
> |row-size=89B cardinality=730 |
> +-+
> {code}
> Example of constant folding doesn't work:
> {code:java}
> Query: explain select * from functional.alltypes where string_col = 
> substr('引擎', 1, 3)
> +-+
> | Explain String  |
> +-+
> | Max Per-Host Resource Reservation: Memory=32.00KB Threads=3 |
> | Per-Host Resource Estimates: Memory=160MB   |
> | Codegen disabled by planner |
> | |
> | PLAN-ROOT SINK  |
> | |   |
> | 01:EXCHANGE [UNPARTITIONED] |
> | |   |
> | 00:SCAN HDFS [functional.alltypes]  |
> |HDFS partitions=24/24 files=24 size=478.45KB |
> |predicates: string_col = substr('引擎', 1, 3)|
> |row-size=89B cardinality=730 |
> +-+
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12927) Support reading BINARY columns in JSON tables

2024-03-22 Thread Csaba Ringhofer (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829953#comment-17829953
 ] 

Csaba Ringhofer commented on IMPALA-12927:
--

I think that the best would be to check tbl property "json.binary.format":
 * if not set, give a clear error message
 * if base64, do base64 decoding
 * if rawstring, handle it the way Hive does: 
[https://github.com/apache/hive/blame/f216bbb632752f467321869cee03adf9477409cf/serde/src/java/org/apache/hadoop/hive/serde2/json/HiveJsonReader.java#L455]

Note that I am don't know how exactly special characters are handled in the 
rawstring case.

> Support reading BINARY columns in JSON tables
> -
>
> Key: IMPALA-12927
> URL: https://issues.apache.org/jira/browse/IMPALA-12927
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Csaba Ringhofer
>Assignee: Zihao Ye
>Priority: Major
>
> Currently Impala cannot read BINARY columns in JSON files written by Hive 
> correctly and returns runtime errors:
> {code}
> select * from functional_json.binary_tbl;
> ++--++
> | id | string_col   | binary_col |
> ++--++
> | 1  | ascii        | NULL       |
> | 2  | ascii        | NULL       |
> | 3  | null         | NULL       |
> | 4  | empty        |            |
> | 5  | valid utf8   | NULL       |
> | 6  | valid utf8   | NULL       |
> | 7  | invalid utf8 | NULL       |
> | 8  | invalid utf8 | NULL       |
> ++--++
> WARNINGS: Error converting column: functional_json.binary_tbl.binary_col, 
> type: STRING, data: 'binary1'
> Error parsing row: file: 
> hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before 
> offset: 481
> Error converting column: functional_json.binary_tbl.binary_col, type: STRING, 
> data: 'binary2'
> Error parsing row: file: 
> hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before 
> offset: 481
> Error converting column: functional_json.binary_tbl.binary_col, type: STRING, 
> data: 'árvíztűrőtükörfúró'
> Error parsing row: file: 
> hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before 
> offset: 481
> Error converting column: functional_json.binary_tbl.binary_col, type: STRING, 
> data: '你好hello'
> Error parsing row: file: 
> hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before 
> offset: 481
> Error converting column: functional_json.binary_tbl.binary_col, type: STRING, 
> data: '��'
> Error parsing row: file: 
> hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before 
> offset: 481
> Error converting column: functional_json.binary_tbl.binary_col, type: STRING, 
> data: '�D3"'
> Error parsing row: file: 
> hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before 
> offset: 481
> {code}
> The single file in the table looks like this:
> {code}
>  hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0
> {"id":1,"string_col":"ascii","binary_col":"binary1"}
> {"id":2,"string_col":"ascii","binary_col":"binary2"}
> {"id":3,"string_col":"null","binary_col":null}
> {"id":4,"string_col":"empty","binary_col":""}
> {"id":5,"string_col":"valid utf8","binary_col":"árvíztűrőtükörfúró"}
> {"id":6,"string_col":"valid utf8","binary_col":"你好hello"}
> {"id":7,"string_col":"invalid utf8","binary_col":"\u�\u�"}
> {"id":8,"string_col":"invalid utf8","binary_col":"�D3\"\u0011\u"}
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-12927) Support reading BINARY columns in JSON tables

2024-03-21 Thread Csaba Ringhofer (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829614#comment-17829614
 ] 

Csaba Ringhofer edited comment on IMPALA-12927 at 3/21/24 3:47 PM:
---

[~Eyizoha]  About AuxColumnType: fyi there is an ongoing refactor to remove 
that class and make it easier to decided whether a column is STRING or BINARY: 
[https://gerrit.cloudera.org/#/c/21157/]

About encoding of BINARY columns: I looked at the Hive code, but it doesn't 
match with the encoding I see in the files.

[https://github.com/apache/hive/blob/9a0ce4e15890aa91f05322e845438e1e8830b1c3/serde/src/java/org/apache/hadoop/hive/serde2/JsonSerDe.java#L135]

Current Apache Hive seems to default to using base64 encoding, while it can be 
altered with tbl property "json.binary.format". In the JSON tables in Impala's 
dataload the files are certainly not base64 encoded and "json.binary.format" is 
also not set, so it doesn't seem to work like the current Hive codebase. It is 
possible that this is related to differences between Apache Impala's Hive 
dependency and current Apache Hive.

Currently Impala base64 decodes the BINARY columns:
{code:java}
Hive:

create table tjsonbinary (string s, binary b) stored as JSONFILE;

insert into tjsonbinary values ("abcd", base64(cast("abcd" as binary)));

Impala:

select * from tjsonbinary;

+--+--+
| s    | b    |
+--+--+
| abcd | abcd |
+--+--+

{code}
What do you think about disabling BINARY column reading in JSON until Hive 
compatibility is clarified? My concern is that besides error messages and 
nulled values this may actually lead to correctness issues as many strings are 
both valid utf8 strings and base64 strings, so Impala may return unintended 
results.


was (Author: csringhofer):
[~Eyizoha]  About AuxColumnType: fyi is there is an ongoing refactor to remove 
that class and make it easier to decided whether a column is STRING or BINARY: 
[https://gerrit.cloudera.org/#/c/21157/]

About encoding of BINARY columns: I looked at the Hive code, but it doesn't 
match with the encoding I see in the files.

[https://github.com/apache/hive/blob/9a0ce4e15890aa91f05322e845438e1e8830b1c3/serde/src/java/org/apache/hadoop/hive/serde2/JsonSerDe.java#L135]

Current Apache Hive seems to default to using base64 encoding, while it can be 
altered with tbl property "json.binary.format". In the JSON tables in Impala's 
dataload the files are certainly not base64 encoded and "json.binary.format" is 
also not set, so it doesn't seem to work like the current Hive codebase. It is 
possible that this is related to differences between Apache Impala's Hive 
dependency and current Apache Hive.

Currently Impala base64 decodes the BINARY columns:

{code}

Hive:

create table tjsonbinary (string s, binary b) stored as JSONFILE;

insert into tjsonbinary values ("abcd", base64(cast("abcd" as binary)));

Impala:

select * from tjsonbinary;

+--+--+
| s    | b    |
+--+--+
| abcd | abcd |
+--+--+

{code}

What do you think about disabling BINARY column reading in JSON until Hive 
compatibility is clarified? My concern is that besides error messages and 
nulled values this may actually lead to correctness issues as many strings are 
both valid utf8 strings and base64 strings, so Impala may return unintended 
results.

> Support reading BINARY columns in JSON tables
> -
>
> Key: IMPALA-12927
> URL: https://issues.apache.org/jira/browse/IMPALA-12927
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Csaba Ringhofer
>Assignee: Zihao Ye
>Priority: Major
>
> Currently Impala cannot read BINARY columns in JSON files written by Hive 
> correctly and returns runtime errors:
> {code}
> select * from functional_json.binary_tbl;
> ++--++
> | id | string_col   | binary_col |
> ++--++
> | 1  | ascii        | NULL       |
> | 2  | ascii        | NULL       |
> | 3  | null         | NULL       |
> | 4  | empty        |            |
> | 5  | valid utf8   | NULL       |
> | 6  | valid utf8   | NULL       |
> | 7  | invalid utf8 | NULL       |
> | 8  | invalid utf8 | NULL       |
> ++--++
> WARNINGS: Error converting column: functional_json.binary_tbl.binary_col, 
> type: STRING, data: 'binary1'
> Error parsing row: file: 
> hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before 
> offset: 481
> Error converting column: functional_json.binary_tbl.binary_col, type: STRING, 
> data: 'binary2'
> Error parsing row: file: 
> hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before 
> offset: 481
> Error converting column: functional_json.binary_tbl.binary_col, type: STRING, 
> data: 

[jira] [Commented] (IMPALA-12927) Support reading BINARY columns in JSON tables

2024-03-21 Thread Csaba Ringhofer (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829614#comment-17829614
 ] 

Csaba Ringhofer commented on IMPALA-12927:
--

[~Eyizoha]  About AuxColumnType: fyi is there is an ongoing refactor to remove 
that class and make it easier to decided whether a column is STRING or BINARY: 
[https://gerrit.cloudera.org/#/c/21157/]

About encoding of BINARY columns: I looked at the Hive code, but it doesn't 
match with the encoding I see in the files.

[https://github.com/apache/hive/blob/9a0ce4e15890aa91f05322e845438e1e8830b1c3/serde/src/java/org/apache/hadoop/hive/serde2/JsonSerDe.java#L135]

Current Apache Hive seems to default to using base64 encoding, while it can be 
altered with tbl property "json.binary.format". In the JSON tables in Impala's 
dataload the files are certainly not base64 encoded and "json.binary.format" is 
also not set, so it doesn't seem to work like the current Hive codebase. It is 
possible that this is related to differences between Apache Impala's Hive 
dependency and current Apache Hive.

Currently Impala base64 decodes the BINARY columns:

{code}

Hive:

create table tjsonbinary (string s, binary b) stored as JSONFILE;

insert into tjsonbinary values ("abcd", base64(cast("abcd" as binary)));

Impala:

select * from tjsonbinary;

+--+--+
| s    | b    |
+--+--+
| abcd | abcd |
+--+--+

{code}

What do you think about disabling BINARY column reading in JSON until Hive 
compatibility is clarified? My concern is that besides error messages and 
nulled values this may actually lead to correctness issues as many strings are 
both valid utf8 strings and base64 strings, so Impala may return unintended 
results.

> Support reading BINARY columns in JSON tables
> -
>
> Key: IMPALA-12927
> URL: https://issues.apache.org/jira/browse/IMPALA-12927
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Csaba Ringhofer
>Assignee: Zihao Ye
>Priority: Major
>
> Currently Impala cannot read BINARY columns in JSON files written by Hive 
> correctly and returns runtime errors:
> {code}
> select * from functional_json.binary_tbl;
> ++--++
> | id | string_col   | binary_col |
> ++--++
> | 1  | ascii        | NULL       |
> | 2  | ascii        | NULL       |
> | 3  | null         | NULL       |
> | 4  | empty        |            |
> | 5  | valid utf8   | NULL       |
> | 6  | valid utf8   | NULL       |
> | 7  | invalid utf8 | NULL       |
> | 8  | invalid utf8 | NULL       |
> ++--++
> WARNINGS: Error converting column: functional_json.binary_tbl.binary_col, 
> type: STRING, data: 'binary1'
> Error parsing row: file: 
> hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before 
> offset: 481
> Error converting column: functional_json.binary_tbl.binary_col, type: STRING, 
> data: 'binary2'
> Error parsing row: file: 
> hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before 
> offset: 481
> Error converting column: functional_json.binary_tbl.binary_col, type: STRING, 
> data: 'árvíztűrőtükörfúró'
> Error parsing row: file: 
> hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before 
> offset: 481
> Error converting column: functional_json.binary_tbl.binary_col, type: STRING, 
> data: '你好hello'
> Error parsing row: file: 
> hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before 
> offset: 481
> Error converting column: functional_json.binary_tbl.binary_col, type: STRING, 
> data: '��'
> Error parsing row: file: 
> hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before 
> offset: 481
> Error converting column: functional_json.binary_tbl.binary_col, type: STRING, 
> data: '�D3"'
> Error parsing row: file: 
> hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before 
> offset: 481
> {code}
> The single file in the table looks like this:
> {code}
>  hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0
> {"id":1,"string_col":"ascii","binary_col":"binary1"}
> {"id":2,"string_col":"ascii","binary_col":"binary2"}
> {"id":3,"string_col":"null","binary_col":null}
> {"id":4,"string_col":"empty","binary_col":""}
> {"id":5,"string_col":"valid utf8","binary_col":"árvíztűrőtükörfúró"}
> {"id":6,"string_col":"valid utf8","binary_col":"你好hello"}
> {"id":7,"string_col":"invalid utf8","binary_col":"\u�\u�"}
> {"id":8,"string_col":"invalid utf8","binary_col":"�D3\"\u0011\u"}
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: 

[jira] [Commented] (IMPALA-12927) Support reading BINARY columns in JSON tables

2024-03-20 Thread Csaba Ringhofer (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829192#comment-17829192
 ] 

Csaba Ringhofer commented on IMPALA-12927:
--

[~Eyizoha] I see that BINARY tests are explicitly skipped for JSON, but I 
couldn't find any discussion about this in the commit that add the JSON scanner:

[https://gerrit.cloudera.org/#/c/19699/33/tests/query_test/test_scanners.py]

Do you have an idea on what to do with BINARY columns? I am not familiar with 
Hive's JSON files, so I don't know what is the intended encoding for BINARY 
columns. I know that the JSON format doesn't support binary values, so 
generally some encoding (e.g. base64) is used to convert byte arrays to some 
ascii representation. 

> Support reading BINARY columns in JSON tables
> -
>
> Key: IMPALA-12927
> URL: https://issues.apache.org/jira/browse/IMPALA-12927
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Csaba Ringhofer
>Priority: Major
>
> Currently Impala cannot read BINARY columns in JSON files written by Hive 
> correctly and returns runtime errors:
> {code}
> select * from functional_json.binary_tbl;
> ++--++
> | id | string_col   | binary_col |
> ++--++
> | 1  | ascii        | NULL       |
> | 2  | ascii        | NULL       |
> | 3  | null         | NULL       |
> | 4  | empty        |            |
> | 5  | valid utf8   | NULL       |
> | 6  | valid utf8   | NULL       |
> | 7  | invalid utf8 | NULL       |
> | 8  | invalid utf8 | NULL       |
> ++--++
> WARNINGS: Error converting column: functional_json.binary_tbl.binary_col, 
> type: STRING, data: 'binary1'
> Error parsing row: file: 
> hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before 
> offset: 481
> Error converting column: functional_json.binary_tbl.binary_col, type: STRING, 
> data: 'binary2'
> Error parsing row: file: 
> hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before 
> offset: 481
> Error converting column: functional_json.binary_tbl.binary_col, type: STRING, 
> data: 'árvíztűrőtükörfúró'
> Error parsing row: file: 
> hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before 
> offset: 481
> Error converting column: functional_json.binary_tbl.binary_col, type: STRING, 
> data: '你好hello'
> Error parsing row: file: 
> hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before 
> offset: 481
> Error converting column: functional_json.binary_tbl.binary_col, type: STRING, 
> data: '��'
> Error parsing row: file: 
> hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before 
> offset: 481
> Error converting column: functional_json.binary_tbl.binary_col, type: STRING, 
> data: '�D3"'
> Error parsing row: file: 
> hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before 
> offset: 481
> {code}
> The single file in the table looks like this:
> {code}
>  hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0
> {"id":1,"string_col":"ascii","binary_col":"binary1"}
> {"id":2,"string_col":"ascii","binary_col":"binary2"}
> {"id":3,"string_col":"null","binary_col":null}
> {"id":4,"string_col":"empty","binary_col":""}
> {"id":5,"string_col":"valid utf8","binary_col":"árvíztűrőtükörfúró"}
> {"id":6,"string_col":"valid utf8","binary_col":"你好hello"}
> {"id":7,"string_col":"invalid utf8","binary_col":"\u�\u�"}
> {"id":8,"string_col":"invalid utf8","binary_col":"�D3\"\u0011\u"}
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12927) Support reading BINARY columns in JSON tables

2024-03-20 Thread Csaba Ringhofer (Jira)
Csaba Ringhofer created IMPALA-12927:


 Summary: Support reading BINARY columns in JSON tables
 Key: IMPALA-12927
 URL: https://issues.apache.org/jira/browse/IMPALA-12927
 Project: IMPALA
  Issue Type: Sub-task
  Components: Backend
Reporter: Csaba Ringhofer


Currently Impala cannot read BINARY columns in JSON files written by Hive 
correctly and returns runtime errors:

{code}

select * from functional_json.binary_tbl;
++--++
| id | string_col   | binary_col |
++--++
| 1  | ascii        | NULL       |
| 2  | ascii        | NULL       |
| 3  | null         | NULL       |
| 4  | empty        |            |
| 5  | valid utf8   | NULL       |
| 6  | valid utf8   | NULL       |
| 7  | invalid utf8 | NULL       |
| 8  | invalid utf8 | NULL       |
++--++
WARNINGS: Error converting column: functional_json.binary_tbl.binary_col, type: 
STRING, data: 'binary1'
Error parsing row: file: 
hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before offset: 
481
Error converting column: functional_json.binary_tbl.binary_col, type: STRING, 
data: 'binary2'
Error parsing row: file: 
hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before offset: 
481
Error converting column: functional_json.binary_tbl.binary_col, type: STRING, 
data: 'árvíztűrőtükörfúró'
Error parsing row: file: 
hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before offset: 
481
Error converting column: functional_json.binary_tbl.binary_col, type: STRING, 
data: '你好hello'
Error parsing row: file: 
hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before offset: 
481
Error converting column: functional_json.binary_tbl.binary_col, type: STRING, 
data: '��'
Error parsing row: file: 
hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before offset: 
481
Error converting column: functional_json.binary_tbl.binary_col, type: STRING, 
data: '�D3"'
Error parsing row: file: 
hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before offset: 
481

{code}

The single file in the table looks like this:

{code}

 hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0

{"id":1,"string_col":"ascii","binary_col":"binary1"}
{"id":2,"string_col":"ascii","binary_col":"binary2"}
{"id":3,"string_col":"null","binary_col":null}
{"id":4,"string_col":"empty","binary_col":""}
{"id":5,"string_col":"valid utf8","binary_col":"árvíztűrőtükörfúró"}
{"id":6,"string_col":"valid utf8","binary_col":"你好hello"}
{"id":7,"string_col":"invalid utf8","binary_col":"\u�\u�"}
{"id":8,"string_col":"invalid utf8","binary_col":"�D3\"\u0011\u"}

{code}

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-12927) Support reading BINARY columns in JSON tables

2024-03-20 Thread Csaba Ringhofer (Jira)
Csaba Ringhofer created IMPALA-12927:


 Summary: Support reading BINARY columns in JSON tables
 Key: IMPALA-12927
 URL: https://issues.apache.org/jira/browse/IMPALA-12927
 Project: IMPALA
  Issue Type: Sub-task
  Components: Backend
Reporter: Csaba Ringhofer


Currently Impala cannot read BINARY columns in JSON files written by Hive 
correctly and returns runtime errors:

{code}

select * from functional_json.binary_tbl;
++--++
| id | string_col   | binary_col |
++--++
| 1  | ascii        | NULL       |
| 2  | ascii        | NULL       |
| 3  | null         | NULL       |
| 4  | empty        |            |
| 5  | valid utf8   | NULL       |
| 6  | valid utf8   | NULL       |
| 7  | invalid utf8 | NULL       |
| 8  | invalid utf8 | NULL       |
++--++
WARNINGS: Error converting column: functional_json.binary_tbl.binary_col, type: 
STRING, data: 'binary1'
Error parsing row: file: 
hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before offset: 
481
Error converting column: functional_json.binary_tbl.binary_col, type: STRING, 
data: 'binary2'
Error parsing row: file: 
hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before offset: 
481
Error converting column: functional_json.binary_tbl.binary_col, type: STRING, 
data: 'árvíztűrőtükörfúró'
Error parsing row: file: 
hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before offset: 
481
Error converting column: functional_json.binary_tbl.binary_col, type: STRING, 
data: '你好hello'
Error parsing row: file: 
hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before offset: 
481
Error converting column: functional_json.binary_tbl.binary_col, type: STRING, 
data: '��'
Error parsing row: file: 
hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before offset: 
481
Error converting column: functional_json.binary_tbl.binary_col, type: STRING, 
data: '�D3"'
Error parsing row: file: 
hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before offset: 
481

{code}

The single file in the table looks like this:

{code}

 hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0

{"id":1,"string_col":"ascii","binary_col":"binary1"}
{"id":2,"string_col":"ascii","binary_col":"binary2"}
{"id":3,"string_col":"null","binary_col":null}
{"id":4,"string_col":"empty","binary_col":""}
{"id":5,"string_col":"valid utf8","binary_col":"árvíztűrőtükörfúró"}
{"id":6,"string_col":"valid utf8","binary_col":"你好hello"}
{"id":7,"string_col":"invalid utf8","binary_col":"\u�\u�"}
{"id":8,"string_col":"invalid utf8","binary_col":"�D3\"\u0011\u"}

{code}

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12899) Temporary workaround for BINARY in complex types

2024-03-19 Thread Csaba Ringhofer (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828387#comment-17828387
 ] 

Csaba Ringhofer commented on IMPALA-12899:
--

base64 encoding seems a sane and widely used approach to me. I would suggest 
the following:
 # implement it first with base64 encoding
 # if there is demand to handle this differently, a query option like 
binary_column_encoding_in_json=base64 / skip / hive_style_unquoted_string

I would avoid a "lossy" solution as default, so one where the original binary 
value can't be decoded from the output.

> Temporary workaround for BINARY in complex types
> 
>
> Key: IMPALA-12899
> URL: https://issues.apache.org/jira/browse/IMPALA-12899
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Daniel Becker
>Assignee: Daniel Becker
>Priority: Major
>
> The BINARY type is currently not supported inside complex types and a 
> cross-component decision is probably needed to support it (see IMPALA-11491). 
> We would like to enable EXPAND_COMPLEX_TYPES for Iceberg metadata tables 
> (IMPALA-12612), which requires that queries with BINARY inside complex types 
> don't fail. Enabling EXPAND_COMPLEX_TYPES is a more prioritised issue than 
> IMPALA-11491, so we should come up with a temporary solution, e.g. NULLing 
> BINARY values in complex types and logging a warning, or setting these BINARY 
> values to a warning string.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12902) Event replication is can be broken if hms_event_incremental_refresh_transactional_table=false

2024-03-14 Thread Csaba Ringhofer (Jira)
Csaba Ringhofer created IMPALA-12902:


 Summary: Event replication is can be broken if 
hms_event_incremental_refresh_transactional_table=false
 Key: IMPALA-12902
 URL: https://issues.apache.org/jira/browse/IMPALA-12902
 Project: IMPALA
  Issue Type: Bug
  Components: Catalog
Reporter: Csaba Ringhofer


when setting hms_event_incremental_refresh_transactional_table=false 
metadata.test_event_processing.TestEventProcessing.test_event_based_replication 
fails at the following assert:

[https://github.com/apache/impala/blob/6c0c26146d956ad771cee27283c1371b9c23adce/tests/metadata/test_event_processing_base.py#L234]

 

Based on the logs catalogd only sees alter_database and transaction events in 
this case, so if the transaction events (COMMIT_TXN) are ignore, then it 
doesn't detect the change in the table.

This seems strange as the commit that added the test is older than the one that 
added hms_event_incremental_refresh_transactional_table

[https://github.com/apache/impala/commit/e53d649f8a88f42a70237fe7c2663baa126fed1a]

vs

[https://github.com/apache/impala/commit/097b10104f23e0927d5b21b43a79f6cc10425f59]

 

So it is not clear to me how could the test pass originally. One possibility is 
that different events were generated in HMS at that time. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12902) Event replication is can be broken if hms_event_incremental_refresh_transactional_table=false

2024-03-14 Thread Csaba Ringhofer (Jira)
Csaba Ringhofer created IMPALA-12902:


 Summary: Event replication is can be broken if 
hms_event_incremental_refresh_transactional_table=false
 Key: IMPALA-12902
 URL: https://issues.apache.org/jira/browse/IMPALA-12902
 Project: IMPALA
  Issue Type: Bug
  Components: Catalog
Reporter: Csaba Ringhofer


when setting hms_event_incremental_refresh_transactional_table=false 
metadata.test_event_processing.TestEventProcessing.test_event_based_replication 
fails at the following assert:

[https://github.com/apache/impala/blob/6c0c26146d956ad771cee27283c1371b9c23adce/tests/metadata/test_event_processing_base.py#L234]

 

Based on the logs catalogd only sees alter_database and transaction events in 
this case, so if the transaction events (COMMIT_TXN) are ignore, then it 
doesn't detect the change in the table.

This seems strange as the commit that added the test is older than the one that 
added hms_event_incremental_refresh_transactional_table

[https://github.com/apache/impala/commit/e53d649f8a88f42a70237fe7c2663baa126fed1a]

vs

[https://github.com/apache/impala/commit/097b10104f23e0927d5b21b43a79f6cc10425f59]

 

So it is not clear to me how could the test pass originally. One possibility is 
that different events were generated in HMS at that time. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IMPALA-12902) Event replication can be broken if hms_event_incremental_refresh_transactional_table=false

2024-03-14 Thread Csaba Ringhofer (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer updated IMPALA-12902:
-
Summary: Event replication can be broken if 
hms_event_incremental_refresh_transactional_table=false  (was: Event 
replication is can be broken if 
hms_event_incremental_refresh_transactional_table=false)

> Event replication can be broken if 
> hms_event_incremental_refresh_transactional_table=false
> --
>
> Key: IMPALA-12902
> URL: https://issues.apache.org/jira/browse/IMPALA-12902
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Csaba Ringhofer
>Priority: Major
>
> when setting hms_event_incremental_refresh_transactional_table=false 
> metadata.test_event_processing.TestEventProcessing.test_event_based_replication
>  fails at the following assert:
> [https://github.com/apache/impala/blob/6c0c26146d956ad771cee27283c1371b9c23adce/tests/metadata/test_event_processing_base.py#L234]
>  
> Based on the logs catalogd only sees alter_database and transaction events in 
> this case, so if the transaction events (COMMIT_TXN) are ignore, then it 
> doesn't detect the change in the table.
> This seems strange as the commit that added the test is older than the one 
> that added hms_event_incremental_refresh_transactional_table
> [https://github.com/apache/impala/commit/e53d649f8a88f42a70237fe7c2663baa126fed1a]
> vs
> [https://github.com/apache/impala/commit/097b10104f23e0927d5b21b43a79f6cc10425f59]
>  
> So it is not clear to me how could the test pass originally. One possibility 
> is that different events were generated in HMS at that time. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12895) REFRESH doesn't detect changes in partition locations in ACID tables

2024-03-12 Thread Csaba Ringhofer (Jira)
Csaba Ringhofer created IMPALA-12895:


 Summary: REFRESH doesn't detect changes in partition locations in 
ACID tables
 Key: IMPALA-12895
 URL: https://issues.apache.org/jira/browse/IMPALA-12895
 Project: IMPALA
  Issue Type: Bug
  Components: Catalog
Reporter: Csaba Ringhofer


This was discovered by running test 
metadata.test_event_processing.TestEventProcessing.test_transact_partition_location_change_from_hive
 when flag hms_event_incremental_refresh_transactional_table  is set to false.

[https://github.com/apache/impala/blob/ab6c9467f6347671b971dbce4c640bea032b6ed9/tests/metadata/test_event_processing.py#L164]

 

When hms_event_incremental_refresh_transactional_table  is true (default), the 
alter partition event is processed correctly and the location change is 
detected. But if it is false or event processing is turned off, the change is 
not detected and running REFRESH on the table also doesn't update the location.

The different handling based on the flag seems intentional:

https://github.com/apache/impala/blob/ab6c9467f6347671b971dbce4c640bea032b6ed9/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java#L2606

 

This seems to be an old issues while the test was added in a recent commit:

[https://github.com/apache/impala/commit/32b29ff36fb3e05fd620a6714de88805052d0117]

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-12895) REFRESH doesn't detect changes in partition locations in ACID tables

2024-03-12 Thread Csaba Ringhofer (Jira)
Csaba Ringhofer created IMPALA-12895:


 Summary: REFRESH doesn't detect changes in partition locations in 
ACID tables
 Key: IMPALA-12895
 URL: https://issues.apache.org/jira/browse/IMPALA-12895
 Project: IMPALA
  Issue Type: Bug
  Components: Catalog
Reporter: Csaba Ringhofer


This was discovered by running test 
metadata.test_event_processing.TestEventProcessing.test_transact_partition_location_change_from_hive
 when flag hms_event_incremental_refresh_transactional_table  is set to false.

[https://github.com/apache/impala/blob/ab6c9467f6347671b971dbce4c640bea032b6ed9/tests/metadata/test_event_processing.py#L164]

 

When hms_event_incremental_refresh_transactional_table  is true (default), the 
alter partition event is processed correctly and the location change is 
detected. But if it is false or event processing is turned off, the change is 
not detected and running REFRESH on the table also doesn't update the location.

The different handling based on the flag seems intentional:

https://github.com/apache/impala/blob/ab6c9467f6347671b971dbce4c640bea032b6ed9/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java#L2606

 

This seems to be an old issues while the test was added in a recent commit:

[https://github.com/apache/impala/commit/32b29ff36fb3e05fd620a6714de88805052d0117]

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-12835) Transactional tables are unsynced when hms_event_incremental_refresh_transactional_table is disabled

2024-03-07 Thread Csaba Ringhofer (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-12835 started by Csaba Ringhofer.

> Transactional tables are unsynced when 
> hms_event_incremental_refresh_transactional_table is disabled
> 
>
> Key: IMPALA-12835
> URL: https://issues.apache.org/jira/browse/IMPALA-12835
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Quanlong Huang
>Assignee: Csaba Ringhofer
>Priority: Critical
>
> There are some test failures when 
> hms_event_incremental_refresh_transactional_table is disabled:
>  * 
> tests/metadata/test_event_processing.py::TestEventProcessing::test_transactional_insert_events
>  * 
> tests/metadata/test_event_processing.py::TestEventProcessing::test_event_based_replication
> I can reproduce the issue locally:
> {noformat}
> $ bin/start-impala-cluster.py 
> --catalogd_args=--hms_event_incremental_refresh_transactional_table=false
> impala-shell> create table txn_tbl (id int, val int) stored as parquet 
> tblproperties 
> ('transactional'='true','transactional_properties'='insert_only');
> impala-shell> describe txn_tbl;  -- make the table loaded in Impala
> hive> insert into txn_tbl values(101, 200);
> impala-shell> select * from txn_tbl; {noformat}
> Impala shows no results until a REFRESH runs on this table.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12835) Transactional tables are unsynced when hms_event_incremental_refresh_transactional_table is disabled

2024-03-07 Thread Csaba Ringhofer (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824490#comment-17824490
 ] 

Csaba Ringhofer commented on IMPALA-12835:
--

https://gerrit.cloudera.org/#/c/21116/

> Transactional tables are unsynced when 
> hms_event_incremental_refresh_transactional_table is disabled
> 
>
> Key: IMPALA-12835
> URL: https://issues.apache.org/jira/browse/IMPALA-12835
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Quanlong Huang
>Assignee: Csaba Ringhofer
>Priority: Critical
>
> There are some test failures when 
> hms_event_incremental_refresh_transactional_table is disabled:
>  * 
> tests/metadata/test_event_processing.py::TestEventProcessing::test_transactional_insert_events
>  * 
> tests/metadata/test_event_processing.py::TestEventProcessing::test_event_based_replication
> I can reproduce the issue locally:
> {noformat}
> $ bin/start-impala-cluster.py 
> --catalogd_args=--hms_event_incremental_refresh_transactional_table=false
> impala-shell> create table txn_tbl (id int, val int) stored as parquet 
> tblproperties 
> ('transactional'='true','transactional_properties'='insert_only');
> impala-shell> describe txn_tbl;  -- make the table loaded in Impala
> hive> insert into txn_tbl values(101, 200);
> impala-shell> select * from txn_tbl; {noformat}
> Impala shows no results until a REFRESH runs on this table.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Closed] (IMPALA-12812) Send reload event after ALTER TABLE RECOVER PARTITIONS

2024-03-01 Thread Csaba Ringhofer (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer closed IMPALA-12812.

Resolution: Invalid

> Send reload event after ALTER TABLE RECOVER PARTITIONS
> --
>
> Key: IMPALA-12812
> URL: https://issues.apache.org/jira/browse/IMPALA-12812
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Csaba Ringhofer
>Priority: Major
>
> IMPALA-11808 added support for sending reload events after REFRESH to allow 
> other Impala cluster connecting to the same HMS to also reload their tables. 
> REFRESH is often used when in external tables the files are written directly 
> to filesystem without notifying HMS, so Impala needs to update its cache and 
> can't rely on HMS notifications.
> The same could be useful for ALTER TABLE RECOVER PARTITIONS.  -It detects 
> partition directories that were only created in the FS but not in HMS and 
> creates them in HMS too.-  - UPDATE: the previous sentence was not true with 
> current Impala.  It also reloads the table (similarly to other DDLs) and 
> detects new files in existing partitions.
> An HMS event is created for the new partitions but there is no event that 
> would indicate that there are new files in existing partitions. As ALTER 
> TABLE RECOVER PARTITIONS is called when the user expects changes in the 
> filesystem (similarly to REFRESH), it could be useful to send a reload event 
> after it is finished.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12812) Send reload event after ALTER TABLE RECOVER PARTITIONS

2024-03-01 Thread Csaba Ringhofer (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer updated IMPALA-12812:
-
Description: 
IMPALA-11808 added support for sending reload events after REFRESH to allow 
other Impala cluster connecting to the same HMS to also reload their tables. 
REFRESH is often used when in external tables the files are written directly to 
filesystem without notifying HMS, so Impala needs to update its cache and can't 
rely on HMS notifications.

The same could be useful for ALTER TABLE RECOVER PARTITIONS.  -It detects 
partition directories that were only created in the FS but not in HMS and 
creates them in HMS too.-  - UPDATE: the previous sentence was not true with 
current Impala.  It also reloads the table (similarly to other DDLs) and 
detects new files in existing partitions.

An HMS event is created for the new partitions but there is no event that would 
indicate that there are new files in existing partitions. As ALTER TABLE 
RECOVER PARTITIONS is called when the user expects changes in the filesystem 
(similarly to REFRESH), it could be useful to send a reload event after it is 
finished.

  was:
IMPALA-11808 added support for sending reload events after REFRESH to allow 
other Impala cluster connecting to the same HMS to also reload their tables. 
REFRESH is often used when in external tables the files are written directly to 
filesystem without notifying HMS, so Impala needs to update its cache and can't 
rely on HMS notifications.

The same could be useful for ALTER TABLE RECOVER PARTITIONS.  {-}It detects 
partition directories that were only created in the FS but not in HMS and 
creates them in HMS too. I{-}t also reloads the table (similarly to other DDLs) 
and detects new files in existing partitions. - UPDATE: the previous sentence 
was not true with current Impala.

An HMS event is created for the new partitions but there is no event that would 
indicate that there are new files in existing partitions. As ALTER TABLE 
RECOVER PARTITIONS is called when the user expects changes in the filesystem 
(similarly to REFRESH), it could be useful to send a reload event after it is 
finished.


> Send reload event after ALTER TABLE RECOVER PARTITIONS
> --
>
> Key: IMPALA-12812
> URL: https://issues.apache.org/jira/browse/IMPALA-12812
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Csaba Ringhofer
>Priority: Major
>
> IMPALA-11808 added support for sending reload events after REFRESH to allow 
> other Impala cluster connecting to the same HMS to also reload their tables. 
> REFRESH is often used when in external tables the files are written directly 
> to filesystem without notifying HMS, so Impala needs to update its cache and 
> can't rely on HMS notifications.
> The same could be useful for ALTER TABLE RECOVER PARTITIONS.  -It detects 
> partition directories that were only created in the FS but not in HMS and 
> creates them in HMS too.-  - UPDATE: the previous sentence was not true with 
> current Impala.  It also reloads the table (similarly to other DDLs) and 
> detects new files in existing partitions.
> An HMS event is created for the new partitions but there is no event that 
> would indicate that there are new files in existing partitions. As ALTER 
> TABLE RECOVER PARTITIONS is called when the user expects changes in the 
> filesystem (similarly to REFRESH), it could be useful to send a reload event 
> after it is finished.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Closed] (IMPALA-12812) Send reload event after ALTER TABLE RECOVER PARTITIONS

2024-03-01 Thread Csaba Ringhofer (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer closed IMPALA-12812.

Resolution: Invalid

> Send reload event after ALTER TABLE RECOVER PARTITIONS
> --
>
> Key: IMPALA-12812
> URL: https://issues.apache.org/jira/browse/IMPALA-12812
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Csaba Ringhofer
>Priority: Major
>
> IMPALA-11808 added support for sending reload events after REFRESH to allow 
> other Impala cluster connecting to the same HMS to also reload their tables. 
> REFRESH is often used when in external tables the files are written directly 
> to filesystem without notifying HMS, so Impala needs to update its cache and 
> can't rely on HMS notifications.
> The same could be useful for ALTER TABLE RECOVER PARTITIONS.  -It detects 
> partition directories that were only created in the FS but not in HMS and 
> creates them in HMS too.-  - UPDATE: the previous sentence was not true with 
> current Impala.  It also reloads the table (similarly to other DDLs) and 
> detects new files in existing partitions.
> An HMS event is created for the new partitions but there is no event that 
> would indicate that there are new files in existing partitions. As ALTER 
> TABLE RECOVER PARTITIONS is called when the user expects changes in the 
> filesystem (similarly to REFRESH), it could be useful to send a reload event 
> after it is finished.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IMPALA-12812) Send reload event after ALTER TABLE RECOVER PARTITIONS

2024-03-01 Thread Csaba Ringhofer (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer updated IMPALA-12812:
-
Description: 
IMPALA-11808 added support for sending reload events after REFRESH to allow 
other Impala cluster connecting to the same HMS to also reload their tables. 
REFRESH is often used when in external tables the files are written directly to 
filesystem without notifying HMS, so Impala needs to update its cache and can't 
rely on HMS notifications.

The same could be useful for ALTER TABLE RECOVER PARTITIONS.  {-}It detects 
partition directories that were only created in the FS but not in HMS and 
creates them in HMS too. I{-}t also reloads the table (similarly to other DDLs) 
and detects new files in existing partitions. - UPDATE: the previous sentence 
was not true with current Impala.

An HMS event is created for the new partitions but there is no event that would 
indicate that there are new files in existing partitions. As ALTER TABLE 
RECOVER PARTITIONS is called when the user expects changes in the filesystem 
(similarly to REFRESH), it could be useful to send a reload event after it is 
finished.

  was:
IMPALA-11808 added support for sending reload events after REFRESH to allow 
other Impala cluster connecting to the same HMS to also reload their tables. 
REFRESH is often used when in external tables the files are written directly to 
filesystem without notifying HMS, so Impala needs to update its cache and can't 
rely on HMS notifications.

The same could be useful for ALTER TABLE RECOVER PARTITIONS. {-}- It detects 
partition directories that were only created in the FS but not in HMS and 
creates them in HMS too.-{-}It also reloads the table (similarly to other DDLs) 
and detects new files in existing partitions. - UPDATE: the previous sentence 
was not true with current Impala.

An HMS event is created for the new partitions but there is no event that would 
indicate that there are new files in existing partitions. As ALTER TABLE 
RECOVER PARTITIONS is called when the user expects changes in the filesystem 
(similarly to REFRESH), it could be useful to send a reload event after it is 
finished.


> Send reload event after ALTER TABLE RECOVER PARTITIONS
> --
>
> Key: IMPALA-12812
> URL: https://issues.apache.org/jira/browse/IMPALA-12812
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Csaba Ringhofer
>Priority: Major
>
> IMPALA-11808 added support for sending reload events after REFRESH to allow 
> other Impala cluster connecting to the same HMS to also reload their tables. 
> REFRESH is often used when in external tables the files are written directly 
> to filesystem without notifying HMS, so Impala needs to update its cache and 
> can't rely on HMS notifications.
> The same could be useful for ALTER TABLE RECOVER PARTITIONS.  {-}It detects 
> partition directories that were only created in the FS but not in HMS and 
> creates them in HMS too. I{-}t also reloads the table (similarly to other 
> DDLs) and detects new files in existing partitions. - UPDATE: the previous 
> sentence was not true with current Impala.
> An HMS event is created for the new partitions but there is no event that 
> would indicate that there are new files in existing partitions. As ALTER 
> TABLE RECOVER PARTITIONS is called when the user expects changes in the 
> filesystem (similarly to REFRESH), it could be useful to send a reload event 
> after it is finished.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12812) Send reload event after ALTER TABLE RECOVER PARTITIONS

2024-03-01 Thread Csaba Ringhofer (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer updated IMPALA-12812:
-
Description: 
IMPALA-11808 added support for sending reload events after REFRESH to allow 
other Impala cluster connecting to the same HMS to also reload their tables. 
REFRESH is often used when in external tables the files are written directly to 
filesystem without notifying HMS, so Impala needs to update its cache and can't 
rely on HMS notifications.

The same could be useful for ALTER TABLE RECOVER PARTITIONS. {-}- It detects 
partition directories that were only created in the FS but not in HMS and 
creates them in HMS too.-{-}It also reloads the table (similarly to other DDLs) 
and detects new files in existing partitions. - UPDATE: the previous sentence 
was not true with current Impala.

An HMS event is created for the new partitions but there is no event that would 
indicate that there are new files in existing partitions. As ALTER TABLE 
RECOVER PARTITIONS is called when the user expects changes in the filesystem 
(similarly to REFRESH), it could be useful to send a reload event after it is 
finished.

  was:
IMPALA-11808 added support for sending reload events after REFRESH to allow 
other Impala cluster connecting to the same HMS to also reload their tables. 
REFRESH is often used when in external tables the files are written directly  
to filesystem without notifying HMS, so Impala needs to update its cache and 
can't rely on HMS notifications.

The same could be useful for ALTER TABLE RECOVER PARTITIONS. It detects 
partition directories that were only created in the FS but not in HMS and 
creates them in HMS too.- It also reloads the table (similarly to other DDLs) 
and detects new files in existing partitions. - UPDATE: the previous sentence 
was not true with current Impala. 

An HMS event is created for the new partitions but there is no event that would 
indicate that there are new files in existing partitions. As ALTER TABLE 
RECOVER PARTITIONS is called when the user expects changes in the filesystem 
(similarly to REFRESH), it could be useful to send a reload event after it is 
finished.


> Send reload event after ALTER TABLE RECOVER PARTITIONS
> --
>
> Key: IMPALA-12812
> URL: https://issues.apache.org/jira/browse/IMPALA-12812
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Csaba Ringhofer
>Priority: Major
>
> IMPALA-11808 added support for sending reload events after REFRESH to allow 
> other Impala cluster connecting to the same HMS to also reload their tables. 
> REFRESH is often used when in external tables the files are written directly 
> to filesystem without notifying HMS, so Impala needs to update its cache and 
> can't rely on HMS notifications.
> The same could be useful for ALTER TABLE RECOVER PARTITIONS. {-}- It detects 
> partition directories that were only created in the FS but not in HMS and 
> creates them in HMS too.-{-}It also reloads the table (similarly to other 
> DDLs) and detects new files in existing partitions. - UPDATE: the previous 
> sentence was not true with current Impala.
> An HMS event is created for the new partitions but there is no event that 
> would indicate that there are new files in existing partitions. As ALTER 
> TABLE RECOVER PARTITIONS is called when the user expects changes in the 
> filesystem (similarly to REFRESH), it could be useful to send a reload event 
> after it is finished.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12812) Send reload event after ALTER TABLE RECOVER PARTITIONS

2024-03-01 Thread Csaba Ringhofer (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer updated IMPALA-12812:
-
Description: 
IMPALA-11808 added support for sending reload events after REFRESH to allow 
other Impala cluster connecting to the same HMS to also reload their tables. 
REFRESH is often used when in external tables the files are written directly  
to filesystem without notifying HMS, so Impala needs to update its cache and 
can't rely on HMS notifications.

The same could be useful for ALTER TABLE RECOVER PARTITIONS. It detects 
partition directories that were only created in the FS but not in HMS and 
creates them in HMS too.- It also reloads the table (similarly to other DDLs) 
and detects new files in existing partitions. - UPDATE: the previous sentence 
was not true with current Impala. 

An HMS event is created for the new partitions but there is no event that would 
indicate that there are new files in existing partitions. As ALTER TABLE 
RECOVER PARTITIONS is called when the user expects changes in the filesystem 
(similarly to REFRESH), it could be useful to send a reload event after it is 
finished.

  was:
IMPALA-11808 added support for sending reload events after REFRESH to allow 
other Impala cluster connecting to the same HMS to also reload their tables. 
REFRESH is often used when in external tables the files are written directly  
to filesystem without notifying HMS, so Impala needs to update its cache and 
can't rely on HMS notifications.

The same could be useful for ALTER TABLE RECOVER PARTITIONS. It detects 
partition directories that were only created in the FS but not in HMS and 
creates them in HMS too. It also reloads the table (similarly to other DDLs) 
and detects new files in existing partitions. An HMS event is created for the 
new partitions but there is no event that would indicate that there are new 
files in existing partitions. As ALTER TABLE RECOVER PARTITIONS is called when 
the user expects changes in the filesystem (similarly to REFRESH), it could be 
useful to send a reload event after it is finished.


> Send reload event after ALTER TABLE RECOVER PARTITIONS
> --
>
> Key: IMPALA-12812
> URL: https://issues.apache.org/jira/browse/IMPALA-12812
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Csaba Ringhofer
>Priority: Major
>
> IMPALA-11808 added support for sending reload events after REFRESH to allow 
> other Impala cluster connecting to the same HMS to also reload their tables. 
> REFRESH is often used when in external tables the files are written directly  
> to filesystem without notifying HMS, so Impala needs to update its cache and 
> can't rely on HMS notifications.
> The same could be useful for ALTER TABLE RECOVER PARTITIONS. It detects 
> partition directories that were only created in the FS but not in HMS and 
> creates them in HMS too.- It also reloads the table (similarly to other DDLs) 
> and detects new files in existing partitions. - UPDATE: the previous sentence 
> was not true with current Impala. 
> An HMS event is created for the new partitions but there is no event that 
> would indicate that there are new files in existing partitions. As ALTER 
> TABLE RECOVER PARTITIONS is called when the user expects changes in the 
> filesystem (similarly to REFRESH), it could be useful to send a reload event 
> after it is finished.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12812) Send reload event after ALTER TABLE RECOVER PARTITIONS

2024-03-01 Thread Csaba Ringhofer (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17822631#comment-17822631
 ] 

Csaba Ringhofer commented on IMPALA-12812:
--

I was wrong about this one:
"An HMS event is created for the new partitions but there is no event that 
would indicate that there are new files in existing partitions. "
At the moment no refresh is done on partitions that already exist in HMS.
A valid workaround is to call both REFRESH after ALTER TABLE RECOVER PARTITIONS 
 - REFRESH will both detect new files and send the reload event.
Closing the issue as it wouldn't be that useful.


> Send reload event after ALTER TABLE RECOVER PARTITIONS
> --
>
> Key: IMPALA-12812
> URL: https://issues.apache.org/jira/browse/IMPALA-12812
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Csaba Ringhofer
>Priority: Major
>
> IMPALA-11808 added support for sending reload events after REFRESH to allow 
> other Impala cluster connecting to the same HMS to also reload their tables. 
> REFRESH is often used when in external tables the files are written directly  
> to filesystem without notifying HMS, so Impala needs to update its cache and 
> can't rely on HMS notifications.
> The same could be useful for ALTER TABLE RECOVER PARTITIONS. It detects 
> partition directories that were only created in the FS but not in HMS and 
> creates them in HMS too. It also reloads the table (similarly to other DDLs) 
> and detects new files in existing partitions. An HMS event is created for the 
> new partitions but there is no event that would indicate that there are new 
> files in existing partitions. As ALTER TABLE RECOVER PARTITIONS is called 
> when the user expects changes in the filesystem (similarly to REFRESH), it 
> could be useful to send a reload event after it is finished.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-12812) Send reload event after ALTER TABLE RECOVER PARTITIONS

2024-03-01 Thread Csaba Ringhofer (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17822631#comment-17822631
 ] 

Csaba Ringhofer edited comment on IMPALA-12812 at 3/1/24 4:14 PM:
--

I was wrong about this one:
" It also reloads the table (similarly to other DDLs) and detects new files in 
existing partitions. "
At the moment no refresh is done on partitions that already exist in HMS.
A valid workaround is to call both REFRESH after ALTER TABLE RECOVER PARTITIONS 
 - REFRESH will both detect new files and send the reload event.
Closing the issue as it wouldn't be that useful.



was (Author: csringhofer):
I was wrong about this one:
"An HMS event is created for the new partitions but there is no event that 
would indicate that there are new files in existing partitions. "
At the moment no refresh is done on partitions that already exist in HMS.
A valid workaround is to call both REFRESH after ALTER TABLE RECOVER PARTITIONS 
 - REFRESH will both detect new files and send the reload event.
Closing the issue as it wouldn't be that useful.


> Send reload event after ALTER TABLE RECOVER PARTITIONS
> --
>
> Key: IMPALA-12812
> URL: https://issues.apache.org/jira/browse/IMPALA-12812
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Csaba Ringhofer
>Priority: Major
>
> IMPALA-11808 added support for sending reload events after REFRESH to allow 
> other Impala cluster connecting to the same HMS to also reload their tables. 
> REFRESH is often used when in external tables the files are written directly  
> to filesystem without notifying HMS, so Impala needs to update its cache and 
> can't rely on HMS notifications.
> The same could be useful for ALTER TABLE RECOVER PARTITIONS. It detects 
> partition directories that were only created in the FS but not in HMS and 
> creates them in HMS too. It also reloads the table (similarly to other DDLs) 
> and detects new files in existing partitions. An HMS event is created for the 
> new partitions but there is no event that would indicate that there are new 
> files in existing partitions. As ALTER TABLE RECOVER PARTITIONS is called 
> when the user expects changes in the filesystem (similarly to REFRESH), it 
> could be useful to send a reload event after it is finished.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-12835) Transactional tables are unsynced when hms_event_incremental_refresh_transactional_table is disabled

2024-02-22 Thread Csaba Ringhofer (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer reassigned IMPALA-12835:


Assignee: Csaba Ringhofer

> Transactional tables are unsynced when 
> hms_event_incremental_refresh_transactional_table is disabled
> 
>
> Key: IMPALA-12835
> URL: https://issues.apache.org/jira/browse/IMPALA-12835
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Quanlong Huang
>Assignee: Csaba Ringhofer
>Priority: Critical
>
> There are some test failures when 
> hms_event_incremental_refresh_transactional_table is disabled:
>  * 
> tests/metadata/test_event_processing.py::TestEventProcessing::test_transactional_insert_events
>  * 
> tests/metadata/test_event_processing.py::TestEventProcessing::test_event_based_replication
> I can reproduce the issue locally:
> {noformat}
> $ bin/start-impala-cluster.py 
> --catalogd_args=--hms_event_incremental_refresh_transactional_table=false
> impala-shell> create table txn_tbl (id int, val int) stored as parquet 
> tblproperties 
> ('transactional'='true','transactional_properties'='insert_only');
> impala-shell> describe txn_tbl;  -- make the table loaded in Impala
> hive> insert into txn_tbl values(101, 200);
> impala-shell> select * from txn_tbl; {noformat}
> Impala shows no results until a REFRESH runs on this table.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12835) Transactional tables are unsynced when hms_event_incremental_refresh_transactional_table is disabled

2024-02-22 Thread Csaba Ringhofer (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819713#comment-17819713
 ] 

Csaba Ringhofer commented on IMPALA-12835:
--

I think that what actually broke this is IMPALA-11534
Without hms_event_incremental_refresh_transactional_table the only event 
catalogd processes during an INSERT to an unpartitioned ACID table is the 
ALTER_TABLE event -  since IMPALA-11534 most ALTER_TABLE events do not lead to 
reloading file medatata, so while HMS metadata will be reloaded, the file 
listing won't be refreshed (even though the validWriteIdList is refreshed)

Note that this issue only occurs with unpartitioned tables, partitioned tables 
are refreshed correctly when processing the ALTER_PARTITION events.

> Transactional tables are unsynced when 
> hms_event_incremental_refresh_transactional_table is disabled
> 
>
> Key: IMPALA-12835
> URL: https://issues.apache.org/jira/browse/IMPALA-12835
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Quanlong Huang
>Priority: Critical
>
> There are some test failures when 
> hms_event_incremental_refresh_transactional_table is disabled:
>  * 
> tests/metadata/test_event_processing.py::TestEventProcessing::test_transactional_insert_events
>  * 
> tests/metadata/test_event_processing.py::TestEventProcessing::test_event_based_replication
> I can reproduce the issue locally:
> {noformat}
> $ bin/start-impala-cluster.py 
> --catalogd_args=--hms_event_incremental_refresh_transactional_table=false
> impala-shell> create table txn_tbl (id int, val int) stored as parquet 
> tblproperties 
> ('transactional'='true','transactional_properties'='insert_only');
> impala-shell> describe txn_tbl;  -- make the table loaded in Impala
> hive> insert into txn_tbl values(101, 200);
> impala-shell> select * from txn_tbl; {noformat}
> Impala shows no results until a REFRESH runs on this table.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12827) Precondition was hit in MutableValidReaderWriteIdList

2024-02-21 Thread Csaba Ringhofer (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer updated IMPALA-12827:
-
Description: 
The callstack below led to stopping metastore event processor during an abort 
transaction event:
{code}
MetastoreEventsProcessor.java:899] Unexpected exception received while 
processing event
Java exception follows:
java.lang.IllegalStateException
at 
com.google.common.base.Preconditions.checkState(Preconditions.java:486)
at 
org.apache.impala.hive.common.MutableValidReaderWriteIdList.addAbortedWriteIds(MutableValidReaderWriteIdList.java:274)
at org.apache.impala.catalog.HdfsTable.addWriteIds(HdfsTable.java:3101)
at 
org.apache.impala.catalog.CatalogServiceCatalog.addWriteIdsToTable(CatalogServiceCatalog.java:3885)
at 
org.apache.impala.catalog.events.MetastoreEvents$AbortTxnEvent.addAbortedWriteIdsToTables(MetastoreEvents.java:2775)
at 
org.apache.impala.catalog.events.MetastoreEvents$AbortTxnEvent.process(MetastoreEvents.java:2761)
at 
org.apache.impala.catalog.events.MetastoreEvents$MetastoreEvent.processIfEnabled(MetastoreEvents.java:522)
at 
org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:1052)
at 
org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:881)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
{code}

Precondition: 
https://github.com/apache/impala/blob/2f14fd29c0b47fc2c170a7f0eb1cecaf6b9704f4/fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java#L274

I was not able to reproduce this so far.



  was:
The callstack below led to stopping metastore event processor during an abort 
transaction event:
{code}
MetastoreEventsProcessor.java:899] Unexpected exception received while 
processing event
Java exception follows:
java.lang.IllegalStateException
at 
com.google.common.base.Preconditions.checkState(Preconditions.java:486)
at 
org.apache.impala.hive.common.MutableValidReaderWriteIdList.addAbortedWriteIds(MutableValidReaderWriteIdList.java:274)
at org.apache.impala.catalog.HdfsTable.addWriteIds(HdfsTable.java:3101)
at 
org.apache.impala.catalog.CatalogServiceCatalog.addWriteIdsToTable(CatalogServiceCatalog.java:3885)
at 
org.apache.impala.catalog.events.MetastoreEvents$AbortTxnEvent.addAbortedWriteIdsToTables(MetastoreEvents.java:2775)
at 
org.apache.impala.catalog.events.MetastoreEvents$AbortTxnEvent.process(MetastoreEvents.java:2761)
at 
org.apache.impala.catalog.events.MetastoreEvents$MetastoreEvent.processIfEnabled(MetastoreEvents.java:522)
at 
org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:1052)
at 
org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:881)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
{code}

Precondition: 
https://github.com/apache/impala/blob/2f14fd29c0b47fc2c170a7f0eb1cecaf6b9704f4/fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java#L274

I was not able to reproduce this yet.




> Precondition was hit in MutableValidReaderWriteIdList
> -
>
> Key: IMPALA-12827
> URL: https://issues.apache.org/jira/browse/IMPALA-12827
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Csaba Ringhofer
>Priority: Major
>  Labels: ACID, catalog
>
> The callstack below led to stopping metastore event processor during an abort 
> transaction event:
> {code}
> 

[jira] [Updated] (IMPALA-12827) Precondition was hit in MutableValidReaderWriteIdList

2024-02-21 Thread Csaba Ringhofer (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer updated IMPALA-12827:
-
Labels: catalog  (was: )

> Precondition was hit in MutableValidReaderWriteIdList
> -
>
> Key: IMPALA-12827
> URL: https://issues.apache.org/jira/browse/IMPALA-12827
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Csaba Ringhofer
>Priority: Major
>  Labels: catalog
>
> The callstack below led to stopping metastore event processor during an abort 
> transaction event:
> {code}
> MetastoreEventsProcessor.java:899] Unexpected exception received while 
> processing event
> Java exception follows:
> java.lang.IllegalStateException
>   at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:486)
>   at 
> org.apache.impala.hive.common.MutableValidReaderWriteIdList.addAbortedWriteIds(MutableValidReaderWriteIdList.java:274)
>   at org.apache.impala.catalog.HdfsTable.addWriteIds(HdfsTable.java:3101)
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.addWriteIdsToTable(CatalogServiceCatalog.java:3885)
>   at 
> org.apache.impala.catalog.events.MetastoreEvents$AbortTxnEvent.addAbortedWriteIdsToTables(MetastoreEvents.java:2775)
>   at 
> org.apache.impala.catalog.events.MetastoreEvents$AbortTxnEvent.process(MetastoreEvents.java:2761)
>   at 
> org.apache.impala.catalog.events.MetastoreEvents$MetastoreEvent.processIfEnabled(MetastoreEvents.java:522)
>   at 
> org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:1052)
>   at 
> org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:881)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:750)
> {code}
> Precondition: 
> https://github.com/apache/impala/blob/2f14fd29c0b47fc2c170a7f0eb1cecaf6b9704f4/fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java#L274
> I was not able to reproduce this yet.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12827) Precondition was hit in MutableValidReaderWriteIdList

2024-02-21 Thread Csaba Ringhofer (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer updated IMPALA-12827:
-
Labels: ACID catalog  (was: catalog)

> Precondition was hit in MutableValidReaderWriteIdList
> -
>
> Key: IMPALA-12827
> URL: https://issues.apache.org/jira/browse/IMPALA-12827
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Csaba Ringhofer
>Priority: Major
>  Labels: ACID, catalog
>
> The callstack below led to stopping metastore event processor during an abort 
> transaction event:
> {code}
> MetastoreEventsProcessor.java:899] Unexpected exception received while 
> processing event
> Java exception follows:
> java.lang.IllegalStateException
>   at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:486)
>   at 
> org.apache.impala.hive.common.MutableValidReaderWriteIdList.addAbortedWriteIds(MutableValidReaderWriteIdList.java:274)
>   at org.apache.impala.catalog.HdfsTable.addWriteIds(HdfsTable.java:3101)
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.addWriteIdsToTable(CatalogServiceCatalog.java:3885)
>   at 
> org.apache.impala.catalog.events.MetastoreEvents$AbortTxnEvent.addAbortedWriteIdsToTables(MetastoreEvents.java:2775)
>   at 
> org.apache.impala.catalog.events.MetastoreEvents$AbortTxnEvent.process(MetastoreEvents.java:2761)
>   at 
> org.apache.impala.catalog.events.MetastoreEvents$MetastoreEvent.processIfEnabled(MetastoreEvents.java:522)
>   at 
> org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:1052)
>   at 
> org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:881)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:750)
> {code}
> Precondition: 
> https://github.com/apache/impala/blob/2f14fd29c0b47fc2c170a7f0eb1cecaf6b9704f4/fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java#L274
> I was not able to reproduce this yet.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12827) Precondition was hit in MutableValidReaderWriteIdList

2024-02-21 Thread Csaba Ringhofer (Jira)
Csaba Ringhofer created IMPALA-12827:


 Summary: Precondition was hit in MutableValidReaderWriteIdList
 Key: IMPALA-12827
 URL: https://issues.apache.org/jira/browse/IMPALA-12827
 Project: IMPALA
  Issue Type: Bug
Reporter: Csaba Ringhofer


The callstack below led to stopping metastore event processor during an abort 
transaction event:
{code}
MetastoreEventsProcessor.java:899] Unexpected exception received while 
processing event
Java exception follows:
java.lang.IllegalStateException
at 
com.google.common.base.Preconditions.checkState(Preconditions.java:486)
at 
org.apache.impala.hive.common.MutableValidReaderWriteIdList.addAbortedWriteIds(MutableValidReaderWriteIdList.java:274)
at org.apache.impala.catalog.HdfsTable.addWriteIds(HdfsTable.java:3101)
at 
org.apache.impala.catalog.CatalogServiceCatalog.addWriteIdsToTable(CatalogServiceCatalog.java:3885)
at 
org.apache.impala.catalog.events.MetastoreEvents$AbortTxnEvent.addAbortedWriteIdsToTables(MetastoreEvents.java:2775)
at 
org.apache.impala.catalog.events.MetastoreEvents$AbortTxnEvent.process(MetastoreEvents.java:2761)
at 
org.apache.impala.catalog.events.MetastoreEvents$MetastoreEvent.processIfEnabled(MetastoreEvents.java:522)
at 
org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:1052)
at 
org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:881)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
{code}

Precondition: 
https://github.com/apache/impala/blob/2f14fd29c0b47fc2c170a7f0eb1cecaf6b9704f4/fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java#L274

I was not able to reproduce this yet.





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12827) Precondition was hit in MutableValidReaderWriteIdList

2024-02-21 Thread Csaba Ringhofer (Jira)
Csaba Ringhofer created IMPALA-12827:


 Summary: Precondition was hit in MutableValidReaderWriteIdList
 Key: IMPALA-12827
 URL: https://issues.apache.org/jira/browse/IMPALA-12827
 Project: IMPALA
  Issue Type: Bug
Reporter: Csaba Ringhofer


The callstack below led to stopping metastore event processor during an abort 
transaction event:
{code}
MetastoreEventsProcessor.java:899] Unexpected exception received while 
processing event
Java exception follows:
java.lang.IllegalStateException
at 
com.google.common.base.Preconditions.checkState(Preconditions.java:486)
at 
org.apache.impala.hive.common.MutableValidReaderWriteIdList.addAbortedWriteIds(MutableValidReaderWriteIdList.java:274)
at org.apache.impala.catalog.HdfsTable.addWriteIds(HdfsTable.java:3101)
at 
org.apache.impala.catalog.CatalogServiceCatalog.addWriteIdsToTable(CatalogServiceCatalog.java:3885)
at 
org.apache.impala.catalog.events.MetastoreEvents$AbortTxnEvent.addAbortedWriteIdsToTables(MetastoreEvents.java:2775)
at 
org.apache.impala.catalog.events.MetastoreEvents$AbortTxnEvent.process(MetastoreEvents.java:2761)
at 
org.apache.impala.catalog.events.MetastoreEvents$MetastoreEvent.processIfEnabled(MetastoreEvents.java:522)
at 
org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:1052)
at 
org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:881)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
{code}

Precondition: 
https://github.com/apache/impala/blob/2f14fd29c0b47fc2c170a7f0eb1cecaf6b9704f4/fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java#L274

I was not able to reproduce this yet.





--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-12812) Send reload event after ALTER TABLE RECOVER PARTITIONS

2024-02-13 Thread Csaba Ringhofer (Jira)
Csaba Ringhofer created IMPALA-12812:


 Summary: Send reload event after ALTER TABLE RECOVER PARTITIONS
 Key: IMPALA-12812
 URL: https://issues.apache.org/jira/browse/IMPALA-12812
 Project: IMPALA
  Issue Type: Improvement
Reporter: Csaba Ringhofer


IMPALA-11808 added support for sending reload events after REFRESH to allow 
other Impala cluster connecting to the same HMS to also reload their tables. 
REFRESH is often used when in external tables the files are written directly  
to filesystem without notifying HMS, so Impala needs to update its cache and 
can't rely on HMS notifications.

The same could be useful for ALTER TABLE RECOVER PARTITIONS. It detects 
partition directories that were only created in the FS but not in HMS and 
creates them in HMS too. It also reloads the table (similarly to other DDLs) 
and detects new files in existing partitions. An HMS event is created for the 
new partitions but there is no event that would indicate that there are new 
files in existing partitions. As ALTER TABLE RECOVER PARTITIONS is called when 
the user expects changes in the filesystem (similarly to REFRESH), it could be 
useful to send a reload event after it is finished.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12812) Send reload event after ALTER TABLE RECOVER PARTITIONS

2024-02-13 Thread Csaba Ringhofer (Jira)
Csaba Ringhofer created IMPALA-12812:


 Summary: Send reload event after ALTER TABLE RECOVER PARTITIONS
 Key: IMPALA-12812
 URL: https://issues.apache.org/jira/browse/IMPALA-12812
 Project: IMPALA
  Issue Type: Improvement
Reporter: Csaba Ringhofer


IMPALA-11808 added support for sending reload events after REFRESH to allow 
other Impala cluster connecting to the same HMS to also reload their tables. 
REFRESH is often used when in external tables the files are written directly  
to filesystem without notifying HMS, so Impala needs to update its cache and 
can't rely on HMS notifications.

The same could be useful for ALTER TABLE RECOVER PARTITIONS. It detects 
partition directories that were only created in the FS but not in HMS and 
creates them in HMS too. It also reloads the table (similarly to other DDLs) 
and detects new files in existing partitions. An HMS event is created for the 
new partitions but there is no event that would indicate that there are new 
files in existing partitions. As ALTER TABLE RECOVER PARTITIONS is called when 
the user expects changes in the filesystem (similarly to REFRESH), it could be 
useful to send a reload event after it is finished.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IMPALA-12543) test_iceberg_self_events failed in JDK11 build

2024-02-10 Thread Csaba Ringhofer (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816289#comment-17816289
 ] 

Csaba Ringhofer commented on IMPALA-12543:
--

[~stigahuang]
Do you think that this can cause correctness issues, or this should only lead 
to unnecessary table reloading + failed tests?

If I understand correctly what happens is:
1. alter table starts in CatalogOpExecutor
2. table level lock is taken
3. HMS RPC starts (CatalogOpExecutor.applyAlterTable())
4. HMS generates the event
5. HMS RPC returns
6. table is reloaded
7. catalog version is added to inflight event list
8. table table lock is releases

Meanwhile the event processor thread fetches the new event after 4 and before 
7, and because of  IMPALA-12461 (part 1), it can also finish self event 
checking before reaching 7. Before IMPALA-12461 it would have needed to wait 
for 8.

Currently adding to inflight event list happens here:
https://github.com/apache/impala/blob/11d2fe4fc00a1e6ef2d3a45825be9845456adc1d/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L1307

Would it be a problem to move this before the HMS RPC, e.g. into 
CatalogOpExecutor.applyAlterTable()?
In case the RPC or table loading fails we could remove the inflight event.


> test_iceberg_self_events failed in JDK11 build
> --
>
> Key: IMPALA-12543
> URL: https://issues.apache.org/jira/browse/IMPALA-12543
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Riza Suminto
>Assignee: Riza Suminto
>Priority: Major
>  Labels: broken-build
> Attachments: catalogd.INFO, std_err.txt
>
>
> test_iceberg_self_events failed in JDK11 build with following error.
>  
> {code:java}
> Error Message
> assert 0 == 1
> Stacktrace
> custom_cluster/test_events_custom_configs.py:637: in test_iceberg_self_events
>     check_self_events("ALTER TABLE {0} ADD COLUMN j INT".format(tbl_name))
> custom_cluster/test_events_custom_configs.py:624: in check_self_events
>     assert tbls_refreshed_before == tbls_refreshed_after
> E   assert 0 == 1 {code}
> This test still pass before IMPALA-11387 merged.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-12455) Create set of disjunct bloom filters for keys in partitioned builds

2024-02-08 Thread Csaba Ringhofer (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17815717#comment-17815717
 ] 

Csaba Ringhofer edited comment on IMPALA-12455 at 2/8/24 3:23 PM:
--

>waiting on receiving EOS signals from all senders below it.
agree

>but the fastest join builder still need to wait for the slowest join builder 
>to complete before it can publish its own bloom filter.
yes, they would still need EOS from right side child before publishing any 
filters

Besides avoiding coordinator aggregation work, I expect bloom filter building 
to be faster because the individual bloom filters would be smaller, so more 
likely to fit into the CPU cache.

A solution to "waiting for all senders to send EOS" could be to build bloom 
filters on the sender side (before exchange node) instead in the hash join 
builder (after exchange node). As individual senders would know earlier that 
they are finished they could send their bloom filter without waiting for the 
slowest one.

This would also help in distributing work in case of broadcast joins, as no 
builder would have to process the whole dataset. On the other side this would 
introduce aggregation work the the broadcast case, which is not necessary at 
the moment.





was (Author: csringhofer):
>waiting on receiving EOS signals from all senders below it.
agree

>but the fastest join builder still need to wait for the slowest join builder 
>to complete before it can publish its own bloom filter.
yes, they would still need EOS from right side child before publishing any 
filters

Besides avoiding coordinator aggregation work, I expect bloom filter building 
to be faster because the individual bloom filters would be smaller, so more 
likely to fit into the CPU cache.

An solution to "waiting for all senders to send EOS" could be to build bloom 
filters on the sender side (before exchange node) instead in the hash join 
builder (after exchange node). As individual senders would know earlier that 
they are finished they could send their bloom filter without waiting for the 
slowest one.

This would also help in distributing work in case of broadcast joins, as no 
builder would have to process the whole dataset. On the other side this would 
introduce aggregation work the the broadcast case, which is not necessary at 
the moment.




> Create set of disjunct bloom filters for keys in partitioned builds
> ---
>
> Key: IMPALA-12455
> URL: https://issues.apache.org/jira/browse/IMPALA-12455
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend, Frontend
>Reporter: Csaba Ringhofer
>Priority: Major
>  Labels: bloom-filter, performance, runtime-filters
>
> Currently Impala aggregates bloom filters from different instances of the 
> join builder by OR-ing them to a final filter. This could be avoided by 
> having num_instances smaller bloom filters and choosing the correct one 
> during lookup by doing the same hashing as used in partitioning. Builders 
> would only need to write a single small filter as they have only keys from a 
> single partition. This would make runtime filter producers faster and much 
> more scalable while shouldn't have major effect on consumers.
> One caveat is that we push down the current bloom filter to Kudu as it is, so 
> this optimization wouldn't be applicable in filters consumed by Kudu scans.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-12455) Create set of disjunct bloom filters for keys in partitioned builds

2024-02-08 Thread Csaba Ringhofer (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17815717#comment-17815717
 ] 

Csaba Ringhofer edited comment on IMPALA-12455 at 2/8/24 3:20 PM:
--

>waiting on receiving EOS signals from all senders below it.
agree

>but the fastest join builder still need to wait for the slowest join builder 
>to complete before it can publish its own bloom filter.
yes, they would still need EOS from right side child before publishing any 
filters

Besides avoiding coordinator aggregation work, I expect bloom filter building 
to be faster because the individual bloom filters would be smaller, so more 
likely to fit into the CPU cache.

An solution to "waiting for all senders to send EOS" could be to build bloom 
filters on the sender side (before exchange node) instead in the hash join 
builder (after exchange node). As individual senders would know earlier that 
they are finished they could send their bloom filter without waiting for the 
slowest one.

This would also help in distributing work in case of broadcast joins, as no 
builder would have to process the whole dataset. On the other side this would 
introduce merging work the the broadcast case, which is not necessary at the 
moment.





was (Author: csringhofer):
>waiting on receiving EOS signals from all senders below it.
agree

>but the fastest join builder still need to wait for the slowest join builder 
>to complete before it can publish its own bloom filter.
yes, they would still need EOS from right side child before publishing any 
filters

Besides avoiding coordinator aggregation work, I expect bloom filter building 
to be faster because the individual bloom filters would be smaller, so more 
likely to fit into the CPU cache.

An alternative solution could be to build bloom filters on the sender side 
(before exchange node) instead in the hash join builder (after exchange node). 
This would make the optimization suggested in this Jira impossible, but would 
help with the issue you raised, as the senders would know earlier that they are 
finished and wouldn't need to wait for all senders to hit EOS before publishing 
bloom filters.


> Create set of disjunct bloom filters for keys in partitioned builds
> ---
>
> Key: IMPALA-12455
> URL: https://issues.apache.org/jira/browse/IMPALA-12455
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend, Frontend
>Reporter: Csaba Ringhofer
>Priority: Major
>  Labels: bloom-filter, performance, runtime-filters
>
> Currently Impala aggregates bloom filters from different instances of the 
> join builder by OR-ing them to a final filter. This could be avoided by 
> having num_instances smaller bloom filters and choosing the correct one 
> during lookup by doing the same hashing as used in partitioning. Builders 
> would only need to write a single small filter as they have only keys from a 
> single partition. This would make runtime filter producers faster and much 
> more scalable while shouldn't have major effect on consumers.
> One caveat is that we push down the current bloom filter to Kudu as it is, so 
> this optimization wouldn't be applicable in filters consumed by Kudu scans.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-12455) Create set of disjunct bloom filters for keys in partitioned builds

2024-02-08 Thread Csaba Ringhofer (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17815717#comment-17815717
 ] 

Csaba Ringhofer edited comment on IMPALA-12455 at 2/8/24 3:21 PM:
--

>waiting on receiving EOS signals from all senders below it.
agree

>but the fastest join builder still need to wait for the slowest join builder 
>to complete before it can publish its own bloom filter.
yes, they would still need EOS from right side child before publishing any 
filters

Besides avoiding coordinator aggregation work, I expect bloom filter building 
to be faster because the individual bloom filters would be smaller, so more 
likely to fit into the CPU cache.

An solution to "waiting for all senders to send EOS" could be to build bloom 
filters on the sender side (before exchange node) instead in the hash join 
builder (after exchange node). As individual senders would know earlier that 
they are finished they could send their bloom filter without waiting for the 
slowest one.

This would also help in distributing work in case of broadcast joins, as no 
builder would have to process the whole dataset. On the other side this would 
introduce aggregation work the the broadcast case, which is not necessary at 
the moment.





was (Author: csringhofer):
>waiting on receiving EOS signals from all senders below it.
agree

>but the fastest join builder still need to wait for the slowest join builder 
>to complete before it can publish its own bloom filter.
yes, they would still need EOS from right side child before publishing any 
filters

Besides avoiding coordinator aggregation work, I expect bloom filter building 
to be faster because the individual bloom filters would be smaller, so more 
likely to fit into the CPU cache.

An solution to "waiting for all senders to send EOS" could be to build bloom 
filters on the sender side (before exchange node) instead in the hash join 
builder (after exchange node). As individual senders would know earlier that 
they are finished they could send their bloom filter without waiting for the 
slowest one.

This would also help in distributing work in case of broadcast joins, as no 
builder would have to process the whole dataset. On the other side this would 
introduce merging work the the broadcast case, which is not necessary at the 
moment.




> Create set of disjunct bloom filters for keys in partitioned builds
> ---
>
> Key: IMPALA-12455
> URL: https://issues.apache.org/jira/browse/IMPALA-12455
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend, Frontend
>Reporter: Csaba Ringhofer
>Priority: Major
>  Labels: bloom-filter, performance, runtime-filters
>
> Currently Impala aggregates bloom filters from different instances of the 
> join builder by OR-ing them to a final filter. This could be avoided by 
> having num_instances smaller bloom filters and choosing the correct one 
> during lookup by doing the same hashing as used in partitioning. Builders 
> would only need to write a single small filter as they have only keys from a 
> single partition. This would make runtime filter producers faster and much 
> more scalable while shouldn't have major effect on consumers.
> One caveat is that we push down the current bloom filter to Kudu as it is, so 
> this optimization wouldn't be applicable in filters consumed by Kudu scans.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12455) Create set of disjunct bloom filters for keys in partitioned builds

2024-02-08 Thread Csaba Ringhofer (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17815717#comment-17815717
 ] 

Csaba Ringhofer commented on IMPALA-12455:
--

>waiting on receiving EOS signals from all senders below it.
agree

>but the fastest join builder still need to wait for the slowest join builder 
>to complete before it can publish its own bloom filter.
yes, they would still need EOS from right side child before publishing any 
filters

Besides avoiding coordinator aggregation work, I expect bloom filter building 
to be faster because the individual bloom filters would be smaller, so more 
likely to fit into the CPU cache.

An alternative solution could be to build bloom filters on the sender side 
(before exchange node) instead in the hash join builder (after exchange node). 
This would make the optimization suggested in this Jira impossible, but would 
help with the issue you raised, as the senders would know earlier that they are 
finished and wouldn't need to wait for all senders to hit EOS before publishing 
bloom filters.


> Create set of disjunct bloom filters for keys in partitioned builds
> ---
>
> Key: IMPALA-12455
> URL: https://issues.apache.org/jira/browse/IMPALA-12455
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend, Frontend
>Reporter: Csaba Ringhofer
>Priority: Major
>  Labels: bloom-filter, performance, runtime-filters
>
> Currently Impala aggregates bloom filters from different instances of the 
> join builder by OR-ing them to a final filter. This could be avoided by 
> having num_instances smaller bloom filters and choosing the correct one 
> during lookup by doing the same hashing as used in partitioning. Builders 
> would only need to write a single small filter as they have only keys from a 
> single partition. This would make runtime filter producers faster and much 
> more scalable while shouldn't have major effect on consumers.
> One caveat is that we push down the current bloom filter to Kudu as it is, so 
> this optimization wouldn't be applicable in filters consumed by Kudu scans.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-12746) Bump jackson-databind version to 2.15

2024-01-26 Thread Csaba Ringhofer (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer resolved IMPALA-12746.
--
Fix Version/s: Impala 4.4.0
   Resolution: Fixed

> Bump jackson-databind version to 2.15
> -
>
> Key: IMPALA-12746
> URL: https://issues.apache.org/jira/browse/IMPALA-12746
> Project: IMPALA
>  Issue Type: Task
>Reporter: Csaba Ringhofer
>Priority: Major
> Fix For: Impala 4.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



  1   2   3   4   5   6   7   8   9   10   >