from:"Csaba Ringhofer \(JIRA\)"

[jira] [Created] (IMPALA-13424) Improve diagnostics for OOM in JVM

2024-10-07 Thread Csaba Ringhofer (Jira)

Csaba Ringhofer created IMPALA-13424:


 Summary: Improve diagnostics for OOM in JVM
 Key: IMPALA-13424
 URL: https://issues.apache.org/jira/browse/IMPALA-13424
 Project: IMPALA
  Issue Type: Improvement
  Components: Catalog, Frontend
Reporter: Csaba Ringhofer


By default impalad/catalogd can survive a failed allocation in Java and throws 
an exception that may abort the given thread instead of crashing the whole 
process. There is already a ticket (IMPALA-1956) about changing the default 
behavior.

Tested locally by allocating till OOM during HdfsTable.load(), which lead to 
throwing OutOfMemoryError in catalogd. This lead to failing to load the table 
and catalogd + coordinator could function properly after this. Error reporting 
for client / in coordinator log seemed ok, but catalogd has no log lines or 
metrics that clearly identify the issue:
client:
{code}
AnalysisException: Failed to load metadata for table: 'functional.alltypestiny'
CAUSED BY: TableLoadingException: Failed to load metadata for table: 
functional.alltypestiny. Running 'invalidate metadata functional.alltypestiny' 
may resolve this problem.
CAUSED BY: OutOfMemoryError: Java heap space
{code}

coordinator log:
{code}
I1007 16:48:41.760563 126482 jni-util.cc:321] 
604b67244b0793c6:32c416c0] org.apache.impala.common.AnalysisException: 
Failed to load metadata for table: 'functional.alltypestiny'
at 
org.apache.impala.analysis.Analyzer.resolveTableRef(Analyzer.java:990)
at org.apache.impala.analysis.FromClause.analyze(FromClause.java:87)
at 
org.apache.impala.analysis.SelectStmt$SelectAnalyzer.analyze(SelectStmt.java:330)
at 
org.apache.impala.analysis.SelectStmt$SelectAnalyzer.access$100(SelectStmt.java:282)
at org.apache.impala.analysis.SelectStmt.analyze(SelectStmt.java:274)
at 
org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:558)
at 
org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:505)
at 
org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2586)
at 
org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2268)
at 
org.apache.impala.service.Frontend.createExecRequest(Frontend.java:2029)
at 
org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175)
Caused by: org.apache.impala.catalog.TableLoadingException: Failed to load 
metadata for table: functional.alltypestiny. Running 'invalidate metadata 
functional.alltypestiny' may resolve this problem.
CAUSED BY: OutOfMemoryError: Java heap space
at 
org.apache.impala.catalog.IncompleteTable.loadFromThrift(IncompleteTable.java:181)
at org.apache.impala.catalog.Table.fromThrift(Table.java:591)
at 
org.apache.impala.catalog.ImpaladCatalog.addTable(ImpaladCatalog.java:489)
at 
org.apache.impala.catalog.ImpaladCatalog.addCatalogObject(ImpaladCatalog.java:344)
at 
org.apache.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:262)
at 
org.apache.impala.service.FeCatalogManager$CatalogdImpl.updateCatalogCache(FeCatalogManager.java:114)
at 
org.apache.impala.service.Frontend.updateCatalogCache(Frontend.java:590)
at 
org.apache.impala.service.JniFrontend.updateCatalogCache(JniFrontend.java:196)
at .: 
org.apache.impala.catalog.TableLoadingException: Failed to load metadata for 
table: functional.alltypestiny. Running 'invalidate metadata 
functional.alltypestiny' may resolve this problem.
at org.apache.impala.catalog.TableLoader.load(TableLoader.java:168)
at 
org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:251)
at 
org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:247)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.OutOfMemoryError: Java heap space
()
{code}

catalogd log:
{code}
I1007 16:48:33.565677 126490 TableLoader.java:79] Loading metadata for: 
functional.alltypestiny (background load)
...
I1007 16:48:38.755443 126490 TableLoader.java:177] Loaded metadata for: 
functional.alltypestiny (5189ms)
I1007 16:48:38.755504 114205 JvmPauseMonitor.java:209] Detected pause in JVM or 
host machine (eg GC): pause of approximately 1357ms
GC pool 'PS MarkSweep' had collection(s): count=2 time=1702ms
{code}

The oom doesn't lead to any warning/error level logs. Ideally there should be 
an error level log with the name of the table and the callstack of the failed 
allocation.





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---

[jira] [Updated] (IMPALA-12594) KrpcDataStreamSender's mem estimate is different than real usage

2024-09-12 Thread Csaba Ringhofer (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-12594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer updated IMPALA-12594:
-
Description: 
IMPALA-6684 added memory estimates for KrpcDataStreamSender's, but there are 
few gaps between the how the frontend estimates memory and how the backend 
actually allocates it:
The frontend uses the following formula:
buffer_size = num_channels * 2 * (tuple_buffer_length + 
compressed_buffer_length)
This takes account for the serialization and compression buffer for each 
OutboundRowBatch.

This can  both under and over estimate:
1. it doesn't take account of the RowBatch used by channels during partitioned 
exchange to collect rows belonging to a single channel 
https://github.com/apache/impala/blob/4c762725c707f8d150fe250c03faf486008702d4/be/src/runtime/krpc-data-stream-sender.cc#L232

2.it ignores the adjustment to the RowBatch capacity above based on flag 
data_stream_sender_buffer_size 
https://github.com/apache/impala/blob/4c762725c707f8d150fe250c03faf486008702d4/be/src/runtime/krpc-data-stream-sender.cc#L379
This adjustment can both increase or decrease the capacity to have to desired 
total size (16K by defaul).

Note that the adjustment above ignores var len data, so it can massively 
underestimate in some cases. Meanwhile the frontend logic calculates string 
sizes if stats are present. Ideally both logic would be improved and synced to 
use both data_stream_sender_buffer_size and the string sizes for the estimate 
(I am not sure about collection types).

  was:
IMPALA-6684 added memory estimates for KrpcDataStreamSender's, but there are 
few gaps between the how the frontend estimates memory and how the backend 
actually allocates it:
The frontend uses the following formula:
buffer_size = num_channels * 2 * (tuple_buffer_length + 
compressed_buffer_length)
This takes account for the serialization and compression buffer for each 
OutboundRowBatch.

This can  both under and over estimate:
1. it doesn't take account of the RowBatch used by channels during partitioned 
exchange to collact rows belonging to a single channel 
https://github.com/apache/impala/blob/4c762725c707f8d150fe250c03faf486008702d4/be/src/runtime/krpc-data-stream-sender.cc#L232

2.it ignores the adjustment to the RowBatch capacity above based on flag 
data_stream_sender_buffer_size 
https://github.com/apache/impala/blob/4c762725c707f8d150fe250c03faf486008702d4/be/src/runtime/krpc-data-stream-sender.cc#L379
This adjustment can both increase or decrease the capacity to have to desired 
total size (16K by defaul).

Note that the adjustment above ignores var len data, so it can massively 
underestimate in some cases. Meanwhile the frontend logic calculates string 
sizes if stats are present. Ideally both logic would be improved and synced to 
use both data_stream_sender_buffer_size and the string sizes for the estimate 
(I am not sure about collection types).


> KrpcDataStreamSender's mem estimate is different than real usage
> 
>
> Key: IMPALA-12594
> URL: https://issues.apache.org/jira/browse/IMPALA-12594
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend, Frontend
>Reporter: Csaba Ringhofer
>Priority: Major
>
> IMPALA-6684 added memory estimates for KrpcDataStreamSender's, but there are 
> few gaps between the how the frontend estimates memory and how the backend 
> actually allocates it:
> The frontend uses the following formula:
> buffer_size = num_channels * 2 * (tuple_buffer_length + 
> compressed_buffer_length)
> This takes account for the serialization and compression buffer for each 
> OutboundRowBatch.
> This can  both under and over estimate:
> 1. it doesn't take account of the RowBatch used by channels during 
> partitioned exchange to collect rows belonging to a single channel 
> https://github.com/apache/impala/blob/4c762725c707f8d150fe250c03faf486008702d4/be/src/runtime/krpc-data-stream-sender.cc#L232
> 2.it ignores the adjustment to the RowBatch capacity above based on flag 
> data_stream_sender_buffer_size 
> https://github.com/apache/impala/blob/4c762725c707f8d150fe250c03faf486008702d4/be/src/runtime/krpc-data-stream-sender.cc#L379
> This adjustment can both increase or decrease the capacity to have to desired 
> total size (16K by defaul).
> Note that the adjustment above ignores var len data, so it can massively 
> underestimate in some cases. Meanwhile the frontend logic calculates string 
> sizes if stats are present. Ideally both logic would be improved and synced 
> to use both data_stream_sender_buffer_size and the string sizes for the 
> estimate (I am not sure about collection types).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscrib

[jira] [Resolved] (IMPALA-13371) Avoid throwing exceptions in FileSystemUtil::FindFileInPath()

2024-09-12 Thread Csaba Ringhofer (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer resolved IMPALA-13371.
--
Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> Avoid throwing exceptions in FileSystemUtil::FindFileInPath()
> -
>
> Key: IMPALA-13371
> URL: https://issues.apache.org/jira/browse/IMPALA-13371
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Csaba Ringhofer
>Assignee: Csaba Ringhofer
>Priority: Critical
> Fix For: Impala 4.5.0
>
>
> Some function in std::filesystem can throw exceptions in some scenarios. We 
> should catch the exception in all cases or use and overload with noexcept. 
> Even if the error is fatal throwing and exception can lead to not logging it 
> properly.
> An example is filesystem:exists(): 
> https://github.com/apache/impala/blob/22723d0f276468a25553f007dc65b21d79bd821d/be/src/util/filesystem-util.cc#L271
> Other functions use an overload with noexcept:
> https://github.com/apache/impala/blob/22723d0f276468a25553f007dc65b21d79bd821d/be/src/util/filesystem-util.cc#L75
> https://en.cppreference.com/w/cpp/filesystem/exists



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-13374) impala-shell can hit errors when downloading runtime profile

2024-09-11 Thread Csaba Ringhofer (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer updated IMPALA-13374:
-
Description: 
There are several issues with the current way runtime profiles are downloaded 
in impala-shell: 
https://github.com/apache/impala/blob/874e4fa117bdccfb8784c1987e5e3bf1ef4fbc1d/shell/impala_shell.py#L1463

1. The profile is fetched AFTER the queries are closed, which means that Impala 
may have already discarded it from memory, in which case the RPC will return an 
error. 
(Closing the query happens at different point depending on is_dml, but both 
happen before fetching the profile.)

2. If show_profiles=true, then failing to fetch the profiles is treated as an 
error. This leads to just an error message in interactive sessions, but with -q 
or -f parameter it will stop executing the queries and return with non 0 exit 
status.

3. The profile is fetched from Impala even if it is not used at all 
(show_profiles=false, which is the default). This is not a functional bug but 
can impact performance.

4. The downloaded profile is not cached, so a subsequent PROFILE; command will 
download it again. This is not just an optimization issue, but may lead to 
script failures if the profile is already discarded when PROFILE; is called.

Note that the "already discarded" case has special handling during SUMMARY (but 
not for PROFILE) command, if the query id is not found, then it is not treated 
as an error.
https://github.com/apache/impala/blob/874e4fa117bdccfb8784c1987e5e3bf1ef4fbc1d/shell/impala_shell.py#L820

The main problem is the combination of 1 and 2, as it can lead to failures if 
show_profiles=true, even when everything works as expected and the coordinator 
discards the profile between close and get_runtime_profile.



  was:
There are several issues with the current way runtime profiles are downloaded 
in impala-shell: 
https://github.infra.cloudera.com/CDH/Impala/blob/2010c93bd364795d4ee7d17ea8805450658fc485/shell/impala_shell.py#L1196

1. The profile is fetched AFTER the queries are closed, which means that Impala 
may have already discarded it from memory, in which case the RPC will return an 
error. 
(Closing the query happens at different point depending on is_dml, but both 
happen before fetching the profile.)

2. If show_profiles=true, then failing to fetch the profiles is treated as an 
error. This leads to just an error message in interactive sessions, but with -q 
or -f parameter it will stop executing the queries and return with non 0 exit 
status.

3. The profile is fetched from Impala even if it is not used at all 
(show_profiles=false, which is the default). This is not a functional bug but 
can impact performance.

4. The downloaded profile is not cached, so a subsequent PROFILE; command will 
download it again. This is not just an optimization issue, but may lead to 
script failures if the profile is already discarded when PROFILE; is called.

Note that the "already discarded" case has special handling during SUMMARY (but 
not for PROFILE) command, if the query id is not found, then it is not treated 
as an error.
https://github.infra.cloudera.com/CDH/Impala/blob/2010c93bd364795d4ee7d17ea8805450658fc485/shell/impala_shell.py#L684

The main problem is the combination of 1 and 2, as it can lead to failures if 
show_profiles=true, even when everything works as expected and the coordinator 
discards the profile between close and get_runtime_profile.




> impala-shell can hit errors when downloading runtime profile
> 
>
> Key: IMPALA-13374
> URL: https://issues.apache.org/jira/browse/IMPALA-13374
> Project: IMPALA
>  Issue Type: Bug
>  Components: Clients
>Reporter: Csaba Ringhofer
>Priority: Critical
>
> There are several issues with the current way runtime profiles are downloaded 
> in impala-shell: 
> https://github.com/apache/impala/blob/874e4fa117bdccfb8784c1987e5e3bf1ef4fbc1d/shell/impala_shell.py#L1463
> 1. The profile is fetched AFTER the queries are closed, which means that 
> Impala may have already discarded it from memory, in which case the RPC will 
> return an error. 
> (Closing the query happens at different point depending on is_dml, but both 
> happen before fetching the profile.)
> 2. If show_profiles=true, then failing to fetch the profiles is treated as an 
> error. This leads to just an error message in interactive sessions, but with 
> -q or -f parameter it will stop executing the queries and return with non 0 
> exit status.
> 3. The profile is fetched from Impala even if it is not used at all 
> (show_profiles=false, which is the default). This is not a functional bug but 
> can impact performance.
> 4. The downloaded profile is not cached, so a subsequent PROFILE; command 
> will download it again. This is

[jira] [Created] (IMPALA-13374) impala-shell can hit errors when downloading runtime profile

2024-09-11 Thread Csaba Ringhofer (Jira)

Csaba Ringhofer created IMPALA-13374:

Summary: impala-shell can hit errors when downloading runtime
profile
Key: IMPALA-13374
URL: https://issues.apache.org/jira/browse/IMPALA-13374
Project: IMPALA
Issue Type: Bug
Components: Clients
Reporter: Csaba Ringhofer

There are several issues with the current way runtime profiles are downloaded
in impala-shell:
https://github.infra.cloudera.com/CDH/Impala/blob/2010c93bd364795d4ee7d17ea8805450658fc485/shell/impala_shell.py#L1196

1. The profile is fetched AFTER the queries are closed, which means that Impala
may have already discarded it from memory, in which case the RPC will return an
error.
(Closing the query happens at different point depending on is_dml, but both
happen before fetching the profile.)

2. If show_profiles=true, then failing to fetch the profiles is treated as an
error. This leads to just an error message in interactive sessions, but with -q
or -f parameter it will stop executing the queries and return with non 0 exit
status.

3. The profile is fetched from Impala even if it is not used at all
(show_profiles=false, which is the default). This is not a functional bug but
can impact performance.

4. The downloaded profile is not cached, so a subsequent PROFILE; command will
download it again. This is not just an optimization issue, but may lead to
script failures if the profile is already discarded when PROFILE; is called.

Note that the "already discarded" case has special handling during SUMMARY (but
not for PROFILE) command, if the query id is not found, then it is not treated
as an error.
https://github.infra.cloudera.com/CDH/Impala/blob/2010c93bd364795d4ee7d17ea8805450658fc485/shell/impala_shell.py#L684

The main problem is the combination of 1 and 2, as it can lead to failures if
show_profiles=true, even when everything works as expected and the coordinator
discards the profile between close and get_runtime_profile.

--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-13371) Avoid throwing exceptions in FileSystemUtil::FindFileInPath()

2024-09-10 Thread Csaba Ringhofer (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer updated IMPALA-13371:
-
Summary: Avoid throwing exceptions in FileSystemUtil::FindFileInPath()  
(was: Avoid throwing exceptions in FileSystemUtil)

> Avoid throwing exceptions in FileSystemUtil::FindFileInPath()
> -
>
> Key: IMPALA-13371
> URL: https://issues.apache.org/jira/browse/IMPALA-13371
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Csaba Ringhofer
>Assignee: Csaba Ringhofer
>Priority: Critical
>
> Some function in std::filesystem can throw exceptions in some scenarios. We 
> should catch the exception in all cases or use and overload with noexcept. 
> Even if the error is fatal throwing and exception can lead to not logging it 
> properly.
> An example is filesystem:exists(): 
> https://github.com/apache/impala/blob/22723d0f276468a25553f007dc65b21d79bd821d/be/src/util/filesystem-util.cc#L271
> Other functions use an overload with noexcept:
> https://github.com/apache/impala/blob/22723d0f276468a25553f007dc65b21d79bd821d/be/src/util/filesystem-util.cc#L75
> https://en.cppreference.com/w/cpp/filesystem/exists



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-13371) Avoid throwing exceptions in FileSystemUtil

2024-09-10 Thread Csaba Ringhofer (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer reassigned IMPALA-13371:


Assignee: Csaba Ringhofer

> Avoid throwing exceptions in FileSystemUtil
> ---
>
> Key: IMPALA-13371
> URL: https://issues.apache.org/jira/browse/IMPALA-13371
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Csaba Ringhofer
>Assignee: Csaba Ringhofer
>Priority: Critical
>
> Some function in std::filesystem can throw exceptions in some scenarios. We 
> should catch the exception in all cases or use and overload with noexcept. 
> Even if the error is fatal throwing and exception can lead to not logging it 
> properly.
> An example is filesystem:exists(): 
> https://github.com/apache/impala/blob/22723d0f276468a25553f007dc65b21d79bd821d/be/src/util/filesystem-util.cc#L271
> Other functions use an overload with noexcept:
> https://github.com/apache/impala/blob/22723d0f276468a25553f007dc65b21d79bd821d/be/src/util/filesystem-util.cc#L75
> https://en.cppreference.com/w/cpp/filesystem/exists



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-11431) TestComputeStatsWithNestedTypes.test_compute_stats_with_structs fails in an exhaustive build

2024-09-06 Thread Csaba Ringhofer (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-11431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer resolved IMPALA-11431.
--
Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> TestComputeStatsWithNestedTypes.test_compute_stats_with_structs fails in an 
> exhaustive build
> 
>
> Key: IMPALA-11431
> URL: https://issues.apache.org/jira/browse/IMPALA-11431
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Daniel Becker
>Assignee: Csaba Ringhofer
>Priority: Blocker
>  Labels: broken-build
> Fix For: Impala 4.5.0
>
>
> In one of the exhaustive builds, 
> query_test.test_nested_types.TestComputeStatsWithNestedTypes.test_compute_stats_with_structs
>  fails:
> {code:java}
> query_test/test_nested_types.py:252: in test_compute_stats_with_structs
> self.run_test_case('QueryTest/compute-stats-with-structs', vector)
> common/impala_test_suite.py:778: in run_test_case
> self.__verify_results_and_errors(vector, test_section, result, use_db)
> common/impala_test_suite.py:588: in __verify_results_and_errors
> replace_filenames_with_placeholder)
> common/test_result_verifier.py:469: in verify_raw_results
> VERIFIER_MAP[verifier](expected, actual)
> common/test_result_verifier.py:278: in verify_query_result_is_equal
> assert expected_results == actual_results
> E   assert Comparing QueryTestResults (expected vs actual):
> E 
> 'alltypes','STRUCT',-1,-1,-1,-1.0,-1,-1
>  == 
> 'alltypes','STRUCT',-1,-1,-1,-1,-1,-1
> E 'id','INT',6,0,4,4.0,-1,-1 != 'id','INT',-1,-1,4,4,-1,-1
> E 'small_struct','STRUCT',-1,-1,-1,-1.0,-1,-1 == 
> 'small_struct','STRUCT',-1,-1,-1,-1,-1,-1
> E 'str','STRING',6,0,11,10.330154,-1,-1 != 
> 'str','STRING',-1,-1,-1,-1,-1,-1
> E 'tiny_struct','STRUCT',-1,-1,-1,-1.0,-1,-1 == 
> 'tiny_struct','STRUCT',-1,-1,-1,-1,-1,-1
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-13356) Avoid unnecessarily updating column stats

2024-09-04 Thread Csaba Ringhofer (Jira)

Csaba Ringhofer created IMPALA-13356:


 Summary: Avoid unnecessarily updating column stats
 Key: IMPALA-13356
 URL: https://issues.apache.org/jira/browse/IMPALA-13356
 Project: IMPALA
  Issue Type: Improvement
  Components: Catalog
Reporter: Csaba Ringhofer


Currently column stats are reloaded every time when the schema is reloaded:
https://github.com/apache/impala/blob/48ee4276be1eb278fb628a4813728134a4910b1f/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L1304
This includes the common scenario of processing alter table events.

Since HIVE-22046 introduced the engine field for column stats it is unlikely 
that Impala's version of column stats are modified by any other component. If 
there is another Impala catalogd connecting to the same cluster then it should 
also update table property impala.lastComputeStatsTime, so it is enough to 
update column stats when a non-self event is seen that modifies this property.

Another case when it can be useful to reload stats is when the schema actually 
changes, for example when columns are added/removed/renamed.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-13346) query_test/test_iceberg.py / test_read_position_deletes is flaky

2024-09-03 Thread Csaba Ringhofer (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer updated IMPALA-13346:
-
Priority: Critical  (was: Major)

> query_test/test_iceberg.py / test_read_position_deletes is flaky
> 
>
> Key: IMPALA-13346
> URL: https://issues.apache.org/jira/browse/IMPALA-13346
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Csaba Ringhofer
>Priority: Critical
>
> {code}
> query_test/test_iceberg.py:1466: in test_read_position_deletes 
> self.run_test_case('QueryTest/iceberg-v2-read-position-deletes', vector) 
> common/impala_test_suite.py:772: in run_test_case 
> self.__verify_results_and_errors(vector, test_section, result, use_db) 
> common/impala_test_suite.py:606: in __verify_results_and_errors 
> replace_filenames_with_placeholder) common/test_result_verifier.py:503: in 
> verify_raw_results VERIFIER_MAP[verifier](expected, actual) 
> common/test_result_verifier.py:296: in verify_query_result_is_equal assert 
> expected_results == actual_results E assert Comparing QueryTestResults 
> (expected vs actual): E 3,2,'3.21KB','NOT CACHED','NOT 
> CACHED','PARQUET','false','hdfs://localhost:20500/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows','NONE'
>  != 0,2,'3.21KB','NOT CACHED','NOT 
> CACHED','PARQUET','false','hdfs://localhost:20500/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows','NONE'
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-13346) query_test/test_iceberg.py / test_read_position_deletes is flaky

2024-09-03 Thread Csaba Ringhofer (Jira)

Csaba Ringhofer created IMPALA-13346:


 Summary: query_test/test_iceberg.py / test_read_position_deletes 
is flaky
 Key: IMPALA-13346
 URL: https://issues.apache.org/jira/browse/IMPALA-13346
 Project: IMPALA
  Issue Type: Bug
Reporter: Csaba Ringhofer


{code}

query_test/test_iceberg.py:1466: in test_read_position_deletes 
self.run_test_case('QueryTest/iceberg-v2-read-position-deletes', vector) 
common/impala_test_suite.py:772: in run_test_case 
self.__verify_results_and_errors(vector, test_section, result, use_db) 
common/impala_test_suite.py:606: in __verify_results_and_errors 
replace_filenames_with_placeholder) common/test_result_verifier.py:503: in 
verify_raw_results VERIFIER_MAP[verifier](expected, actual) 
common/test_result_verifier.py:296: in verify_query_result_is_equal assert 
expected_results == actual_results E assert Comparing QueryTestResults 
(expected vs actual): E 3,2,'3.21KB','NOT CACHED','NOT 
CACHED','PARQUET','false','hdfs://localhost:20500/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows','NONE'
 != 0,2,'3.21KB','NOT CACHED','NOT 
CACHED','PARQUET','false','hdfs://localhost:20500/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows','NONE'
 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Work started] (IMPALA-11431) TestComputeStatsWithNestedTypes.test_compute_stats_with_structs fails in an exhaustive build

2024-09-03 Thread Csaba Ringhofer (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-11431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-11431 started by Csaba Ringhofer.

> TestComputeStatsWithNestedTypes.test_compute_stats_with_structs fails in an 
> exhaustive build
> 
>
> Key: IMPALA-11431
> URL: https://issues.apache.org/jira/browse/IMPALA-11431
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Daniel Becker
>Assignee: Csaba Ringhofer
>Priority: Blocker
>  Labels: broken-build
>
> In one of the exhaustive builds, 
> query_test.test_nested_types.TestComputeStatsWithNestedTypes.test_compute_stats_with_structs
>  fails:
> {code:java}
> query_test/test_nested_types.py:252: in test_compute_stats_with_structs
> self.run_test_case('QueryTest/compute-stats-with-structs', vector)
> common/impala_test_suite.py:778: in run_test_case
> self.__verify_results_and_errors(vector, test_section, result, use_db)
> common/impala_test_suite.py:588: in __verify_results_and_errors
> replace_filenames_with_placeholder)
> common/test_result_verifier.py:469: in verify_raw_results
> VERIFIER_MAP[verifier](expected, actual)
> common/test_result_verifier.py:278: in verify_query_result_is_equal
> assert expected_results == actual_results
> E   assert Comparing QueryTestResults (expected vs actual):
> E 
> 'alltypes','STRUCT',-1,-1,-1,-1.0,-1,-1
>  == 
> 'alltypes','STRUCT',-1,-1,-1,-1,-1,-1
> E 'id','INT',6,0,4,4.0,-1,-1 != 'id','INT',-1,-1,4,4,-1,-1
> E 'small_struct','STRUCT',-1,-1,-1,-1.0,-1,-1 == 
> 'small_struct','STRUCT',-1,-1,-1,-1,-1,-1
> E 'str','STRING',6,0,11,10.330154,-1,-1 != 
> 'str','STRING',-1,-1,-1,-1,-1,-1
> E 'tiny_struct','STRUCT',-1,-1,-1,-1.0,-1,-1 == 
> 'tiny_struct','STRUCT',-1,-1,-1,-1,-1,-1
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-11431) TestComputeStatsWithNestedTypes.test_compute_stats_with_structs fails in an exhaustive build

2024-09-03 Thread Csaba Ringhofer (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-11431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer reassigned IMPALA-11431:


Assignee: Csaba Ringhofer  (was: Daniel Becker)

> TestComputeStatsWithNestedTypes.test_compute_stats_with_structs fails in an 
> exhaustive build
> 
>
> Key: IMPALA-11431
> URL: https://issues.apache.org/jira/browse/IMPALA-11431
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Daniel Becker
>Assignee: Csaba Ringhofer
>Priority: Blocker
>  Labels: broken-build
>
> In one of the exhaustive builds, 
> query_test.test_nested_types.TestComputeStatsWithNestedTypes.test_compute_stats_with_structs
>  fails:
> {code:java}
> query_test/test_nested_types.py:252: in test_compute_stats_with_structs
> self.run_test_case('QueryTest/compute-stats-with-structs', vector)
> common/impala_test_suite.py:778: in run_test_case
> self.__verify_results_and_errors(vector, test_section, result, use_db)
> common/impala_test_suite.py:588: in __verify_results_and_errors
> replace_filenames_with_placeholder)
> common/test_result_verifier.py:469: in verify_raw_results
> VERIFIER_MAP[verifier](expected, actual)
> common/test_result_verifier.py:278: in verify_query_result_is_equal
> assert expected_results == actual_results
> E   assert Comparing QueryTestResults (expected vs actual):
> E 
> 'alltypes','STRUCT',-1,-1,-1,-1.0,-1,-1
>  == 
> 'alltypes','STRUCT',-1,-1,-1,-1,-1,-1
> E 'id','INT',6,0,4,4.0,-1,-1 != 'id','INT',-1,-1,4,4,-1,-1
> E 'small_struct','STRUCT',-1,-1,-1,-1.0,-1,-1 == 
> 'small_struct','STRUCT',-1,-1,-1,-1,-1,-1
> E 'str','STRING',6,0,11,10.330154,-1,-1 != 
> 'str','STRING',-1,-1,-1,-1,-1,-1
> E 'tiny_struct','STRUCT',-1,-1,-1,-1.0,-1,-1 == 
> 'tiny_struct','STRUCT',-1,-1,-1,-1,-1,-1
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-11431) TestComputeStatsWithNestedTypes.test_compute_stats_with_structs fails in an exhaustive build

2024-08-31 Thread Csaba Ringhofer (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-11431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17878309#comment-17878309
 ] 

Csaba Ringhofer commented on IMPALA-11431:
--

It seems that besides row__id another problematic column is 
auto_incrementing_id in Kudu tables with non-unique primary key.

I still don't understand why the error is sporadic in HMS. So for I only saw 
the errors in exhaustive tests and not in core tests.

> TestComputeStatsWithNestedTypes.test_compute_stats_with_structs fails in an 
> exhaustive build
> 
>
> Key: IMPALA-11431
> URL: https://issues.apache.org/jira/browse/IMPALA-11431
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Daniel Becker
>Assignee: Daniel Becker
>Priority: Blocker
>  Labels: broken-build
>
> In one of the exhaustive builds, 
> query_test.test_nested_types.TestComputeStatsWithNestedTypes.test_compute_stats_with_structs
>  fails:
> {code:java}
> query_test/test_nested_types.py:252: in test_compute_stats_with_structs
> self.run_test_case('QueryTest/compute-stats-with-structs', vector)
> common/impala_test_suite.py:778: in run_test_case
> self.__verify_results_and_errors(vector, test_section, result, use_db)
> common/impala_test_suite.py:588: in __verify_results_and_errors
> replace_filenames_with_placeholder)
> common/test_result_verifier.py:469: in verify_raw_results
> VERIFIER_MAP[verifier](expected, actual)
> common/test_result_verifier.py:278: in verify_query_result_is_equal
> assert expected_results == actual_results
> E   assert Comparing QueryTestResults (expected vs actual):
> E 
> 'alltypes','STRUCT',-1,-1,-1,-1.0,-1,-1
>  == 
> 'alltypes','STRUCT',-1,-1,-1,-1,-1,-1
> E 'id','INT',6,0,4,4.0,-1,-1 != 'id','INT',-1,-1,4,4,-1,-1
> E 'small_struct','STRUCT',-1,-1,-1,-1.0,-1,-1 == 
> 'small_struct','STRUCT',-1,-1,-1,-1,-1,-1
> E 'str','STRING',6,0,11,10.330154,-1,-1 != 
> 'str','STRING',-1,-1,-1,-1,-1,-1
> E 'tiny_struct','STRUCT',-1,-1,-1,-1.0,-1,-1 == 
> 'tiny_struct','STRUCT',-1,-1,-1,-1,-1,-1
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Comment Edited] (IMPALA-11431) TestComputeStatsWithNestedTypes.test_compute_stats_with_structs fails in an exhaustive build

2024-08-30 Thread Csaba Ringhofer (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-11431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17878057#comment-17878057
 ] 

Csaba Ringhofer edited comment on IMPALA-11431 at 8/31/24 6:45 AM:
---

I see 2 issues here:

1.
test_compute_stats_with_structs itself is problematic, as it has side effects. 
It calls COMPUTE STATS for functional_*.complextypes_structs and 
complextypes_nested_structs, which modifies the state of tables read by many 
tests. While this is is not the cause of the issue, it is something to clean up.

2.
HMS throws an exception in some cases with "Column row__id doesn't exist in 
table"
row__id is related to full ACID tables (when the test runs for ORC the table is 
full ACID)
This leads to keeping the old empty stats and failing the test. I saw 140 error 
like this during the exhaustive test run, so this also effects some other 
tables. It is not clear why is this sporadic though.

Impala adds a "synthetic" row__id column to full ACID tables, so these columns 
don't come from HMS and we should not try to read their statistics.

The full exception in HMS:
{code:java}
2024-08-29T02:35:24,264 ERROR [TThreadPoolServer WorkerProcess-142] 
metastore.ObjectStore: Error retrieving statistics via jdo
org.apache.hadoop.hive.metastore.api.MetaException: Column row__id doesn't 
exist in table complextypes_structs in database functional_orc_def
    at 
org.apache.hadoop.hive.metastore.ObjectStore.validateTableCols(ObjectStore.java:10342)
 
~[hive-standalone-metastore-3.1.3000.2024.0.19.0-170.jar:3.1.3000.2024.0.19.0-170]
    at 
org.apache.hadoop.hive.metastore.ObjectStore.getMTableColumnStatistics(ObjectStore.java:10277)
 
~[hive-standalone-metastore-3.1.3000.2024.0.19.0-170.jar:3.1.3000.2024.0.19.0-170]
    at 
org.apache.hadoop.hive.metastore.ObjectStore.access$3100(ObjectStore.java:295) 
~[hive-standalone-metastore-3.1.3000.2024.0.19.0-170.jar:3.1.3000.2024.0.19.0-170]
    at 
org.apache.hadoop.hive.metastore.ObjectStore$20.getJdoResult(ObjectStore.java:10434)
 
~[hive-standalone-metastore-3.1.3000.2024.0.19.0-170.jar:3.1.3000.2024.0.19.0-170]
    at 
org.apache.hadoop.hive.metastore.ObjectStore$20.getJdoResult(ObjectStore.java:10426)
 
~[hive-standalone-metastore-3.1.3000.2024.0.19.0-170.jar:3.1.3000.2024.0.19.0-170]
    at 
org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:4345)
 
~[hive-standalone-metastore-3.1.3000.2024.0.19.0-170.jar:3.1.3000.2024.0.19.0-170]
    at 
org.apache.hadoop.hive.metastore.ObjectStore.getTableColumnStatisticsInternal(ObjectStore.java:10454)
 
~[hive-standalone-metastore-3.1.3000.2024.0.19.0-170.jar:3.1.3000.2024.0.19.0-170]
    at 
org.apache.hadoop.hive.metastore.ObjectStore.getTableColumnStatistics(ObjectStore.java:10412)
 
~[hive-standalone-metastore-3.1.3000.2024.0.19.0-170.jar:3.1.3000.2024.0.19.0-170]
    at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source) ~[?:?]
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_422]
    at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_422]
    at 
org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97) 
~[hive-standalone-metastore-3.1.3000.2024.0.19.0-170.jar:3.1.3000.2024.0.19.0-170]
    at com.sun.proxy.$Proxy33.getTableColumnStatistics(Unknown Source) ~[?:?]
    at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table_statistics_req(HiveMetaStore.java:7186)
 
~[hive-standalone-metastore-3.1.3000.2024.0.19.0-170.jar:3.1.3000.2024.0.19.0-170]
    at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) ~[?:?]
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_422]
    at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_422]
    at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:160)
 
~[hive-standalone-metastore-3.1.3000.2024.0.19.0-170.jar:3.1.3000.2024.0.19.0-170]
    at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:121)
 
~[hive-standalone-metastore-3.1.3000.2024.0.19.0-170.jar:3.1.3000.2024.0.19.0-170]
    at com.sun.proxy.$Proxy34.get_table_statistics_req(Unknown Source) ~[?:?]
    at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_table_statistics_req.getResult(ThriftHiveMetastore.java:22613)
 
~[hive-standalone-metastore-3.1.3000.2024.0.19.0-170.jar:3.1.3000.2024.0.19.0-170]
    at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_table_statistics_req.getResult(ThriftHiveMetastore.java:22592)
 
~[hive-standalone-metastore-3.1.3000.2024.0.19.0-170.jar:3.1.3000.2024.0.19.0-170]
    at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) 
~[libthrift-0.16.0.jar:0.16.0]
    at 
org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.j

[jira] [Commented] (IMPALA-11431) TestComputeStatsWithNestedTypes.test_compute_stats_with_structs fails in an exhaustive build

2024-08-30 Thread Csaba Ringhofer (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-11431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17878057#comment-17878057
 ] 

Csaba Ringhofer commented on IMPALA-11431:
--

I see 2 issues here:

1.
test_compute_stats_with_structs itself is problematic, as it has side effects. 
It calls COMPUTE STATS for functional_*.complextypes_structs and 
complextypes_nested_structs, which modifies the state of tables read by many 
tests. While this is is not the cause of the issue, it is something to clean up.

2.
HMS throws an exception in some cases with "Column row__id doesn't exist in 
table"
row__id is related to full ACID table (when the test runs for ORC the table is 
full ACID)
This leads to keeping the old empty stats and failing the test. I saw 140 error 
like this during the exhaustive test run, so this also effects some other 
tables. It is not clear why is this sporadic though.

Impala adds a "synthetic" row__id column to full ACID tables, so these columns 
don't come from HMS and we should try to read their statistics.

The full exception in HMS:
{code:java}
2024-08-29T02:35:24,264 ERROR [TThreadPoolServer WorkerProcess-142] 
metastore.ObjectStore: Error retrieving statistics via jdo
org.apache.hadoop.hive.metastore.api.MetaException: Column row__id doesn't 
exist in table complextypes_structs in database functional_orc_def
    at 
org.apache.hadoop.hive.metastore.ObjectStore.validateTableCols(ObjectStore.java:10342)
 
~[hive-standalone-metastore-3.1.3000.2024.0.19.0-170.jar:3.1.3000.2024.0.19.0-170]
    at 
org.apache.hadoop.hive.metastore.ObjectStore.getMTableColumnStatistics(ObjectStore.java:10277)
 
~[hive-standalone-metastore-3.1.3000.2024.0.19.0-170.jar:3.1.3000.2024.0.19.0-170]
    at 
org.apache.hadoop.hive.metastore.ObjectStore.access$3100(ObjectStore.java:295) 
~[hive-standalone-metastore-3.1.3000.2024.0.19.0-170.jar:3.1.3000.2024.0.19.0-170]
    at 
org.apache.hadoop.hive.metastore.ObjectStore$20.getJdoResult(ObjectStore.java:10434)
 
~[hive-standalone-metastore-3.1.3000.2024.0.19.0-170.jar:3.1.3000.2024.0.19.0-170]
    at 
org.apache.hadoop.hive.metastore.ObjectStore$20.getJdoResult(ObjectStore.java:10426)
 
~[hive-standalone-metastore-3.1.3000.2024.0.19.0-170.jar:3.1.3000.2024.0.19.0-170]
    at 
org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:4345)
 
~[hive-standalone-metastore-3.1.3000.2024.0.19.0-170.jar:3.1.3000.2024.0.19.0-170]
    at 
org.apache.hadoop.hive.metastore.ObjectStore.getTableColumnStatisticsInternal(ObjectStore.java:10454)
 
~[hive-standalone-metastore-3.1.3000.2024.0.19.0-170.jar:3.1.3000.2024.0.19.0-170]
    at 
org.apache.hadoop.hive.metastore.ObjectStore.getTableColumnStatistics(ObjectStore.java:10412)
 
~[hive-standalone-metastore-3.1.3000.2024.0.19.0-170.jar:3.1.3000.2024.0.19.0-170]
    at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source) ~[?:?]
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_422]
    at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_422]
    at 
org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97) 
~[hive-standalone-metastore-3.1.3000.2024.0.19.0-170.jar:3.1.3000.2024.0.19.0-170]
    at com.sun.proxy.$Proxy33.getTableColumnStatistics(Unknown Source) ~[?:?]
    at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table_statistics_req(HiveMetaStore.java:7186)
 
~[hive-standalone-metastore-3.1.3000.2024.0.19.0-170.jar:3.1.3000.2024.0.19.0-170]
    at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) ~[?:?]
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_422]
    at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_422]
    at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:160)
 
~[hive-standalone-metastore-3.1.3000.2024.0.19.0-170.jar:3.1.3000.2024.0.19.0-170]
    at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:121)
 
~[hive-standalone-metastore-3.1.3000.2024.0.19.0-170.jar:3.1.3000.2024.0.19.0-170]
    at com.sun.proxy.$Proxy34.get_table_statistics_req(Unknown Source) ~[?:?]
    at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_table_statistics_req.getResult(ThriftHiveMetastore.java:22613)
 
~[hive-standalone-metastore-3.1.3000.2024.0.19.0-170.jar:3.1.3000.2024.0.19.0-170]
    at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_table_statistics_req.getResult(ThriftHiveMetastore.java:22592)
 
~[hive-standalone-metastore-3.1.3000.2024.0.19.0-170.jar:3.1.3000.2024.0.19.0-170]
    at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) 
~[libthrift-0.16.0.jar:0.16.0]
    at 
org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:111)
 
~[hive-standalone-metastore-3.1.3000.2024.0

[jira] [Created] (IMPALA-13331) Add metrics to CatalogdTableInvalidator

2024-08-27 Thread Csaba Ringhofer (Jira)

Csaba Ringhofer created IMPALA-13331:


 Summary: Add metrics to CatalogdTableInvalidator
 Key: IMPALA-13331
 URL: https://issues.apache.org/jira/browse/IMPALA-13331
 Project: IMPALA
  Issue Type: Improvement
  Components: Catalog
Reporter: Csaba Ringhofer


CatalogdTableInvalidator only  logs when it invalidates a table but there are 
no metrics to track the number of invalidations. This makes it hard to get a 
picture of the effects of invalidate_tables_timeout_s 
/invalidate_tables_gc_old_gen_full_threshold.

A few examples for useful metrics:
- number of times invalidateSome() was called
- number of table invalidations due to mem pressure
- number of table invalidations due to ttl
- time spent in invalidateSome()
- time spent in invalidateOlderThan()

for the number of invalidations/time spent metrics probably having separate 
sum/max/avg metrics would be nice 




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-13246) Smallify strings during broadcast exchange

2024-08-16 Thread Csaba Ringhofer (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer resolved IMPALA-13246.
--
Fix Version/s: Impala 4.5.0
   Resolution: Done

> Smallify strings during broadcast exchange
> --
>
> Key: IMPALA-13246
> URL: https://issues.apache.org/jira/browse/IMPALA-13246
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Csaba Ringhofer
>Assignee: Csaba Ringhofer
>Priority: Major
> Fix For: Impala 4.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-13293) LocalCatalog's waitForCatalogUpdate() sleeps too much

2024-08-16 Thread Csaba Ringhofer (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer resolved IMPALA-13293.
--
Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> LocalCatalog's waitForCatalogUpdate() sleeps too much
> -
>
> Key: IMPALA-13293
> URL: https://issues.apache.org/jira/browse/IMPALA-13293
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog, Frontend
>Reporter: Csaba Ringhofer
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> Unlike ImpaladCatalog's waitForCatalogUpdate() the LocalCatalog version 
> doesn't use a conditional variable and simply waits for timeoutMs. The 
> timeout comes from MAX_CATALOG_UPDATE_WAIT_TIME_MS, which is 2 seconds. This 
> means the the function will wait 2 seconds even it the catalog update arrived 
> in the meantime. This 2 seconds is often nearly completely added to the 
> impala cluster startup time.
> The sleep was added in 
> https://gerrit.cloudera.org/#/c/11472/3/fe/src/main/java/org/apache/impala/catalog/local/LocalCatalog.java
> Update: realized that this also doesn't work well for ImpaladCatalog - the 
> FeCatalogManager creates a new ImpaladCatalog when a full topic update 
> arrives, so the Object that waitForCatalogUpdate() waits for is never 
> notified. My impression is that was broken a long time ago, even before 
> LocalCatalog was added.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-13306) Store resources attached to row batches per-tuple descriptor

2024-08-16 Thread Csaba Ringhofer (Jira)

Csaba Ringhofer created IMPALA-13306:


 Summary: Store resources attached to row batches per-tuple 
descriptor
 Key: IMPALA-13306
 URL: https://issues.apache.org/jira/browse/IMPALA-13306
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Reporter: Csaba Ringhofer


Currently RowBatch handles resource related info (e.g. FlushMode) globally 
while it may be different for each tuple descriptor.

An example is a row that comes from a join that didn't spill. In this case the 
memory of the build side tuple remains valid until the join node is closed, 
while the probe side can change more often, e.g. when the scratch batch in the 
Parquet scanner gets full and is attached to the row batch.

Some operators could benefit from knowing that some tuple pointers remain valid 
fog longer. An example is tuple deduplication KrpcDataStream sender - if more 
than one row batches could be sent in a single OutboundRowBatch, then it would 
be important to know which whether the same tuple pointer really means the same 
tuple in the new RowBatch.
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-13293) LocalCatalog's waitForCatalogUpdate() sleeps too much

2024-08-09 Thread Csaba Ringhofer (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer updated IMPALA-13293:
-
Description: 
Unlike ImpaladCatalog's waitForCatalogUpdate() the LocalCatalog version doesn't 
use a conditional variable and simply waits for timeoutMs. The timeout comes 
from MAX_CATALOG_UPDATE_WAIT_TIME_MS, which is 2 seconds. This means the the 
function will wait 2 seconds even it the catalog update arrived in the 
meantime. This 2 seconds is often nearly completely added to the impala cluster 
startup time.

The sleep was added in 
https://gerrit.cloudera.org/#/c/11472/3/fe/src/main/java/org/apache/impala/catalog/local/LocalCatalog.java

Update: realized that this also doesn't work well for ImpaladCatalog - the 
FeCatalogManager creates a new ImpaladCatalog when a full topic update arrives, 
so the Object that waitForCatalogUpdate() waits for is never notified. My 
impression is that was broken a long time ago, even before LocalCatalog was 
added.



  was:
Unlike ImpaladCatalog's waitForCatalogUpdate() the LocalCatalog version doesn't 
use a conditional variable and simply waits for timeoutMs. The timeout comes 
from MAX_CATALOG_UPDATE_WAIT_TIME_MS, which is 2 seconds. This means the the 
function will wait 2 seconds even it the catalog update arrived in the 
meantime. This 2 seconds is often nearly completely added to the impala cluster 
startup time.

The sleep was added in 
https://gerrit.cloudera.org/#/c/11472/3/fe/src/main/java/org/apache/impala/catalog/local/LocalCatalog.java




> LocalCatalog's waitForCatalogUpdate() sleeps too much
> -
>
> Key: IMPALA-13293
> URL: https://issues.apache.org/jira/browse/IMPALA-13293
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog, Frontend
>Reporter: Csaba Ringhofer
>Priority: Major
>
> Unlike ImpaladCatalog's waitForCatalogUpdate() the LocalCatalog version 
> doesn't use a conditional variable and simply waits for timeoutMs. The 
> timeout comes from MAX_CATALOG_UPDATE_WAIT_TIME_MS, which is 2 seconds. This 
> means the the function will wait 2 seconds even it the catalog update arrived 
> in the meantime. This 2 seconds is often nearly completely added to the 
> impala cluster startup time.
> The sleep was added in 
> https://gerrit.cloudera.org/#/c/11472/3/fe/src/main/java/org/apache/impala/catalog/local/LocalCatalog.java
> Update: realized that this also doesn't work well for ImpaladCatalog - the 
> FeCatalogManager creates a new ImpaladCatalog when a full topic update 
> arrives, so the Object that waitForCatalogUpdate() waits for is never 
> notified. My impression is that was broken a long time ago, even before 
> LocalCatalog was added.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-13293) LocalCatalog's waitForCatalogUpdate() sleeps too much

2024-08-09 Thread Csaba Ringhofer (Jira)

Csaba Ringhofer created IMPALA-13293:


 Summary: LocalCatalog's waitForCatalogUpdate() sleeps too much
 Key: IMPALA-13293
 URL: https://issues.apache.org/jira/browse/IMPALA-13293
 Project: IMPALA
  Issue Type: Improvement
  Components: Catalog, Frontend
Reporter: Csaba Ringhofer


Unlike ImpaladCatalog's waitForCatalogUpdate() the LocalCatalog version doesn't 
use a conditional variable and simply waits for timeoutMs. The timeout comes 
from MAX_CATALOG_UPDATE_WAIT_TIME_MS, which is 2 seconds. This means the the 
function will wait 2 seconds even it the catalog update arrived in the 
meantime. This 2 seconds is often nearly completely added to the impala cluster 
startup time.

The sleep was added in 
https://gerrit.cloudera.org/#/c/11472/3/fe/src/main/java/org/apache/impala/catalog/local/LocalCatalog.java





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-13292) Decrease statestore_update_frequency_ms in development environment

2024-08-09 Thread Csaba Ringhofer (Jira)

Csaba Ringhofer created IMPALA-13292:


 Summary: Decrease statestore_update_frequency_ms in development 
environment
 Key: IMPALA-13292
 URL: https://issues.apache.org/jira/browse/IMPALA-13292
 Project: IMPALA
  Issue Type: Improvement
  Components: Infrastructure
Reporter: Csaba Ringhofer


The current default 2s for statestore_update_frequency_ms adds significant 
delay to lot of operations. While decreasing it in production environment 
sounds risky, doing this in the development environment could make it faster.

Decreasing statestore_update_frequency_ms from 2s to 0.5s reduced cluster 
startup time by 1-2 seconds, which could make custom cluster tests faster. It 
also significantly speeds up tests that create / drop metastore objects:
impala-py.test -x tests/metadata/test_ddl.py -ktest_create_table_as_select 
with default settings:
first run: 32s -> 16s
other runs: 16s -> 8s

The effect less drastic on the first run with catalog_topic_mode=minimal:
first run: 18s-> 12s
other runs: 16s -> 8s



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-13272) Analyitic function of collections can lead to crash

2024-08-04 Thread Csaba Ringhofer (Jira)

Csaba Ringhofer created IMPALA-13272:


 Summary: Analyitic function of collections can lead to crash
 Key: IMPALA-13272
 URL: https://issues.apache.org/jira/browse/IMPALA-13272
 Project: IMPALA
  Issue Type: Improvement
Reporter: Csaba Ringhofer


Using Impala's test data the following query leads to DCHECK in debug builds 
and may cause more subtle issues in RELEASE builds:

{code}
select
  row_no
from (
   select
 arr.small,
 row_number() over (
  order by arr.inner_struct1.str) as row_no
   from functional_parquet.collection_struct_mix t, 
t.arr_contains_nested_struct arr
 ) res
{code}

The following DCHECK is hit:
{code}
tuple.h:296 Check failed: offset != -1
{code}


The problem seems to be with arr.small, which is referenced in the inline view, 
but not used in the outer query - removing it from the inline view or adding it 
to the outer select leads to avoiding the bug. The problem seems related to 
materialization - offset==-1 means that the slot is not materialized, but the 
Parquet scanner still tries to materialize it.

It is not clear yet which commit introduced the bug or whether this is a bug in 
the planner or the backend. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-13272) Analyitic function of collections can lead to crash

2024-08-04 Thread Csaba Ringhofer (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer updated IMPALA-13272:
-
Priority: Blocker  (was: Major)

> Analyitic function of collections can lead to crash
> ---
>
> Key: IMPALA-13272
> URL: https://issues.apache.org/jira/browse/IMPALA-13272
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Csaba Ringhofer
>Priority: Blocker
>
> Using Impala's test data the following query leads to DCHECK in debug builds 
> and may cause more subtle issues in RELEASE builds:
> {code}
> select
>   row_no
> from (
>  select
>arr.small,
>row_number() over (
> order by arr.inner_struct1.str) as row_no
>  from functional_parquet.collection_struct_mix t, 
> t.arr_contains_nested_struct arr
>) res
> {code}
> The following DCHECK is hit:
> {code}
> tuple.h:296 Check failed: offset != -1
> {code}
> The problem seems to be with arr.small, which is referenced in the inline 
> view, but not used in the outer query - removing it from the inline view or 
> adding it to the outer select leads to avoiding the bug. The problem seems 
> related to materialization - offset==-1 means that the slot is not 
> materialized, but the Parquet scanner still tries to materialize it.
> It is not clear yet which commit introduced the bug or whether this is a bug 
> in the planner or the backend. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-13269) Limit Kudu scan length if not all filters arrived

2024-08-01 Thread Csaba Ringhofer (Jira)

Csaba Ringhofer created IMPALA-13269:


 Summary: Limit Kudu scan length if not all filters arrived
 Key: IMPALA-13269
 URL: https://issues.apache.org/jira/browse/IMPALA-13269
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Reporter: Csaba Ringhofer


TARGETED_KUDU_SCAN_RANGE_LENGTH can be used as a hint for Kudu to limit size of 
scan ranges. As Impala can pickup late runtime filters when a new scan range 
starts, it can be useful to start with smaller scan ranges if there are 
unarrived runtime filters, as doing the whole scan without runtime filters can 
make it much less efficient.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-7086) Cache timezone name look ups

2024-08-01 Thread Csaba Ringhofer (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer reassigned IMPALA-7086:
---

Assignee: Mihaly Szjatinya

> Cache timezone name look ups
> 
>
> Key: IMPALA-7086
> URL: https://issues.apache.org/jira/browse/IMPALA-7086
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Csaba Ringhofer
>Assignee: Mihaly Szjatinya
>Priority: Major
>  Labels: performance, ramp-up, timestamp
>
> to/from_utc_timestamp looks up time zones by name during every invocation, 
> even if the timezone parameter is constant. Avoiding this lookup if time zone 
> name is the same as during the last call (in the fragment) could speed up 
> time zone conversions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-10536) saml2_callback_token_ttl's unit is seconds instead of milliseconds

2024-07-31 Thread Csaba Ringhofer (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer resolved IMPALA-10536.
--
Resolution: Fixed

> saml2_callback_token_ttl's unit is seconds instead of milliseconds
> --
>
> Key: IMPALA-10536
> URL: https://issues.apache.org/jira/browse/IMPALA-10536
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Csaba Ringhofer
>Priority: Major
>
> The description of saml2_callback_token_ttl writes "seconds", while its value 
> is interpreted as milliseconds, which the default 30 is way too short outside 
> automatic tests.
> I think that keeping the semantics and just rewriting the desc to 
> milliseconds is better than fixing the semantics, because the very low ttl is 
> actually useful for automatic tests that test expiration logic.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-7086) Cache timezone name look ups

2024-07-31 Thread Csaba Ringhofer (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer updated IMPALA-7086:

Labels: performance ramp-up timestamp  (was: performance timestamp)

> Cache timezone name look ups
> 
>
> Key: IMPALA-7086
> URL: https://issues.apache.org/jira/browse/IMPALA-7086
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Csaba Ringhofer
>Priority: Major
>  Labels: performance, ramp-up, timestamp
>
> to/from_utc_timestamp looks up time zones by name during every invocation, 
> even if the timezone parameter is constant. Avoiding this lookup if time zone 
> name is the same as during the last call (in the fragment) could speed up 
> time zone conversions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-13261) Consider the effect of NULL keys when choosing BROADCAST vs SHUFFLE join

2024-07-30 Thread Csaba Ringhofer (Jira)

Csaba Ringhofer created IMPALA-13261:


 Summary: Consider the effect of NULL keys when choosing BROADCAST 
vs SHUFFLE join
 Key: IMPALA-13261
 URL: https://issues.apache.org/jira/browse/IMPALA-13261
 Project: IMPALA
  Issue Type: Improvement
  Components: Frontend
Reporter: Csaba Ringhofer


Currently NULL keys are hashed to a single value and sent to a single fragment 
instance in partitioned joins. This can cause data skew if the number of NULL 
keys is large.

The planner could give preference to BROADCAST in LEFT OUTER JOIN when the 
number of NULLs is large on the probe side. 

Another potential solution for the same problem is IMPALA-13260 - it is about 
sending rows with NULL keys to local fragment instances in this situation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-13260) Exchange on the probe side of outer joins could send NULL keys to local target

2024-07-30 Thread Csaba Ringhofer (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer updated IMPALA-13260:
-
Description: 
Currently NULL keys are hashed to a single value and sent to a single fragment 
instance in partitioned joins. This can cause data skew if the number of NULL 
keys is large.

If a NULL key guarantees that no row is matched on the build side, then columns 
from build side will be all NULL and it doesn't matter which fragment instance 
processes the row.

Always sending rows with NULL key to a local fragment instance would both 
reduce data skew and make the shuffle cheaper (no compression/network). If 
mt_dop>0 then to completely avoid data these rows would need to be spread 
evenly among the local fragment instances.

One caveat is that sending NULL keys locally would "weaken" the partitioning of 
the fragment, so it is no longer "partitioned by col", but "partitioned by col 
(with the exception of NULLs)". For example if the outer join is followed by a 
grouping aggregation that uses the same key, then a shuffle is still needed as 
the aggregation needs all NULL keys in the same fragment instance.

  was:
Currently NULL keys are hashed to a single value and sent to a single fragment 
instance in partitioned joins. This can cause data skew if the number of NULL 
keys is large.

If a NULL key guarantees that no row is matched on the build side, then columns 
from build side will be all NULL and it doesn't matter which fragment instance 
processes the row.

Always sending rows with NULL key to a local fragment instance would both 
reduce data skew and make the shuffle cheaper (no compression/network). If 
mt_dop>0 then to completely avoid data these rows would need to be spread 
evenly among the local fragment instances. 


> Exchange on the probe side of outer joins could send NULL keys to local target
> --
>
> Key: IMPALA-13260
> URL: https://issues.apache.org/jira/browse/IMPALA-13260
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Csaba Ringhofer
>Priority: Major
>
> Currently NULL keys are hashed to a single value and sent to a single 
> fragment instance in partitioned joins. This can cause data skew if the 
> number of NULL keys is large.
> If a NULL key guarantees that no row is matched on the build side, then 
> columns from build side will be all NULL and it doesn't matter which fragment 
> instance processes the row.
> Always sending rows with NULL key to a local fragment instance would both 
> reduce data skew and make the shuffle cheaper (no compression/network). If 
> mt_dop>0 then to completely avoid data these rows would need to be spread 
> evenly among the local fragment instances.
> One caveat is that sending NULL keys locally would "weaken" the partitioning 
> of the fragment, so it is no longer "partitioned by col", but "partitioned by 
> col (with the exception of NULLs)". For example if the outer join is followed 
> by a grouping aggregation that uses the same key, then a shuffle is still 
> needed as the aggregation needs all NULL keys in the same fragment instance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-13260) Exchange on the probe side of outer joins could send NULL keys to local target

2024-07-30 Thread Csaba Ringhofer (Jira)

Csaba Ringhofer created IMPALA-13260:


 Summary: Exchange on the probe side of outer joins could send NULL 
keys to local target
 Key: IMPALA-13260
 URL: https://issues.apache.org/jira/browse/IMPALA-13260
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Reporter: Csaba Ringhofer


Currently NULL keys are hashed to a single value and sent to a single fragment 
instance in partitioned joins. This can cause data skew if the number of NULL 
keys is large.

If a NULL key guarantees that no row is matched on the build side, then columns 
from build side will be all NULL and it doesn't matter which fragment instance 
processes the row.

Always sending rows with NULL key to a local fragment instance would both 
reduce data skew and make the shuffle cheaper (no compression/network). If 
mt_dop>0 then to completely avoid data these rows would need to be spread 
evenly among the local fragment instances. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Work started] (IMPALA-13246) Smallify strings during broadcast exchange

2024-07-19 Thread Csaba Ringhofer (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-13246 started by Csaba Ringhofer.

> Smallify strings during broadcast exchange
> --
>
> Key: IMPALA-13246
> URL: https://issues.apache.org/jira/browse/IMPALA-13246
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Csaba Ringhofer
>Assignee: Csaba Ringhofer
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-13246) Smallify strings during broadcast exchange

2024-07-19 Thread Csaba Ringhofer (Jira)

Csaba Ringhofer created IMPALA-13246:


 Summary: Smallify strings during broadcast exchange
 Key: IMPALA-13246
 URL: https://issues.apache.org/jira/browse/IMPALA-13246
 Project: IMPALA
  Issue Type: Sub-task
  Components: Backend
Reporter: Csaba Ringhofer






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-13209) ExchangeNode's ConvertRowBatchTime can be high

2024-07-18 Thread Csaba Ringhofer (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer resolved IMPALA-13209.
--
Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> ExchangeNode's ConvertRowBatchTime can be high
> --
>
> Key: IMPALA-13209
> URL: https://issues.apache.org/jira/browse/IMPALA-13209
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Csaba Ringhofer
>Assignee: Csaba Ringhofer
>Priority: Major
>  Labels: performance
> Fix For: Impala 4.5.0
>
>
> ConvertRowBatchTime can be surprisingly high - the only thing done during 
> this timer is copying tuple pointers from one RowBatch to another.
> https://github.com/apache/impala/blob/c53987480726b114e0c3537c71297df2834a4962/be/src/exec/exchange-node.cc#L217
> {code}
> set mt_dop=8;
> select straight_join count(*) from tpcds_parquet.store_sales s1 join 
> /*+broadcast*/ tpcds_parquet.store_sales16 s2 on s1.ss_customer_sk = 
> s2.ss_customer_sk;
> ConvertRowBatchTime dominates the busy exchange node's exec time in the 
> profile:
>- ConvertRowBatchTime: 640.072ms
>- InactiveTotalTime: 243.783ms
>- PeakMemoryUsage: 12.53 MB (13142368)
>- RowsReturned: 46.09M (46086464)
>- RowsReturnedRate: 46.93 M/sec
>- TotalTime: 981.968ms
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-13225) Tuple deduplication does not work in partitioned exchanges

2024-07-15 Thread Csaba Ringhofer (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer updated IMPALA-13225:
-
Labels: performance  (was: )

> Tuple deduplication does not work in partitioned exchanges
> --
>
> Key: IMPALA-13225
> URL: https://issues.apache.org/jira/browse/IMPALA-13225
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Csaba Ringhofer
>Priority: Major
>  Labels: performance
>
> RowBatch::Serialize() has a deduplication logic that detects duplicate tuples 
> (usually the result of joins) based on tuple pointers. This doesn't work in 
> partitioned exchanges because all rows are deep copied one-by-one when 
> collecting rows for a given channel, so all tuple pointers will be distinct:
> https://github.com/apache/impala/blob/d83b48cf72fa94ec7f6e55da409b4dff3350543b/be/src/runtime/krpc-data-stream-sender.cc#L645
> The deduplication was added a long time ago (doesn't have a Jira):
> https://gerrit.cloudera.org/#/c/573/
> I am not sure if it ever worked in the partitioned case (it should work 
> though in broadcast exchanges).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-13225) Tuple deduplication does not work in partitioned exchanges

2024-07-15 Thread Csaba Ringhofer (Jira)

Csaba Ringhofer created IMPALA-13225:


 Summary: Tuple deduplication does not work in partitioned exchanges
 Key: IMPALA-13225
 URL: https://issues.apache.org/jira/browse/IMPALA-13225
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Reporter: Csaba Ringhofer


RowBatch::Serialize() has a deduplication logic that detects duplicate tuples 
(usually the result of joins) based on tuple pointers. This doesn't work in 
partitioned exchanges because all rows are deep copied one-by-one when 
collecting rows for a given channel, so all tuple pointers will be distinct:
https://github.com/apache/impala/blob/d83b48cf72fa94ec7f6e55da409b4dff3350543b/be/src/runtime/krpc-data-stream-sender.cc#L645

The deduplication was added a long time ago (doesn't have a Jira):
https://gerrit.cloudera.org/#/c/573/
I am not sure if it ever worked in the partitioned case (it should work though 
in broadcast exchanges).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-13217) Create allocator in Impala with mem tracker in TLS

2024-07-12 Thread Csaba Ringhofer (Jira)

Csaba Ringhofer created IMPALA-13217:


 Summary: Create allocator in Impala with mem tracker in TLS
 Key: IMPALA-13217
 URL: https://issues.apache.org/jira/browse/IMPALA-13217
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Reporter: Csaba Ringhofer


Some libraries allow setting an allocator only globally.
In most cases in Impala a thread is only used in a very specific context - this 
allows saving the context to TLS and use it in stateless allocator. For example:

setMemTrackerForThread(mem_tracker_);
vector v;
v.push_back(1); // Allocator can get mem_tracker_ from TLS 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-13216) Switch run_workload.py to use HS2 instead of Beeswax

2024-07-11 Thread Csaba Ringhofer (Jira)

Csaba Ringhofer created IMPALA-13216:


 Summary: Switch run_workload.py to use HS2 instead of Beeswax
 Key: IMPALA-13216
 URL: https://issues.apache.org/jira/browse/IMPALA-13216
 Project: IMPALA
  Issue Type: Improvement
  Components: Clients, Infrastructure
Reporter: Csaba Ringhofer


Currently the default is using beeswax, which leads to using beeswax in perf 
tests.
https://github.com/apache/impala/blob/c53987480726b114e0c3537c71297df2834a4962/bin/run-workload.py#L98

This could affect perf results/variance, because different clients use 
different sleep intervals when waiting for query status to become finished:
beeswax uses 50ms:
https://github.com/apache/impala/blob/c53987480726b114e0c3537c71297df2834a4962/tests/beeswax/impala_beeswax.py#L408
while hs2 would use a more complicated formula from Impyla, ranging for 10ms to 
1s:
https://github.com/apache/impala/blob/c53987480726b114e0c3537c71297df2834a4962/tests/performance/query_exec_functions.py#L122
https://github.com/cloudera/impyla/blob/acbd481dde28d85976dfc777f888b32ad6c8d721/impala/hiveserver2.py#L513

Making sleep times configurable in Impyla could help with this - it would make 
sense to use smaller sleeps than in real workloads to reduce variability.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Work started] (IMPALA-13209) ExchangeNode's ConvertRowBatchTime can be high

2024-07-10 Thread Csaba Ringhofer (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-13209 started by Csaba Ringhofer.

> ExchangeNode's ConvertRowBatchTime can be high
> --
>
> Key: IMPALA-13209
> URL: https://issues.apache.org/jira/browse/IMPALA-13209
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Csaba Ringhofer
>Assignee: Csaba Ringhofer
>Priority: Major
>  Labels: performance
>
> ConvertRowBatchTime can be surprisingly high - the only thing done during 
> this timer is copying tuple pointers from one RowBatch to another.
> https://github.com/apache/impala/blob/c53987480726b114e0c3537c71297df2834a4962/be/src/exec/exchange-node.cc#L217
> {code}
> set mt_dop=8;
> select straight_join count(*) from tpcds_parquet.store_sales s1 join 
> /*+broadcast*/ tpcds_parquet.store_sales16 s2 on s1.ss_customer_sk = 
> s2.ss_customer_sk;
> ConvertRowBatchTime dominates the busy exchange node's exec time in the 
> profile:
>- ConvertRowBatchTime: 640.072ms
>- InactiveTotalTime: 243.783ms
>- PeakMemoryUsage: 12.53 MB (13142368)
>- RowsReturned: 46.09M (46086464)
>- RowsReturnedRate: 46.93 M/sec
>- TotalTime: 981.968ms
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-13209) ExchangeNode's ConvertRowBatchTime can be high

2024-07-10 Thread Csaba Ringhofer (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer reassigned IMPALA-13209:


Assignee: Csaba Ringhofer

> ExchangeNode's ConvertRowBatchTime can be high
> --
>
> Key: IMPALA-13209
> URL: https://issues.apache.org/jira/browse/IMPALA-13209
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Csaba Ringhofer
>Assignee: Csaba Ringhofer
>Priority: Major
>  Labels: performance
>
> ConvertRowBatchTime can be surprisingly high - the only thing done during 
> this timer is copying tuple pointers from one RowBatch to another.
> https://github.com/apache/impala/blob/c53987480726b114e0c3537c71297df2834a4962/be/src/exec/exchange-node.cc#L217
> {code}
> set mt_dop=8;
> select straight_join count(*) from tpcds_parquet.store_sales s1 join 
> /*+broadcast*/ tpcds_parquet.store_sales16 s2 on s1.ss_customer_sk = 
> s2.ss_customer_sk;
> ConvertRowBatchTime dominates the busy exchange node's exec time in the 
> profile:
>- ConvertRowBatchTime: 640.072ms
>- InactiveTotalTime: 243.783ms
>- PeakMemoryUsage: 12.53 MB (13142368)
>- RowsReturned: 46.09M (46086464)
>- RowsReturnedRate: 46.93 M/sec
>- TotalTime: 981.968ms
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-13209) ExchangeNode's ConvertRowBatchTime can be high

2024-07-10 Thread Csaba Ringhofer (Jira)

Csaba Ringhofer created IMPALA-13209:


 Summary: ExchangeNode's ConvertRowBatchTime can be high
 Key: IMPALA-13209
 URL: https://issues.apache.org/jira/browse/IMPALA-13209
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Reporter: Csaba Ringhofer


ConvertRowBatchTime can be surprisingly high - the only thing done during this 
timer is copying tuple pointers from one RowBatch to another.
https://github.com/apache/impala/blob/c53987480726b114e0c3537c71297df2834a4962/be/src/exec/exchange-node.cc#L217

{code}
set mt_dop=8;
select straight_join count(*) from tpcds_parquet.store_sales s1 join 
/*+broadcast*/ tpcds_parquet.store_sales16 s2 on s1.ss_customer_sk = 
s2.ss_customer_sk;

ConvertRowBatchTime dominates the busy exchange node's exec time in the profile:
   - ConvertRowBatchTime: 640.072ms
   - InactiveTotalTime: 243.783ms
   - PeakMemoryUsage: 12.53 MB (13142368)
   - RowsReturned: 46.09M (46086464)
   - RowsReturnedRate: 46.93 M/sec
   - TotalTime: 981.968ms
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-13193) RuntimeFilter on parquet dictionary should evaluate null values

2024-07-09 Thread Csaba Ringhofer (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer updated IMPALA-13193:
-
Labels: correctness  (was: )

> RuntimeFilter on parquet dictionary should evaluate null values
> ---
>
> Key: IMPALA-13193
> URL: https://issues.apache.org/jira/browse/IMPALA-13193
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.1.0, Impala 4.2.0, Impala 4.1.1, Impala 4.1.2, 
> Impala 4.3.0, Impala 4.4.0
>Reporter: Quanlong Huang
>Assignee: Zhi Tang
>Priority: Critical
>  Labels: correctness
>
> IMPALA-10910, IMPALA-5509 introduces an optimization to evaluate runtime 
> filter on parquet dictionary values. If non of the values can pass the check, 
> the whole row group will be skipped. However, NULL values are not included in 
> the parquet dictionary. Runtime filters that accept NULL values might 
> incorrectly reject the row group if none of the dictionary values can pass 
> the check.
> Here are steps to reproduce the bug:
> {code:sql}
> create table parq_tbl (id bigint, name string) stored as parquet;
> insert into parq_tbl values (0, "abc"), (1, NULL), (2, NULL), (3, "abc");
> create table dim_tbl (name string);
> insert into dim_tbl values (NULL);
> select * from parq_tbl p join dim_tbl d
>   on COALESCE(p.name, '') = COALESCE(d.name, '');{code}
> The SELECT query should return 2 rows but now it returns 0 rows.
> A workaround is to disable this optimization:
> {code:sql}
> set PARQUET_DICTIONARY_RUNTIME_FILTER_ENTRY_LIMIT=0;{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-10985) always_true hint is not needed if all predicates are on partitioning columns

2024-06-27 Thread Csaba Ringhofer (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer updated IMPALA-10985:
-
Description: IMPALA-10314 added always_true hint that leads to assuming 
that a file in a table will return at least one row even if there is a WHERE 
clause. Currently we need to add it even if all columns used in the WHERE are 
partitioning columns. This is not needed, as these predicates can't drop any 
more rows after partition pruning.  (was: IMPALA-10314 added always_true hint 
that leads to assuming that a file in a table will return at least on row even 
if there is a WHERE clause. Currently we need to add it even if all columns 
used in the WHERE are partitioning columns. This is not needed, as these 
predicates can't drop any more rows after partition pruning.)

> always_true hint is not needed if all predicates are on partitioning columns
> 
>
> Key: IMPALA-10985
> URL: https://issues.apache.org/jira/browse/IMPALA-10985
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Csaba Ringhofer
>Priority: Minor
>
> IMPALA-10314 added always_true hint that leads to assuming that a file in a 
> table will return at least one row even if there is a WHERE clause. Currently 
> we need to add it even if all columns used in the WHERE are partitioning 
> columns. This is not needed, as these predicates can't drop any more rows 
> after partition pruning.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-5078) Break up expr-test.cc

2024-06-27 Thread Csaba Ringhofer (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-5078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17860452#comment-17860452
 ] 

Csaba Ringhofer commented on IMPALA-5078:
-

[~sy117] Were you able to make some progress with this / do you plan to?
This ticket is not urgent, but in the long run it would be really nice to break 
up expr-test.cc

> Break up expr-test.cc
> -
>
> Key: IMPALA-5078
> URL: https://issues.apache.org/jira/browse/IMPALA-5078
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Henry Robinson
>Assignee: Csaba Ringhofer
>Priority: Minor
>  Labels: newbie, ramp-up
> Attachments: Screen Shot 2020-06-30 at 12.19.16 PM.png, Screen Shot 
> 2020-07-10 at 1.01.43 PM.png, Screen Shot 2020-07-10 at 11.16.36 AM.png, 
> Screen Shot 2020-07-10 at 11.27.57 AM.png, image-2020-07-10-13-22-48-230.png
>
>
> {{expr-test.cc}} clocks in at 7129 lines, which is about enough for my emacs 
> to start slowing down a bit. Let's see if we can refactor it enough to have a 
> couple of test files. Maybe moving all the string instructions into a 
> separate {{expr-string-test.cc}}, and having a common header will be enough 
> to make it a bit more manageable. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-13183) Add default timeout for hs2/beeswax server sockets

2024-06-26 Thread Csaba Ringhofer (Jira)

Csaba Ringhofer created IMPALA-13183:


 Summary: Add default timeout for hs2/beeswax server sockets
 Key: IMPALA-13183
 URL: https://issues.apache.org/jira/browse/IMPALA-13183
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Reporter: Csaba Ringhofer


Currently Impala only sets timeout  for specific operations, for example during 
SASL handshake and when checking if connection can be closed due to idle 
session.
https://github.com/apache/impala/blob/d39596f6fb7da54c24d02523c4691e6b1973857b/be/src/rpc/TAcceptQueueServer.cpp#L153
https://github.com/apache/impala/blob/d39596f6fb7da54c24d02523c4691e6b1973857b/be/src/transport/TSaslServerTransport.cpp#L145

There are several cases where an inactive client could keep the connection open 
indefinitely, for example if it hasn't opened a session yet.
I think that there should be a general longer timeout set for both send/recv, 
e.g. flag client_default_timout_s=3600.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-12322) return wrong timestamp when scan kudu timestamp with timezone

2024-06-07 Thread Csaba Ringhofer (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-12322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17853203#comment-17853203
 ] 

Csaba Ringhofer commented on IMPALA-12322:
--

Thanks for the feedback[~eyizoha]. I have uploaded a patch that adds a new 
query option:  https://gerrit.cloudera.org/#/c/21492/

> return wrong timestamp when scan kudu timestamp with timezone
> -
>
> Key: IMPALA-12322
> URL: https://issues.apache.org/jira/browse/IMPALA-12322
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 4.1.1
> Environment: impala 4.1.1
>Reporter: daicheng
>Assignee: Zihao Ye
>Priority: Major
> Attachments: image-2022-04-24-00-01-05-746-1.png, 
> image-2022-04-24-00-01-05-746.png, image-2022-04-24-00-01-37-520.png, 
> image-2022-04-24-00-03-14-467-1.png, image-2022-04-24-00-03-14-467.png, 
> image-2022-04-24-00-04-16-240-1.png, image-2022-04-24-00-04-16-240.png, 
> image-2022-04-24-00-04-52-860-1.png, image-2022-04-24-00-04-52-860.png, 
> image-2022-04-24-00-05-52-086-1.png, image-2022-04-24-00-05-52-086.png, 
> image-2022-04-24-00-07-09-776-1.png, image-2022-04-24-00-07-09-776.png, 
> image-2023-07-28-20-31-09-457.png, image-2023-07-28-22-27-38-521.png, 
> image-2023-07-28-22-29-40-083.png, image-2023-07-28-22-36-17-460.png, 
> image-2023-07-28-22-36-37-884.png, image-2023-07-28-22-38-19-728.png
>
>
> impala version is 3.1.0-cdh6.1
> i have set system timezone=Asia/Shanghai:
> !image-2022-04-24-00-01-37-520.png!
> !image-2022-04-24-00-01-05-746.png!
> here is the bug:
> *step 1*
> i have parquet file with two columns like below,and read it with impala-shell 
> and spark (timezone=shanghai)
> !image-2022-04-24-00-03-14-467.png|width=1016,height=154!
> !image-2022-04-24-00-04-16-240.png|width=944,height=367!
> the result both exactly right。
> *step two*
> create kudu table  with impala-shell:
> CREATE TABLE default.test_{_}test{_}_test_time2 (id BIGINT,t 
> TIMESTAMP,PRIMARY KEY (id) ) STORED AS KUDU;
> note: kudu version:1.8
> and  insert 2 row into the table with spark :
> !image-2022-04-24-00-04-52-860.png|width=914,height=279!
> *stop 3*
> read it with spark (timezone=shanghai)，spark read kudu table with kudu-client 
> api,here is the result:
> !image-2022-04-24-00-05-52-086.png|width=914,height=301!
> the result is still exactly right。
> but read it with impala-shell: 
> !image-2022-04-24-00-07-09-776.png|width=915,height=154!
> the result show late 8hour
> *conclusion*
>    it seems like impala timezone didn't work when kudu column type is 
> timestamp, but it work fine in parquet file,I don't know why?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-12322) return wrong timestamp when scan kudu timestamp with timezone

2024-05-31 Thread Csaba Ringhofer (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-12322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17851147#comment-17851147
 ] 

Csaba Ringhofer commented on IMPALA-12322:
--

[~eyizoha] convert_kudu_utc_timestamps only affects reading, so if Impala 
writes a Kudu table, it will read back a different timestamp than what it 
written

In IMPALA-12370 there is some discussion about how to configure writing 
behavior. Do you think that convert_kudu_utc_timestamps should also govern 
writing, or that should get a separate query option?

> return wrong timestamp when scan kudu timestamp with timezone
> -
>
> Key: IMPALA-12322
> URL: https://issues.apache.org/jira/browse/IMPALA-12322
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 4.1.1
> Environment: impala 4.1.1
>Reporter: daicheng
>Assignee: Zihao Ye
>Priority: Major
> Attachments: image-2022-04-24-00-01-05-746-1.png, 
> image-2022-04-24-00-01-05-746.png, image-2022-04-24-00-01-37-520.png, 
> image-2022-04-24-00-03-14-467-1.png, image-2022-04-24-00-03-14-467.png, 
> image-2022-04-24-00-04-16-240-1.png, image-2022-04-24-00-04-16-240.png, 
> image-2022-04-24-00-04-52-860-1.png, image-2022-04-24-00-04-52-860.png, 
> image-2022-04-24-00-05-52-086-1.png, image-2022-04-24-00-05-52-086.png, 
> image-2022-04-24-00-07-09-776-1.png, image-2022-04-24-00-07-09-776.png, 
> image-2023-07-28-20-31-09-457.png, image-2023-07-28-22-27-38-521.png, 
> image-2023-07-28-22-29-40-083.png, image-2023-07-28-22-36-17-460.png, 
> image-2023-07-28-22-36-37-884.png, image-2023-07-28-22-38-19-728.png
>
>
> impala version is 3.1.0-cdh6.1
> i have set system timezone=Asia/Shanghai:
> !image-2022-04-24-00-01-37-520.png!
> !image-2022-04-24-00-01-05-746.png!
> here is the bug:
> *step 1*
> i have parquet file with two columns like below,and read it with impala-shell 
> and spark (timezone=shanghai)
> !image-2022-04-24-00-03-14-467.png|width=1016,height=154!
> !image-2022-04-24-00-04-16-240.png|width=944,height=367!
> the result both exactly right。
> *step two*
> create kudu table  with impala-shell:
> CREATE TABLE default.test_{_}test{_}_test_time2 (id BIGINT,t 
> TIMESTAMP,PRIMARY KEY (id) ) STORED AS KUDU;
> note: kudu version:1.8
> and  insert 2 row into the table with spark :
> !image-2022-04-24-00-04-52-860.png|width=914,height=279!
> *stop 3*
> read it with spark (timezone=shanghai)，spark read kudu table with kudu-client 
> api,here is the result:
> !image-2022-04-24-00-05-52-086.png|width=914,height=301!
> the result is still exactly right。
> but read it with impala-shell: 
> !image-2022-04-24-00-07-09-776.png|width=915,height=154!
> the result show late 8hour
> *conclusion*
>    it seems like impala timezone didn't work when kudu column type is 
> timestamp, but it work fine in parquet file,I don't know why?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-12370) Add an option to customize timezone when working with UNIXTIME_MICROS columns of Kudu tables

2024-05-31 Thread Csaba Ringhofer (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-12370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17851132#comment-17851132
 ] 

Csaba Ringhofer commented on IMPALA-12370:
--

>That will free the users from the inconvenience of running their clusters in 
>the UTC timezone
The timezone doesn't need to be set at server level in Impala, it can be set 
per query using query option "timezone", e.g. set timezone=CET;

> Ideally, the setting should be per Kudu table, but a system-wide flag is also 
> an option.
Query option convert_kudu_utc_timestamps, only affects reading, so there could 
be a writing related one to, e.g. write_kudu_utc_timestamps. (or 
convert_kudu_utc_timestamps could be changed to also affect writing).

I agree that the ideal would be to be able to override this per table, for 
example with a table property like "impala.use_kudu_utc_timestamps" which would 
override both convert_kudu_utc_timestamps / write_kudu_utc_timestamps.
It would be even better if other components would also respect this property, 
so if it is false, then they would write in the timezone agnostic "Impala" way. 

> Add an option to customize timezone when working with UNIXTIME_MICROS columns 
> of Kudu tables
> 
>
> Key: IMPALA-12370
> URL: https://issues.apache.org/jira/browse/IMPALA-12370
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Alexey Serbin
>Priority: Major
>  Labels: timezone
>
> Impala uses the timezone of its server when converting Unix epoch time stored 
> in a Kudu table in a column of UNIXTIME_MICROS type (legacy type name 
> TIMESTAMP) into a timestamp.  As one can see, the former (a values stored in 
> a column of the UNIXTIME_MICROS type) does not contain information about 
> timezone, but the latter (the result timestamp returned by Impala) does, and 
> Impala's convention does make sense and works totally fine if the data is 
> being written and read by Impala or by other application that uses the same 
> convention.
> However, Spark uses a different convention.  Spark applications convert 
> timestamps to the UTC timezone before representing the result as Unix epoch 
> time.  So, when a Spark application stores timestamp data in a Kudu table, 
> there is a difference in the result timestamps upon reading the stored data 
> via Impala if Impala servers are running in other than the UTC timezone.
> As of now, the workaround is to run Impala servers in the UTC timezone, so 
> the convention used by Spark produces the same result as the convention used 
> by Impala when converting between timestamps and Unix epoch times.
> In this context, it would be great to make it possible customizing the 
> timezone that's used by Impala when working with UNIXTIME_MICROS/TIMESTAMP 
> values stored in Kudu tables.  That will free the users from the 
> inconvenience of running their clusters in the UTC timezone if they use a mix 
> of Spark/Impala applications to work with the same data stored in Kudu 
> tables.  Ideally, the setting should be per Kudu table, but a system-wide 
> flag is also an option.
> This is similar to IMPALA-1658.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-12656) impala-shell cannot be installed on Python 3.11

2024-05-29 Thread Csaba Ringhofer (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-12656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17850457#comment-17850457
 ] 

Csaba Ringhofer commented on IMPALA-12656:
--

I also bumped into this and tried building the python-sasl PRs
https://github.com/cloudera/python-sasl/pull/32 worked with 3.11 and 3.12 but 
broke 2.7 (at least in my environment). The other PR only fix 3.11, but had 
other build failures with 3.12.

I think that this is a good reason to drop Python 2.7 support.

> impala-shell cannot be installed on Python 3.11
> ---
>
> Key: IMPALA-12656
> URL: https://issues.apache.org/jira/browse/IMPALA-12656
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 4.3.0
>Reporter: Michael Smith
>Priority: Major
>  Labels: python3
>
> Trying to {{pip install impala-shell}} fails with
> {code:java}
>       clang -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG 
> -g -fwrapv -O3 -Wall -isysroot 
> /Library/Developer/CommandLineTools/SDKs/MacOSX14.sdk -Isasl 
> -I/opt/homebrew/opt/python@3.11/Frameworks/Python.framework/Versions/3.11/include/python3.11
>  -c sasl/saslwrapper.cpp -o 
> build/temp.macosx-14-arm64-cpython-311/sasl/saslwrapper.o
>       sasl/saslwrapper.cpp:196:12: fatal error: 'longintrepr.h' file not found
>         #include "longintrepr.h"
>                  ^~~
>       1 error generated. {code}
> Python 3.11 moved this file to a subdirectory in 
> [https://github.com/python/cpython/commit/8e5de40f90476249e9a2e5ef135143b5c6a0b512.]
> Adopting [https://github.com/cloudera/python-sasl/pull/31] or 
> [https://github.com/cloudera/python-sasl/pull/32] might fix it. But they need 
> to be included in a new release of sasl on pypi.org.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-12656) impala-shell cannot be installed on Python 3.11

2024-05-29 Thread Csaba Ringhofer (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-12656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer updated IMPALA-12656:
-
Priority: Critical  (was: Major)

> impala-shell cannot be installed on Python 3.11
> ---
>
> Key: IMPALA-12656
> URL: https://issues.apache.org/jira/browse/IMPALA-12656
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 4.3.0
>Reporter: Michael Smith
>Priority: Critical
>  Labels: python3
>
> Trying to {{pip install impala-shell}} fails with
> {code:java}
>       clang -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG 
> -g -fwrapv -O3 -Wall -isysroot 
> /Library/Developer/CommandLineTools/SDKs/MacOSX14.sdk -Isasl 
> -I/opt/homebrew/opt/python@3.11/Frameworks/Python.framework/Versions/3.11/include/python3.11
>  -c sasl/saslwrapper.cpp -o 
> build/temp.macosx-14-arm64-cpython-311/sasl/saslwrapper.o
>       sasl/saslwrapper.cpp:196:12: fatal error: 'longintrepr.h' file not found
>         #include "longintrepr.h"
>                  ^~~
>       1 error generated. {code}
> Python 3.11 moved this file to a subdirectory in 
> [https://github.com/python/cpython/commit/8e5de40f90476249e9a2e5ef135143b5c6a0b512.]
> Adopting [https://github.com/cloudera/python-sasl/pull/31] or 
> [https://github.com/cloudera/python-sasl/pull/32] might fix it. But they need 
> to be included in a new release of sasl on pypi.org.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-11512) BINARY support in Iceberg

2024-05-23 Thread Csaba Ringhofer (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-11512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17848937#comment-17848937
 ] 

Csaba Ringhofer commented on IMPALA-11512:
--

BINARY columns seem to be working with Iceberg, but testing seems very limited. 
I didn't find any test with partition spec on BINARY columns.

> BINARY support in Iceberg
> -
>
> Key: IMPALA-11512
> URL: https://issues.apache.org/jira/browse/IMPALA-11512
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Frontend
>Reporter: Csaba Ringhofer
>Priority: Major
>  Labels: impala-iceberg
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-12990) impala-shell broken if Iceberg delete deletes 0 rows

2024-05-17 Thread Csaba Ringhofer (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-12990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer resolved IMPALA-12990.
--
Fix Version/s: Impala 4.4.0
   Resolution: Fixed

> impala-shell broken if Iceberg delete deletes 0 rows
> 
>
> Key: IMPALA-12990
> URL: https://issues.apache.org/jira/browse/IMPALA-12990
> Project: IMPALA
>  Issue Type: Bug
>  Components: Clients
>Reporter: Csaba Ringhofer
>Assignee: Csaba Ringhofer
>Priority: Major
>  Labels: iceberg
> Fix For: Impala 4.4.0
>
>
> Happens only with Python 3
> {code}
> impala-python3 shell/impala_shell.py
> create table icebergupdatet (i int, s string) stored as iceberg;
> alter table icebergupdatet set tblproperties("format-version"="2");
> delete from icebergupdatet where i=0;
> Unknown Exception : '>' not supported between instances of 'NoneType' and 
> 'int'
> Traceback (most recent call last):
>   File "shell/impala_shell.py", line 1428, in _execute_stmt
> if is_dml and num_rows == 0 and num_deleted_rows > 0:
> TypeError: '>' not supported between instances of 'NoneType' and 'int'
> {code}
> The same erros should also happen when the delete removes > 0 rows, but the 
> impala server has an older version that doesn't set TDmlResult.rows_deleted



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-13056) HBaseTableScanner's timeout handling looks broken

2024-05-03 Thread Csaba Ringhofer (Jira)

Csaba Ringhofer created IMPALA-13056:


 Summary: HBaseTableScanner's timeout handling looks broken
 Key: IMPALA-13056
 URL: https://issues.apache.org/jira/browse/IMPALA-13056
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Reporter: Csaba Ringhofer


https://gerrit.cloudera.org/#/c/12660/ rewrote some JNI exception handling code 
and accidentally eliminated the timeout handling in 
https://github.com/apache/impala/blob/7ad94006563b88d9221b4ac978dbf5b4fc0a3ca1/be/src/exec/hbase/hbase-table-scanner.cc#L518



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-13052) Sampling aggregate result sizes are underestimated

2024-05-02 Thread Csaba Ringhofer (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer updated IMPALA-13052:
-
Description: 
Sampling aggregates (sample, appx_median, histogram) return a string that can 
be quite large, but the planner assumes it to have a fixed small size.

Examples:
select sample(l_orderkey) from tpch.lineitem;
according to plan: row-size=12B
in reality: TotalBytesSent: 254.45 KB (this is  single row sent by a host)

select appx_median(l_orderkey) from tpch.lineitem;
according to plan: row-size= 8B
in reality: TotalBytesSent: 254.68 KB (this is  single row sent by a host)

select histogram(l_orderkey) from tpch.lineitem;
according to plan: row-size=12B
in reality: TotalBytesSent: 254.35 KB (this is  single row sent by a host)

This may be also relevant for datasketches functions, haven't checked those  
yet.

This can lead to highly underestimating the memory needs of grouping 
aggregators:
select appx_median(l_shipmode) from lineitem group by l_orderkey order by 1 
limit 1
04:AGGREGATE  FINALIZE Peak Mem:  2.19 GB   Est. Peak Mem:  18.00 MB
01:AGGREGATE STREAMING  Peak Mem:   2.37 GB   Est. Peak Mem:  45.79 MB

Enforcing PREAGG_BYTES_LIMIT also doesn't seem to work well -setting a 40MB 
limit decreased peak mem to 1.5 GB. My guess is that the pre-aggregation logic 
is not prepared for aggregation states that grow during the execution, so it 
can decide to not add another group to the hash table, but can't deny 
increasing an existing one's state.


  was:
Sampling aggregates (sample, appx_median, histogram) return a string that can 
be quite large, but the planner assumes it to have a fixed small size.

Examples:
select sample(l_orderkey) from tpch.lineitem;
according to plan: row-size=12B
in reality: TotalBytesSent: 254.45 KB (this is  single row sent by a host)

select appx_median(l_orderkey) from tpch.lineitem;
according to plan: row-size= 8B
in reality: TotalBytesSent: 254.68 KB (this is  single row sent by a host)

select histogram(l_orderkey) from tpch.lineitem;
according to plan: row-size=12B
in reality: TotalBytesSent: 254.35 KB (this is  single row sent by a host)

This may be also relevant for datasketches functions, haven't checked thos  yet.

This can lead to highly underestimating the memory needs of grouping 
aggregators:
select appx_median(l_shipmode) from lineitem group by l_orderkey order by 1 
limit 1
04:AGGREGATE  FINALIZE Peak Mem:  2.19 GB   Est. Peak Mem:  18.00 MB
01:AGGREGATE STREAMING  Peak Mem:   2.37 GB   Est. Peak Mem:  45.79 MB


> Sampling aggregate result sizes are underestimated
> --
>
> Key: IMPALA-13052
> URL: https://issues.apache.org/jira/browse/IMPALA-13052
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Csaba Ringhofer
>Priority: Major
>
> Sampling aggregates (sample, appx_median, histogram) return a string that can 
> be quite large, but the planner assumes it to have a fixed small size.
> Examples:
> select sample(l_orderkey) from tpch.lineitem;
> according to plan: row-size=12B
> in reality: TotalBytesSent: 254.45 KB (this is  single row sent by a host)
> select appx_median(l_orderkey) from tpch.lineitem;
> according to plan: row-size= 8B
> in reality: TotalBytesSent: 254.68 KB (this is  single row sent by a host)
> select histogram(l_orderkey) from tpch.lineitem;
> according to plan: row-size=12B
> in reality: TotalBytesSent: 254.35 KB (this is  single row sent by a host)
> This may be also relevant for datasketches functions, haven't checked those  
> yet.
> This can lead to highly underestimating the memory needs of grouping 
> aggregators:
> select appx_median(l_shipmode) from lineitem group by l_orderkey order by 1 
> limit 1
> 04:AGGREGATE  FINALIZE Peak Mem:  2.19 GB   Est. Peak Mem:  18.00 MB
> 01:AGGREGATE STREAMING  Peak Mem:   2.37 GB   Est. Peak Mem:  45.79 MB
> Enforcing PREAGG_BYTES_LIMIT also doesn't seem to work well -setting a 40MB 
> limit decreased peak mem to 1.5 GB. My guess is that the pre-aggregation 
> logic is not prepared for aggregation states that grow during the execution, 
> so it can decide to not add another group to the hash table, but can't deny 
> increasing an existing one's state.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-13052) Sampling aggregate result sizes are underestimated

2024-05-02 Thread Csaba Ringhofer (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer updated IMPALA-13052:
-
Description: 
Sampling aggregates (sample, appx_median, histogram) return a string that can 
be quite large, but the planner assumes it to have a fixed small size.

Examples:
select sample(l_orderkey) from tpch.lineitem;
according to plan: row-size=12B
in reality: TotalBytesSent: 254.45 KB (this is  single row sent by a host)

select appx_median(l_orderkey) from tpch.lineitem;
according to plan: row-size= 8B
in reality: TotalBytesSent: 254.68 KB (this is  single row sent by a host)

select histogram(l_orderkey) from tpch.lineitem;
according to plan: row-size=12B
in reality: TotalBytesSent: 254.35 KB (this is  single row sent by a host)

This may be also relevant for datasketches functions, haven't checked thos  yet.

This can lead to highly underestimating the memory needs of grouping 
aggregators:
select appx_median(l_shipmode) from lineitem group by l_orderkey order by 1 
limit 1
04:AGGREGATE  FINALIZE Peak Mem:  2.19 GB   Est. Peak Mem:  18.00 MB
01:AGGREGATE STREAMING  Peak Mem:   2.37 GB   Est. Peak Mem:  45.79 MB

  was:
Sampling aggregates (sample, appx_median, histogram) return a string that can 
be quite large, but the planner assumes it to have a fixed small size.

Examples:
select sample(l_orderkey) from tpch.lineitem;
according to plan: row-size=12B
in reality: TotalBytesSent: 254.45 KB (this is  single row sent by a host)

select appx_median(l_orderkey) from tpch.lineitem;
according to plan: row-size= 8B
in reality: TotalBytesSent: 254.68 KB (this is  single row sent by a host)

select histogram(l_orderkey) from tpch.lineitem;
according to plan: row-size=12B
in reality: TotalBytesSent: 254.35 KB (this is  single row sent by a host)

This may be also relevant for datasketches functions.



> Sampling aggregate result sizes are underestimated
> --
>
> Key: IMPALA-13052
> URL: https://issues.apache.org/jira/browse/IMPALA-13052
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Csaba Ringhofer
>Priority: Major
>
> Sampling aggregates (sample, appx_median, histogram) return a string that can 
> be quite large, but the planner assumes it to have a fixed small size.
> Examples:
> select sample(l_orderkey) from tpch.lineitem;
> according to plan: row-size=12B
> in reality: TotalBytesSent: 254.45 KB (this is  single row sent by a host)
> select appx_median(l_orderkey) from tpch.lineitem;
> according to plan: row-size= 8B
> in reality: TotalBytesSent: 254.68 KB (this is  single row sent by a host)
> select histogram(l_orderkey) from tpch.lineitem;
> according to plan: row-size=12B
> in reality: TotalBytesSent: 254.35 KB (this is  single row sent by a host)
> This may be also relevant for datasketches functions, haven't checked thos  
> yet.
> This can lead to highly underestimating the memory needs of grouping 
> aggregators:
> select appx_median(l_shipmode) from lineitem group by l_orderkey order by 1 
> limit 1
> 04:AGGREGATE  FINALIZE Peak Mem:  2.19 GB   Est. Peak Mem:  18.00 MB
> 01:AGGREGATE STREAMING  Peak Mem:   2.37 GB   Est. Peak Mem:  45.79 MB



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-13052) Sampling aggregate result sizes are underestimated

2024-05-02 Thread Csaba Ringhofer (Jira)

Csaba Ringhofer created IMPALA-13052:


 Summary: Sampling aggregate result sizes are underestimated
 Key: IMPALA-13052
 URL: https://issues.apache.org/jira/browse/IMPALA-13052
 Project: IMPALA
  Issue Type: Bug
Reporter: Csaba Ringhofer


Sampling aggregates (sample, appx_median, histogram) return a string that can 
be quite large, but the planner assumes it to have a fixed small size.

Examples:
select sample(l_orderkey) from tpch.lineitem;
according to plan: row-size=12B
in reality: TotalBytesSent: 254.45 KB (this is  single row sent by a host)

select appx_median(l_orderkey) from tpch.lineitem;
according to plan: row-size= 8B
in reality: TotalBytesSent: 254.68 KB (this is  single row sent by a host)

select histogram(l_orderkey) from tpch.lineitem;
according to plan: row-size=12B
in reality: TotalBytesSent: 254.35 KB (this is  single row sent by a host)

This may be also relevant for datasketches functions.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-13048) Shuffle hint on joins is ignored in some cases

2024-04-30 Thread Csaba Ringhofer (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer updated IMPALA-13048:
-
Description: 
I noticed that shuffle hint is ignored without any warning in some cases

shuffle hint is not applied in this query:

{code}
explain select  * from alltypestiny a2 join /* +SHUFFLE */ alltypes a1 on 
a1.id=a2.id join alltypessmall a3 on a2.tinyint_col=a3.tinyint_col;
{code}
result plan
{code}
PLAN-ROOT SINK
|
07:EXCHANGE [UNPARTITIONED]
|
04:HASH JOIN [INNER JOIN, BROADCAST]
|  hash predicates: a3.tinyint_col = a2.tinyint_col
|  runtime filters: RF000 <- a2.tinyint_col
|  row-size=267B cardinality=80
|
|--06:EXCHANGE [BROADCAST]
|  |
|  03:HASH JOIN [INNER JOIN, BROADCAST]
|  |  hash predicates: a1.id = a2.id
|  |  runtime filters: RF002 <- a2.id
|  |  row-size=178B cardinality=8
|  |
|  |--05:EXCHANGE [BROADCAST]
|  |  |
|  |  00:SCAN HDFS [functional.alltypestiny a2]
|  | HDFS partitions=4/4 files=4 size=460B
|  | row-size=89B cardinality=8
|  |
|  01:SCAN HDFS [functional.alltypes a1]
| HDFS partitions=24/24 files=24 size=478.45KB
| runtime filters: RF002 -> a1.id
| row-size=89B cardinality=7.30K
|
02:SCAN HDFS [functional.alltypessmall a3]
   HDFS partitions=4/4 files=4 size=6.32KB
   runtime filters: RF000 -> a3.tinyint_col
   row-size=89B cardinality=100
{code}

if the first two tables' position is swapped, then it is applied:
{code}
explain select  * from alltypes a1 join /* +SHUFFLE */ alltypestiny a2 on 
a1.id=a2.id join alltypessmall a3 on a2.tinyint_col=a3.tinyint_col;
{code}

  was:
I noticed that shuffle hint is ignore without any warning in some cases

shuffle hint is not applied in this query:

{code}
explain select  * from alltypestiny a2 join /* +SHUFFLE */ alltypes a1 on 
a1.id=a2.id join alltypessmall a3 on a2.tinyint_col=a3.tinyint_col;
{code}
result plan
{code}
PLAN-ROOT SINK
|
07:EXCHANGE [UNPARTITIONED]
|
04:HASH JOIN [INNER JOIN, BROADCAST]
|  hash predicates: a3.tinyint_col = a2.tinyint_col
|  runtime filters: RF000 <- a2.tinyint_col
|  row-size=267B cardinality=80
|
|--06:EXCHANGE [BROADCAST]
|  |
|  03:HASH JOIN [INNER JOIN, BROADCAST]
|  |  hash predicates: a1.id = a2.id
|  |  runtime filters: RF002 <- a2.id
|  |  row-size=178B cardinality=8
|  |
|  |--05:EXCHANGE [BROADCAST]
|  |  |
|  |  00:SCAN HDFS [functional.alltypestiny a2]
|  | HDFS partitions=4/4 files=4 size=460B
|  | row-size=89B cardinality=8
|  |
|  01:SCAN HDFS [functional.alltypes a1]
| HDFS partitions=24/24 files=24 size=478.45KB
| runtime filters: RF002 -> a1.id
| row-size=89B cardinality=7.30K
|
02:SCAN HDFS [functional.alltypessmall a3]
   HDFS partitions=4/4 files=4 size=6.32KB
   runtime filters: RF000 -> a3.tinyint_col
   row-size=89B cardinality=100
{code}

if the first two tables' position is swapped, then it is applied:
{code}
explain select  * from alltypes a1 join /* +SHUFFLE */ alltypestiny a2 on 
a1.id=a2.id join alltypessmall a3 on a2.tinyint_col=a3.tinyint_col;
{code}


> Shuffle hint on joins is ignored in some cases
> --
>
> Key: IMPALA-13048
> URL: https://issues.apache.org/jira/browse/IMPALA-13048
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Csaba Ringhofer
>Priority: Major
>
> I noticed that shuffle hint is ignored without any warning in some cases
> shuffle hint is not applied in this query:
> {code}
> explain select  * from alltypestiny a2 join /* +SHUFFLE */ alltypes a1 on 
> a1.id=a2.id join alltypessmall a3 on a2.tinyint_col=a3.tinyint_col;
> {code}
> result plan
> {code}
> PLAN-ROOT SINK
> |
> 07:EXCHANGE [UNPARTITIONED]
> |
> 04:HASH JOIN [INNER JOIN, BROADCAST]
> |  hash predicates: a3.tinyint_col = a2.tinyint_col
> |  runtime filters: RF000 <- a2.tinyint_col
> |  row-size=267B cardinality=80
> |
> |--06:EXCHANGE [BROADCAST]
> |  |
> |  03:HASH JOIN [INNER JOIN, BROADCAST]
> |  |  hash predicates: a1.id = a2.id
> |  |  runtime filters: RF002 <- a2.id
> |  |  row-size=178B cardinality=8
> |  |
> |  |--05:EXCHANGE [BROADCAST]
> |  |  |
> |  |  00:SCAN HDFS [functional.alltypestiny a2]
> |  | HDFS partitions=4/4 files=4 size=460B
> |  | row-size=89B cardinality=8
> |  |
> |  01:SCAN HDFS [functional.alltypes a1]
> | HDFS partitions=24/24 files=24 size=478.45KB
> | runtime filters: RF002 -> a1.id
> | row-size=89B cardinality=7.30K
> |
> 02:SCAN HDFS [functional.alltypessmall a3]
>HDFS partitions=4/4 files=4 size=6.32KB
>runtime filters: RF000 -> a3.tinyint_col
>row-size=89B cardinality=100
> {code}
> if the first two tables' position is swapped, then it is applied:
> {code}
> explain select  * from alltypes a1 join /* +SHUFFLE */ alltypestiny a2 on 
> a1.id=a2.id join alltypessmall a3 on a2.tinyint_col=a3.tinyint_col;
> {code}



--
This message was sent by Atlassian Jira
(v8.2

[jira] [Created] (IMPALA-13048) Shuffle hint on joins is ignored in some cases

2024-04-30 Thread Csaba Ringhofer (Jira)

Csaba Ringhofer created IMPALA-13048:


 Summary: Shuffle hint on joins is ignored in some cases
 Key: IMPALA-13048
 URL: https://issues.apache.org/jira/browse/IMPALA-13048
 Project: IMPALA
  Issue Type: Bug
Reporter: Csaba Ringhofer


I noticed that shuffle hint is ignore without any warning in some cases

shuffle hint is not applied in this query:

{code}
explain select  * from alltypestiny a2 join /* +SHUFFLE */ alltypes a1 on 
a1.id=a2.id join alltypessmall a3 on a2.tinyint_col=a3.tinyint_col;
{code}
result plan
{code}
PLAN-ROOT SINK
|
07:EXCHANGE [UNPARTITIONED]
|
04:HASH JOIN [INNER JOIN, BROADCAST]
|  hash predicates: a3.tinyint_col = a2.tinyint_col
|  runtime filters: RF000 <- a2.tinyint_col
|  row-size=267B cardinality=80
|
|--06:EXCHANGE [BROADCAST]
|  |
|  03:HASH JOIN [INNER JOIN, BROADCAST]
|  |  hash predicates: a1.id = a2.id
|  |  runtime filters: RF002 <- a2.id
|  |  row-size=178B cardinality=8
|  |
|  |--05:EXCHANGE [BROADCAST]
|  |  |
|  |  00:SCAN HDFS [functional.alltypestiny a2]
|  | HDFS partitions=4/4 files=4 size=460B
|  | row-size=89B cardinality=8
|  |
|  01:SCAN HDFS [functional.alltypes a1]
| HDFS partitions=24/24 files=24 size=478.45KB
| runtime filters: RF002 -> a1.id
| row-size=89B cardinality=7.30K
|
02:SCAN HDFS [functional.alltypessmall a3]
   HDFS partitions=4/4 files=4 size=6.32KB
   runtime filters: RF000 -> a3.tinyint_col
   row-size=89B cardinality=100
{code}

if the first two tables' position is swapped, then it is applied:
{code}
explain select  * from alltypes a1 join /* +SHUFFLE */ alltypestiny a2 on 
a1.id=a2.id join alltypessmall a3 on a2.tinyint_col=a3.tinyint_col;
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-13040) SIGSEGV in QueryState::UpdateFilterFromRemote

2024-04-26 Thread Csaba Ringhofer (Jira)

Csaba Ringhofer created IMPALA-13040:


 Summary: SIGSEGV in  QueryState::UpdateFilterFromRemote
 Key: IMPALA-13040
 URL: https://issues.apache.org/jira/browse/IMPALA-13040
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Reporter: Csaba Ringhofer


{code}
Crash reason:  SIGSEGV /SEGV_MAPERR
Crash address: 0x48
Process uptime: not available

Thread 114 (crashed)
 0  libpthread.so.0 + 0x9d00
rax = 0x00019e57ad00   rdx = 0x2a656720
rcx = 0x059a9860   rbx = 0x
rsi = 0x00019e57ad00   rdi = 0x0038
rbp = 0x7f6233d544e0   rsp = 0x7f6233d544a8
 r8 = 0x06a53540r9 = 0x0039
r10 = 0x   r11 = 0x000a
r12 = 0x00019e57ad00   r13 = 0x7f62a2f997d0
r14 = 0x7f6233d544f8   r15 = 0x1632c0f0
rip = 0x7f62a2f96d00
Found by: given as instruction pointer in context
 1  
impalad!impala::QueryState::UpdateFilterFromRemote(impala::UpdateFilterParamsPB 
const&, kudu::rpc::RpcContext*) [query-state.cc : 1033 + 0x5]
rbp = 0x7f6233d54520   rsp = 0x7f6233d544f0
rip = 0x015c0837
Found by: previous frame's frame pointer
 2  
impalad!impala::DataStreamService::UpdateFilterFromRemote(impala::UpdateFilterParamsPB
 const*, impala::UpdateFilterResultPB*, kudu::rpc::RpcContext*) 
[data-stream-service.cc : 134 + 0xb]
rbp = 0x7f6233d54640   rsp = 0x7f6233d54530
rip = 0x017c05de
Found by: previous frame's frame pointer
{code}

The line that crashes is 
https://github.com/apache/impala/blob/b39cd79ae84c415e0aebec2c2b4d7690d2a0cc7a/be/src/runtime/query-state.cc#L1033
My guess is that inside the actual segfault is within WaitForPrepare() but it 
was inlined. Not sure if a remote filter can arrive even before 
QueryState::Init is finished - that would explain the issue, as 
instances_prepared_barrier_ is not yet created at that point.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-12320) test_topic_updates_unblock fails in ASAN build

2024-04-26 Thread Csaba Ringhofer (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-12320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer updated IMPALA-12320:
-
Priority: Critical  (was: Major)

> test_topic_updates_unblock fails in ASAN build
> --
>
> Key: IMPALA-12320
> URL: https://issues.apache.org/jira/browse/IMPALA-12320
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Joe McDonnell
>Priority: Critical
>  Labels: broken-build
>
> h3. Error Message
> AssertionError: alter table tpcds.store_sales recover partitions query took 
> less time than 1 msec assert 9622 > 1 + where 9622 =  ApplyResult.get of  0x7f1ab45b6d10>>() + where  > = 
> .get
> h3. Stacktrace
> {noformat}
> custom_cluster/test_topic_update_frequency.py:82: in 
> test_topic_updates_unblock
> non_blocking_query_options=non_blocking_query_options)
> custom_cluster/test_topic_update_frequency.py:132: in __run_topic_update_test
> assert slow_query_future.get() > blocking_query_min_time, \
> E   AssertionError: alter table tpcds.store_sales recover partitions query 
> took less time than 1 msec
> E   assert 9622 > 1
> E+  where 9622 =  >()
> E+where  > = 
> .get
> {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-12266) Sporadic failure after migrating a table to Iceberg

2024-04-24 Thread Csaba Ringhofer (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-12266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17840486#comment-17840486
 ] 

Csaba Ringhofer commented on IMPALA-12266:
--

Saw this test failing again.
select * from special_chars;
Could not resolve table reference: 'special_chars'

Looked into coordinator log:
{code}
I0422 03:48:38.383420 19888 Frontend.java:2127] 
1f4e0654b999662f:b6f1b015] Analyzing query: select * from special_chars 
db: test_convert_table_cdba7383
...
I0422 03:48:42.862898  1012 ImpaladCatalog.java:232] Deleting: 
TABLE:test_convert_table_cdba7383.special_chars version: 7785 size: 77
I0422 03:48:42.862920  1012 ImpaladCatalog.java:232] Deleting: 
TABLE:test_convert_table_cdba7383.special_chars_tmp_5eb06c80 version: 7786 
size: 714
I0422 03:48:42.862967  1012 ImpaladCatalog.java:232] Adding: CATALOG_SERVICE_ID 
version: 7786 size: 60
...
I0422 03:48:42.863464 19888 jni-util.cc:302] 1f4e0654b999662f:b6f1b015] 
org.apache.impala.common.AnalysisException: Could not resolve table reference: 
'special_chars'
at org.apache.impala.analysis.Analyzer.resolvePath(Analyzer.java:1458)
...
I0422 03:48:46.893426  1012 ImpaladCatalog.java:232] Adding: 
TABLE:test_convert_table_cdba7383.special_chars version: 7794 size: 84
{code}
I am not familiar with how convert to Iceberg works, but based on the logs
1. special_chars_tmp_5eb06c80 is created,
2. special_chars is deleted 
3. special_chars recreated

If the table is queries between 2 and 3 then the coordinator will think that it 
doesn't exist.

> Sporadic failure after migrating a table to Iceberg
> ---
>
> Key: IMPALA-12266
> URL: https://issues.apache.org/jira/browse/IMPALA-12266
> Project: IMPALA
>  Issue Type: Bug
>  Components: fe
>Affects Versions: Impala 4.2.0
>Reporter: Tamas Mate
>Assignee: Quanlong Huang
>Priority: Major
>  Labels: impala-iceberg
> Attachments: 
> catalogd.bd40020df22b.invalid-user.log.INFO.20230704-181939.1, 
> impalad.6c0f48d9ce66.invalid-user.log.INFO.20230704-181940.1
>
>
> TestIcebergTable.test_convert_table test failed in a recent verify job's 
> dockerised tests:
> https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/7629
> {code:none}
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EINNER EXCEPTION: 
> EMESSAGE: AnalysisException: Failed to load metadata for table: 
> 'parquet_nopartitioned'
> E   CAUSED BY: TableLoadingException: Could not load table 
> test_convert_table_cdba7383.parquet_nopartitioned from catalog
> E   CAUSED BY: TException: 
> TGetPartialCatalogObjectResponse(status:TStatus(status_code:GENERAL, 
> error_msgs:[NullPointerException: null]), lookup_status:OK)
> {code}
> {code:none}
> E0704 19:09:22.980131   833 JniUtil.java:183] 
> 7145c21173f2c47b:2579db55] Error in Getting partial catalog object of 
> TABLE:test_convert_table_cdba7383.parquet_nopartitioned. Time spent: 49ms
> I0704 19:09:22.980309   833 jni-util.cc:288] 
> 7145c21173f2c47b:2579db55] java.lang.NullPointerException
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.replaceTableIfUnchanged(CatalogServiceCatalog.java:2357)
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.getOrLoadTable(CatalogServiceCatalog.java:2300)
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.doGetPartialCatalogObject(CatalogServiceCatalog.java:3587)
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.getPartialCatalogObject(CatalogServiceCatalog.java:3513)
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.getPartialCatalogObject(CatalogServiceCatalog.java:3480)
>   at 
> org.apache.impala.service.JniCatalog.lambda$getPartialCatalogObject$11(JniCatalog.java:397)
>   at 
> org.apache.impala.service.JniCatalogOp.lambda$execAndSerialize$1(JniCatalogOp.java:90)
>   at org.apache.impala.service.JniCatalogOp.execOp(JniCatalogOp.java:58)
>   at 
> org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:89)
>   at 
> org.apache.impala.service.JniCatalogOp.execAndSerializeSilentStartAndFinish(JniCatalogOp.java:109)
>   at 
> org.apache.impala.service.JniCatalog.execAndSerializeSilentStartAndFinish(JniCatalog.java:238)
>   at 
> org.apache.impala.service.JniCatalog.getPartialCatalogObject(JniCatalog.java:396)
> I0704 19:09:22.980324   833 status.cc:129] 7145c21173f2c47b:2579db55] 
> NullPointerException: null
> @  0x1012f9f  impala::Status::Status()
> @  0x187f964  impala::JniUtil::GetJniExceptionMsg()
> @   0xfee920  impala::JniCall::Call<>()
> @   0xfccd0f  impala::Catalog::GetPartialCatalogObject()
> @   0xfb55a5  
> impala::CatalogServiceThriftIf::GetPartialCatalogObject()
> @

[jira] [Created] (IMPALA-13037) EventsProcessorStressTest can hang

2024-04-24 Thread Csaba Ringhofer (Jira)

Csaba Ringhofer created IMPALA-13037:


 Summary: EventsProcessorStressTest can hang
 Key: IMPALA-13037
 URL: https://issues.apache.org/jira/browse/IMPALA-13037
 Project: IMPALA
  Issue Type: Bug
  Components: Catalog, Infrastructure
Reporter: Csaba Ringhofer


The test failed with timeout.

>From mvn.log the last line is:
20:17:53 [INFO] Running 
org.apache.impala.catalog.events.EventsProcessorStressTest

Things seem to be hanging from 2024.04.22 20:17:53 to 2024.04.23
The tests seems to wait for a Hive query.

>From FeSupport.INFO:
{code}
I0422 20:17:55.478875  7949 RandomHiveQueryRunner.java:1102] Client 0 running 
hive query set 2: 
insert into table events_stress_db_0.stress_test_tbl_0_alltypes_part partition 
(year,month) select * from functional.alltypes limit 100
   create database if not exists events_stress_db_0
   drop table if exists events_stress_db_0.stress_test_tbl_0_alltypes_part 
   create table if not exists 
events_stress_db_0.stress_test_tbl_0_alltypes_part  like  functional.alltypes 
   set hive.exec.dynamic.partition.mode = nonstrict
   set hive.exec.max.dynamic.partitions = 1
   set hive.exec.max.dynamic.partitions.pernode = 1
   set tez.session.am.dag.submit.timeout.secs = 2
I0422 20:17:55.478940  7949 HiveJdbcClientPool.java:102] Executing sql : create 
database if not exists events_stress_db_0
I0422 20:17:55.493497  7768 MetastoreShim.java:843] EventId: 33414 EventType: 
COMMIT_TXN transaction id: 2075
I0422 20:17:55.493682  7768 MetastoreEvents.java:302] Total number of events 
received: 6 Total number of events filtered out: 0
I0422 20:17:55.494762  7768 MetastoreEvents.java:825] EventId: 33407 EventType: 
CREATE_DATABASE Successfully added database events_stress_db_0
I0422 20:17:55.508478  7949 HiveJdbcClientPool.java:102] Executing sql : drop 
table if exists events_stress_db_0.stress_test_tbl_0_alltypes_part 
I0422 20:17:55.516858  7768 MetastoreEvents.java:825] EventId: 33410 EventType: 
CREATE_TABLE Successfully added table events_stress_db_0.stress_test_tbl_0_part
I0422 20:17:55.518288  7768 CatalogOpExecutor.java:4713] EventId: 33413 Table 
events_stress_db_0.stress_test_tbl_0_part is not loaded. Skipping add partitions
I0422 20:17:55.519479  7768 MetastoreEventsProcessor.java:1340] Time elapsed in 
processing event batch: 178.895ms
I0422 20:17:55.521183  7768 MetastoreEventsProcessor.java:1120] Latest event in 
HMS: id=33420, time=1713842275. Last synced event: id=33414, time=1713842275.
I0422 20:17:55.533375  7949 HiveJdbcClientPool.java:102] Executing sql : create 
table if not exists events_stress_db_0.stress_test_tbl_0_alltypes_part  like  
functional.alltypes 
I0422 20:17:55.611153  7949 HiveJdbcClientPool.java:102] Executing sql : set 
hive.exec.dynamic.partition.mode = nonstrict
I0422 20:17:55.616571  7949 HiveJdbcClientPool.java:102] Executing sql : set 
hive.exec.max.dynamic.partitions = 1
I0422 20:17:55.619197  7949 HiveJdbcClientPool.java:102] Executing sql : set 
hive.exec.max.dynamic.partitions.pernode = 1
I0422 20:17:55.621069  7949 HiveJdbcClientPool.java:102] Executing sql : set 
tez.session.am.dag.submit.timeout.secs = 2
I0422 20:17:55.622972  7949 HiveJdbcClientPool.java:102] Executing sql : insert 
into table events_stress_db_0.stress_test_tbl_0_alltypes_part partition 
(year,month) select * from functional.alltypes limit 100
I0422 20:17:57.163591  7950 CatalogServiceCatalog.java:2747] Refreshing table 
metadata: events_stress_db_0.stress_test_tbl_0_part
I0422 20:17:57.829802  7768 MetastoreEventsProcessor.java:982] Received 6 
events. First event id: 33416.
I0422 20:17:57.833026  7768 MetastoreShim.java:843] EventId: 33417 EventType: 
COMMIT_TXN transaction id: 2076
I0422 20:17:57.833222  7768 MetastoreShim.java:843] EventId: 33419 EventType: 
COMMIT_TXN transaction id: 2077
I0422 20:17:57.84  7768 MetastoreShim.java:843] EventId: 33421 EventType: 
COMMIT_TXN transaction id: 2078
I0422 20:17:57.834242  7768 MetastoreShim.java:843] EventId: 33424 EventType: 
COMMIT_TXN transaction id: 2079
I0422 20:17:57.834323  7768 MetastoreEvents.java:302] Total number of events 
received: 6 Total number of events filtered out: 0
I0422 20:17:57.834570  7768 CatalogOpExecutor.java:4862] EventId: 33416 Table 
events_stress_db_0.stress_test_tbl_0_part is not loaded. Not processing the 
event.
I0422 20:17:57.837756  7768 MetastoreEvents.java:825] EventId: 33423 EventType: 
CREATE_TABLE Successfully added table 
events_stress_db_0.stress_test_tbl_0_alltypes_part
I0422 20:17:57.838668  7768 MetastoreEventsProcessor.java:1340] Time elapsed in 
processing event batch: 8.625ms
I0422 20:17:57.840027  7768 MetastoreEventsProcessor.java:1120] Latest event in 
HMS: id=33425, time=1713842275. Last synced event: id=33424, time=1713842275.
I0422 20:18:03.143219  7768 MetastoreEventsProcessor.java:982] Received 0 
events. First event id: non

[jira] [Created] (IMPALA-13026) Creating openai-api-key-secret fails sporadically

2024-04-22 Thread Csaba Ringhofer (Jira)

Csaba Ringhofer created IMPALA-13026:


 Summary: Creating openai-api-key-secret fails sporadically
 Key: IMPALA-13026
 URL: https://issues.apache.org/jira/browse/IMPALA-13026
 Project: IMPALA
  Issue Type: Bug
  Components: Infrastructure
Reporter: Csaba Ringhofer


Data load fails time to time with the following error:
{code}
00:27:17.680 Error loading data. The end of the log file is:
00:27:17.680 04:15:15 
/data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/bin/load-data.py 
--workloads functional-query -e core --table_formats kudu/none/none --force 
--impalad localhost --hive_hs2_hostport localhost:11050 --hdfs_namenode 
localhost:20500
00:27:17.680 04:15:15 Executing Hadoop command: ... hadoop credential create 
openai-api-key-secret -value secret -provider 
localjceks://file/data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/testdata/jceks/test.jceks
...

00:27:17.680 java.io.IOException: Credential openai-api-key-secret already 
exists in 
localjceks://file/data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/testdata/jceks/test.jceks
00:27:17.680at 
org.apache.hadoop.security.alias.AbstractJavaKeyStoreProvider.createCredentialEntry(AbstractJavaKeyStoreProvider.java:234)
00:27:17.680at 
org.apache.hadoop.security.alias.CredentialShell$CreateCommand.execute(CredentialShell.java:354)
00:27:17.680at 
org.apache.hadoop.tools.CommandShell.run(CommandShell.java:72)
00:27:17.680at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:81)
00:27:17.680at 
org.apache.hadoop.security.alias.CredentialShell.main(CredentialShell.java:437)
00:27:17.680 04:15:15 Error executing Hadoop command, exiting
{code}

My guess is that this happens when calling "hadoop credential create" 
concurrently with different  data loader processes.
https://github.com/apache/impala/blob/9b05a205fec397fa1e19ae467b1cc406ca43d948/bin/load-data.py#L323
Ideally this would be called in the serial phase of dataload




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Comment Edited] (IMPALA-13024) Several tests timeout waiting for admission

2024-04-21 Thread Csaba Ringhofer (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-13024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17839337#comment-17839337
 ] 

Csaba Ringhofer edited comment on IMPALA-13024 at 4/21/24 8:15 AM:
---

>Slot based admission is not enabled when using default groups
This was also my assumption, but it seems that it is enforced by default.
Reproduced slot starvation locally:

Run one query with more fragment instance than core count in one impala-shell:
set mt_dop=32;
select sleep(1000*60) from tpcds.store_sales limit 200; -- 

Run a query in another impala-shell:
select * from functional.alltypestiny;
ERROR: Admission for query exceeded timeout 6ms in pool default-pool. 
Queued reason: Not enough admission control slots available on host 
csringhofer-7000-ubuntu:27000. Needed 1 slots but 32/24 are already in use. 
Additional Details: Not Applicable

UPDATE:
I understand now what is happening: the limit is only enforced on coordinator 
only queries.
While "select * from alltypestiny" failed, the much larger "select * from 
alltypes" could be run without issues. The reason is that the former query runs 
on a single node.

>From impalad.INFO:
"0421 10:10:57.505287 1586078 admission-controller.cc:1962] Trying to admit 
id=91442a9fa1d2512d:db5337c2 in pool_name=default-pool 
executor_group_name=empty group (using coordinator only) 
per_host_mem_estimate=20.00 MB dedicated_coord_mem_estimate=120.00 MB 
max_requests=-1 max_queued=200 max_mem=-1.00 B is_trivial_query=false
I0421 10:10:57.505345 1586078 admission-controller.cc:1971] Stats: 
agg_num_running=1, agg_num_queued=1, agg_mem_reserved=4.02 MB,  
local_host(local_mem_admitted=516.57 MB, local_trivial_running=0, 
num_admitted_running=1, num_queued=1, backend_mem_reserved=4.02 MB, 
topN_query_stats: queries=[d84f2a7efee0998a:45ac1206], 
total_mem_consumed=4.02 MB, fraction_of_pool_total_mem=1; pool_level_stats: 
num_running=1, min=4.02 MB, max=4.02 MB, pool_total_mem=4.02 MB, 
average_per_query=4.02 MB)
I0421 10:10:57.505407 1586078 admission-controller.cc:2227] Could not dequeue 
query id=91442a9fa1d2512d:db5337c2 reason: Not enough admission control 
slots available on host csringhofer-7000-ubuntu:27000. Needed 1 slots but 32/24 
are already in use."



was (Author: csringhofer):
>Slot based admission is not enabled when using default groups
This was also my assumption, but it seems that it is enforced by default.
Reproduced slot starvation locally:

Run one query with more fragment instance than core count in one impala-shell:
set mt_dop=32;
select sleep(1000*60) from tpcds.store_sales limit 200; -- 

Run a query in another impala-shell:
select * from functional.alltypestiny;
ERROR: Admission for query exceeded timeout 6ms in pool default-pool. 
Queued reason: Not enough admission control slots available on host 
csringhofer-7000-ubuntu:27000. Needed 1 slots but 32/24 are already in use. 
Additional Details: Not Applicable


> Several tests timeout waiting for admission
> ---
>
> Key: IMPALA-13024
> URL: https://issues.apache.org/jira/browse/IMPALA-13024
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Csaba Ringhofer
>Priority: Critical
>
> A bunch of seemingly unrelated tests failed with the following message:
> Example: 
> query_test.test_spilling.TestSpillingDebugActionDimensions.test_spilling_aggs[protocol:
>  beeswax | exec_option: {'mt_dop': 1, 'debug_action': None, 
> 'default_spillable_buffer_size': '256k'} | table_format: parquet/none] 
> {code}
> ImpalaBeeswaxException: EQuery aborted:Admission for query exceeded 
> timeout 6ms in pool default-pool. Queued reason: Not enough admission 
> control slots available on host ... . Needed 1 slots but 18/16 are already in 
> use. Additional Details: Not Applicable
> {code}
> This happened in an ASAN build. Another test also failed which may be related 
> to the cause:
> custom_cluster.test_admission_controller.TestAdmissionController.test_queue_reasons_slots
>  
> {code}
> Timeout: query 'e1410add778cd7b0:c40812b9' did not reach one of the 
> expected states [4], last known state 5
> {code}
> test_queue_reasons_slots seems to be know flaky test: IMPALA-10338



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-13024) Several tests timeout waiting for admission

2024-04-21 Thread Csaba Ringhofer (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-13024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17839337#comment-17839337
 ] 

Csaba Ringhofer commented on IMPALA-13024:
--

>Slot based admission is not enabled when using default groups
This was also my assumption, but it seems that it is enforced by default.
Reproduced slot starvation locally:

Run one query with more fragment instance than core count in one impala-shell:
set mt_dop=32;
select sleep(1000*60) from tpcds.store_sales limit 200; -- 

Run a query in another impala-shell:
select * from functional.alltypestiny;
ERROR: Admission for query exceeded timeout 6ms in pool default-pool. 
Queued reason: Not enough admission control slots available on host 
csringhofer-7000-ubuntu:27000. Needed 1 slots but 32/24 are already in use. 
Additional Details: Not Applicable


> Several tests timeout waiting for admission
> ---
>
> Key: IMPALA-13024
> URL: https://issues.apache.org/jira/browse/IMPALA-13024
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Csaba Ringhofer
>Priority: Critical
>
> A bunch of seemingly unrelated tests failed with the following message:
> Example: 
> query_test.test_spilling.TestSpillingDebugActionDimensions.test_spilling_aggs[protocol:
>  beeswax | exec_option: {'mt_dop': 1, 'debug_action': None, 
> 'default_spillable_buffer_size': '256k'} | table_format: parquet/none] 
> {code}
> ImpalaBeeswaxException: EQuery aborted:Admission for query exceeded 
> timeout 6ms in pool default-pool. Queued reason: Not enough admission 
> control slots available on host ... . Needed 1 slots but 18/16 are already in 
> use. Additional Details: Not Applicable
> {code}
> This happened in an ASAN build. Another test also failed which may be related 
> to the cause:
> custom_cluster.test_admission_controller.TestAdmissionController.test_queue_reasons_slots
>  
> {code}
> Timeout: query 'e1410add778cd7b0:c40812b9' did not reach one of the 
> expected states [4], last known state 5
> {code}
> test_queue_reasons_slots seems to be know flaky test: IMPALA-10338



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-13024) Several tests timeout waiting for admission

2024-04-19 Thread Csaba Ringhofer (Jira)

Csaba Ringhofer created IMPALA-13024:


 Summary: Several tests timeout waiting for admission
 Key: IMPALA-13024
 URL: https://issues.apache.org/jira/browse/IMPALA-13024
 Project: IMPALA
  Issue Type: Bug
Reporter: Csaba Ringhofer


A bunch of seemingly unrelated tests failed with the following message:
Example: 
query_test.test_spilling.TestSpillingDebugActionDimensions.test_spilling_aggs[protocol:
 beeswax | exec_option: {'mt_dop': 1, 'debug_action': None, 
'default_spillable_buffer_size': '256k'} | table_format: parquet/none] 
{code}
ImpalaBeeswaxException: EQuery aborted:Admission for query exceeded timeout 
6ms in pool default-pool. Queued reason: Not enough admission control slots 
available on host ... . Needed 1 slots but 18/16 are already in use. Additional 
Details: Not Applicable
{code}

This happened in an ASAN build. Another test also failed which may be related 
to the cause:
custom_cluster.test_admission_controller.TestAdmissionController.test_queue_reasons_slots
 
{code}
Timeout: query 'e1410add778cd7b0:c40812b9' did not reach one of the 
expected states [4], last known state 5
{code}
test_queue_reasons_slots seems to be know flaky test: IMPALA-10338



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-13021) Failed test: test_iceberg_deletes_and_updates_and_optimize

2024-04-19 Thread Csaba Ringhofer (Jira)

Csaba Ringhofer created IMPALA-13021:


 Summary: Failed test: test_iceberg_deletes_and_updates_and_optimize
 Key: IMPALA-13021
 URL: https://issues.apache.org/jira/browse/IMPALA-13021
 Project: IMPALA
  Issue Type: Bug
Reporter: Csaba Ringhofer


{code}
test_iceberg_deletes_and_updates_and_optimize
run_tasks([deleter, updater, optimizer, checker])
stress/stress_util.py:46: in run_tasks
pool.map_async(Task.run, tasks).get(timeout_seconds)
Impala-Toolchain/toolchain-packages-gcc10.4.0/python-2.7.16/lib/python2.7/multiprocessing/pool.py:568:
 in get
raise TimeoutError
E   TimeoutError
{code}
This happened in an exhaustive test run with data cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-5323) Support Kudu BINARY

2024-04-10 Thread Csaba Ringhofer (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer updated IMPALA-5323:

Fix Version/s: Impala 4.4.0

> Support Kudu BINARY
> ---
>
> Key: IMPALA-5323
> URL: https://issues.apache.org/jira/browse/IMPALA-5323
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend
>Reporter: Pavel Martynov
>Assignee: Csaba Ringhofer
>Priority: Major
>  Labels: kudu
> Fix For: Impala 4.4.0
>
>
> I trying to 'CREATE EXTERNAL TABLE STORED AS KUDU' on the table with BINARY 
> Kudu column data type and got an error: Kudu type 'binary' is not supported 
> in Impala.
> This limitation is not documented, checked:
> https://impala.incubator.apache.org/docs/build/html/topics/impala_kudu.html
> https://kudu.apache.org/docs/kudu_impala_integration.html#_known_issues_and_limitations
> There are some thoughts that Kudu BINARY data type may be supported by 
> Impala's STRING data type:
> https://community.cloudera.com/t5/Interactive-Short-cycle-SQL/Does-impala-support-binary-data-type/td-p/24366
> https://groups.google.com/a/cloudera.org/forum/#!msg/impala-user/muguKJU3c3I/_oArmoxSlDMJ



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-5323) Support Kudu BINARY

2024-04-10 Thread Csaba Ringhofer (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer resolved IMPALA-5323.
-
Resolution: Fixed

> Support Kudu BINARY
> ---
>
> Key: IMPALA-5323
> URL: https://issues.apache.org/jira/browse/IMPALA-5323
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend
>Reporter: Pavel Martynov
>Assignee: Csaba Ringhofer
>Priority: Major
>  Labels: kudu
> Fix For: Impala 4.4.0
>
>
> I trying to 'CREATE EXTERNAL TABLE STORED AS KUDU' on the table with BINARY 
> Kudu column data type and got an error: Kudu type 'binary' is not supported 
> in Impala.
> This limitation is not documented, checked:
> https://impala.incubator.apache.org/docs/build/html/topics/impala_kudu.html
> https://kudu.apache.org/docs/kudu_impala_integration.html#_known_issues_and_limitations
> There are some thoughts that Kudu BINARY data type may be supported by 
> Impala's STRING data type:
> https://community.cloudera.com/t5/Interactive-Short-cycle-SQL/Does-impala-support-binary-data-type/td-p/24366
> https://groups.google.com/a/cloudera.org/forum/#!msg/impala-user/muguKJU3c3I/_oArmoxSlDMJ



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Work started] (IMPALA-12990) impala-shell broken if Iceberg delete deletes 0 rows

2024-04-10 Thread Csaba Ringhofer (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-12990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-12990 started by Csaba Ringhofer.

> impala-shell broken if Iceberg delete deletes 0 rows
> 
>
> Key: IMPALA-12990
> URL: https://issues.apache.org/jira/browse/IMPALA-12990
> Project: IMPALA
>  Issue Type: Bug
>  Components: Clients
>Reporter: Csaba Ringhofer
>Assignee: Csaba Ringhofer
>Priority: Major
>  Labels: iceberg
>
> Happens only with Python 3
> {code}
> impala-python3 shell/impala_shell.py
> create table icebergupdatet (i int, s string) stored as iceberg;
> alter table icebergupdatet set tblproperties("format-version"="2");
> delete from icebergupdatet where i=0;
> Unknown Exception : '>' not supported between instances of 'NoneType' and 
> 'int'
> Traceback (most recent call last):
>   File "shell/impala_shell.py", line 1428, in _execute_stmt
> if is_dml and num_rows == 0 and num_deleted_rows > 0:
> TypeError: '>' not supported between instances of 'NoneType' and 'int'
> {code}
> The same erros should also happen when the delete removes > 0 rows, but the 
> impala server has an older version that doesn't set TDmlResult.rows_deleted



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-12990) impala-shell broken if Iceberg delete deletes 0 rows

2024-04-10 Thread Csaba Ringhofer (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-12990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17835793#comment-17835793
 ] 

Csaba Ringhofer commented on IMPALA-12990:
--

https://gerrit.cloudera.org/#/c/21284

> impala-shell broken if Iceberg delete deletes 0 rows
> 
>
> Key: IMPALA-12990
> URL: https://issues.apache.org/jira/browse/IMPALA-12990
> Project: IMPALA
>  Issue Type: Bug
>  Components: Clients
>Reporter: Csaba Ringhofer
>Priority: Major
>  Labels: iceberg
>
> Happens only with Python 3
> {code}
> impala-python3 shell/impala_shell.py
> create table icebergupdatet (i int, s string) stored as iceberg;
> alter table icebergupdatet set tblproperties("format-version"="2");
> delete from icebergupdatet where i=0;
> Unknown Exception : '>' not supported between instances of 'NoneType' and 
> 'int'
> Traceback (most recent call last):
>   File "shell/impala_shell.py", line 1428, in _execute_stmt
> if is_dml and num_rows == 0 and num_deleted_rows > 0:
> TypeError: '>' not supported between instances of 'NoneType' and 'int'
> {code}
> The same erros should also happen when the delete removes > 0 rows, but the 
> impala server has an older version that doesn't set TDmlResult.rows_deleted



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-12990) impala-shell broken if Iceberg delete deletes 0 rows

2024-04-10 Thread Csaba Ringhofer (Jira)

Csaba Ringhofer created IMPALA-12990:


 Summary: impala-shell broken if Iceberg delete deletes 0 rows
 Key: IMPALA-12990
 URL: https://issues.apache.org/jira/browse/IMPALA-12990
 Project: IMPALA
  Issue Type: Bug
  Components: Clients
Reporter: Csaba Ringhofer


Happens only with Python 3
{code}
impala-python3 shell/impala_shell.py

create table icebergupdatet (i int, s string) stored as iceberg;
alter table icebergupdatet set tblproperties("format-version"="2");
delete from icebergupdatet where i=0;
Unknown Exception : '>' not supported between instances of 'NoneType' and 'int'
Traceback (most recent call last):
  File "shell/impala_shell.py", line 1428, in _execute_stmt
if is_dml and num_rows == 0 and num_deleted_rows > 0:
TypeError: '>' not supported between instances of 'NoneType' and 'int'
{code}

The same erros should also happen when the delete removes > 0 rows, but the 
impala server has an older version that doesn't set TDmlResult.rows_deleted



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-12987) Errors with \0 character in partition values

2024-04-09 Thread Csaba Ringhofer (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-12987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer updated IMPALA-12987:
-
Description: 
Inserting strings with "\0" values to partition columns leads errors both in 
Iceberg and Hive tables. 

The issue is more severe in Iceberg tables as from this point the table can't 
be read in Impala or Hive:
{code}
create table iceberg_unicode (s string, p string) partitioned by spec 
(identity(p)) stored as iceberg;
insert into iceberg_unicode select "a", "a\0a";
ERROR: IcebergTableLoadingException: Error loading metadata for Iceberg table 
hdfs://localhost:20500/test-warehouse/iceberg_unicode
CAUSED BY: TableLoadingException: Refreshing file and block metadata for 1 
paths for table default.iceberg_unicode: failed to load 1 paths. Check the 
catalog server log for more details.
{code}

The partition directory created above seems truncated:
hdfs://localhost:20500/test-warehouse/iceberg_unicode/data/p=a

In partition Hive tables the insert also returns an error, but the new 
partition is not created and the table remains usable. The error is similar to 
IMPALA-11499's

Note that Java handles  \0 characters in unicode in a special way, which may be 
related: 
https://docs.oracle.com/javase/1.5.0/docs/guide/jni/spec/types.html#wp16542


  was:
Inserting strings with "\0" values to partition columns leads errors both in 
Iceberg and Hive tables. 

The issue is more severe in Iceberg tables as from this point the table can't 
be read in Impala or Hive:
{code}
create table iceberg_unicode (s string, p string) partitioned by spec 
(identity(p)) stored as iceberg;
insert into iceberg_unicode select "a", "a\0a";
ERROR: IcebergTableLoadingException: Error loading metadata for Iceberg table 
hdfs://localhost:20500/test-warehouse/iceberg_unicode
CAUSED BY: TableLoadingException: Refreshing file and block metadata for 1 
paths for table default.iceberg_unicode: failed to load 1 paths. Check the 
catalog server log for more details.
{code}

The partition directory created above seems truncated:
hdfs://localhost:20500/test-warehouse/iceberg_unicode/data/p=a

In partition Hive tables the insert also returns an error, but the new 
partition is not created and the table remains usable. The error is similar to 
IMPALA-11499's

Note Java handles  \0 characters in unicode in a special way, which may be 
related: 
https://docs.oracle.com/javase/1.5.0/docs/guide/jni/spec/types.html#wp16542



> Errors with \0 character in partition values
> 
>
> Key: IMPALA-12987
> URL: https://issues.apache.org/jira/browse/IMPALA-12987
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Csaba Ringhofer
>Priority: Critical
>  Labels: iceberg
>
> Inserting strings with "\0" values to partition columns leads errors both in 
> Iceberg and Hive tables. 
> The issue is more severe in Iceberg tables as from this point the table can't 
> be read in Impala or Hive:
> {code}
> create table iceberg_unicode (s string, p string) partitioned by spec 
> (identity(p)) stored as iceberg;
> insert into iceberg_unicode select "a", "a\0a";
> ERROR: IcebergTableLoadingException: Error loading metadata for Iceberg table 
> hdfs://localhost:20500/test-warehouse/iceberg_unicode
> CAUSED BY: TableLoadingException: Refreshing file and block metadata for 1 
> paths for table default.iceberg_unicode: failed to load 1 paths. Check the 
> catalog server log for more details.
> {code}
> The partition directory created above seems truncated:
> hdfs://localhost:20500/test-warehouse/iceberg_unicode/data/p=a
> In partition Hive tables the insert also returns an error, but the new 
> partition is not created and the table remains usable. The error is similar 
> to IMPALA-11499's
> Note that Java handles  \0 characters in unicode in a special way, which may 
> be related: 
> https://docs.oracle.com/javase/1.5.0/docs/guide/jni/spec/types.html#wp16542



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-12987) Errors with \0 character in partition values

2024-04-09 Thread Csaba Ringhofer (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-12987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer updated IMPALA-12987:
-
Description: 
Inserting strings with "\0" values to partition columns leads errors both in 
Iceberg and Hive tables. 

The issue is more severe in Iceberg tables as from this point the table can't 
be read in Impala or Hive:
{code}
create table iceberg_unicode (s string, p string) partitioned by spec 
(identity(p)) stored as iceberg;
insert into iceberg_unicode select "a", "a\0a";
ERROR: IcebergTableLoadingException: Error loading metadata for Iceberg table 
hdfs://localhost:20500/test-warehouse/iceberg_unicode
CAUSED BY: TableLoadingException: Refreshing file and block metadata for 1 
paths for table default.iceberg_unicode: failed to load 1 paths. Check the 
catalog server log for more details.
{code}

The partition directory created above seems truncated:
hdfs://localhost:20500/test-warehouse/iceberg_unicode/data/p=a

In partition Hive tables the insert also returns an error, but the new 
partition is not created and the table remains usable. The error is similar to 
IMPALA-11499's

Note Java handles  \0 characters in unicode in a special way, which may be 
related: 
https://docs.oracle.com/javase/1.5.0/docs/guide/jni/spec/types.html#wp16542


  was:
Inserting strings with "\0" values to partition columns leads errors both in 
Iceberg and Hive tables. 

The issue is more severe in Iceberg tables as from this point the table can't 
be read in Impala or Hive:
{code}
create table iceberg_unicode (s string, p string) partitioned by spec 
(identity(p)) stored as iceberg;
insert into iceberg_unicode select "a", "a\0a";
ERROR: IcebergTableLoadingException: Error loading metadata for Iceberg table 
hdfs://localhost:20500/test-warehouse/iceberg_unicode
CAUSED BY: TableLoadingException: Refreshing file and block metadata for 1 
paths for table default.iceberg_unicode: failed to load 1 paths. Check the 
catalog server log for more details.
{code}

In partition Hive tables the insert also returns an error, but the new 
partition is not created and the table remains usable. The error is similar to 
IMPALA-11499's



> Errors with \0 character in partition values
> 
>
> Key: IMPALA-12987
> URL: https://issues.apache.org/jira/browse/IMPALA-12987
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Csaba Ringhofer
>Priority: Critical
>  Labels: iceberg
>
> Inserting strings with "\0" values to partition columns leads errors both in 
> Iceberg and Hive tables. 
> The issue is more severe in Iceberg tables as from this point the table can't 
> be read in Impala or Hive:
> {code}
> create table iceberg_unicode (s string, p string) partitioned by spec 
> (identity(p)) stored as iceberg;
> insert into iceberg_unicode select "a", "a\0a";
> ERROR: IcebergTableLoadingException: Error loading metadata for Iceberg table 
> hdfs://localhost:20500/test-warehouse/iceberg_unicode
> CAUSED BY: TableLoadingException: Refreshing file and block metadata for 1 
> paths for table default.iceberg_unicode: failed to load 1 paths. Check the 
> catalog server log for more details.
> {code}
> The partition directory created above seems truncated:
> hdfs://localhost:20500/test-warehouse/iceberg_unicode/data/p=a
> In partition Hive tables the insert also returns an error, but the new 
> partition is not created and the table remains usable. The error is similar 
> to IMPALA-11499's
> Note Java handles  \0 characters in unicode in a special way, which may be 
> related: 
> https://docs.oracle.com/javase/1.5.0/docs/guide/jni/spec/types.html#wp16542



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-12987) Errors with \0 character in partition values

2024-04-09 Thread Csaba Ringhofer (Jira)

Csaba Ringhofer created IMPALA-12987:


 Summary: Errors with \0 character in partition values
 Key: IMPALA-12987
 URL: https://issues.apache.org/jira/browse/IMPALA-12987
 Project: IMPALA
  Issue Type: Bug
Reporter: Csaba Ringhofer


Inserting strings with "\0" values to partition columns leads errors both in 
Iceberg and Hive tables. 

The issue issue more severe in Iceberg tables as from this point the table 
can't be read in Impala or Hive:
{code}
 create table iceberg_unicode (s string, p string) partitioned by spec 
(identity(p)) stored as iceberg;
insert into iceberg_unicode select "a", "a\0a";
ERROR: IcebergTableLoadingException: Error loading metadata for Iceberg table 
hdfs://localhost:20500/test-warehouse/iceberg_unicode
CAUSED BY: TableLoadingException: Refreshing file and block metadata for 1 
paths for table default.iceberg_unicode: failed to load 1 paths. Check the 
catalog server log for more details.
{code}

In partition Hive tables the insert also returns an error, but the new 
partition is not created and the table remains usable. The error is similare to 
IMPALA-11499's




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-12987) Errors with \0 character in partition values

2024-04-09 Thread Csaba Ringhofer (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-12987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer updated IMPALA-12987:
-
Description: 
Inserting strings with "\0" values to partition columns leads errors both in 
Iceberg and Hive tables. 

The issue is more severe in Iceberg tables as from this point the table can't 
be read in Impala or Hive:
{code}
create table iceberg_unicode (s string, p string) partitioned by spec 
(identity(p)) stored as iceberg;
insert into iceberg_unicode select "a", "a\0a";
ERROR: IcebergTableLoadingException: Error loading metadata for Iceberg table 
hdfs://localhost:20500/test-warehouse/iceberg_unicode
CAUSED BY: TableLoadingException: Refreshing file and block metadata for 1 
paths for table default.iceberg_unicode: failed to load 1 paths. Check the 
catalog server log for more details.
{code}

In partition Hive tables the insert also returns an error, but the new 
partition is not created and the table remains usable. The error is similar to 
IMPALA-11499's


  was:
Inserting strings with "\0" values to partition columns leads errors both in 
Iceberg and Hive tables. 

The issue issue more severe in Iceberg tables as from this point the table 
can't be read in Impala or Hive:
{code}
 create table iceberg_unicode (s string, p string) partitioned by spec 
(identity(p)) stored as iceberg;
insert into iceberg_unicode select "a", "a\0a";
ERROR: IcebergTableLoadingException: Error loading metadata for Iceberg table 
hdfs://localhost:20500/test-warehouse/iceberg_unicode
CAUSED BY: TableLoadingException: Refreshing file and block metadata for 1 
paths for table default.iceberg_unicode: failed to load 1 paths. Check the 
catalog server log for more details.
{code}

In partition Hive tables the insert also returns an error, but the new 
partition is not created and the table remains usable. The error is similare to 
IMPALA-11499's



> Errors with \0 character in partition values
> 
>
> Key: IMPALA-12987
> URL: https://issues.apache.org/jira/browse/IMPALA-12987
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Csaba Ringhofer
>Priority: Critical
>  Labels: iceberg
>
> Inserting strings with "\0" values to partition columns leads errors both in 
> Iceberg and Hive tables. 
> The issue is more severe in Iceberg tables as from this point the table can't 
> be read in Impala or Hive:
> {code}
> create table iceberg_unicode (s string, p string) partitioned by spec 
> (identity(p)) stored as iceberg;
> insert into iceberg_unicode select "a", "a\0a";
> ERROR: IcebergTableLoadingException: Error loading metadata for Iceberg table 
> hdfs://localhost:20500/test-warehouse/iceberg_unicode
> CAUSED BY: TableLoadingException: Refreshing file and block metadata for 1 
> paths for table default.iceberg_unicode: failed to load 1 paths. Check the 
> catalog server log for more details.
> {code}
> In partition Hive tables the insert also returns an error, but the new 
> partition is not created and the table remains usable. The error is similar 
> to IMPALA-11499's



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-12969) DeserializeThriftMsg may leak JNI resources

2024-04-08 Thread Csaba Ringhofer (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-12969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer updated IMPALA-12969:
-
Priority: Critical  (was: Major)

> DeserializeThriftMsg may leak JNI resources
> ---
>
> Key: IMPALA-12969
> URL: https://issues.apache.org/jira/browse/IMPALA-12969
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Csaba Ringhofer
>Priority: Critical
> Fix For: Impala 4.4.0
>
>
> JNI's GetByteArrayElements should be followed by a ReleaseByteArrayElements 
> call, but this is not done in case there is an error during deserialization:
> [https://github.com/apache/impala/blob/f05eac647647b5e03c3aafc35f785c73d07e2658/be/src/rpc/jni-thrift-util.h#L66]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-12969) DeserializeThriftMsg may leak JNI resources

2024-04-08 Thread Csaba Ringhofer (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-12969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer resolved IMPALA-12969.
--
Fix Version/s: Impala 4.4.0
   Resolution: Fixed

> DeserializeThriftMsg may leak JNI resources
> ---
>
> Key: IMPALA-12969
> URL: https://issues.apache.org/jira/browse/IMPALA-12969
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Csaba Ringhofer
>Priority: Major
> Fix For: Impala 4.4.0
>
>
> JNI's GetByteArrayElements should be followed by a ReleaseByteArrayElements 
> call, but this is not done in case there is an error during deserialization:
> [https://github.com/apache/impala/blob/f05eac647647b5e03c3aafc35f785c73d07e2658/be/src/rpc/jni-thrift-util.h#L66]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-12978) IMPALA-12544 made impala-shell incompatible with old impala servers

2024-04-08 Thread Csaba Ringhofer (Jira)

Csaba Ringhofer created IMPALA-12978:


 Summary: IMPALA-12544 made impala-shell incompatible with old 
impala servers
 Key: IMPALA-12978
 URL: https://issues.apache.org/jira/browse/IMPALA-12978
 Project: IMPALA
  Issue Type: Bug
  Components: Clients
Reporter: Csaba Ringhofer


IMPALA-12544 uses  "progress.total_fragment_instances > 0:", but 
total_fragment_instances is None if the server is older and does not know this 
Thrift member yet (added in IMPALA-12048). 
[https://github.com/apache/impala/blob/fb3c379f395635f9f6927b40694bc3dd95a2866f/shell/impala_shell.py#L1320]

 

This leads to error messages in interactive shell sessions when progress 
reporting is enabled.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-12969) DeserializeThriftMsg may leak JNI resources

2024-04-03 Thread Csaba Ringhofer (Jira)

Csaba Ringhofer created IMPALA-12969:


 Summary: DeserializeThriftMsg may leak JNI resources
 Key: IMPALA-12969
 URL: https://issues.apache.org/jira/browse/IMPALA-12969
 Project: IMPALA
  Issue Type: Bug
Reporter: Csaba Ringhofer


JNI's GetByteArrayElements should be followed by a ReleaseByteArrayElements 
call, but this is not done in case there is an error during deserialization:

[https://github.com/apache/impala/blob/f05eac647647b5e03c3aafc35f785c73d07e2658/be/src/rpc/jni-thrift-util.h#L66]

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-12968) Early EndDataStream RPC could be responded earlier

2024-04-02 Thread Csaba Ringhofer (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-12968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer updated IMPALA-12968:
-
Description: 
When a producer fragment sends no rows and finishes before the receiver is 
initialized the EndDataStream rpc is stored as early sender and is responded 
when the receiver is registered.

[https://github.com/apache/impala/blob/effc9df933b46eb5b0acf55a858606415425505f/be/src/runtime/krpc-data-stream-mgr.cc#L150]

While it is important to store the information that the EOS has happened to 
unregister the sender from the receiver, the RPC itself could be responded 
right after it was stored in the early sender map.

  was:
When a producer fragment sends no rows and finishes before the receiver is 
initialized te e EndDataStream rpc is stored as early sender and is responded 
when the receiver is registered.

[https://github.com/apache/impala/blob/effc9df933b46eb5b0acf55a858606415425505f/be/src/runtime/krpc-data-stream-mgr.cc#L150]

While it is important to store the information that the EOS has happened to 
unregister the sender from the receiver, the RPC itself could be responded 
right after it was stored in the early sender map.


> Early EndDataStream RPC could be responded earlier
> --
>
> Key: IMPALA-12968
> URL: https://issues.apache.org/jira/browse/IMPALA-12968
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Csaba Ringhofer
>Priority: Minor
>  Labels: krpc
>
> When a producer fragment sends no rows and finishes before the receiver is 
> initialized the EndDataStream rpc is stored as early sender and is responded 
> when the receiver is registered.
> [https://github.com/apache/impala/blob/effc9df933b46eb5b0acf55a858606415425505f/be/src/runtime/krpc-data-stream-mgr.cc#L150]
> While it is important to store the information that the EOS has happened to 
> unregister the sender from the receiver, the RPC itself could be responded 
> right after it was stored in the early sender map.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-12968) Early EndDataStream RPC could be responded earlier

2024-04-02 Thread Csaba Ringhofer (Jira)

Csaba Ringhofer created IMPALA-12968:


 Summary: Early EndDataStream RPC could be responded earlier
 Key: IMPALA-12968
 URL: https://issues.apache.org/jira/browse/IMPALA-12968
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Reporter: Csaba Ringhofer


When a producer fragment sends no rows and finishes before the receiver is 
initialized te e EndDataStream rpc is stored as early sender and is responded 
when the receiver is registered.

[https://github.com/apache/impala/blob/effc9df933b46eb5b0acf55a858606415425505f/be/src/runtime/krpc-data-stream-mgr.cc#L150]

While it is important to store the information that the EOS has happened to 
unregister the sender from the receiver, the RPC itself could be responded 
right after it was stored in the early sender map.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Comment Edited] (IMPALA-10349) Revisit constant folding on non-ASCII strings

2024-03-25 Thread Csaba Ringhofer (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-10349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17830545#comment-17830545
 ] 

Csaba Ringhofer edited comment on IMPALA-10349 at 3/25/24 3:55 PM:
---

Also bumped into this related to pushing down to Kudu:
{code:java}
explain select count(*) from functional_kudu.alltypes where string_col = "á";

-- kudu predicates: string_col = 'á'

explain select count(*) from functional_kudu.alltypes where string_col = 
concat("a", "")

-- kudu predicates: string_col = 'a'

explain select count(*) from functional_kudu.alltypes where string_col = 
concat("á", "")

-- not pushed down to Kudu:

-- predicates: string_col = concat('á', '') 

{code}
>I think we should allow folding non-ASCII strings if they are legal UTF-8 
>strings.

[~stigahuang]  Do you why is it not possible to fold strings that are not valid 
UTF-8?

Currently BINARY columns also use StringLiterals, a.g cast("a" as binary) will 
be folded to a StringLiteral. It would be useful to also fold expressions like 
cast(unhex("aa")  as binary) to be able to push them down to Kudu.


was (Author: csringhofer):
Also bumped into this related to pushing down to Kudu:

{code}

explain select count(*) from functional_kudu.alltypes where string_col = "á";

-- kudu predicates: string_col = 'á'

explain select count(*) from functional_kudu.alltypes where string_col = 
concat("a", "")

-- kudu predicates: string_col = 'a'

explain select count(*) from functional_kudu.alltypes where string_col = 
concat("á", "")

-- not pushed down to Kudu:

-- predicates: string_col = concat('á', '') 

{code} 

>I think we should allow folding non-ASCII strings if they are legal UTF-8 
>strings.

[~stigahuang]  Do you why is it not possible to fold strings that are not valid 
UTF-8?

Currently BINARY columns also use StringLiterals, a.g cast("a" as binary) will 
be folded to a StringLiteral. It would be useful to also fold expressions like 
cast(unhex("aa")  as binary) to be able to push them down to Kudu.

 

> Revisit constant folding on non-ASCII strings
> -
>
> Key: IMPALA-10349
> URL: https://issues.apache.org/jira/browse/IMPALA-10349
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Reporter: Quanlong Huang
>Priority: Critical
>
> Constant folding may produce non-ASCII strings. In such cases, we currently 
> abandon folding the constant. See commit message of IMPALA-1788 or codes 
> here: 
> [https://github.com/apache/impala/blob/9672d945963e1ca3c8699340f92d7d6ce1d91c9f/fe/src/main/java/org/apache/impala/analysis/LiteralExpr.java#L274-L282]
> I think we should allow folding non-ASCII strings if they are legal UTF-8 
> strings.
> Example of constant folding work:
> {code:java}
> Query: explain select * from functional.alltypes where string_col = 
> substr('123', 1, 1)
> +-+
> | Explain String  |
> +-+
> | Max Per-Host Resource Reservation: Memory=32.00KB Threads=3 |
> | Per-Host Resource Estimates: Memory=160MB   |
> | Codegen disabled by planner |
> | |
> | PLAN-ROOT SINK  |
> | |   |
> | 01:EXCHANGE [UNPARTITIONED] |
> | |   |
> | 00:SCAN HDFS [functional.alltypes]  |
> |HDFS partitions=24/24 files=24 size=478.45KB |
> |predicates: string_col = '1' |
> |row-size=89B cardinality=730 |
> +-+
> {code}
> Example of constant folding doesn't work:
> {code:java}
> Query: explain select * from functional.alltypes where string_col = 
> substr('引擎', 1, 3)
> +-+
> | Explain String  |
> +-+
> | Max Per-Host Resource Reservation: Memory=32.00KB Threads=3 |
> | Per-Host Resource Estimates: Memory=160MB   |
> | Codegen disabled by planner |
> | |
> | PLAN-ROOT SINK  |
> | |   |
> | 01:EXCHANGE [UNPARTITIONED] |
> | |   |
> | 00:S

[jira] [Comment Edited] (IMPALA-10349) Revisit constant folding on non-ASCII strings

2024-03-25 Thread Csaba Ringhofer (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-10349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17830545#comment-17830545
 ] 

Csaba Ringhofer edited comment on IMPALA-10349 at 3/25/24 3:55 PM:
---

Also bumped into this related to pushing down to Kudu:
{code:java}
explain select count(*) from functional_kudu.alltypes where string_col = "á";

-- kudu predicates: string_col = 'á'

explain select count(*) from functional_kudu.alltypes where string_col = 
concat("a", "")

-- kudu predicates: string_col = 'a'

explain select count(*) from functional_kudu.alltypes where string_col = 
concat("á", "")

-- not pushed down to Kudu:

-- predicates: string_col = concat('á', '') 

{code}
>I think we should allow folding non-ASCII strings if they are legal UTF-8 
>strings.

[~stigahuang]  Do you know why is it not possible to fold strings that are not 
valid UTF-8?

Currently BINARY columns also use StringLiterals, a.g cast("a" as binary) will 
be folded to a StringLiteral. It would be useful to also fold expressions like 
cast(unhex("aa")  as binary) to be able to push them down to Kudu.


was (Author: csringhofer):
Also bumped into this related to pushing down to Kudu:
{code:java}
explain select count(*) from functional_kudu.alltypes where string_col = "á";

-- kudu predicates: string_col = 'á'

explain select count(*) from functional_kudu.alltypes where string_col = 
concat("a", "")

-- kudu predicates: string_col = 'a'

explain select count(*) from functional_kudu.alltypes where string_col = 
concat("á", "")

-- not pushed down to Kudu:

-- predicates: string_col = concat('á', '') 

{code}
>I think we should allow folding non-ASCII strings if they are legal UTF-8 
>strings.

[~stigahuang]  Do you why is it not possible to fold strings that are not valid 
UTF-8?

Currently BINARY columns also use StringLiterals, a.g cast("a" as binary) will 
be folded to a StringLiteral. It would be useful to also fold expressions like 
cast(unhex("aa")  as binary) to be able to push them down to Kudu.

> Revisit constant folding on non-ASCII strings
> -
>
> Key: IMPALA-10349
> URL: https://issues.apache.org/jira/browse/IMPALA-10349
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Reporter: Quanlong Huang
>Priority: Critical
>
> Constant folding may produce non-ASCII strings. In such cases, we currently 
> abandon folding the constant. See commit message of IMPALA-1788 or codes 
> here: 
> [https://github.com/apache/impala/blob/9672d945963e1ca3c8699340f92d7d6ce1d91c9f/fe/src/main/java/org/apache/impala/analysis/LiteralExpr.java#L274-L282]
> I think we should allow folding non-ASCII strings if they are legal UTF-8 
> strings.
> Example of constant folding work:
> {code:java}
> Query: explain select * from functional.alltypes where string_col = 
> substr('123', 1, 1)
> +-+
> | Explain String  |
> +-+
> | Max Per-Host Resource Reservation: Memory=32.00KB Threads=3 |
> | Per-Host Resource Estimates: Memory=160MB   |
> | Codegen disabled by planner |
> | |
> | PLAN-ROOT SINK  |
> | |   |
> | 01:EXCHANGE [UNPARTITIONED] |
> | |   |
> | 00:SCAN HDFS [functional.alltypes]  |
> |HDFS partitions=24/24 files=24 size=478.45KB |
> |predicates: string_col = '1' |
> |row-size=89B cardinality=730 |
> +-+
> {code}
> Example of constant folding doesn't work:
> {code:java}
> Query: explain select * from functional.alltypes where string_col = 
> substr('引擎', 1, 3)
> +-+
> | Explain String  |
> +-+
> | Max Per-Host Resource Reservation: Memory=32.00KB Threads=3 |
> | Per-Host Resource Estimates: Memory=160MB   |
> | Codegen disabled by planner |
> | |
> | PLAN-ROOT SINK  |
> | |   |
> | 01:EXCHANGE [UNPARTITIONED] |
> | |   |
> | 0

[jira] [Commented] (IMPALA-10349) Revisit constant folding on non-ASCII strings

2024-03-25 Thread Csaba Ringhofer (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-10349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17830545#comment-17830545
 ] 

Csaba Ringhofer commented on IMPALA-10349:
--

Also bumped into this related to pushing down to Kudu:

{code}

explain select count(*) from functional_kudu.alltypes where string_col = "á";

-- kudu predicates: string_col = 'á'

explain select count(*) from functional_kudu.alltypes where string_col = 
concat("a", "")

-- kudu predicates: string_col = 'a'

explain select count(*) from functional_kudu.alltypes where string_col = 
concat("á", "")

-- not pushed down to Kudu:

-- predicates: string_col = concat('á', '') 

{code} 

>I think we should allow folding non-ASCII strings if they are legal UTF-8 
>strings.

[~stigahuang]  Do you why is it not possible to fold strings that are not valid 
UTF-8?

Currently BINARY columns also use StringLiterals, a.g cast("a" as binary) will 
be folded to a StringLiteral. It would be useful to also fold expressions like 
cast(unhex("aa")  as binary) to be able to push them down to Kudu.

 

> Revisit constant folding on non-ASCII strings
> -
>
> Key: IMPALA-10349
> URL: https://issues.apache.org/jira/browse/IMPALA-10349
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Reporter: Quanlong Huang
>Priority: Critical
>
> Constant folding may produce non-ASCII strings. In such cases, we currently 
> abandon folding the constant. See commit message of IMPALA-1788 or codes 
> here: 
> [https://github.com/apache/impala/blob/9672d945963e1ca3c8699340f92d7d6ce1d91c9f/fe/src/main/java/org/apache/impala/analysis/LiteralExpr.java#L274-L282]
> I think we should allow folding non-ASCII strings if they are legal UTF-8 
> strings.
> Example of constant folding work:
> {code:java}
> Query: explain select * from functional.alltypes where string_col = 
> substr('123', 1, 1)
> +-+
> | Explain String  |
> +-+
> | Max Per-Host Resource Reservation: Memory=32.00KB Threads=3 |
> | Per-Host Resource Estimates: Memory=160MB   |
> | Codegen disabled by planner |
> | |
> | PLAN-ROOT SINK  |
> | |   |
> | 01:EXCHANGE [UNPARTITIONED] |
> | |   |
> | 00:SCAN HDFS [functional.alltypes]  |
> |HDFS partitions=24/24 files=24 size=478.45KB |
> |predicates: string_col = '1' |
> |row-size=89B cardinality=730 |
> +-+
> {code}
> Example of constant folding doesn't work:
> {code:java}
> Query: explain select * from functional.alltypes where string_col = 
> substr('引擎', 1, 3)
> +-+
> | Explain String  |
> +-+
> | Max Per-Host Resource Reservation: Memory=32.00KB Threads=3 |
> | Per-Host Resource Estimates: Memory=160MB   |
> | Codegen disabled by planner |
> | |
> | PLAN-ROOT SINK  |
> | |   |
> | 01:EXCHANGE [UNPARTITIONED] |
> | |   |
> | 00:SCAN HDFS [functional.alltypes]  |
> |HDFS partitions=24/24 files=24 size=478.45KB |
> |predicates: string_col = substr('引擎', 1, 3)|
> |row-size=89B cardinality=730 |
> +-+
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-12927) Support reading BINARY columns in JSON tables

2024-03-22 Thread Csaba Ringhofer (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-12927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17829953#comment-17829953
 ] 

Csaba Ringhofer commented on IMPALA-12927:
--

I think that the best would be to check tbl property "json.binary.format":
 * if not set, give a clear error message
 * if base64, do base64 decoding
 * if rawstring, handle it the way Hive does: 
[https://github.com/apache/hive/blame/f216bbb632752f467321869cee03adf9477409cf/serde/src/java/org/apache/hadoop/hive/serde2/json/HiveJsonReader.java#L455]

Note that I am don't know how exactly special characters are handled in the 
rawstring case.

> Support reading BINARY columns in JSON tables
> -
>
> Key: IMPALA-12927
> URL: https://issues.apache.org/jira/browse/IMPALA-12927
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Csaba Ringhofer
>Assignee: Zihao Ye
>Priority: Major
>
> Currently Impala cannot read BINARY columns in JSON files written by Hive 
> correctly and returns runtime errors:
> {code}
> select * from functional_json.binary_tbl;
> ++--++
> | id | string_col   | binary_col |
> ++--++
> | 1  | ascii        | NULL       |
> | 2  | ascii        | NULL       |
> | 3  | null         | NULL       |
> | 4  | empty        |            |
> | 5  | valid utf8   | NULL       |
> | 6  | valid utf8   | NULL       |
> | 7  | invalid utf8 | NULL       |
> | 8  | invalid utf8 | NULL       |
> ++--++
> WARNINGS: Error converting column: functional_json.binary_tbl.binary_col, 
> type: STRING, data: 'binary1'
> Error parsing row: file: 
> hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before 
> offset: 481
> Error converting column: functional_json.binary_tbl.binary_col, type: STRING, 
> data: 'binary2'
> Error parsing row: file: 
> hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before 
> offset: 481
> Error converting column: functional_json.binary_tbl.binary_col, type: STRING, 
> data: 'árvíztűrőtükörfúró'
> Error parsing row: file: 
> hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before 
> offset: 481
> Error converting column: functional_json.binary_tbl.binary_col, type: STRING, 
> data: '你好hello'
> Error parsing row: file: 
> hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before 
> offset: 481
> Error converting column: functional_json.binary_tbl.binary_col, type: STRING, 
> data: '��'
> Error parsing row: file: 
> hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before 
> offset: 481
> Error converting column: functional_json.binary_tbl.binary_col, type: STRING, 
> data: '�D3"'
> Error parsing row: file: 
> hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before 
> offset: 481
> {code}
> The single file in the table looks like this:
> {code}
>  hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0
> {"id":1,"string_col":"ascii","binary_col":"binary1"}
> {"id":2,"string_col":"ascii","binary_col":"binary2"}
> {"id":3,"string_col":"null","binary_col":null}
> {"id":4,"string_col":"empty","binary_col":""}
> {"id":5,"string_col":"valid utf8","binary_col":"árvíztűrőtükörfúró"}
> {"id":6,"string_col":"valid utf8","binary_col":"你好hello"}
> {"id":7,"string_col":"invalid utf8","binary_col":"\u�\u�"}
> {"id":8,"string_col":"invalid utf8","binary_col":"�D3\"\u0011\u"}
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Comment Edited] (IMPALA-12927) Support reading BINARY columns in JSON tables

2024-03-21 Thread Csaba Ringhofer (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-12927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17829614#comment-17829614
 ] 

Csaba Ringhofer edited comment on IMPALA-12927 at 3/21/24 3:47 PM:
---

[~Eyizoha]  About AuxColumnType: fyi there is an ongoing refactor to remove 
that class and make it easier to decided whether a column is STRING or BINARY: 
[https://gerrit.cloudera.org/#/c/21157/]

About encoding of BINARY columns: I looked at the Hive code, but it doesn't 
match with the encoding I see in the files.

[https://github.com/apache/hive/blob/9a0ce4e15890aa91f05322e845438e1e8830b1c3/serde/src/java/org/apache/hadoop/hive/serde2/JsonSerDe.java#L135]

Current Apache Hive seems to default to using base64 encoding, while it can be 
altered with tbl property "json.binary.format". In the JSON tables in Impala's 
dataload the files are certainly not base64 encoded and "json.binary.format" is 
also not set, so it doesn't seem to work like the current Hive codebase. It is 
possible that this is related to differences between Apache Impala's Hive 
dependency and current Apache Hive.

Currently Impala base64 decodes the BINARY columns:
{code:java}
Hive:

create table tjsonbinary (string s, binary b) stored as JSONFILE;

insert into tjsonbinary values ("abcd", base64(cast("abcd" as binary)));

Impala:

select * from tjsonbinary;

+--+--+
| s    | b    |
+--+--+
| abcd | abcd |
+--+--+

{code}
What do you think about disabling BINARY column reading in JSON until Hive 
compatibility is clarified? My concern is that besides error messages and 
nulled values this may actually lead to correctness issues as many strings are 
both valid utf8 strings and base64 strings, so Impala may return unintended 
results.


was (Author: csringhofer):
[~Eyizoha]  About AuxColumnType: fyi is there is an ongoing refactor to remove 
that class and make it easier to decided whether a column is STRING or BINARY: 
[https://gerrit.cloudera.org/#/c/21157/]

About encoding of BINARY columns: I looked at the Hive code, but it doesn't 
match with the encoding I see in the files.

[https://github.com/apache/hive/blob/9a0ce4e15890aa91f05322e845438e1e8830b1c3/serde/src/java/org/apache/hadoop/hive/serde2/JsonSerDe.java#L135]

Current Apache Hive seems to default to using base64 encoding, while it can be 
altered with tbl property "json.binary.format". In the JSON tables in Impala's 
dataload the files are certainly not base64 encoded and "json.binary.format" is 
also not set, so it doesn't seem to work like the current Hive codebase. It is 
possible that this is related to differences between Apache Impala's Hive 
dependency and current Apache Hive.

Currently Impala base64 decodes the BINARY columns:

{code}

Hive:

create table tjsonbinary (string s, binary b) stored as JSONFILE;

insert into tjsonbinary values ("abcd", base64(cast("abcd" as binary)));

Impala:

select * from tjsonbinary;

+--+--+
| s    | b    |
+--+--+
| abcd | abcd |
+--+--+

{code}

What do you think about disabling BINARY column reading in JSON until Hive 
compatibility is clarified? My concern is that besides error messages and 
nulled values this may actually lead to correctness issues as many strings are 
both valid utf8 strings and base64 strings, so Impala may return unintended 
results.

> Support reading BINARY columns in JSON tables
> -
>
> Key: IMPALA-12927
> URL: https://issues.apache.org/jira/browse/IMPALA-12927
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Csaba Ringhofer
>Assignee: Zihao Ye
>Priority: Major
>
> Currently Impala cannot read BINARY columns in JSON files written by Hive 
> correctly and returns runtime errors:
> {code}
> select * from functional_json.binary_tbl;
> ++--++
> | id | string_col   | binary_col |
> ++--++
> | 1  | ascii        | NULL       |
> | 2  | ascii        | NULL       |
> | 3  | null         | NULL       |
> | 4  | empty        |            |
> | 5  | valid utf8   | NULL       |
> | 6  | valid utf8   | NULL       |
> | 7  | invalid utf8 | NULL       |
> | 8  | invalid utf8 | NULL       |
> ++--++
> WARNINGS: Error converting column: functional_json.binary_tbl.binary_col, 
> type: STRING, data: 'binary1'
> Error parsing row: file: 
> hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before 
> offset: 481
> Error converting column: functional_json.binary_tbl.binary_col, type: STRING, 
> data: 'binary2'
> Error parsing row: file: 
> hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before 
> offset: 481
> Error converting column: functional_json.binary_tbl.binary_col, type: STRING, 
> data: 'á

[jira] [Commented] (IMPALA-12927) Support reading BINARY columns in JSON tables

2024-03-21 Thread Csaba Ringhofer (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-12927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17829614#comment-17829614
 ] 

Csaba Ringhofer commented on IMPALA-12927:
--

[~Eyizoha]  About AuxColumnType: fyi is there is an ongoing refactor to remove 
that class and make it easier to decided whether a column is STRING or BINARY: 
[https://gerrit.cloudera.org/#/c/21157/]

About encoding of BINARY columns: I looked at the Hive code, but it doesn't 
match with the encoding I see in the files.

[https://github.com/apache/hive/blob/9a0ce4e15890aa91f05322e845438e1e8830b1c3/serde/src/java/org/apache/hadoop/hive/serde2/JsonSerDe.java#L135]

Current Apache Hive seems to default to using base64 encoding, while it can be 
altered with tbl property "json.binary.format". In the JSON tables in Impala's 
dataload the files are certainly not base64 encoded and "json.binary.format" is 
also not set, so it doesn't seem to work like the current Hive codebase. It is 
possible that this is related to differences between Apache Impala's Hive 
dependency and current Apache Hive.

Currently Impala base64 decodes the BINARY columns:

{code}

Hive:

create table tjsonbinary (string s, binary b) stored as JSONFILE;

insert into tjsonbinary values ("abcd", base64(cast("abcd" as binary)));

Impala:

select * from tjsonbinary;

+--+--+
| s    | b    |
+--+--+
| abcd | abcd |
+--+--+

{code}

What do you think about disabling BINARY column reading in JSON until Hive 
compatibility is clarified? My concern is that besides error messages and 
nulled values this may actually lead to correctness issues as many strings are 
both valid utf8 strings and base64 strings, so Impala may return unintended 
results.

> Support reading BINARY columns in JSON tables
> -
>
> Key: IMPALA-12927
> URL: https://issues.apache.org/jira/browse/IMPALA-12927
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Csaba Ringhofer
>Assignee: Zihao Ye
>Priority: Major
>
> Currently Impala cannot read BINARY columns in JSON files written by Hive 
> correctly and returns runtime errors:
> {code}
> select * from functional_json.binary_tbl;
> ++--++
> | id | string_col   | binary_col |
> ++--++
> | 1  | ascii        | NULL       |
> | 2  | ascii        | NULL       |
> | 3  | null         | NULL       |
> | 4  | empty        |            |
> | 5  | valid utf8   | NULL       |
> | 6  | valid utf8   | NULL       |
> | 7  | invalid utf8 | NULL       |
> | 8  | invalid utf8 | NULL       |
> ++--++
> WARNINGS: Error converting column: functional_json.binary_tbl.binary_col, 
> type: STRING, data: 'binary1'
> Error parsing row: file: 
> hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before 
> offset: 481
> Error converting column: functional_json.binary_tbl.binary_col, type: STRING, 
> data: 'binary2'
> Error parsing row: file: 
> hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before 
> offset: 481
> Error converting column: functional_json.binary_tbl.binary_col, type: STRING, 
> data: 'árvíztűrőtükörfúró'
> Error parsing row: file: 
> hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before 
> offset: 481
> Error converting column: functional_json.binary_tbl.binary_col, type: STRING, 
> data: '你好hello'
> Error parsing row: file: 
> hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before 
> offset: 481
> Error converting column: functional_json.binary_tbl.binary_col, type: STRING, 
> data: '��'
> Error parsing row: file: 
> hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before 
> offset: 481
> Error converting column: functional_json.binary_tbl.binary_col, type: STRING, 
> data: '�D3"'
> Error parsing row: file: 
> hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before 
> offset: 481
> {code}
> The single file in the table looks like this:
> {code}
>  hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0
> {"id":1,"string_col":"ascii","binary_col":"binary1"}
> {"id":2,"string_col":"ascii","binary_col":"binary2"}
> {"id":3,"string_col":"null","binary_col":null}
> {"id":4,"string_col":"empty","binary_col":""}
> {"id":5,"string_col":"valid utf8","binary_col":"árvíztűrőtükörfúró"}
> {"id":6,"string_col":"valid utf8","binary_col":"你好hello"}
> {"id":7,"string_col":"invalid utf8","binary_col":"\u�\u�"}
> {"id":8,"string_col":"invalid utf8","binary_col":"�D3\"\u0011\u"}
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: is

[jira] [Commented] (IMPALA-12927) Support reading BINARY columns in JSON tables

2024-03-20 Thread Csaba Ringhofer (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-12927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17829192#comment-17829192
 ] 

Csaba Ringhofer commented on IMPALA-12927:
--

[~Eyizoha] I see that BINARY tests are explicitly skipped for JSON, but I 
couldn't find any discussion about this in the commit that add the JSON scanner:

[https://gerrit.cloudera.org/#/c/19699/33/tests/query_test/test_scanners.py]

Do you have an idea on what to do with BINARY columns? I am not familiar with 
Hive's JSON files, so I don't know what is the intended encoding for BINARY 
columns. I know that the JSON format doesn't support binary values, so 
generally some encoding (e.g. base64) is used to convert byte arrays to some 
ascii representation. 

> Support reading BINARY columns in JSON tables
> -
>
> Key: IMPALA-12927
> URL: https://issues.apache.org/jira/browse/IMPALA-12927
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Csaba Ringhofer
>Priority: Major
>
> Currently Impala cannot read BINARY columns in JSON files written by Hive 
> correctly and returns runtime errors:
> {code}
> select * from functional_json.binary_tbl;
> ++--++
> | id | string_col   | binary_col |
> ++--++
> | 1  | ascii        | NULL       |
> | 2  | ascii        | NULL       |
> | 3  | null         | NULL       |
> | 4  | empty        |            |
> | 5  | valid utf8   | NULL       |
> | 6  | valid utf8   | NULL       |
> | 7  | invalid utf8 | NULL       |
> | 8  | invalid utf8 | NULL       |
> ++--++
> WARNINGS: Error converting column: functional_json.binary_tbl.binary_col, 
> type: STRING, data: 'binary1'
> Error parsing row: file: 
> hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before 
> offset: 481
> Error converting column: functional_json.binary_tbl.binary_col, type: STRING, 
> data: 'binary2'
> Error parsing row: file: 
> hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before 
> offset: 481
> Error converting column: functional_json.binary_tbl.binary_col, type: STRING, 
> data: 'árvíztűrőtükörfúró'
> Error parsing row: file: 
> hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before 
> offset: 481
> Error converting column: functional_json.binary_tbl.binary_col, type: STRING, 
> data: '你好hello'
> Error parsing row: file: 
> hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before 
> offset: 481
> Error converting column: functional_json.binary_tbl.binary_col, type: STRING, 
> data: '��'
> Error parsing row: file: 
> hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before 
> offset: 481
> Error converting column: functional_json.binary_tbl.binary_col, type: STRING, 
> data: '�D3"'
> Error parsing row: file: 
> hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before 
> offset: 481
> {code}
> The single file in the table looks like this:
> {code}
>  hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0
> {"id":1,"string_col":"ascii","binary_col":"binary1"}
> {"id":2,"string_col":"ascii","binary_col":"binary2"}
> {"id":3,"string_col":"null","binary_col":null}
> {"id":4,"string_col":"empty","binary_col":""}
> {"id":5,"string_col":"valid utf8","binary_col":"árvíztűrőtükörfúró"}
> {"id":6,"string_col":"valid utf8","binary_col":"你好hello"}
> {"id":7,"string_col":"invalid utf8","binary_col":"\u�\u�"}
> {"id":8,"string_col":"invalid utf8","binary_col":"�D3\"\u0011\u"}
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-12927) Support reading BINARY columns in JSON tables

2024-03-20 Thread Csaba Ringhofer (Jira)

Csaba Ringhofer created IMPALA-12927:


 Summary: Support reading BINARY columns in JSON tables
 Key: IMPALA-12927
 URL: https://issues.apache.org/jira/browse/IMPALA-12927
 Project: IMPALA
  Issue Type: Sub-task
  Components: Backend
Reporter: Csaba Ringhofer


Currently Impala cannot read BINARY columns in JSON files written by Hive 
correctly and returns runtime errors:

{code}

select * from functional_json.binary_tbl;
++--++
| id | string_col   | binary_col |
++--++
| 1  | ascii        | NULL       |
| 2  | ascii        | NULL       |
| 3  | null         | NULL       |
| 4  | empty        |            |
| 5  | valid utf8   | NULL       |
| 6  | valid utf8   | NULL       |
| 7  | invalid utf8 | NULL       |
| 8  | invalid utf8 | NULL       |
++--++
WARNINGS: Error converting column: functional_json.binary_tbl.binary_col, type: 
STRING, data: 'binary1'
Error parsing row: file: 
hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before offset: 
481
Error converting column: functional_json.binary_tbl.binary_col, type: STRING, 
data: 'binary2'
Error parsing row: file: 
hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before offset: 
481
Error converting column: functional_json.binary_tbl.binary_col, type: STRING, 
data: 'árvíztűrőtükörfúró'
Error parsing row: file: 
hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before offset: 
481
Error converting column: functional_json.binary_tbl.binary_col, type: STRING, 
data: '你好hello'
Error parsing row: file: 
hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before offset: 
481
Error converting column: functional_json.binary_tbl.binary_col, type: STRING, 
data: '��'
Error parsing row: file: 
hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before offset: 
481
Error converting column: functional_json.binary_tbl.binary_col, type: STRING, 
data: '�D3"'
Error parsing row: file: 
hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0, before offset: 
481

{code}

The single file in the table looks like this:

{code}

 hdfs://localhost:20500/test-warehouse/binary_tbl_json/00_0

{"id":1,"string_col":"ascii","binary_col":"binary1"}
{"id":2,"string_col":"ascii","binary_col":"binary2"}
{"id":3,"string_col":"null","binary_col":null}
{"id":4,"string_col":"empty","binary_col":""}
{"id":5,"string_col":"valid utf8","binary_col":"árvíztűrőtükörfúró"}
{"id":6,"string_col":"valid utf8","binary_col":"你好hello"}
{"id":7,"string_col":"invalid utf8","binary_col":"\u�\u�"}
{"id":8,"string_col":"invalid utf8","binary_col":"�D3\"\u0011\u"}

{code}

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-12899) Temporary workaround for BINARY in complex types

2024-03-19 Thread Csaba Ringhofer (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-12899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17828387#comment-17828387
 ] 

Csaba Ringhofer commented on IMPALA-12899:
--

base64 encoding seems a sane and widely used approach to me. I would suggest 
the following:
 # implement it first with base64 encoding
 # if there is demand to handle this differently, a query option like 
binary_column_encoding_in_json=base64 / skip / hive_style_unquoted_string

I would avoid a "lossy" solution as default, so one where the original binary 
value can't be decoded from the output.

> Temporary workaround for BINARY in complex types
> 
>
> Key: IMPALA-12899
> URL: https://issues.apache.org/jira/browse/IMPALA-12899
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Daniel Becker
>Assignee: Daniel Becker
>Priority: Major
>
> The BINARY type is currently not supported inside complex types and a 
> cross-component decision is probably needed to support it (see IMPALA-11491). 
> We would like to enable EXPAND_COMPLEX_TYPES for Iceberg metadata tables 
> (IMPALA-12612), which requires that queries with BINARY inside complex types 
> don't fail. Enabling EXPAND_COMPLEX_TYPES is a more prioritised issue than 
> IMPALA-11491, so we should come up with a temporary solution, e.g. NULLing 
> BINARY values in complex types and logging a warning, or setting these BINARY 
> values to a warning string.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-12902) Event replication is can be broken if hms_event_incremental_refresh_transactional_table=false

2024-03-14 Thread Csaba Ringhofer (Jira)

Csaba Ringhofer created IMPALA-12902:


 Summary: Event replication is can be broken if 
hms_event_incremental_refresh_transactional_table=false
 Key: IMPALA-12902
 URL: https://issues.apache.org/jira/browse/IMPALA-12902
 Project: IMPALA
  Issue Type: Bug
  Components: Catalog
Reporter: Csaba Ringhofer


when setting hms_event_incremental_refresh_transactional_table=false 
metadata.test_event_processing.TestEventProcessing.test_event_based_replication 
fails at the following assert:

[https://github.com/apache/impala/blob/6c0c26146d956ad771cee27283c1371b9c23adce/tests/metadata/test_event_processing_base.py#L234]

 

Based on the logs catalogd only sees alter_database and transaction events in 
this case, so if the transaction events (COMMIT_TXN) are ignore, then it 
doesn't detect the change in the table.

This seems strange as the commit that added the test is older than the one that 
added hms_event_incremental_refresh_transactional_table

[https://github.com/apache/impala/commit/e53d649f8a88f42a70237fe7c2663baa126fed1a]

vs

[https://github.com/apache/impala/commit/097b10104f23e0927d5b21b43a79f6cc10425f59]

 

So it is not clear to me how could the test pass originally. One possibility is 
that different events were generated in HMS at that time. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-12902) Event replication can be broken if hms_event_incremental_refresh_transactional_table=false

2024-03-14 Thread Csaba Ringhofer (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-12902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer updated IMPALA-12902:
-
Summary: Event replication can be broken if 
hms_event_incremental_refresh_transactional_table=false  (was: Event 
replication is can be broken if 
hms_event_incremental_refresh_transactional_table=false)

> Event replication can be broken if 
> hms_event_incremental_refresh_transactional_table=false
> --
>
> Key: IMPALA-12902
> URL: https://issues.apache.org/jira/browse/IMPALA-12902
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Csaba Ringhofer
>Priority: Major
>
> when setting hms_event_incremental_refresh_transactional_table=false 
> metadata.test_event_processing.TestEventProcessing.test_event_based_replication
>  fails at the following assert:
> [https://github.com/apache/impala/blob/6c0c26146d956ad771cee27283c1371b9c23adce/tests/metadata/test_event_processing_base.py#L234]
>  
> Based on the logs catalogd only sees alter_database and transaction events in 
> this case, so if the transaction events (COMMIT_TXN) are ignore, then it 
> doesn't detect the change in the table.
> This seems strange as the commit that added the test is older than the one 
> that added hms_event_incremental_refresh_transactional_table
> [https://github.com/apache/impala/commit/e53d649f8a88f42a70237fe7c2663baa126fed1a]
> vs
> [https://github.com/apache/impala/commit/097b10104f23e0927d5b21b43a79f6cc10425f59]
>  
> So it is not clear to me how could the test pass originally. One possibility 
> is that different events were generated in HMS at that time. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-12895) REFRESH doesn't detect changes in partition locations in ACID tables

2024-03-12 Thread Csaba Ringhofer (Jira)

Csaba Ringhofer created IMPALA-12895:


 Summary: REFRESH doesn't detect changes in partition locations in 
ACID tables
 Key: IMPALA-12895
 URL: https://issues.apache.org/jira/browse/IMPALA-12895
 Project: IMPALA
  Issue Type: Bug
  Components: Catalog
Reporter: Csaba Ringhofer


This was discovered by running test 
metadata.test_event_processing.TestEventProcessing.test_transact_partition_location_change_from_hive
 when flag hms_event_incremental_refresh_transactional_table  is set to false.

[https://github.com/apache/impala/blob/ab6c9467f6347671b971dbce4c640bea032b6ed9/tests/metadata/test_event_processing.py#L164]

 

When hms_event_incremental_refresh_transactional_table  is true (default), the 
alter partition event is processed correctly and the location change is 
detected. But if it is false or event processing is turned off, the change is 
not detected and running REFRESH on the table also doesn't update the location.

The different handling based on the flag seems intentional:

https://github.com/apache/impala/blob/ab6c9467f6347671b971dbce4c640bea032b6ed9/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java#L2606

 

This seems to be an old issues while the test was added in a recent commit:

[https://github.com/apache/impala/commit/32b29ff36fb3e05fd620a6714de88805052d0117]

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Work started] (IMPALA-12835) Transactional tables are unsynced when hms_event_incremental_refresh_transactional_table is disabled

2024-03-07 Thread Csaba Ringhofer (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-12835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-12835 started by Csaba Ringhofer.

> Transactional tables are unsynced when 
> hms_event_incremental_refresh_transactional_table is disabled
> 
>
> Key: IMPALA-12835
> URL: https://issues.apache.org/jira/browse/IMPALA-12835
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Quanlong Huang
>Assignee: Csaba Ringhofer
>Priority: Critical
>
> There are some test failures when 
> hms_event_incremental_refresh_transactional_table is disabled:
>  * 
> tests/metadata/test_event_processing.py::TestEventProcessing::test_transactional_insert_events
>  * 
> tests/metadata/test_event_processing.py::TestEventProcessing::test_event_based_replication
> I can reproduce the issue locally:
> {noformat}
> $ bin/start-impala-cluster.py 
> --catalogd_args=--hms_event_incremental_refresh_transactional_table=false
> impala-shell> create table txn_tbl (id int, val int) stored as parquet 
> tblproperties 
> ('transactional'='true','transactional_properties'='insert_only');
> impala-shell> describe txn_tbl;  -- make the table loaded in Impala
> hive> insert into txn_tbl values(101, 200);
> impala-shell> select * from txn_tbl; {noformat}
> Impala shows no results until a REFRESH runs on this table.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-12835) Transactional tables are unsynced when hms_event_incremental_refresh_transactional_table is disabled

2024-03-07 Thread Csaba Ringhofer (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-12835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824490#comment-17824490
 ] 

Csaba Ringhofer commented on IMPALA-12835:
--

https://gerrit.cloudera.org/#/c/21116/

> Transactional tables are unsynced when 
> hms_event_incremental_refresh_transactional_table is disabled
> 
>
> Key: IMPALA-12835
> URL: https://issues.apache.org/jira/browse/IMPALA-12835
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Quanlong Huang
>Assignee: Csaba Ringhofer
>Priority: Critical
>
> There are some test failures when 
> hms_event_incremental_refresh_transactional_table is disabled:
>  * 
> tests/metadata/test_event_processing.py::TestEventProcessing::test_transactional_insert_events
>  * 
> tests/metadata/test_event_processing.py::TestEventProcessing::test_event_based_replication
> I can reproduce the issue locally:
> {noformat}
> $ bin/start-impala-cluster.py 
> --catalogd_args=--hms_event_incremental_refresh_transactional_table=false
> impala-shell> create table txn_tbl (id int, val int) stored as parquet 
> tblproperties 
> ('transactional'='true','transactional_properties'='insert_only');
> impala-shell> describe txn_tbl;  -- make the table loaded in Impala
> hive> insert into txn_tbl values(101, 200);
> impala-shell> select * from txn_tbl; {noformat}
> Impala shows no results until a REFRESH runs on this table.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Closed] (IMPALA-12812) Send reload event after ALTER TABLE RECOVER PARTITIONS

2024-03-01 Thread Csaba Ringhofer (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-12812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer closed IMPALA-12812.

Resolution: Invalid

> Send reload event after ALTER TABLE RECOVER PARTITIONS
> --
>
> Key: IMPALA-12812
> URL: https://issues.apache.org/jira/browse/IMPALA-12812
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Csaba Ringhofer
>Priority: Major
>
> IMPALA-11808 added support for sending reload events after REFRESH to allow 
> other Impala cluster connecting to the same HMS to also reload their tables. 
> REFRESH is often used when in external tables the files are written directly 
> to filesystem without notifying HMS, so Impala needs to update its cache and 
> can't rely on HMS notifications.
> The same could be useful for ALTER TABLE RECOVER PARTITIONS.  -It detects 
> partition directories that were only created in the FS but not in HMS and 
> creates them in HMS too.-  - UPDATE: the previous sentence was not true with 
> current Impala.  It also reloads the table (similarly to other DDLs) and 
> detects new files in existing partitions.
> An HMS event is created for the new partitions but there is no event that 
> would indicate that there are new files in existing partitions. As ALTER 
> TABLE RECOVER PARTITIONS is called when the user expects changes in the 
> filesystem (similarly to REFRESH), it could be useful to send a reload event 
> after it is finished.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-12812) Send reload event after ALTER TABLE RECOVER PARTITIONS

2024-03-01 Thread Csaba Ringhofer (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-12812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer updated IMPALA-12812:
-
Description: 
IMPALA-11808 added support for sending reload events after REFRESH to allow 
other Impala cluster connecting to the same HMS to also reload their tables. 
REFRESH is often used when in external tables the files are written directly to 
filesystem without notifying HMS, so Impala needs to update its cache and can't 
rely on HMS notifications.

The same could be useful for ALTER TABLE RECOVER PARTITIONS.  -It detects 
partition directories that were only created in the FS but not in HMS and 
creates them in HMS too.-  - UPDATE: the previous sentence was not true with 
current Impala.  It also reloads the table (similarly to other DDLs) and 
detects new files in existing partitions.

An HMS event is created for the new partitions but there is no event that would 
indicate that there are new files in existing partitions. As ALTER TABLE 
RECOVER PARTITIONS is called when the user expects changes in the filesystem 
(similarly to REFRESH), it could be useful to send a reload event after it is 
finished.

  was:
IMPALA-11808 added support for sending reload events after REFRESH to allow 
other Impala cluster connecting to the same HMS to also reload their tables. 
REFRESH is often used when in external tables the files are written directly to 
filesystem without notifying HMS, so Impala needs to update its cache and can't 
rely on HMS notifications.

The same could be useful for ALTER TABLE RECOVER PARTITIONS.  {-}It detects 
partition directories that were only created in the FS but not in HMS and 
creates them in HMS too. I{-}t also reloads the table (similarly to other DDLs) 
and detects new files in existing partitions. - UPDATE: the previous sentence 
was not true with current Impala.

An HMS event is created for the new partitions but there is no event that would 
indicate that there are new files in existing partitions. As ALTER TABLE 
RECOVER PARTITIONS is called when the user expects changes in the filesystem 
(similarly to REFRESH), it could be useful to send a reload event after it is 
finished.


> Send reload event after ALTER TABLE RECOVER PARTITIONS
> --
>
> Key: IMPALA-12812
> URL: https://issues.apache.org/jira/browse/IMPALA-12812
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Csaba Ringhofer
>Priority: Major
>
> IMPALA-11808 added support for sending reload events after REFRESH to allow 
> other Impala cluster connecting to the same HMS to also reload their tables. 
> REFRESH is often used when in external tables the files are written directly 
> to filesystem without notifying HMS, so Impala needs to update its cache and 
> can't rely on HMS notifications.
> The same could be useful for ALTER TABLE RECOVER PARTITIONS.  -It detects 
> partition directories that were only created in the FS but not in HMS and 
> creates them in HMS too.-  - UPDATE: the previous sentence was not true with 
> current Impala.  It also reloads the table (similarly to other DDLs) and 
> detects new files in existing partitions.
> An HMS event is created for the new partitions but there is no event that 
> would indicate that there are new files in existing partitions. As ALTER 
> TABLE RECOVER PARTITIONS is called when the user expects changes in the 
> filesystem (similarly to REFRESH), it could be useful to send a reload event 
> after it is finished.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-12812) Send reload event after ALTER TABLE RECOVER PARTITIONS

2024-03-01 Thread Csaba Ringhofer (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-12812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer updated IMPALA-12812:
-
Description: 
IMPALA-11808 added support for sending reload events after REFRESH to allow 
other Impala cluster connecting to the same HMS to also reload their tables. 
REFRESH is often used when in external tables the files are written directly to 
filesystem without notifying HMS, so Impala needs to update its cache and can't 
rely on HMS notifications.

The same could be useful for ALTER TABLE RECOVER PARTITIONS.  {-}It detects 
partition directories that were only created in the FS but not in HMS and 
creates them in HMS too. I{-}t also reloads the table (similarly to other DDLs) 
and detects new files in existing partitions. - UPDATE: the previous sentence 
was not true with current Impala.

An HMS event is created for the new partitions but there is no event that would 
indicate that there are new files in existing partitions. As ALTER TABLE 
RECOVER PARTITIONS is called when the user expects changes in the filesystem 
(similarly to REFRESH), it could be useful to send a reload event after it is 
finished.

  was:
IMPALA-11808 added support for sending reload events after REFRESH to allow 
other Impala cluster connecting to the same HMS to also reload their tables. 
REFRESH is often used when in external tables the files are written directly to 
filesystem without notifying HMS, so Impala needs to update its cache and can't 
rely on HMS notifications.

The same could be useful for ALTER TABLE RECOVER PARTITIONS. {-}- It detects 
partition directories that were only created in the FS but not in HMS and 
creates them in HMS too.-{-}It also reloads the table (similarly to other DDLs) 
and detects new files in existing partitions. - UPDATE: the previous sentence 
was not true with current Impala.

An HMS event is created for the new partitions but there is no event that would 
indicate that there are new files in existing partitions. As ALTER TABLE 
RECOVER PARTITIONS is called when the user expects changes in the filesystem 
(similarly to REFRESH), it could be useful to send a reload event after it is 
finished.


> Send reload event after ALTER TABLE RECOVER PARTITIONS
> --
>
> Key: IMPALA-12812
> URL: https://issues.apache.org/jira/browse/IMPALA-12812
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Csaba Ringhofer
>Priority: Major
>
> IMPALA-11808 added support for sending reload events after REFRESH to allow 
> other Impala cluster connecting to the same HMS to also reload their tables. 
> REFRESH is often used when in external tables the files are written directly 
> to filesystem without notifying HMS, so Impala needs to update its cache and 
> can't rely on HMS notifications.
> The same could be useful for ALTER TABLE RECOVER PARTITIONS.  {-}It detects 
> partition directories that were only created in the FS but not in HMS and 
> creates them in HMS too. I{-}t also reloads the table (similarly to other 
> DDLs) and detects new files in existing partitions. - UPDATE: the previous 
> sentence was not true with current Impala.
> An HMS event is created for the new partitions but there is no event that 
> would indicate that there are new files in existing partitions. As ALTER 
> TABLE RECOVER PARTITIONS is called when the user expects changes in the 
> filesystem (similarly to REFRESH), it could be useful to send a reload event 
> after it is finished.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1002 matches

Mail list logo