[jira] [Commented] (IMPALA-13252) Filter update log message prints TUniqueId in non-standard format

2024-07-29 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17869283#comment-17869283
 ] 

ASF subversion and git services commented on IMPALA-13252:
--

Commit 8d4497be0947e7552e0e9e2c15b9b08566aad148 in impala's branch 
refs/heads/master from Michael Smith
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=8d4497be0 ]

IMPALA-13252: Consistently use PrintId to print TUniqueId

Some logging formats TUniqueId inconsistently by relying on the Thrift
to_string/toString generated printers. This makes it difficult to track
a specific query through logs.

Adds operator<<(ostream, TUniqueId) to simplify logging TUniqueId
correctly, uses PrintId instead of toString in Java, and adds a verifier
to test_banned_log_messages to ensure TUniqueId is not printed in logs.

Change-Id: If01bf20a240debbbd4c0a22798045ea03f17b28e
Reviewed-on: http://gerrit.cloudera.org:8080/21606
Reviewed-by: Yida Wu 
Tested-by: Impala Public Jenkins 


> Filter update log message prints TUniqueId in non-standard format
> -
>
> Key: IMPALA-13252
> URL: https://issues.apache.org/jira/browse/IMPALA-13252
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.4.0
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Major
>
> Some error messages, such as
> {code}
> Filter update received for non-executing query with id: 
> TUniqueId(hi=-8482965541048796556, lo=3501357296473079808)
> {code}
> print query id as the raw Thrift type rather than our colon-delimited format, 
> e.g. "8a4673c8fbe83a74:309751e9". This makes it difficult to trace 
> queries through logs.
> Normalize on the colon-delimited format.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13214) test_shell_commandline..test_removed_query_option failed with assertion failure

2024-07-24 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17868479#comment-17868479
 ] 

ASF subversion and git services commented on IMPALA-13214:
--

Commit e1098a6a02c417ddc63904259fa0abbcc64fcdb7 in impala's branch 
refs/heads/master from Michael Smith
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=e1098a6a0 ]

IMPALA-13214: Skip wait_until_connected when shell exits

The ImpalaShell class expects to start impala-shell and interact with it
by sending instructions over stdin and reading the results. This
assumption was incorrect when used for impala-shell batch sessions,
where the process exits on its own. If there's a delay in
ImpalaShell.__init__ - between starting the process and polling to see
that it's running - for a batch process, ImpalaShell will fail the
assertion that process_status is None. This can be easily reproduced by
adding a small (0.1s) sleep after starting the new process.

Most batch runs of impala-shell happen through `run_impala_shell_cmd`.
Updated that function to only wait for a successful connection when
stdin input is supplied. Otherwise the command is assumed to be a batch
function and any failures will be detected during `get_result`. Removed
explicit use of `wait_until_connected` as redundant.

Fixed cases in test_config_file that previously ignored WARNING before
the connection string because they did not specify
`wait_until_connected`.

Tested by running shell/test_shell_commandline.py with a 0.1s delay
before ImpalaShell polls.

Change-Id: I24e029b6192a17773760cb44fd7a4f87b71c0aae
Reviewed-on: http://gerrit.cloudera.org:8080/21598
Tested-by: Impala Public Jenkins 
Reviewed-by: Jason Fehr 
Reviewed-by: Kurt Deschler 


> test_shell_commandline..test_removed_query_option failed with assertion 
> failure
> ---
>
> Key: IMPALA-13214
> URL: https://issues.apache.org/jira/browse/IMPALA-13214
> Project: IMPALA
>  Issue Type: Bug
>  Components: Test
>Affects Versions: Impala 4.5.0
>Reporter: Laszlo Gaal
>Assignee: Michael Smith
>Priority: Blocker
> Fix For: Impala 4.5.0
>
>
> Happened during a recent s3-arm-data-cache build.
> Python backtrace:
> {code}
> /data/jenkins/workspace/impala-asf-master-core-s3-arm-data-cache/repos/Impala/tests/shell/test_shell_commandline.py:305:
>  in test_removed_query_option
> expect_success=True)
> shell/util.py:135: in run_impala_shell_cmd
> stderr_file=stderr_file)
> shell/util.py:155: in run_impala_shell_cmd_no_expect
> stdout_file=stdout_file, stderr_file=stderr_file)
> shell/util.py:271: in __init__
> "Impala shell exited with return code {0}".format(process_status)
> E   AssertionError: Impala shell exited with return code 0
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13243) Update Dropwizard Metrics to supported version

2024-07-23 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17867966#comment-17867966
 ] 

ASF subversion and git services commented on IMPALA-13243:
--

Commit 22b59d27d0be25999eba4c839ea157c279939d76 in impala's branch 
refs/heads/master from Michael Smith
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=22b59d27d ]

IMPALA-13243: Update Dropwizard Metrics to 4.2.x

Updates Dropwizard Metrics components to the latest 4.2.x release,
4.2.26. We directly use metrics-core, and metrics-jvm/metrics-json are
imported via Hive (via
https://github.com/joshelser/dropwizard-hadoop-metrics2).

Dropwizard Metrics manually tested with these versions on
https://github.com/joshelser/dropwizard-hadoop-metrics2/pull/8.

Change-Id: Ie9bec7a7c23194604430531bd83b25c5969e888e
Reviewed-on: http://gerrit.cloudera.org:8080/21599
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Update Dropwizard Metrics to supported version
> --
>
> Key: IMPALA-13243
> URL: https://issues.apache.org/jira/browse/IMPALA-13243
> Project: IMPALA
>  Issue Type: Task
>  Components: Frontend
>Affects Versions: Impala 4.4.0
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Major
>
> [Dropwizard Metrics|https://metrics.dropwizard.io] 4.1.x was [EOL in 
> 2023|https://github.com/dropwizard/metrics/discussions/3029]. Impala's still 
> on 3.x. Update to 4.2.x to keep up with bug and security fixes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12857) Add flag to enable merge-on-read even if tables are configured with copy-on-write

2024-07-23 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17867965#comment-17867965
 ] 

ASF subversion and git services commented on IMPALA-12857:
--

Commit 05585c19bfcc235ab9d7574c970db04125fb9743 in impala's branch 
refs/heads/master from Noemi Pap-Takacs
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=05585c19b ]

IMPALA-12857: Add flag to enable merge-on-read even if tables are configured 
with copy-on-write

Impala can only modify an Iceberg table via 'merge-on-read'. The
'iceberg_always allow_merge_on_read_operations' backend flag makes it
possible to execute 'merge-on-read' operations (DELETE, UPDATE, MERGE)
even if the table property is 'copy-on-write'.

Testing:
 - custom cluster test
 - negative E2E test

Change-Id: I3800043e135beeedfb655a238c0644aaa0ef11f4
Reviewed-on: http://gerrit.cloudera.org:8080/21578
Reviewed-by: Daniel Becker 
Tested-by: Impala Public Jenkins 


> Add flag to enable merge-on-read even if tables are configured with 
> copy-on-write
> -
>
> Key: IMPALA-12857
> URL: https://issues.apache.org/jira/browse/IMPALA-12857
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Zoltán Borók-Nagy
>Assignee: Noemi Pap-Takacs
>Priority: Major
>  Labels: impala-iceberg
>
> Impala can only modify a table via 'merge-on-read'. It raises an error if 
> users want to modify a table that is configured with 'copy-on-write'.
> We could add a backend flag to relax this restriction, i.e. enable 
> 'merge-on-read' operations (DELETE, UPDATE, MERGE) even if the table property 
> is 'copy-on-write'.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13226) TupleCacheInfo unintentionally overwrites Object.finalize()

2024-07-18 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17866936#comment-17866936
 ] 

ASF subversion and git services commented on IMPALA-13226:
--

Commit 04608452d37edc9256e368bec69b23c9e989b443 in impala's branch 
refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=04608452d ]

IMPALA-13226: Rename TupleCacheInfo.finalize() to finalizeHash()

TupleCacheInfo.finalize() unintentionally overwrites Object.finalize()
which is called by the JVM garbage collector when garbage collection
determines that there are no more references to the object. Usually the
finalize method is overrided to dispose of system resources or to
perform other cleanup.

TupleCacheInfo.finalize() is not meant to be used during GC. We'd better
use another method name to avoid confusion. This patch renames it to
finalizeHash(). Also fixed some stale comments.

Change-Id: I657c4f14b074b7c16dc7d126b0c8b5083b8f19c6
Reviewed-on: http://gerrit.cloudera.org:8080/21588
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> TupleCacheInfo unintentionally overwrites Object.finalize()
> ---
>
> Key: IMPALA-13226
> URL: https://issues.apache.org/jira/browse/IMPALA-13226
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Major
>
> Object.finalize() is called by the JVM garbage collector on an object when 
> garbage collection determines that there are no more references to the 
> object. A subclass overrides the finalize method to dispose of system 
> resources or to perform other cleanup.
> TupleCacheInfo.finalize() is not meant to be used during GC. We'd better use 
> another method name to avoid confusion.
> {code:java}
>   public void finalize() {
> finalizedHashString_ = hasher_.hash().toString();
> hasher_ = null;
> finalizedHashTrace_ = hashTraceBuilder_.toString();
> hashTraceBuilder_ = null;
> finalized_ = true;
>   }{code}
> https://github.com/apache/impala/blob/d83b48cf72fa94ec7f6e55da409b4dff3350543b/fe/src/main/java/org/apache/impala/planner/TupleCacheInfo.java#L157-L163



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13208) Add cluster id to the membership and request-queue topic names

2024-07-17 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17866908#comment-17866908
 ] 

ASF subversion and git services commented on IMPALA-13208:
--

Commit fcee022e6033afe8c8c072fef1274640336b8770 in impala's branch 
refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=fcee022e6 ]

IMPALA-13208: Add cluster id to the membership and request-queue topic names

To share catalogd and statestore across Impala clusters, this adds the
cluster id to the membership and request-queue topic names. So impalads
are only visible to each other inside the same cluster, i.e. using the
same cluster id. Note that impalads are still subscribe to the same
catalog-update topic so they can share the same catalog service.
If cluster id is empty, use the original topic names.

This also adds the non-empty cluster id as the prefix of the statestore
subscriber id for impalad and admissiond.

Tests:
 - Add custom cluster test
 - Ran exhaustive tests

Change-Id: I2ff41539f568ef03c0ee2284762b4116b313d90f
Reviewed-on: http://gerrit.cloudera.org:8080/21573
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Add cluster id to the membership and request-queue topic names
> --
>
> Key: IMPALA-13208
> URL: https://issues.apache.org/jira/browse/IMPALA-13208
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
>
> Coordinators subscribe to 3 statestore topics: catalog-update, 
> impala-membership and impala-request-queue. The last two topics are about 
> query scheduling. To separate the cluster or share catalogd and statestore 
> across Impala clusters, we can add the cluster id to these two topic names. 
> Impalads are only visible to each other inside the same cluster (i.e. using 
> the same cluster id). Queries won't be scheduled across clusters.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13194) Fast-serialize position delete records

2024-07-17 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17866883#comment-17866883
 ] 

ASF subversion and git services commented on IMPALA-13194:
--

Commit daa4d6e916f80d8b929dcf4873668accceb33b0b in impala's branch 
refs/heads/master from Zoltan Borok-Nagy
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=daa4d6e91 ]

IMPALA-13194: Fast-serialize position delete records

Currently the serialization of position delete records are very
wasteful. The records contain slots 'file_path' and 'pos'. And
what we do during serialization is the following:

 1. Write fixed-size tuples that have a StringValue and a BigInt slot
 2. Copy the StringValue's contents after the tuple.
 3. Convert the StringValue ptr to be an offset to the string data

So we end up having something like this:

+-+++-+++-+
| StringValue | BigInt |   File path| StringValue | BigInt |   File path
| ... |
+-+++-+++-+
| ptr, len| 42 | /.../a.parquet | ptr, len| 43 | /.../a.parquet 
| ... |
+-+++-+++-+

This is very redundant to store the file paths that way, and in the
end we will have a huge buffer that we need to compress and send over
the network. Moreover, we copy the file paths in memory twice:

 1. From input row batch to the KrpcDataStreamSender::Channel's temporary row 
batch
 2. From the temporary row batch to the outbound row batch (during 
serialization)

The position delete files store the delete records in ascending order.
This means adjacent records mostly have the same file path. So we could
just buffer the position delete records up to the Channel's capacity,
then serialize the data in a more efficient way.

With this patch, serialized data will look like this:
++-++-++-+
|   File path| StringValue | BigInt | StringValue | BigInt | ... |
++-++-++-+
| /.../a.parquet | ptr, len| 42 | ptr, len| 43 | ... |
++-++-++-+

File path, then tuples with the same file path, after that comes the
next file path and tuples associated with that one, and so on.

Measurements:
07:EXCHANGE: 1m   ==> 52s
F02:EXCHANGE SENDER: 1m2s ==> 16s

Change-Id: I6095f318e3d06dedb4197681156b40dd2a326c6f
Reviewed-on: http://gerrit.cloudera.org:8080/21563
Reviewed-by: Csaba Ringhofer 
Tested-by: Impala Public Jenkins 


> Fast-serialize position delete records
> --
>
> Key: IMPALA-13194
> URL: https://issues.apache.org/jira/browse/IMPALA-13194
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-iceberg
>
> Currently the serialization of position delete records are very wasteful. The 
> records contain slots 'file_path' and 'pos'. And what we do during 
> serialization is the following.
>  # Write fixed-size tuple that have a StringValue and a BigInt slot (20 bytes 
> in total)
>  # We copy the StringValue's contents after the tuple.
>  # We convert the StringValue slot to be an offset to the string data
> So we end up having something like this:
> {noformat}
> +-+++-+++-+
> | StringValue | BigInt |   File path| StringValue | BigInt |   File path  
>   | ... |
> +-+++-+++-+
> | ptr, len| 42 | /.../a.parquet | ptr, len| 43 | 
> /.../a.parquet | ... |
> +-+++-+++-+
> {noformat}
> This is very redundant to store the file paths that way, and at the end we 
> will have a huge buffer that we need to compress and send over the network. 
> Moreover, we copy the file paths in memory twice:
>  # From input row batch to the KrpcDataStreamSender::Channel's temporary row 
> batch
>  # From the temporary row batch to the outbound row batch (during 
> serialization)
> The position delete files store the delete records in ascending order. This 
> means adjacent records mostly have the same file path. So we could just 
> buffer the position delete records up to the Channel's capacity, then 
> serialize the data in a more efficient way.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: 

[jira] [Commented] (IMPALA-13231) Some auto-generated files for ranger are not ignored by Git

2024-07-17 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17866884#comment-17866884
 ] 

ASF subversion and git services commented on IMPALA-13231:
--

Commit 8d16858f29f5c0ef0d5c03c48db693bbdae64c0f in impala's branch 
refs/heads/master from Xuebin Su
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=8d16858f2 ]

IMPALA-13231: Gitignore auto-generated files for ranger

Previously, some files generated from templates by `setup-ranger.sh`
are not ignored by Git. This patch fixes the issue by adding those
files to `.gitignore`.

Change-Id: I3057b136643412f686352f3188bf7e2b801626bd
Reviewed-on: http://gerrit.cloudera.org:8080/21590
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Some auto-generated files for ranger are not ignored by Git
> ---
>
> Key: IMPALA-13231
> URL: https://issues.apache.org/jira/browse/IMPALA-13231
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Xuebin Su
>Assignee: Xuebin Su
>Priority: Major
>
> When {{bin/bootstrap_development.sh}} runs, some files generated by 
> {{testdata/binsetup-ranger.sh}} from the templates are not ignored by Git,  
> including
> * {{testdata/cluster/ranger/setup/impala_user_non_owner_2.json}}, and
> * {{testdata/cluster/ranger/setup/all_database_policy_revised.json}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13161) impalad crash -- impala::DelimitedTextParser::ParseFieldLocations

2024-07-16 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17866608#comment-17866608
 ] 

ASF subversion and git services commented on IMPALA-13161:
--

Commit 5c4e771241a7f847d2349ae248bc268243e071ed in impala's branch 
refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=5c4e77124 ]

IMPALA-13161: Fix column index overflow in DelimitedTextParser

DelimitedTextParser tracks the current column index inside the current
row that is under parsing. The row could have arbitrary numbers of
fields. The index, 'column_idx_', is defined as int type which could
overflow when there are more than 2^31 fields in the row. This index is
only used to check whether the current column should be materialized. It
doesn't make sense to track the index if it's larger than the number of
columns of the table.

This patch fixes the overflow issue by only bumping 'column_idx_' when
it's smaller than the number of columns of the table.

Tests
 - Add e2e test

Change-Id: I527a8971e92e270d5576c2155e4622dd6d43d745
Reviewed-on: http://gerrit.cloudera.org:8080/21559
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> impalad crash -- impala::DelimitedTextParser::ParseFieldLocations
> ---
>
> Key: IMPALA-13161
> URL: https://issues.apache.org/jira/browse/IMPALA-13161
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.0.0, Impala 4.4.0
>Reporter: nyq
>Assignee: Quanlong Huang
>Priority: Critical
>
> Impala version: 4.0.0
> Problem:
> impalad crash, by operating a text table, which has a 3GB data file that only 
> contains '\x00' char
> Steps:
> python -c 'f=open("impala_0_3gb.data.csv", "wb");tmp="\x00"*1024*1024*3; 
> [f.write(tmp) for i in range(1024)] ;f.close()'
> create table impala_0_3gb (id int)
> hdfs dfs -put impala_0_3gb.data.csv /user/hive/warehouse/impala_0_3gb/
> refresh impala_0_3gb
> select count(1) from impala_0_3gb
> Errors:
> Wrote minidump to 1dcf110f-5a2e-49a2-be4eb7a5-4709ed19.dmp
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x0181861c, pid=956182, tid=0x7fc6b340e700
> #
> # JRE version: OpenJDK Runtime Environment (8.0) (build 1.8.0)
> # Java VM: OpenJDK 64-Bit Server VM
> # Problematic frame:
> # C  [impalad+0x141861c]  
> impala::DelimitedTextParser::ParseFieldLocations(int, long, char**, 
> char**, impala::FieldLocation*, int*, int*, char**)+0x7cc
> #
> # Failed to write core dump. Core dumps have been disabled. To enable core 
> dumping, try "ulimit -c unlimited" before starting Java again
> #
> # An error report file with more information is saved as:
> # /tmp/hs_err_pid956182.log
> #
> #
> C  [impalad+0x141861c]  
> impala::DelimitedTextParser::ParseFieldLocations(int, long, char**, 
> char**, impala::FieldLocation*, int*, int*, char**)+0x7cc
> C  [impalad+0x136fe11]  
> impala::HdfsTextScanner::ProcessRange(impala::RowBatch*, int*)+0x1a1
> C  [impalad+0x137100e]  
> impala::HdfsTextScanner::FinishScanRange(impala::RowBatch*)+0x3be
> C  [impalad+0x13721ac]  
> impala::HdfsTextScanner::GetNextInternal(impala::RowBatch*)+0x12c
> C  [impalad+0x131cdfc]  impala::HdfsScanner::ProcessSplit()+0x19c
> C  [impalad+0x1443e17]  
> impala::HdfsScanNode::ProcessSplit(std::vector std::allocator > const&, impala::MemPool*, 
> impala::io::ScanRange*, long*)+0x7e7
> C  [impalad+0x1447001]  impala::HdfsScanNode::ScannerThread(bool, long)+0x541



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13209) ExchangeNode's ConvertRowBatchTime can be high

2024-07-16 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17866607#comment-17866607
 ] 

ASF subversion and git services commented on IMPALA-13209:
--

Commit a486305a922d672f77ff23b5f42e604a720597fd in impala's branch 
refs/heads/master from Csaba Ringhofer
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=a486305a9 ]

IMPALA-13209: Optimize ConvertRowBatchTime in ExchangeNode

The patch optimizes the most common case when the src and dst
RowBatches have the same number of tuples per row.

ConvertRowBatchTime is decreased from >600ms to <100ms in a query
with busy exchange node:
set mt_dop=8;
select straight_join count(*) from tpcds_parquet.store_sales s1
  join /*+broadcast*/ tpcds_parquet.store_sales16 s2
  on s1.ss_customer_sk = s2.ss_customer_sk;

TPCDS-20 showed minor improvement (0.77%). The affect is likely to
be larger if more nodes are involved.

Testing:
- passed core tests

Change-Id: Iab94315364e8886da1ae01cf6af623812a2da9cb
Reviewed-on: http://gerrit.cloudera.org:8080/21571
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> ExchangeNode's ConvertRowBatchTime can be high
> --
>
> Key: IMPALA-13209
> URL: https://issues.apache.org/jira/browse/IMPALA-13209
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Csaba Ringhofer
>Assignee: Csaba Ringhofer
>Priority: Major
>  Labels: performance
>
> ConvertRowBatchTime can be surprisingly high - the only thing done during 
> this timer is copying tuple pointers from one RowBatch to another.
> https://github.com/apache/impala/blob/c53987480726b114e0c3537c71297df2834a4962/be/src/exec/exchange-node.cc#L217
> {code}
> set mt_dop=8;
> select straight_join count(*) from tpcds_parquet.store_sales s1 join 
> /*+broadcast*/ tpcds_parquet.store_sales16 s2 on s1.ss_customer_sk = 
> s2.ss_customer_sk;
> ConvertRowBatchTime dominates the busy exchange node's exec time in the 
> profile:
>- ConvertRowBatchTime: 640.072ms
>- InactiveTotalTime: 243.783ms
>- PeakMemoryUsage: 12.53 MB (13142368)
>- RowsReturned: 46.09M (46086464)
>- RowsReturnedRate: 46.93 M/sec
>- TotalTime: 981.968ms
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13227) test_spilling_hash_join should be marked for serial execution

2024-07-16 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17866606#comment-17866606
 ] 

ASF subversion and git services commented on IMPALA-13227:
--

Commit 3de8c2ab9c755b1adfc35ea2176a2ac193899ca6 in impala's branch 
refs/heads/master from Zoltan Borok-Nagy
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=3de8c2ab9 ]

IMPALA-13227: test_spilling_hash_join should be marked for serial execution

test_spilling_hash_join consumes too much resources and parallel
tests can fail because of it. We should mark it for serial execution.

Testing:
 * had a green exhaustive run, and we also now that before
   test_spilling_hash_join was added, the exhaustive runs
   were much stable

Change-Id: I7b50376db9dde5b33a02fde55880f49a7db4b7c1
Reviewed-on: http://gerrit.cloudera.org:8080/21589
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> test_spilling_hash_join should be marked for serial execution
> -
>
> Key: IMPALA-13227
> URL: https://issues.apache.org/jira/browse/IMPALA-13227
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>
> test_spilling_hash_join consumes too much resources and parallel tests can 
> fail because of it.
> We should mark it for serial execution.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13088) Speedup IcebergDeleteBuilder

2024-07-16 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17866482#comment-17866482
 ] 

ASF subversion and git services commented on IMPALA-13088:
--

Commit f1133acc2a038a97426087675286ca1dcd863767 in impala's branch 
refs/heads/master from Zoltan Borok-Nagy
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=f1133acc2 ]

IMPALA-13088, IMPALA-13109: Use RoaringBitmap instead of sorted vector of int64s

This patch substitutes the sorted 64-bit integer vectors that we
use in IcebergDeleteNode to 64-bit roaring bitmaps. We use the
CRoaring library (version 4.0.0). CRoaring also offers C++ classes,
but this patch adds its own thin C++ wrapper class around the C
functions to get the best performance.

Toolchain Clang 5.0.1 was not able to compile CRoaring due to a
bug which is tracked by IMPALA-13190, this patch also fixes it
with a new toolchain.

Performance
I used an extended version of the "One Trillion Row" challenge. This
means after inserting 1 Trillion records to a table I also deleted /
updated lots of records (see statements at the end). So at the end
I had 1 Trillion data records and ~68.5 Billion delete records in
the table.

For the measurements I used clusters with 10 and 40 executors, and
executed the following query:

 SELECT station, min(measure), max(measure), avg(measure)
 FROM measurements_extra_1trc_partitioned
 GROUP BY 1
 ORDER BY 1;

JOIN BUILD times:
++--+--+
| Implementation | 10 executors | 40 executors |
++--+--+
| Sorted vectors | CRASH| 4m15s|
| Roaring bitmap | 6m35s| 1m51s|
++--+--+

10 executors cluster with sorted vectors failed to run the query because
executors crashed due to out-of-memory.

Memory usage (VmRSS) for 10 executors:
+++
| Implementation |  10 executors  |
+++
| Sorted vectors | 54.4 GB (before CRASH) |
| Roaring bitmap | 7.4 GB |
+++

The resource estimations were wrong when MT_DOP was greater than 1. This
has been also fixed.

Testing:
 * added tests for RoaringBitmap64
 * added tests for resource estimations

Statements I used to delete / update the records for the One Trillion
Row challenge:

create table measurements_extra_1trc_partitioned(
station string, ts timestamp, sensor_type int, measure decimal(5,2))
partitioned by spec (bucket(11, station), day(ts),
truncate(10, sensor_type))
stored as iceberg;

The original challenge didn't have any row-level modifications, columns
'ts' and 'sensor_type' are new:
 'ts': timestamps that span a year
 'sensor_type': integer between 0 and 100

Both 'ts' and 'sensor_type' has uniform distribution.

Ingested data with the help of the original table One Trillion Row
challenge, then issued the following DML statements:

-- DELETE ~10 Billion
delete from measurements_extra_1trc_partitioned
where sensor_type = 13;

-- UPDATE ~220 Million
update measurements_extra_1trc_partitioned
set measure = cast(measure - 2 as decimal(5,2))
  where station in ('Budapest', 'Paris', 'Zurich', 'Kuala Lumpur')
  and sensor_type in (7, 17, 77);

-- DELETE ~7.1 Billion
delete from measurements_extra_1trc_partitioned
where ts between '2024-01-15 11:30:00' and '2024-09-10 11:30:00'
  and sensor_type between 45 and 51
  and station regexp '[ATZ].*';

-- UPDATE ~334 Million
update measurements_extra_1trc_partitioned
set measure = cast(measure + 5 as decimal(5,2))
where station in ('Accra', 'Addis Ababa', 'Entebbe', 'Helsinki',
'Hong Kong', 'Nairobi', 'Ottawa', 'Tauranga', 'Yaounde', 'Zagreb',
'Zurich')
  and ts > '2024-11-05 22:30:00'
  and sensor_type > 90;

-- DELETE 50.6 Billion
delete from measurements_extra_1trc_partitioned
where
  sensor_type between 65 and 77
  and ts > '2024-08-11 12:00:00'
;

-- UPDATE ~200 Million
update measurements_extra_1trc_partitioned
set measure = cast(measure + 3.5 as decimal(5,2))
where
  sensor_type in (56, 66, 76, 86, 96)
  and ts < '2024-03-17 01:00:00'
  and (station like 'Z%' or station like 'Y%');

Change-Id: Ib769965d094149e99c43e0044914d9e76107
Reviewed-on: http://gerrit.cloudera.org:8080/21557
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Speedup IcebergDeleteBuilder
> 
>
> Key: IMPALA-13088
> URL: https://issues.apache.org/jira/browse/IMPALA-13088
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-iceberg
>
> When there are lots of delete records IcebergDeleteBuilder can become a 
> bottleneck. Since the left side of the JOIN 

[jira] [Commented] (IMPALA-13109) Use RoaringBitmap in IcebergDeleteNode

2024-07-16 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17866483#comment-17866483
 ] 

ASF subversion and git services commented on IMPALA-13109:
--

Commit f1133acc2a038a97426087675286ca1dcd863767 in impala's branch 
refs/heads/master from Zoltan Borok-Nagy
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=f1133acc2 ]

IMPALA-13088, IMPALA-13109: Use RoaringBitmap instead of sorted vector of int64s

This patch substitutes the sorted 64-bit integer vectors that we
use in IcebergDeleteNode to 64-bit roaring bitmaps. We use the
CRoaring library (version 4.0.0). CRoaring also offers C++ classes,
but this patch adds its own thin C++ wrapper class around the C
functions to get the best performance.

Toolchain Clang 5.0.1 was not able to compile CRoaring due to a
bug which is tracked by IMPALA-13190, this patch also fixes it
with a new toolchain.

Performance
I used an extended version of the "One Trillion Row" challenge. This
means after inserting 1 Trillion records to a table I also deleted /
updated lots of records (see statements at the end). So at the end
I had 1 Trillion data records and ~68.5 Billion delete records in
the table.

For the measurements I used clusters with 10 and 40 executors, and
executed the following query:

 SELECT station, min(measure), max(measure), avg(measure)
 FROM measurements_extra_1trc_partitioned
 GROUP BY 1
 ORDER BY 1;

JOIN BUILD times:
++--+--+
| Implementation | 10 executors | 40 executors |
++--+--+
| Sorted vectors | CRASH| 4m15s|
| Roaring bitmap | 6m35s| 1m51s|
++--+--+

10 executors cluster with sorted vectors failed to run the query because
executors crashed due to out-of-memory.

Memory usage (VmRSS) for 10 executors:
+++
| Implementation |  10 executors  |
+++
| Sorted vectors | 54.4 GB (before CRASH) |
| Roaring bitmap | 7.4 GB |
+++

The resource estimations were wrong when MT_DOP was greater than 1. This
has been also fixed.

Testing:
 * added tests for RoaringBitmap64
 * added tests for resource estimations

Statements I used to delete / update the records for the One Trillion
Row challenge:

create table measurements_extra_1trc_partitioned(
station string, ts timestamp, sensor_type int, measure decimal(5,2))
partitioned by spec (bucket(11, station), day(ts),
truncate(10, sensor_type))
stored as iceberg;

The original challenge didn't have any row-level modifications, columns
'ts' and 'sensor_type' are new:
 'ts': timestamps that span a year
 'sensor_type': integer between 0 and 100

Both 'ts' and 'sensor_type' has uniform distribution.

Ingested data with the help of the original table One Trillion Row
challenge, then issued the following DML statements:

-- DELETE ~10 Billion
delete from measurements_extra_1trc_partitioned
where sensor_type = 13;

-- UPDATE ~220 Million
update measurements_extra_1trc_partitioned
set measure = cast(measure - 2 as decimal(5,2))
  where station in ('Budapest', 'Paris', 'Zurich', 'Kuala Lumpur')
  and sensor_type in (7, 17, 77);

-- DELETE ~7.1 Billion
delete from measurements_extra_1trc_partitioned
where ts between '2024-01-15 11:30:00' and '2024-09-10 11:30:00'
  and sensor_type between 45 and 51
  and station regexp '[ATZ].*';

-- UPDATE ~334 Million
update measurements_extra_1trc_partitioned
set measure = cast(measure + 5 as decimal(5,2))
where station in ('Accra', 'Addis Ababa', 'Entebbe', 'Helsinki',
'Hong Kong', 'Nairobi', 'Ottawa', 'Tauranga', 'Yaounde', 'Zagreb',
'Zurich')
  and ts > '2024-11-05 22:30:00'
  and sensor_type > 90;

-- DELETE 50.6 Billion
delete from measurements_extra_1trc_partitioned
where
  sensor_type between 65 and 77
  and ts > '2024-08-11 12:00:00'
;

-- UPDATE ~200 Million
update measurements_extra_1trc_partitioned
set measure = cast(measure + 3.5 as decimal(5,2))
where
  sensor_type in (56, 66, 76, 86, 96)
  and ts < '2024-03-17 01:00:00'
  and (station like 'Z%' or station like 'Y%');

Change-Id: Ib769965d094149e99c43e0044914d9e76107
Reviewed-on: http://gerrit.cloudera.org:8080/21557
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Use RoaringBitmap in IcebergDeleteNode
> --
>
> Key: IMPALA-13109
> URL: https://issues.apache.org/jira/browse/IMPALA-13109
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-iceberg
>
> IcebergDeleteNode currently uses an ordered int64_t array for each data file 
> to hold the deleted 

[jira] [Commented] (IMPALA-13190) Backport Clang compiler fix to Toolchain Clang 5.0.1

2024-07-16 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17866484#comment-17866484
 ] 

ASF subversion and git services commented on IMPALA-13190:
--

Commit f1133acc2a038a97426087675286ca1dcd863767 in impala's branch 
refs/heads/master from Zoltan Borok-Nagy
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=f1133acc2 ]

IMPALA-13088, IMPALA-13109: Use RoaringBitmap instead of sorted vector of int64s

This patch substitutes the sorted 64-bit integer vectors that we
use in IcebergDeleteNode to 64-bit roaring bitmaps. We use the
CRoaring library (version 4.0.0). CRoaring also offers C++ classes,
but this patch adds its own thin C++ wrapper class around the C
functions to get the best performance.

Toolchain Clang 5.0.1 was not able to compile CRoaring due to a
bug which is tracked by IMPALA-13190, this patch also fixes it
with a new toolchain.

Performance
I used an extended version of the "One Trillion Row" challenge. This
means after inserting 1 Trillion records to a table I also deleted /
updated lots of records (see statements at the end). So at the end
I had 1 Trillion data records and ~68.5 Billion delete records in
the table.

For the measurements I used clusters with 10 and 40 executors, and
executed the following query:

 SELECT station, min(measure), max(measure), avg(measure)
 FROM measurements_extra_1trc_partitioned
 GROUP BY 1
 ORDER BY 1;

JOIN BUILD times:
++--+--+
| Implementation | 10 executors | 40 executors |
++--+--+
| Sorted vectors | CRASH| 4m15s|
| Roaring bitmap | 6m35s| 1m51s|
++--+--+

10 executors cluster with sorted vectors failed to run the query because
executors crashed due to out-of-memory.

Memory usage (VmRSS) for 10 executors:
+++
| Implementation |  10 executors  |
+++
| Sorted vectors | 54.4 GB (before CRASH) |
| Roaring bitmap | 7.4 GB |
+++

The resource estimations were wrong when MT_DOP was greater than 1. This
has been also fixed.

Testing:
 * added tests for RoaringBitmap64
 * added tests for resource estimations

Statements I used to delete / update the records for the One Trillion
Row challenge:

create table measurements_extra_1trc_partitioned(
station string, ts timestamp, sensor_type int, measure decimal(5,2))
partitioned by spec (bucket(11, station), day(ts),
truncate(10, sensor_type))
stored as iceberg;

The original challenge didn't have any row-level modifications, columns
'ts' and 'sensor_type' are new:
 'ts': timestamps that span a year
 'sensor_type': integer between 0 and 100

Both 'ts' and 'sensor_type' has uniform distribution.

Ingested data with the help of the original table One Trillion Row
challenge, then issued the following DML statements:

-- DELETE ~10 Billion
delete from measurements_extra_1trc_partitioned
where sensor_type = 13;

-- UPDATE ~220 Million
update measurements_extra_1trc_partitioned
set measure = cast(measure - 2 as decimal(5,2))
  where station in ('Budapest', 'Paris', 'Zurich', 'Kuala Lumpur')
  and sensor_type in (7, 17, 77);

-- DELETE ~7.1 Billion
delete from measurements_extra_1trc_partitioned
where ts between '2024-01-15 11:30:00' and '2024-09-10 11:30:00'
  and sensor_type between 45 and 51
  and station regexp '[ATZ].*';

-- UPDATE ~334 Million
update measurements_extra_1trc_partitioned
set measure = cast(measure + 5 as decimal(5,2))
where station in ('Accra', 'Addis Ababa', 'Entebbe', 'Helsinki',
'Hong Kong', 'Nairobi', 'Ottawa', 'Tauranga', 'Yaounde', 'Zagreb',
'Zurich')
  and ts > '2024-11-05 22:30:00'
  and sensor_type > 90;

-- DELETE 50.6 Billion
delete from measurements_extra_1trc_partitioned
where
  sensor_type between 65 and 77
  and ts > '2024-08-11 12:00:00'
;

-- UPDATE ~200 Million
update measurements_extra_1trc_partitioned
set measure = cast(measure + 3.5 as decimal(5,2))
where
  sensor_type in (56, 66, 76, 86, 96)
  and ts < '2024-03-17 01:00:00'
  and (station like 'Z%' or station like 'Y%');

Change-Id: Ib769965d094149e99c43e0044914d9e76107
Reviewed-on: http://gerrit.cloudera.org:8080/21557
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Backport Clang compiler fix to Toolchain Clang 5.0.1
> 
>
> Key: IMPALA-13190
> URL: https://issues.apache.org/jira/browse/IMPALA-13190
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Toolchain
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>
> Toolchain Clang 5.0.1 fails to compile the CRoaring library.
> There was an 

[jira] [Commented] (IMPALA-13001) Add graceful and force shutdown for packaging script.

2024-07-16 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17866454#comment-17866454
 ] 

ASF subversion and git services commented on IMPALA-13001:
--

Commit 8af0ce8ed6659fdda9b81847d4871c14036e173c in impala's branch 
refs/heads/master from Xiang Yang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=8af0ce8ed ]

IMPALA-13001: Support graceful and force shutdown for impala.sh

This patch add graceful and force shutdown support for impala.sh.

This patch also keep the stdout and stderr log when startup.

This patch also fix some bugs in the impala.sh, including:
 - empty service name check.
 - restart command cannot work.

Testing:
 - Manually deploy package on Ubuntu22.04 and verify it.

Change-Id: Ib7743234952ba6b12694ecc68a920d59fea0d4ba
Reviewed-on: http://gerrit.cloudera.org:8080/21297
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Add graceful and force shutdown for packaging script.
> -
>
> Key: IMPALA-13001
> URL: https://issues.apache.org/jira/browse/IMPALA-13001
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: XiangYang
>Assignee: XiangYang
>Priority: Major
>
> Add graceful and force shutdown for packaging script to finish the TODO in 
> https://github.com/apache/impala/blob/4.3.0/package/bin/impala-env.sh#L33.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13193) RuntimeFilter on parquet dictionary should evaluate null values

2024-07-10 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17864837#comment-17864837
 ] 

ASF subversion and git services commented on IMPALA-13193:
--

Commit c53987480726b114e0c3537c71297df2834a4962 in impala's branch 
refs/heads/master from ttz
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=c53987480 ]

IMPALA-13193: RuntimeFilter on parquet dictionary should evaluate NULL values

NULL values are not included in the parquet dictionary. If the column
contains NULL values, add evaluating for NULL values.

Testing:
- Added a test case in parquet-dictionary-runtime-filter.test

Change-Id: I0f69405c0c08feb47141d080a828847e5094163f
Reviewed-on: http://gerrit.cloudera.org:8080/21566
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> RuntimeFilter on parquet dictionary should evaluate null values
> ---
>
> Key: IMPALA-13193
> URL: https://issues.apache.org/jira/browse/IMPALA-13193
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.1.0, Impala 4.2.0, Impala 4.1.1, Impala 4.1.2, 
> Impala 4.3.0, Impala 4.4.0
>Reporter: Quanlong Huang
>Assignee: Zhi Tang
>Priority: Critical
>  Labels: correctness
>
> IMPALA-10910, IMPALA-5509 introduces an optimization to evaluate runtime 
> filter on parquet dictionary values. If non of the values can pass the check, 
> the whole row group will be skipped. However, NULL values are not included in 
> the parquet dictionary. Runtime filters that accept NULL values might 
> incorrectly reject the row group if none of the dictionary values can pass 
> the check.
> Here are steps to reproduce the bug:
> {code:sql}
> create table parq_tbl (id bigint, name string) stored as parquet;
> insert into parq_tbl values (0, "abc"), (1, NULL), (2, NULL), (3, "abc");
> create table dim_tbl (name string);
> insert into dim_tbl values (NULL);
> select * from parq_tbl p join dim_tbl d
>   on COALESCE(p.name, '') = COALESCE(d.name, '');{code}
> The SELECT query should return 2 rows but now it returns 0 rows.
> A workaround is to disable this optimization:
> {code:sql}
> set PARQUET_DICTIONARY_RUNTIME_FILTER_ENTRY_LIMIT=0;{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12786) Optimize count(*) for JSON scans

2024-07-10 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17864738#comment-17864738
 ] 

ASF subversion and git services commented on IMPALA-12786:
--

Commit ec59578106b9d9adcdc4d4ea2223d3531eac9cbc in impala's branch 
refs/heads/master from Eyizoha
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=ec5957810 ]

IMPALA-12786: Optimize count(*) for JSON scans

When performing zero slots scans on a JSON table for operations like
count(*), we don't require specific data from the JSON, we only need the
number of top-level JSON objects. However, the current JSON parser based
on rapidjson still decodes and copies specific data from the JSON, even
in zero slots scans. Skipping these steps can significantly improve scan
performance.

This patch introduces a JSON skipper to conduct zero slots scans on JSON
data. Essentially, it is a simplified version of a rapidjson parser,
removing specific data decoding and copying operations, resulting in
faster parsing of the number of JSON objects. The skipper retains the
ability to recognize malformed JSON and provide specific error codes
same as the rapidjson parser. Nevertheless, as it bypasses specific
data parsing, it cannot identify string encoding errors or numeric
overflow errors. Despite this, these data errors do not impact the
counting of JSON objects, so it is acceptable to ignore them. The TEXT
scanner exhibits similar behavior.

Additionally, a new query option, disable_optimized_json_count_star, has
been added to disable this optimization and revert to the old behavior.

In the performance test of TPC-DS with a format of json/none and a scale
of 10GB, the performance optimization is shown in the following tables:
+---+---+++-++---++---++-++
| Workload  | Query | File Format| Avg(s) | Base 
Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW 
Zval | Tval   |
+---+---+++-++---++---++-++
| TPCDS(10) | TPCDS-Q_COUNT_UNOPTIMIZED | json / none / none | 6.78   | 6.88
|   -1.46%   |   4.93%   |   3.63%| 9 |   -1.51%   | -0.74  
 | -0.72  |
| TPCDS(10) | TPCDS-Q_COUNT_ZERO_SLOT   | json / none / none | 2.42   | 6.75
| I -64.20%  |   6.44%   |   4.58%| 9 | I -177.75% | -3.36  
 | -37.55 |
| TPCDS(10) | TPCDS-Q_COUNT_OPTIMIZED   | json / none / none | 2.42   | 7.03
| I -65.63%  |   3.93%   |   4.39%| 9 | I -194.13% | -3.36  
 | -42.82 |
+---+---+++-++---++---++-++

(I) Improvement: TPCDS(10) TPCDS-Q_COUNT_ZERO_SLOT [json / none / none] (6.75s 
-> 2.42s [-64.20%])
+--++-+--+++--+--+++---++---+
| Operator | % of Query | Avg | Base Avg | Delta(Avg) | StdDev(%)  | 
Max  | Base Max | Delta(Max) | #Hosts | #Inst | #Rows  | Est #Rows |
+--++-+--+++--+--+++---++---+
| 01:AGGREGATE | 2.58%  | 54.85ms | 58.88ms  | -6.85% | * 14.43% * | 
115.82ms | 133.11ms | -12.99%| 3  | 3 | 3  | 1 |
| 00:SCAN HDFS | 97.41% | 2.07s   | 6.07s| -65.84%|   5.87%| 
2.43s| 6.95s| -65.01%| 3  | 3 | 28.80M | 143.83M   |
+--++-+--+++--+--+++---++---+

(I) Improvement: TPCDS(10) TPCDS-Q_COUNT_OPTIMIZED [json / none / none] (7.03s 
-> 2.42s [-65.63%])
+--++---+--++---+---+--+++---++---+
| Operator | % of Query | Avg   | Base Avg | Delta(Avg) | StdDev(%) | Max   
| Base Max | Delta(Max) | #Hosts | #Inst | #Rows  | Est #Rows |
+--++---+--++---+---+--+++---++---+
| 00:SCAN HDFS | 99.35% | 2.07s | 6.49s| -68.15%|   4.83%   | 2.37s 
| 7.49s| -68.32%| 3  | 3 | 28.80M | 143.83M   |
+--++---+--++---+---+--+++---++---+

Testing:
- Added new test cases in TestQueriesJsonTables to verify that query
  results are consistent before and after optimization.
- 

[jira] [Commented] (IMPALA-13203) ExprRewriter did not rewrite 'id = 0 OR false' as expected

2024-07-10 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17864737#comment-17864737
 ] 

ASF subversion and git services commented on IMPALA-13203:
--

Commit 018f5980884fcf34f78ee3898b06531c05826c8a in impala's branch 
refs/heads/master from Eyizoha
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=018f59808 ]

IMPALA-13203: Rewrite 'id = 0 OR false' as expected

Currently, ExprRewriter cannot rewrite 'id = 0 OR false' to 'id = 0' as
expected. More precisely, it fails to rewrite any cases where a boolean
literal follows 'AND/OR' as expected.
The issue is that the CompoundPredicate generated by NormalizeExprsRule
is not analyzed, causing SimplifyConditionalsRule to skip the rewrite.
This patch fixes the issue by adding analysis of the rewritten
CompoundPredicate in NormalizeExprsRule.

Testing:
- Modified and passed FE test case
  ExprRewriteRulesTest#testCompoundPredicate
- Modified and passed related test case

Change-Id: I9d9fffdd1cc644cc2b48f08c2509f22a72362d22
Reviewed-on: http://gerrit.cloudera.org:8080/21568
Reviewed-by: Csaba Ringhofer 
Tested-by: Impala Public Jenkins 


>  ExprRewriter did not rewrite 'id = 0 OR false' as expected
> ---
>
> Key: IMPALA-13203
> URL: https://issues.apache.org/jira/browse/IMPALA-13203
> Project: IMPALA
>  Issue Type: Bug
>  Components: fe
>Affects Versions: Impala 4.4.0
>Reporter: Zihao Ye
>Assignee: Zihao Ye
>Priority: Minor
>
> The comments in the SimplifyConditionalsRule class mention that 'id = 0 OR 
> false' would be rewritten to 'id = 0', but in reality, it does not perform 
> this rewrite as expected. After executing such SQL, we can see in the text 
> plan that:
> {code:sql}
> Analyzed query: SELECT * FROM functional.alltypestiny WHERE FALSE OR id = 
> CAST(0
> AS INT) {code}
> The issue appears to be that the CompoundPredicate generated by 
> NormalizeExprsRule was not analyzed, causing the SimplifyConditionalsRule to 
> skip the rewrite.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13196) Query timeline page can not display normally when Knox proxying is being used

2024-07-09 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17864328#comment-17864328
 ] 

ASF subversion and git services commented on IMPALA-13196:
--

Commit e419545c1c2b6fa0b6d23643d3b2eb0faa4dce52 in impala's branch 
refs/heads/master from zhangyifan27
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=e419545c1 ]

IMPALA-13196: Fully qualify urls in www/query_timeline

This patch fixes some links in www/query_timeline with adding
{{ __common__.host-url }} prefix to fully qualify urls when
Knox proxying is being used.

Testing:
  - Ran in a cluster and manually checked the query_timeline page
works as expected.

Change-Id: I4a701f37cf257a0b11a027c9c598645ca0c997f3
Reviewed-on: http://gerrit.cloudera.org:8080/21564
Reviewed-by: Quanlong Huang 
Tested-by: Impala Public Jenkins 


> Query timeline page can not display normally when Knox proxying is being used
> -
>
> Key: IMPALA-13196
> URL: https://issues.apache.org/jira/browse/IMPALA-13196
> Project: IMPALA
>  Issue Type: Bug
>Reporter: YifanZhang
>Priority: Major
>
> We should use absolute url in query_timeline.tmpl.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9441) TestHS2.test_get_schemas is flaky in local catalog mode

2024-07-01 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17861098#comment-17861098
 ] 

ASF subversion and git services commented on IMPALA-9441:
-

Commit 00d0b0dda1e215d8e91ff52688fe6654bee52282 in impala's branch 
refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=00d0b0dda ]

IMPALA-9441,IMPALA-13170: Ops listing dbs/tables should handle db not exists

We have some operations listing the dbs/tables in the following steps:
  1. Get the db list
  2. Do something on the db which could fail if the db no longer exists
For instance, when authorization is enabled, SHOW DATABASES would need a
step-2 to get the owner of each db. This is fine in the legacy catalog
mode since the whole Db object is cached in the coordinator side.
However, in the local catalog mode, the msDb could be missing in the
local cache. Coordinator then triggers a getPartialCatalogObject RPC to
load it from catalogd. If the db no longer exists in catalogd, such step
will fail.

The same in GetTables HS2 requests when listing all tables in all dbs.
In step-2 we list the table names for a db. Though it exists when we get
the db list, it could be dropped when we start listing the table names
in it.

This patch adds codes to handle the exceptions due to db no longer
exists. Also improves GetSchemas to not list the table names to get rid
of the same issue.

Tests:
 - Add e2e tests

Change-Id: I2bd40d33859feca2bbd2e5f1158f3894a91c2929
Reviewed-on: http://gerrit.cloudera.org:8080/21546
Reviewed-by: Yida Wu 
Tested-by: Impala Public Jenkins 


> TestHS2.test_get_schemas is flaky in local catalog mode
> ---
>
> Key: IMPALA-9441
> URL: https://issues.apache.org/jira/browse/IMPALA-9441
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Sahil Takiar
>Assignee: Quanlong Huang
>Priority: Critical
>
> Saw this once on a ubuntu-16.04-dockerised-tests job:
> {code:java}
> Error Message
> hs2/hs2_test_suite.py:63: in add_session lambda: fn(self)) 
> hs2/hs2_test_suite.py:44: in add_session_helper fn() 
> hs2/hs2_test_suite.py:63: in  lambda: fn(self)) 
> hs2/test_hs2.py:423: in test_get_schemas 
> TestHS2.check_response(get_schemas_resp) hs2/hs2_test_suite.py:131: in 
> check_response assert response.status.statusCode == expected_status_code 
> E   assert 3 == 0 E+  where 3 = 3 E+where 3 = 
> TStatus(errorCode=None, errorMessage="DatabaseNotFoundException: Database 
> 'test_compute_stats_impala_2201_e794b8f' not found\n", sqlState='HY000', 
> infoMessages=None, statusCode=3).statusCode E+  where 
> TStatus(errorCode=None, errorMessage="DatabaseNotFoundException: Database 
> 'test_compute_stats_impala_2201_e794b8f' not found\n", sqlState='HY000', 
> infoMessages=None, statusCode=3) = TStatus(errorCode=None, 
> errorMessage="DatabaseNotFoundException: Database 
> 'test_compute_stats_impala_2201_e794b8f' not found\n", sqlState='HY000', 
> infoMessages=None, statusCode=3) E+where TStatus(errorCode=None, 
> errorMessage="DatabaseNotFoundException: Database 
> 'test_compute_stats_impala_2201_e794b8f' not found\n", sqlState='HY000', 
> infoMessages=None, statusCode=3) = 
> TGetSchemasResp(status=TStatus(errorCode=None, 
> errorMessage="DatabaseNotFoundException: Database 
> 'test_compute_stats_i...nHandle(hasResultSet=False, modifiedRowCount=None, 
> operationType=3, operationId=THandleIdentifier(secret='', guid=''))).status
> Stacktrace
> hs2/hs2_test_suite.py:63: in add_session
> lambda: fn(self))
> hs2/hs2_test_suite.py:44: in add_session_helper
> fn()
> hs2/hs2_test_suite.py:63: in 
> lambda: fn(self))
> hs2/test_hs2.py:423: in test_get_schemas
> TestHS2.check_response(get_schemas_resp)
> hs2/hs2_test_suite.py:131: in check_response
> assert response.status.statusCode == expected_status_code
> E   assert 3 == 0
> E+  where 3 = 3
> E+where 3 = TStatus(errorCode=None, 
> errorMessage="DatabaseNotFoundException: Database 
> 'test_compute_stats_impala_2201_e794b8f' not found\n", sqlState='HY000', 
> infoMessages=None, statusCode=3).statusCode
> E+  where TStatus(errorCode=None, 
> errorMessage="DatabaseNotFoundException: Database 
> 'test_compute_stats_impala_2201_e794b8f' not found\n", sqlState='HY000', 
> infoMessages=None, statusCode=3) = TStatus(errorCode=None, 
> errorMessage="DatabaseNotFoundException: Database 
> 'test_compute_stats_impala_2201_e794b8f' not found\n", sqlState='HY000', 
> infoMessages=None, statusCode=3)
> E+where TStatus(errorCode=None, 
> errorMessage="DatabaseNotFoundException: Database 
> 'test_compute_stats_impala_2201_e794b8f' not found\n", sqlState='HY000', 
> infoMessages=None, statusCode=3) = 
> 

[jira] [Commented] (IMPALA-13170) InconsistentMetadataFetchException due to database dropped when showing databases

2024-07-01 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17861099#comment-17861099
 ] 

ASF subversion and git services commented on IMPALA-13170:
--

Commit 00d0b0dda1e215d8e91ff52688fe6654bee52282 in impala's branch 
refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=00d0b0dda ]

IMPALA-9441,IMPALA-13170: Ops listing dbs/tables should handle db not exists

We have some operations listing the dbs/tables in the following steps:
  1. Get the db list
  2. Do something on the db which could fail if the db no longer exists
For instance, when authorization is enabled, SHOW DATABASES would need a
step-2 to get the owner of each db. This is fine in the legacy catalog
mode since the whole Db object is cached in the coordinator side.
However, in the local catalog mode, the msDb could be missing in the
local cache. Coordinator then triggers a getPartialCatalogObject RPC to
load it from catalogd. If the db no longer exists in catalogd, such step
will fail.

The same in GetTables HS2 requests when listing all tables in all dbs.
In step-2 we list the table names for a db. Though it exists when we get
the db list, it could be dropped when we start listing the table names
in it.

This patch adds codes to handle the exceptions due to db no longer
exists. Also improves GetSchemas to not list the table names to get rid
of the same issue.

Tests:
 - Add e2e tests

Change-Id: I2bd40d33859feca2bbd2e5f1158f3894a91c2929
Reviewed-on: http://gerrit.cloudera.org:8080/21546
Reviewed-by: Yida Wu 
Tested-by: Impala Public Jenkins 


> InconsistentMetadataFetchException due to database dropped when showing 
> databases
> -
>
> Key: IMPALA-13170
> URL: https://issues.apache.org/jira/browse/IMPALA-13170
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 3.4.0
>Reporter: Yida Wu
>Assignee: Quanlong Huang
>Priority: Major
>
> Using impalad 3.4.0, an InconsistentMetadataFetchException occurs when 
> running "show databases" in Impala while simultaneously executing "drop 
> database" to drop the newly created database in Hive.
> Step is:
> 1, Creates database (Hive)
> 2, Creates tables (Hive)
> 3, Drops tables (Hive)
> 4, Run show databases (Impala)  Drop database (Hive)
> Logs in Impalad:
> {code:java}
> I0610 02:18:32.435815 278475 CatalogdMetaProvider.java:1354] 1:2] 
> Invalidated objects in cache: [list of database names, HMS_METADATA for DB 
> test_hive]
> I0610 02:18:32.436224 278475 jni-util.cc:288] 1:2] 
> org.apache.impala.catalog.local.InconsistentMetadataFetchException: Fetching 
> DATABASE failed. Could not find TCatalogObject(type:DATABASE, 
> catalog_version:0, db:TDatabase(db_name:test_hive))   
>   
>   
> 
>   at 
> org.apache.impala.catalog.local.CatalogdMetaProvider.sendRequest(CatalogdMetaProvider.java:424)
>   at 
> org.apache.impala.catalog.local.CatalogdMetaProvider.access$100(CatalogdMetaProvider.java:185)
>   at 
> org.apache.impala.catalog.local.CatalogdMetaProvider$2.call(CatalogdMetaProvider.java:643)
>   at 
> org.apache.impala.catalog.local.CatalogdMetaProvider$2.call(CatalogdMetaProvider.java:638)
>   at 
> org.apache.impala.catalog.local.CatalogdMetaProvider.loadWithCaching(CatalogdMetaProvider.java:521)
>   at 
> org.apache.impala.catalog.local.CatalogdMetaProvider.loadDb(CatalogdMetaProvider.java:635)
>   at org.apache.impala.catalog.local.LocalDb.getMetaStoreDb(LocalDb.java:91) 
>   at org.apache.impala.catalog.local.LocalDb.getOwnerUser(LocalDb.java:294)
>   at org.apache.impala.service.Frontend.getDbs(Frontend.java:1066)
>   at org.apache.impala.service.JniFrontend.getDbs(JniFrontend.java:301)
> I0610 02:18:32.436257 278475 status.cc:129] 1:2] 
> InconsistentMetadataFetchException: Fetching DATABASE failed. Could not find 
> TCatalogObject(type:DATABASE, catalog_version:0, 
> {code}
> Logs in Catalog:
> {code:java}
> I0610 02:18:16.190133 222885 MetastoreEvents.java:505] EventId: 141467532 
> EventType: CREATE_DATABASE Successfully added database test_hive 
> ...
> I0610 02:18:32.276082 222885 MetastoreEvents.java:516] EventId: 141467562 
> EventType: DROP_DATABASE Creating event 141467562 of type DROP_DATABASE on 
> database test_hive
> I0610 02:18:32.277876 222885 MetastoreEvents.java:254] Total number of events 
> received: 6 Total number of events filtered out: 0
> I0610 02:18:32.277910 222885 MetastoreEvents.java:258] Incremented skipped 
> metric to 2564
> I0610 02:18:32.279537 222885 

[jira] [Commented] (IMPALA-13120) Failed table loads are not tried to load again even though hive metastore is UP

2024-06-28 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17860737#comment-17860737
 ] 

ASF subversion and git services commented on IMPALA-13120:
--

Commit ab4e62d3c3dd623a8b5ad896641db07782cbb939 in impala's branch 
refs/heads/master from Venu Reddy
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=ab4e62d3c ]

IMPALA-13120: Load failed table without need for manual invalidate

If the metastore is down when the table load is triggered, catalogd
updates a new version of incomplete table with cause as
TableLoadingException. On coordinator/impalad, StmtMetadataLoader
loadTables that has been waiting for table load to complete,
considers the table as loaded. Then, during the analyzer’s table
resolve step, for the incomplete table, TableLoadingException
(i.e., Could not connect to meta store, failed to load metadata for
table and running invalidate metadata for table may resolve this
problem) is thrown.
Henceforth, any query on the table doesn’t trigger the load since
the table is incomplete with TableLoadingException cause. Even though
metastore is UP later at some time, queries continue to throw
the same exception. It is both misleading and also not required to
manually invalidate all such tables.
Note: Incomplete table with cause is considered as loaded.

This patch tries again to load the table that has previously failed
due to metastore connection error(i.e., a recoverable error), when a
query involving the table is fired. Idea is to keep track of table
object present in db that requires load. On successful/failed load,
table object in db is updated. Therefore the tracked table object
reference can be compared to table object in db to detect the
completion of load.

Testing:
 - Added end-to-end tests

Change-Id: Ia882fdd865ef716351be7f1eaf203a9fb04c1c15
Reviewed-on: http://gerrit.cloudera.org:8080/21478
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Failed table loads are not tried to load again even though hive metastore is 
> UP
> ---
>
> Key: IMPALA-13120
> URL: https://issues.apache.org/jira/browse/IMPALA-13120
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Venugopal Reddy K
>Priority: Major
>
> *Description:*
> If the metastore is down at the time when the table load is triggered, 
> catalogd creates a new IncompleteTable instance with 
> cause=TableLoadingException and updates catalog with a new version. And on 
> coordinator/impalad, StmtMetadataLoader loadTables() that has been waiting 
> for table load to complete, considers table as loaded/failed load. Then 
> during the analyzer’s table resolve step, if the table is incomplete, 
> TableLoadingException is thrown to user.
> Note: IncompleteTable with cause not being null is considered as loaded.
> *Henceforth,  queries on the table doesn’t trigger the table load(at 
> StmtMetadataLoader) since the table is IncompleteTable with non-null 
> cause(i.e.,TableLoadingException). Even though metastore is UP later at some 
> time, queries continue to fail with same TableLoadingException:*
> {{CAUSED BY: TableLoadingException: Failed to load metadata for table: 
> default.t1. Running 'invalidate metadata default.t1' may resolve this 
> problem.}}
> {{CAUSED BY: MetaException: Could not connect to meta store using any of the 
> URIs provided. Most recent failure: 
> org.apache.thrift.transport.TTransportException: java.net.ConnectException: 
> Connection refused (Connection refused)}}
> *At present, explicit Invalidate metadata is the only way to recover table 
> from this state.* {*}Queries executed after metastore is up should succeed 
> without the need for explicit invalidate metadata{*}{*}{{*}}
> *Steps to Reproduce:*
>  # create a table from hive and insert some data into it.
>  # Bring down the hive metastore  process
>  # Run a query on impala that triggers the table load. Query fails with 
> TableLoadingException.
>  # Bring up the hive metastore process
>  # Run the query on impala again. It still fails with same 
> TableLoadingException.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12093) impala-shell should preserve all cookies by default

2024-06-27 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17860659#comment-17860659
 ] 

ASF subversion and git services commented on IMPALA-12093:
--

Commit 100693d5adce3a5db38bb171cae4e9c0dec5e20e in impala's branch 
refs/heads/master from Michael Smith
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=100693d5a ]

IMPALA-12093: impala-shell to preserve all cookies

Updates impala-shell to preserve all cookies by default, defined as
setting 'http_cookie_names=*'. Prior behavior of restricting cookies to
a user-specified list is preserved when 'http_cookie_names' is given any
value besides '*'. Setting 'http_cookie_names=' prevents any cookies
from being preserved.

Adds verbose output that prints all cookies that are preserved by the
HTTP client.

Existing cookie tests with LDAP still work. Adds a test where Impala
returns an extra cookie, and test verifies that verbose mode prints all
expected cookies.

Change-Id: Ic81f790288460b086ab218e6701e8115a996dfa7
Reviewed-on: http://gerrit.cloudera.org:8080/19827
Reviewed-by: Impala Public Jenkins 
Tested-by: Michael Smith 


> impala-shell should preserve all cookies by default
> ---
>
> Key: IMPALA-12093
> URL: https://issues.apache.org/jira/browse/IMPALA-12093
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Clients
>Affects Versions: Impala 4.3.0
>Reporter: Joe McDonnell
>Assignee: Michael Smith
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> Currently, impala-shell's http_cookie_names parameter specifies which cookies 
> should be preserved and sent back with subsequent requests. This defaults to 
> a couple well-known cookie names that Impala uses.
> In general, we don't know what proxies are between impala-shell and Impala, 
> and we don't know what cookie name they rely on being preserved. As an 
> example, Apache Knox can rely on a cookie it sets to route requests to the 
> appropriate Impala coordinator. Limiting our cookie preservation to a small 
> allow list makes this much more brittle and hard to use. Clients need to know 
> the right list of cookies to put in http_cookie_names, and that is not 
> obvious.
> It seems like the default behavior should be to preserve all cookies. We can 
> keep a way to disallow or limit the cookies for unusual cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13028) libkudu_client.so is not stripped in the DEB/RPM packages

2024-06-27 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17860445#comment-17860445
 ] 

ASF subversion and git services commented on IMPALA-13028:
--

Commit aea057f095fecb331bc0c58687c3f0ac4f6affa8 in impala's branch 
refs/heads/master from Xiang Yang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=aea057f09 ]

IMPALA-13028: Strip dynamic link libraries in Linux DEB/RPM packages

This optimization can reduce the DEB package size from 611MB to 554MB,
and reduce the kudu client library size from 188MB to 10.5MB at the
same time.

Testing:
 - Manually make a DEB package and check the dynamic link libraries
   whether be stripped.

Change-Id: Ie7bee0b4ef904db3706a350f17bcd68d769aa5ad
Reviewed-on: http://gerrit.cloudera.org:8080/21542
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> libkudu_client.so is not stripped in the DEB/RPM packages
> -
>
> Key: IMPALA-13028
> URL: https://issues.apache.org/jira/browse/IMPALA-13028
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Reporter: Quanlong Huang
>Assignee: XiangYang
>Priority: Major
>
> The current DEB package is 611M on ubuntu18.04. Here are the top-10 largest 
> files:
> {noformat}
> 14 MB 
> ./opt/impala/lib/jars/hive-standalone-metastore-3.1.3000.7.2.18.0-369.jar
> 15 MB ./opt/impala/lib/jars/kudu-client-e742f86f6d.jar
> 20 MB ./opt/impala/lib/native/libstdc++.so.6.0.28
> 22 MB ./opt/impala/lib/jars/js-22.3.0.jar
> 29 MB ./opt/impala/lib/jars/iceberg-hive-runtime-1.3.1.7.2.18.0-369.jar
> 60 MB ./opt/impala/lib/jars/ozone-filesystem-hadoop3-1.3.0.7.2.18.0-369.jar
> 84 MB ./opt/impala/util/impala-profile-tool
> 85 MB ./opt/impala/sbin/impalad
> 175 MB ./opt/impala/lib/jars/impala-minimal-s3a-aws-sdk-4.4.0-SNAPSHOT.jar
> 188 MB ./opt/impala/lib/native/libkudu_client.so.0.1.0{noformat}
> It appears that we just strip binaries built by Impala, e.g. impalad and 
> impala-profile-tool.
> libkudu_client.so.0.1.0 remains the same as the one in the toolchain folder.
> {code:bash}
> $ ll -th 
> toolchain/toolchain-packages-gcc10.4.0/kudu-e742f86f6d/release/lib/libkudu_client.so.0.1.0
> -rw-r--r-- 1 quanlong quanlong 189M 10月 18  2023 
> toolchain/toolchain-packages-gcc10.4.0/kudu-e742f86f6d/release/lib/libkudu_client.so.0.1.0
> $ file 
> toolchain/toolchain-packages-gcc10.4.0/kudu-e742f86f6d/release/lib/libkudu_client.so.0.1.0
> toolchain/toolchain-packages-gcc10.4.0/kudu-e742f86f6d/release/lib/libkudu_client.so.0.1.0:
>  ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, 
> with debug_info, not stripped{code}
> CC [~yx91490] [~boroknagyz] [~rizaon]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8042) Better selectivity estimate for BETWEEN

2024-06-26 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-8042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17860283#comment-17860283
 ] 

ASF subversion and git services commented on IMPALA-8042:
-

Commit 101e10ba3189db0e115cfb98bb8fe7ac1b108186 in impala's branch 
refs/heads/master from Riza Suminto
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=101e10ba3 ]

IMPALA-6311: Lower max_filter_error_rate to 10%

Recent changes such as IMPALA-11924 and IMPALA-8042 managed make NDV
estimate more accurate in some cases. However, the more
accurate (smaller) NDV estimates after these changes have exacerbated
the problem with the 75% default FPP, which causes more cases of badly
undersized filters.

This patch lower default value of max_filter_error_rate flag from 75% to
10%. Lower target FPP will result in doubling runtime filter size most
of the time when previous FPP is greater than 10%.

Testing:
- Pass exhaustive tests.
- Manually ran a TPC-DS test at 3 TB comparing 10% to 75%. A value of
  10% improves q94 by 2x and q95 by 5x, improves total query time and
  geomean time by a few percent, and doesn't cause a significant (> 10%)
  regression in any individual query.

Change-Id: I4104e65cc3ce0ef4b36f6420f5044f2cdba9de04
Reviewed-on: http://gerrit.cloudera.org:8080/21552
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Better selectivity estimate for BETWEEN
> ---
>
> Key: IMPALA-8042
> URL: https://issues.apache.org/jira/browse/IMPALA-8042
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 3.1.0
>Reporter: Paul Rogers
>Assignee: Riza Suminto
>Priority: Minor
> Fix For: Impala 4.5.0
>
>
> The analyzer rewrites a BETWEEN expression into a pair of inequalities.  
> IMPALA-8037 explains that the planner then groups all such non-quality 
> conditions together and assigns a selectivity of 0.1. IMPALA-8031 explains 
> that the analyzer should handle inequalities better.
> BETWEEN is a special case and informs the final result. If we assume a 
> selectivity of s for inequality, then BETWEEN should be something like s/2. 
> The intuition is that if c >= x includes, say, ⅓ of values, and c <= y 
> includes a third of values, then c BETWEEN x AND y should be a narrower set 
> of values, say ⅙.
> [Ramakrishnan an 
> Gherke|http://pages.cs.wisc.edu/~dbbook/openAccess/Minibase/optimizer/costformula.html\
>  recommend 0.4 for between, 0.3 for inequality, and 0.3^2 = 0.09 for the 
> general expression x <= c AND c <= Y. Note the discrepancy between the 
> compound inequality case and the BETWEEN case, likely reflecting the 
> additional information we obtain when the user chooses to use BETWEEN.
> To implement a special BETWEEN selectivity in Impala, we must remember the 
> selectivity of BETWEEN during the rewrite to a compound inequality.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13128) disk-file-test hangs on ARM + UBSAN test jobs

2024-06-26 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17860284#comment-17860284
 ] 

ASF subversion and git services commented on IMPALA-13128:
--

Commit 8d05f5134cc95f53e4e4bbd8ceb9de88b845fda1 in impala's branch 
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=8d05f5134 ]

IMPALA-13128: disk-file-test Hangs on ARM + UBSAN Test Jobs

The Jenkins jobs that run the UBSAN tests on ARM were occaisonally
hanging on the disk-file-test. This commit fixes these hangs by
upgrading Google Test and implementing the Death Test handling
functionality which safely runs tests that expect the process to die.
See https://github.com/google/googletest/blob/main/docs/advanced.md#death-tests
for details on known problems with running death tests and threads at
the same time causing tests to hang.

Testing was accomplished by running the disk-file-test repeatedly in a
loop on a RHEL 8.9 ARM machine. Before this fix was implemented, this
test would run up to 70 times before it hung. After the fix was
implemented, the test ran 2,490 times and was still running when it was
stopped. These test runs had durations between 18.7 and 19.9 seconds
which means disk-file-test now takes about 15 seconds longer than its
previous duration of about 4.4 seconds.

Change-Id: Ie01f7781f24644a66e9ec52652450116f5cb4297
Reviewed-on: http://gerrit.cloudera.org:8080/21544
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> disk-file-test hangs on ARM + UBSAN test jobs
> -
>
> Key: IMPALA-13128
> URL: https://issues.apache.org/jira/browse/IMPALA-13128
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Priority: Critical
>  Labels: broken-build, flaky
>
> The UBSAN ARM job (running on Redhat 8) has been hanging then timing out with 
> this being the last output:
> {noformat}
> 23:06:47  63/147 Test  #63: disk-io-mgr-test .   Passed   
> 43.42 sec
> 23:07:30 Start  64: disk-file-test
> 23:07:30 
> 18:47:00 
> 18:47:00  run-all-tests.sh TIMED OUT! {noformat}
> This has happened multiple times, but it looks limited to ARM + UBSAN. The 
> jobs take stack traces, but only of the running impalads / HMS.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-11924) Bloom filter size is unaffected by column NDV

2024-06-26 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17860282#comment-17860282
 ] 

ASF subversion and git services commented on IMPALA-11924:
--

Commit 101e10ba3189db0e115cfb98bb8fe7ac1b108186 in impala's branch 
refs/heads/master from Riza Suminto
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=101e10ba3 ]

IMPALA-6311: Lower max_filter_error_rate to 10%

Recent changes such as IMPALA-11924 and IMPALA-8042 managed make NDV
estimate more accurate in some cases. However, the more
accurate (smaller) NDV estimates after these changes have exacerbated
the problem with the 75% default FPP, which causes more cases of badly
undersized filters.

This patch lower default value of max_filter_error_rate flag from 75% to
10%. Lower target FPP will result in doubling runtime filter size most
of the time when previous FPP is greater than 10%.

Testing:
- Pass exhaustive tests.
- Manually ran a TPC-DS test at 3 TB comparing 10% to 75%. A value of
  10% improves q94 by 2x and q95 by 5x, improves total query time and
  geomean time by a few percent, and doesn't cause a significant (> 10%)
  regression in any individual query.

Change-Id: I4104e65cc3ce0ef4b36f6420f5044f2cdba9de04
Reviewed-on: http://gerrit.cloudera.org:8080/21552
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Bloom filter size is unaffected by column NDV
> -
>
> Key: IMPALA-11924
> URL: https://issues.apache.org/jira/browse/IMPALA-11924
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Csaba Ringhofer
>Assignee: Csaba Ringhofer
>Priority: Major
>  Labels: bloom-filter, runtime-filters
> Fix For: Impala 4.3.0
>
>
> For bloom filter sizing Impala simply uses the the cardinality of the build 
> side while it could be clearly capped by NDV:
> https://github.com/apache/impala/blob/feb4a76ed4cb5b688143eb21370f78ec93133c56/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java#L661
> https://github.com/apache/impala/blob/feb4a76ed4cb5b688143eb21370f78ec93133c56/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java#L698
> E.g.:
> {code}
> use tpch_parquet;
> set RUNTIME_FILTER_MIN_SIZE=8192
> RUNTIME_FILTER_MIN_SIZE
> explain select count(*) from orders join customer on o_comment = c_mktsegment
> PLAN-ROOT SINK
> |
> 06:AGGREGATE [FINALIZE]
> |  output: count:merge(*)
> |  row-size=8B cardinality=1
> |
> 05:EXCHANGE [UNPARTITIONED]
> |
> 03:AGGREGATE
> |  output: count(*)
> |  row-size=8B cardinality=1
> |
> 02:HASH JOIN [INNER JOIN, BROADCAST]
> |  hash predicates: o_comment = c_mktsegment
> |  runtime filters: RF000 <- c_mktsegment
> |  row-size=82B cardinality=162.03K
> |
> |--04:EXCHANGE [BROADCAST]
> |  |
> |  01:SCAN HDFS [tpch_parquet.customer]
> | HDFS partitions=1/1 files=1 size=12.34MB
> | row-size=21B cardinality=150.00K
> |
> 00:SCAN HDFS [tpch_parquet.orders]
>HDFS partitions=1/1 files=2 size=54.21MB
>runtime filters: RF000 -> o_comment
>row-size=61B cardinality=1.50M
> {code}
> The query above set RF000's size to 65536, while the minimum 8192 would be 
> more than enough, as the ndv of c_mktsegment is 5
> The current logic should work well for FK/PK joins where the build size's 
> cardinality is close the PK's ndv, but can massively overestimate large 
> tables with small ndv keys.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-6311) Evaluate smaller FPP for Bloom filters

2024-06-26 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-6311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17860281#comment-17860281
 ] 

ASF subversion and git services commented on IMPALA-6311:
-

Commit 101e10ba3189db0e115cfb98bb8fe7ac1b108186 in impala's branch 
refs/heads/master from Riza Suminto
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=101e10ba3 ]

IMPALA-6311: Lower max_filter_error_rate to 10%

Recent changes such as IMPALA-11924 and IMPALA-8042 managed make NDV
estimate more accurate in some cases. However, the more
accurate (smaller) NDV estimates after these changes have exacerbated
the problem with the 75% default FPP, which causes more cases of badly
undersized filters.

This patch lower default value of max_filter_error_rate flag from 75% to
10%. Lower target FPP will result in doubling runtime filter size most
of the time when previous FPP is greater than 10%.

Testing:
- Pass exhaustive tests.
- Manually ran a TPC-DS test at 3 TB comparing 10% to 75%. A value of
  10% improves q94 by 2x and q95 by 5x, improves total query time and
  geomean time by a few percent, and doesn't cause a significant (> 10%)
  regression in any individual query.

Change-Id: I4104e65cc3ce0ef4b36f6420f5044f2cdba9de04
Reviewed-on: http://gerrit.cloudera.org:8080/21552
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Evaluate smaller FPP for Bloom filters
> --
>
> Key: IMPALA-6311
> URL: https://issues.apache.org/jira/browse/IMPALA-6311
> Project: IMPALA
>  Issue Type: Task
>  Components: Perf Investigation
>Reporter: Jim Apple
>Assignee: Riza Suminto
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> The Bloom filters are created by estimating the NDV and then using the FPP of 
> 75% to get the right size for the filter. This is may be too high to be very 
> useful - if our filters are currently filtering more than 75% out, then it is 
> only because we are overestimating NDV.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13168) Add README file for setting up Trino

2024-06-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17859989#comment-17859989
 ] 

ASF subversion and git services commented on IMPALA-13168:
--

Commit a6f285cdd5c9e94d720cbbb3d517482768ec00bb in impala's branch 
refs/heads/master from Daniel Becker
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=a6f285cdd ]

IMPALA-13168: Add README file for setting up Trino

The Impala repository contains scripts that make it easy to set up Trino
in the development environment. This commit adds the TRINO-README.md
file that describes how they can be used.

Change-Id: Ic9fea891074223475a57c8f49f788924a0929b12
Reviewed-on: http://gerrit.cloudera.org:8080/21538
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Add README file for setting up Trino
> 
>
> Key: IMPALA-13168
> URL: https://issues.apache.org/jira/browse/IMPALA-13168
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Daniel Becker
>Assignee: Daniel Becker
>Priority: Major
>
> The Impala repository contains scripts that make it easy to set up Trino in 
> the development environment. We should add a README file that describes how 
> they can be used.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12754) Update Impala document to cover external jdbc table

2024-06-20 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17856579#comment-17856579
 ] 

ASF subversion and git services commented on IMPALA-12754:
--

Commit 6632fd00e17867c9f8f40d6905feafa049368a98 in impala's branch 
refs/heads/master from jankiram84
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=6632fd00e ]

IMPALA-12754: [DOCS] External JDBC table support

Created the docs for Impala external JDBC table support

Change-Id: I5360389037ae9ee675ab406d87617d55d476bf8f
Reviewed-on: http://gerrit.cloudera.org:8080/21539
Tested-by: Impala Public Jenkins 
Reviewed-by: gaurav singh 
Reviewed-by: Wenzhe Zhou 


> Update Impala document to cover external jdbc table
> ---
>
> Key: IMPALA-12754
> URL: https://issues.apache.org/jira/browse/IMPALA-12754
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: Wenzhe Zhou
>Assignee: Jankiram Balakrishnan
>Priority: Major
>
> We need to document  the SQL syntax to create external JDBC table and alter 
> external JDBC table, including the table properties to be set for JDBC and 
> DBCP (Database Connection Pool).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13136) Refactor AnalyzedFunctionCallExpr

2024-06-20 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17856578#comment-17856578
 ] 

ASF subversion and git services commented on IMPALA-13136:
--

Commit 4c00cbff7ee82f9a100746a97b07ce22b3fed5ae in impala's branch 
refs/heads/master from Steve Carlin
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=4c00cbff7 ]

IMPALA-13136: Refactor AnalyzedFunctionCallExpr (for Calcite)

The analyze method is now called after the Expr is constructed.

This code is more in line with the existing way that Impala
constructs the Expr object.

Change-Id: Ideb662d9c7536659cb558bf62baec29c82217aa2
Reviewed-on: http://gerrit.cloudera.org:8080/21525
Tested-by: Impala Public Jenkins 
Reviewed-by: Joe McDonnell 


> Refactor AnalyzedFunctionCallExpr
> -
>
> Key: IMPALA-13136
> URL: https://issues.apache.org/jira/browse/IMPALA-13136
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Reporter: Steve Carlin
>Priority: Major
>
> Copied from code review:
> The part where we immediately analyze as part of the constructor makes for 
> complicated exception handling. RexVisitor doesn't support exceptions, so it 
> adds complication to handle them under those circumstances. I can't really 
> explain why it is necessary.
> Let me sketch out an alternative:
> 1. Construct the whole Expr tree without analyzing it
> 2. Any errors that happen during this process are not usually actionable by 
> the end user. It's good to have a descriptive error message, but it doesn't 
> mean there is something wrong with the SQL. I think that it is ok for this 
> code to throw subclasses of RuntimeException or use 
> Preconditions.checkState() with a good explanation.
> 3. When we get the Expr tree back in CreateExprVisitor::getExpr(), we call 
> analyze() on the root node, which does a recursive analysis of the whole tree.
> 4. The special Expr classes don't run analyze() in the constructor, don't 
> keep a reference to the Analyzer, and don't override resetAnalysisState(). 
> They override analyzeImpl() and they should be idempotent. The clone 
> constructor should not need to do anything special, just do a deep copy.
> I don't want to bog down this review. If we want to address this as a 
> followup, I can live with that, but I don't want us to go too far down this 
> road. (Or if we have a good explanation for why it is necessary, then we can 
> write a good comment and move on.)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13169) Specify cluster id before starting HiveServer2 after HIVE-28324

2024-06-20 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17856441#comment-17856441
 ] 

ASF subversion and git services commented on IMPALA-13169:
--

Commit 1ecc43f8c2171475950c37682973b8cd660bfd0c in impala's branch 
refs/heads/master from Fang-Yu Rao
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=1ecc43f8c ]

IMPALA-13169: Specify cluster id before starting HiveServer2

After HIVE-28324, in order to start HiveServer2, it is required that
the cluster id has to be passed to HiveServer2, either via the
environment variable 'HIVE_CLUSTER_ID', or the command line Java
property 'hive.cluster.id'. This patch exports HIVE_CLUSTER_ID before
starting HiveServer2.

Testing:
 - Manually verified that a HiveServer2 including HIVE-28324 could be
   started after this patch.
 - Verified that this patch passed the core tests.

Change-Id: I9d07ec01a04f8123b7ccca676ce744ac485f167c
Reviewed-on: http://gerrit.cloudera.org:8080/21540
Tested-by: Impala Public Jenkins 
Reviewed-by: Quanlong Huang 


> Specify cluster id before starting HiveServer2 after HIVE-28324
> ---
>
> Key: IMPALA-13169
> URL: https://issues.apache.org/jira/browse/IMPALA-13169
> Project: IMPALA
>  Issue Type: Task
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>
> After HIVE-28324, in order to start HiveServer2, it is required that the 
> cluster id has to be passed to HiveServer2, either via the environment 
> variable, or the command line Java property. We should provide HiveServer2 
> with the cluster id before we bump up CDP_BUILD_NUMBER to have a CDP Hive 
> dependency that includes this Hive change.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12940) Implement filtering conditions

2024-06-19 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17856334#comment-17856334
 ] 

ASF subversion and git services commented on IMPALA-12940:
--

Commit a6db27850af5c8dc01be19c2c396ec03211fa402 in impala's branch 
refs/heads/master from Steve Carlin
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=a6db27850 ]

IMPALA-12940: Added filtering capability for Calcite planner

The Filter RelNode is now handled in the Calcite planner.

The parsing and analysis is done by Calcite so there were no
changes added to that portion. The ImpalaFilterRel class was
created to handled the conversion of the Calcite LogicalFilter
to create a filter condition within the Impala plan nodes.

There is no explicit filter plan node in Impala. Instead, the
filter condition attaches itself to an existing plan node. The
filter condition gets passed into the children plan nodes through
the ParentPlanRelContext.

The ExprConjunctsConverter class is responsible for creating the
filter Expr list that is used. The list contains separate AND
conditions that are on the top level.

Change-Id: If104bf1cd801d5ee92dd7e43d398a21a18be5d97
Reviewed-on: http://gerrit.cloudera.org:8080/21498
Reviewed-by: Joe McDonnell 
Tested-by: Impala Public Jenkins 
Reviewed-by: Csaba Ringhofer 


> Implement filtering conditions
> --
>
> Key: IMPALA-12940
> URL: https://issues.apache.org/jira/browse/IMPALA-12940
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Steve Carlin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13159) Running queries been cancelled after statestore failover

2024-06-17 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17855783#comment-17855783
 ] 

ASF subversion and git services commented on IMPALA-13159:
--

Commit 9c2c27c68ce27b6a6d227379581ac39a34f8f348 in impala's branch 
refs/heads/master from wzhou-code
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=9c2c27c68 ]

IMPALA-13159: Fix query cancellation caused by statestore failover

A momentary inconsistent cluster membership state after statestore
failover results in query cancellation.
We already have code to handle inconsistent cluster membership after
statestore restarting by defining a post-recovery grace period. During
the grace period, don't update the current cluster membership so that
the inconsistent membership will not be used to cancel queries on
coordinators and executors.
This patch handles inconsistent cluster membership state after
statestore failover in the same way.

Testing:
 - Added a new test case to verify that inconsistent cluster
   membership after statestore failover will not result in query
   cancellation.
 - Fixed closing client issue for Catalogd HA test case
   test_catalogd_failover_with_sync_ddl when the test fails.
 - Passed core test.

Change-Id: I720bec5199df46475b954558abb0637ca7e6298b
Reviewed-on: http://gerrit.cloudera.org:8080/21520
Reviewed-by: Michael Smith 
Reviewed-by: Riza Suminto 
Tested-by: Impala Public Jenkins 


> Running queries been cancelled after statestore failover
> 
>
> Key: IMPALA-13159
> URL: https://issues.apache.org/jira/browse/IMPALA-13159
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Wenzhe Zhou
>Assignee: Wenzhe Zhou
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> A momentary inconsistent cluster membership state after statestore failover 
> results in query cancellation.
> We already have code to handle inconsistent cluster membership after 
> statestore restarting. We need to handle inconsistent cluster membership 
> after statestore failover in same way.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13150) Possible buffer overflow in StringVal::CopyFrom()

2024-06-17 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17855719#comment-17855719
 ] 

ASF subversion and git services commented on IMPALA-13150:
--

Commit 5d7ca0712af493eca6704a3fdfcfaf16bde46ed0 in impala's branch 
refs/heads/master from Daniel Becker
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=5d7ca0712 ]

IMPALA-13150: Possible buffer overflow in StringVal::CopyFrom()

In StringVal::CopyFrom(), we take the 'len' parameter as a size_t, which
is usually a 64-bit unsigned integer. We pass it to the constructor of
StringVal, which takes it as an int, which is usually a 32-bit signed
integer. The constructor then allocates memory for the length using the
int value, but afterwards in CopyFrom(), we copy the buffer with the
size_t length. If size_t is indeed 64 bits and int is 32 bits, and the
value is truncated, we may copy more bytes that what we have allocated
for the destination.

Note that in the constructor of StringVal it is checked whether the
length is greater than 1GB, but if the value is truncated because of the
type conversion, the check doesn't necessarily catch it as the truncated
value may be small.

This change fixes the problem by doing the length check with 64 bit
integers in StringVal::CopyFrom().

Testing:
 - added unit tests for StringVal::CopyFrom() in udf-test.cc.

Change-Id: I6a1d03d65ec4339a0f33e69ff29abdd8cc3e3067
Reviewed-on: http://gerrit.cloudera.org:8080/21501
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Possible buffer overflow in StringVal::CopyFrom()
> -
>
> Key: IMPALA-13150
> URL: https://issues.apache.org/jira/browse/IMPALA-13150
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Daniel Becker
>Assignee: Daniel Becker
>Priority: Major
>
> In {{{}StringVal::CopyFrom(){}}}, we take the 'len' parameter as a 
> {{{}size_t{}}}, which is usually a 64-bit unsigned integer. We pass it to the 
> constructor of {{{}StringVal{}}}, which takes it as an {{{}int{}}}, which is 
> usually a 32-bit signed integer. The constructor then allocates memory for 
> the length using the {{int}} value, but back in {{{}CopyFrom(){}}}, we copy 
> the buffer with the {{size_t}} length. If {{size_t}} is indeed 64 bits and 
> {{int}} is 32 bits, and the value is truncated, we may copy more bytes that 
> what we have allocated the destination for. See 
> https://github.com/apache/impala/blob/ce8078204e5995277f79e226e26fe8b9eaca408b/be/src/udf/udf.cc#L546



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-11648) validate-java-pom-versions.sh should skip pom.xml in toolchain

2024-06-16 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17855453#comment-17855453
 ] 

ASF subversion and git services commented on IMPALA-11648:
--

Commit dd62dd98b90f114cc0b1fbbce966a7194f30971a in impala's branch 
refs/heads/branch-3.4.2 from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=dd62dd98b ]

IMPALA-11648: validate-java-pom-versions.sh should skip pom.xml in toolchain

bin/validate-java-pom-versions.sh validates the pom.xml files have
consistent version strings. However, it checks all files in IMPALA_HOME
when building from the tarball. There are some pom.xml files in the
toolchain directory that should be skipped.

This patch modifies the find command used in the script from
  find ${IMPALA_HOME} -name pom.xml
to
  find ${IMPALA_HOME} -path ${IMPALA_TOOLCHAIN} -prune -o -name pom.xml -print
to list pom.xml files excluding the toolchain directory. More examples
about how to use `find -prune` can be found in this blog:
https://www.theunixschool.com/2012/07/find-command-15-examples-to-exclude.html

Tests:
 - Built from the tarball locally
 - Modified version strings in some pom.xml files and verified
   validate-java-pom-versions.sh is still able to find them.

Change-Id: I55bbd9c85ab0e4a7c054ee2abd70eae0f55c8a01
Reviewed-on: http://gerrit.cloudera.org:8080/19122
Reviewed-by: Daniel Becker 
Tested-by: Impala Public Jenkins 


> validate-java-pom-versions.sh should skip pom.xml in toolchain
> --
>
> Key: IMPALA-11648
> URL: https://issues.apache.org/jira/browse/IMPALA-11648
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 4.2.0
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Blocker
> Fix For: Impala 4.2.0, Impala 4.1.1
>
>
> Building the RC1 tarball of 4.1.1 release failed by 
> bin/validate-java-pom-versions.sh:
> {noformat}
> Check for Java pom.xml versions FAILED
> Expected 4.1.1-RELEASE
> Not found in:
>   
> /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/accumulo-handler/pom.xml
>   
> /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/beeline/pom.xml
>   
> /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/classification/pom.xml
>   
> /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/cli/pom.xml
>   
> /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/common/pom.xml
>   
> /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/contrib/pom.xml
>   
> /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/druid-handler/pom.xml
>   
> /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/hbase-handler/pom.xml
>   
> /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/hcatalog/core/pom.xml
>   
> /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/hcatalog/hcatalog-pig-adapter/pom.xml
>   
> /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/hcatalog/pom.xml
>   
> /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/hcatalog/server-extensions/pom.xml
>   
> /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/hcatalog/streaming/pom.xml
>   
> /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/hcatalog/webhcat/java-client/pom.xml
>   
> /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/hcatalog/webhcat/svr/pom.xml
>   
> /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/hplsql/pom.xml
>   
> /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/impala/pom.xml
>   
> /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/itests/catalogd-unit/pom.xml
>   
> /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/itests/custom-serde/pom.xml
>   
> /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/itests/custom-udfs/pom.xml
>   
> /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/itests/custom-udfs/udf-classloader-udf1/pom.xml
>   
> /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/itests/custom-udfs/udf-classloader-udf2/pom.xml
>   
> /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/itests/custom-udfs/udf-classloader-util/pom.xml
>   
> 

[jira] [Commented] (IMPALA-10436) Investigate the need for granting ALL privilege on server when creating an external Kudu table

2024-06-16 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17855450#comment-17855450
 ] 

ASF subversion and git services commented on IMPALA-10436:
--

Commit 3a2f5f28c9709664ef31ea9b2b3675eba31f2d15 in impala's branch 
refs/heads/master from Fang-Yu Rao
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=3a2f5f28c ]

IMPALA-12921, IMPALA-12985: Support running Impala with locally built Ranger

The goals and non-goals of this patch could be summarized as follows.
Goals:
 - Add changes to the minicluster configuration that allow a non-default
   version of Ranger (possibly built locally) to run in the context of
   the minicluster, and to be used as the authorization server by
   Impala.
 - Switch to the new constructor when instantiating
   RangerAccessRequestImpl. This resolves IMPALA-12985 and also makes
   Impala compatible with Apache Ranger if RangerAccessRequestImpl from
   Apache Ranger is consumed.
 - Prepare Ranger and Impala patches as supplemental material to verify
   what authorization-related tests could be passed if Apache Ranger is
   the authorization provider. Merging IMPALA-12921_addendum.diff to
   the Impala repository is not in the scope of this patch in that the
   diff file changes the behavior of Impala and thus more discussion is
   required if we'd like to merge it in the future.

Non-goals:
 - Set up any automation for building Ranger from source.
 - Pass all Impala authorization-related tests with a non-default
   version of Ranger.

Instructions on running Impala with locally built Ranger:

Suppose the Ranger project is under the folder $RANGER_SRC_DIR. We could
execute the following to build Apache Ranger for easy reference. By
default, the compressed tarball is produced under
$RANGER_SRC_DIR/target.

mvn clean compile -B -nsu -DskipCheck=true -Dcheckstyle.skip=true \
package install -DskipITs -DskipTests -Dmaven.javadoc.skip=true

After building Ranger, we need to build Impala's Java code so that
Impala's Java code could consume the locally produced Ranger classes. We
will need to export the following environment variables before building
Impala. This prevents bootstrap_toolchain.py from trying to download the
compressed Ranger tarball.

1. export RANGER_VERSION_OVERRIDE=\
   $(mvn -f $RANGER_SRC_DIR/pom.xml -q help:evaluate \
   -Dexpression=project.version -DforceStdout)

2. export RANGER_HOME_OVERRIDE=$RANGER_SRC_DIR/target/\
   ranger-${RANGER_VERSION_OVERRIDE}-admin

It then suffices to execute the following to point
Impala to the locally built Ranger server before starting Impala.

1. source $IMPALA_HOME/bin/impala-config.sh

2. tar zxv -f $RANGER_SRC_DIR/target/\
   ranger-${IMPALA_RANGER_VERSION}-admin.tar.gz \
   -C $RANGER_SRC_DIR/target/

3. $IMPALA_HOME/bin/create-test-configuration.sh

4. $IMPALA_HOME/bin/create-test-configuration.sh \
   -create_ranger_policy_db

5. $IMPALA_HOME/testdata/bin/run-ranger.sh
   (run-all.sh has to be executed instead if other underlying services
   have not been started)

6. $IMPALA_HOME/testdata/bin/setup-ranger.sh

Testing:
 - Manually verified that we could point Impala to a locally built
   Apache Ranger on the master branch (with tip being
   https://github.com/apache/ranger/commit/4abb993).
 - Manually verified that with RANGER-4771.diff and
   IMPALA-12921_addendum.diff, only 3 authorization-related tests
   failed. They failed because the resource type of 'storage-type' is
   not supported in Apache Ranger yet and thus the test cases added in
   IMPALA-10436 could fail.
 - Manually verified that the log files of Apache and CDP Ranger's Admin
   server could be created under ${RANGER_LOG_DIR} after we start the
   Ranger service.
 - Verified that this patch passed the core tests when CDP Ranger is
   used.

Change-Id: I268d6d4d6e371da7497aac8d12f78178d57c6f27
Reviewed-on: http://gerrit.cloudera.org:8080/21160
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Investigate the need for granting ALL privilege on server when creating an 
> external Kudu table
> --
>
> Key: IMPALA-10436
> URL: https://issues.apache.org/jira/browse/IMPALA-10436
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
> Fix For: Impala 4.2.0
>
>
> We found that to allow a user {{usr}} to create an external Kudu table in 
> Impala, we need to grant the user the {{ALL}} privilege on the server in 
> advance like the following, which seems too strict. It would be good to 
> figure out whether such a requirement is indeed necessary.
> {code:sql}
> GRANT ALL ON SERVER TO USER usr;
> {code}



--
This message was sent by Atlassian 

[jira] [Commented] (IMPALA-12921) Consider adding support for locally built Ranger

2024-06-16 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17855448#comment-17855448
 ] 

ASF subversion and git services commented on IMPALA-12921:
--

Commit 3a2f5f28c9709664ef31ea9b2b3675eba31f2d15 in impala's branch 
refs/heads/master from Fang-Yu Rao
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=3a2f5f28c ]

IMPALA-12921, IMPALA-12985: Support running Impala with locally built Ranger

The goals and non-goals of this patch could be summarized as follows.
Goals:
 - Add changes to the minicluster configuration that allow a non-default
   version of Ranger (possibly built locally) to run in the context of
   the minicluster, and to be used as the authorization server by
   Impala.
 - Switch to the new constructor when instantiating
   RangerAccessRequestImpl. This resolves IMPALA-12985 and also makes
   Impala compatible with Apache Ranger if RangerAccessRequestImpl from
   Apache Ranger is consumed.
 - Prepare Ranger and Impala patches as supplemental material to verify
   what authorization-related tests could be passed if Apache Ranger is
   the authorization provider. Merging IMPALA-12921_addendum.diff to
   the Impala repository is not in the scope of this patch in that the
   diff file changes the behavior of Impala and thus more discussion is
   required if we'd like to merge it in the future.

Non-goals:
 - Set up any automation for building Ranger from source.
 - Pass all Impala authorization-related tests with a non-default
   version of Ranger.

Instructions on running Impala with locally built Ranger:

Suppose the Ranger project is under the folder $RANGER_SRC_DIR. We could
execute the following to build Apache Ranger for easy reference. By
default, the compressed tarball is produced under
$RANGER_SRC_DIR/target.

mvn clean compile -B -nsu -DskipCheck=true -Dcheckstyle.skip=true \
package install -DskipITs -DskipTests -Dmaven.javadoc.skip=true

After building Ranger, we need to build Impala's Java code so that
Impala's Java code could consume the locally produced Ranger classes. We
will need to export the following environment variables before building
Impala. This prevents bootstrap_toolchain.py from trying to download the
compressed Ranger tarball.

1. export RANGER_VERSION_OVERRIDE=\
   $(mvn -f $RANGER_SRC_DIR/pom.xml -q help:evaluate \
   -Dexpression=project.version -DforceStdout)

2. export RANGER_HOME_OVERRIDE=$RANGER_SRC_DIR/target/\
   ranger-${RANGER_VERSION_OVERRIDE}-admin

It then suffices to execute the following to point
Impala to the locally built Ranger server before starting Impala.

1. source $IMPALA_HOME/bin/impala-config.sh

2. tar zxv -f $RANGER_SRC_DIR/target/\
   ranger-${IMPALA_RANGER_VERSION}-admin.tar.gz \
   -C $RANGER_SRC_DIR/target/

3. $IMPALA_HOME/bin/create-test-configuration.sh

4. $IMPALA_HOME/bin/create-test-configuration.sh \
   -create_ranger_policy_db

5. $IMPALA_HOME/testdata/bin/run-ranger.sh
   (run-all.sh has to be executed instead if other underlying services
   have not been started)

6. $IMPALA_HOME/testdata/bin/setup-ranger.sh

Testing:
 - Manually verified that we could point Impala to a locally built
   Apache Ranger on the master branch (with tip being
   https://github.com/apache/ranger/commit/4abb993).
 - Manually verified that with RANGER-4771.diff and
   IMPALA-12921_addendum.diff, only 3 authorization-related tests
   failed. They failed because the resource type of 'storage-type' is
   not supported in Apache Ranger yet and thus the test cases added in
   IMPALA-10436 could fail.
 - Manually verified that the log files of Apache and CDP Ranger's Admin
   server could be created under ${RANGER_LOG_DIR} after we start the
   Ranger service.
 - Verified that this patch passed the core tests when CDP Ranger is
   used.

Change-Id: I268d6d4d6e371da7497aac8d12f78178d57c6f27
Reviewed-on: http://gerrit.cloudera.org:8080/21160
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Consider adding support for locally built Ranger
> 
>
> Key: IMPALA-12921
> URL: https://issues.apache.org/jira/browse/IMPALA-12921
> Project: IMPALA
>  Issue Type: Task
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>
> It would be nice to be able to support locally built Ranger in Impala's 
> minicluster in that it would facilitate the testing of features that require 
> changes to both components.
> *+Edit:+*
> Making the current Apache Impala on *master* (tip is
> {*}IMPALA-12925{*}: Fix decimal data type for external JDBC table) to support 
> Ranger on *master* (tip is 
> {*}RANGER-4745{*}: Enhance handling of subAccess authorization in Ranger HDFS 
> plugin) may be too ambitious.
> The signatures of some classes are already incompatible. For instance, on the 
> 

[jira] [Commented] (IMPALA-13152) IllegalStateException in computing processing cost when there are predicates on analytic output columns

2024-06-16 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17855451#comment-17855451
 ] 

ASF subversion and git services commented on IMPALA-13152:
--

Commit 5d1bd80623324f829aca604b25d97ace21f51417 in impala's branch 
refs/heads/master from Riza Suminto
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=5d1bd8062 ]

IMPALA-13152: Avoid NaN, infinite, and negative ProcessingCost

TOP-N cost will turn into NaN if inputCardinality is equal to 0 due to
Math.log(inputCardinality). This patch fix the issue by avoiding
Math.log(0) and replace it with 0 instead.

After this patch, Instantiating BaseProcessingCost with NaN, infinite,
or negative totalCost will throw IllegalArgumentException. In
BaseProcessingCost.getDetails(), "total-cost" is renamed to "raw-cost"
to avoid confusion with "cost-total" in ProcessingCost.getDetails().

Testing:
- Add testcase that run TOP-N query over empty table.
- Compute ProcessingCost in most FE and EE test even when
  COMPUTE_PROCESSING_COST option is not enabled by checking if
  RuntimeEnv.INSTANCE.isTestEnv() is True or TEST_REPLAN option is
  enabled.
- Pass core test.

Change-Id: Ib49c7ae397dadcb2cb69fde1850d442d33cdf177
Reviewed-on: http://gerrit.cloudera.org:8080/21504
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> IllegalStateException in computing processing cost when there are predicates 
> on analytic output columns
> ---
>
> Key: IMPALA-13152
> URL: https://issues.apache.org/jira/browse/IMPALA-13152
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Quanlong Huang
>Assignee: Riza Suminto
>Priority: Major
>
> Saw an error in the following query when is on:
> {code:sql}
> create table tbl (a int, b int, c int);
> set COMPUTE_PROCESSING_COST=1;
> explain select a, b from (
>   select a, b, c,
> row_number() over(partition by a order by b desc) as latest
>   from tbl
> )b
> WHERE latest=1
> ERROR: IllegalStateException: Processing cost of PlanNode 01:TOP-N is invalid!
> {code}
> Exception in the logs:
> {noformat}
> I0611 13:04:37.192874 28004 jni-util.cc:321] 
> 264ee79bfb6ac031:42f8006c] java.lang.IllegalStateException: 
> Processing cost of PlanNode 01:TOP-N is invalid!
> at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:512)
> at 
> org.apache.impala.planner.PlanNode.computeRowConsumptionAndProductionToCost(PlanNode.java:1047)
> at 
> org.apache.impala.planner.PlanFragment.computeCostingSegment(PlanFragment.java:287)
> at 
> org.apache.impala.planner.Planner.computeProcessingCost(Planner.java:560)
> at 
> org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1932)
> at 
> org.apache.impala.service.Frontend.getPlannedExecRequest(Frontend.java:2892)
> at 
> org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2676)
> at 
> org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2224)
> at 
> org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1985)
> at 
> org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175){noformat}
> Don't see the error if removing the predicate "latest=1".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12985) Use the new constructor when instantiating RangerAccessRequestImpl

2024-06-16 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17855449#comment-17855449
 ] 

ASF subversion and git services commented on IMPALA-12985:
--

Commit 3a2f5f28c9709664ef31ea9b2b3675eba31f2d15 in impala's branch 
refs/heads/master from Fang-Yu Rao
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=3a2f5f28c ]

IMPALA-12921, IMPALA-12985: Support running Impala with locally built Ranger

The goals and non-goals of this patch could be summarized as follows.
Goals:
 - Add changes to the minicluster configuration that allow a non-default
   version of Ranger (possibly built locally) to run in the context of
   the minicluster, and to be used as the authorization server by
   Impala.
 - Switch to the new constructor when instantiating
   RangerAccessRequestImpl. This resolves IMPALA-12985 and also makes
   Impala compatible with Apache Ranger if RangerAccessRequestImpl from
   Apache Ranger is consumed.
 - Prepare Ranger and Impala patches as supplemental material to verify
   what authorization-related tests could be passed if Apache Ranger is
   the authorization provider. Merging IMPALA-12921_addendum.diff to
   the Impala repository is not in the scope of this patch in that the
   diff file changes the behavior of Impala and thus more discussion is
   required if we'd like to merge it in the future.

Non-goals:
 - Set up any automation for building Ranger from source.
 - Pass all Impala authorization-related tests with a non-default
   version of Ranger.

Instructions on running Impala with locally built Ranger:

Suppose the Ranger project is under the folder $RANGER_SRC_DIR. We could
execute the following to build Apache Ranger for easy reference. By
default, the compressed tarball is produced under
$RANGER_SRC_DIR/target.

mvn clean compile -B -nsu -DskipCheck=true -Dcheckstyle.skip=true \
package install -DskipITs -DskipTests -Dmaven.javadoc.skip=true

After building Ranger, we need to build Impala's Java code so that
Impala's Java code could consume the locally produced Ranger classes. We
will need to export the following environment variables before building
Impala. This prevents bootstrap_toolchain.py from trying to download the
compressed Ranger tarball.

1. export RANGER_VERSION_OVERRIDE=\
   $(mvn -f $RANGER_SRC_DIR/pom.xml -q help:evaluate \
   -Dexpression=project.version -DforceStdout)

2. export RANGER_HOME_OVERRIDE=$RANGER_SRC_DIR/target/\
   ranger-${RANGER_VERSION_OVERRIDE}-admin

It then suffices to execute the following to point
Impala to the locally built Ranger server before starting Impala.

1. source $IMPALA_HOME/bin/impala-config.sh

2. tar zxv -f $RANGER_SRC_DIR/target/\
   ranger-${IMPALA_RANGER_VERSION}-admin.tar.gz \
   -C $RANGER_SRC_DIR/target/

3. $IMPALA_HOME/bin/create-test-configuration.sh

4. $IMPALA_HOME/bin/create-test-configuration.sh \
   -create_ranger_policy_db

5. $IMPALA_HOME/testdata/bin/run-ranger.sh
   (run-all.sh has to be executed instead if other underlying services
   have not been started)

6. $IMPALA_HOME/testdata/bin/setup-ranger.sh

Testing:
 - Manually verified that we could point Impala to a locally built
   Apache Ranger on the master branch (with tip being
   https://github.com/apache/ranger/commit/4abb993).
 - Manually verified that with RANGER-4771.diff and
   IMPALA-12921_addendum.diff, only 3 authorization-related tests
   failed. They failed because the resource type of 'storage-type' is
   not supported in Apache Ranger yet and thus the test cases added in
   IMPALA-10436 could fail.
 - Manually verified that the log files of Apache and CDP Ranger's Admin
   server could be created under ${RANGER_LOG_DIR} after we start the
   Ranger service.
 - Verified that this patch passed the core tests when CDP Ranger is
   used.

Change-Id: I268d6d4d6e371da7497aac8d12f78178d57c6f27
Reviewed-on: http://gerrit.cloudera.org:8080/21160
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Use the new constructor when instantiating RangerAccessRequestImpl
> --
>
> Key: IMPALA-12985
> URL: https://issues.apache.org/jira/browse/IMPALA-12985
> Project: IMPALA
>  Issue Type: Task
>  Components: Frontend
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>
> After RANGER-2763, we changed the signature of the class 
> RangerAccessRequestImpl in by adding an additional input argument 'userRoles' 
> as shown in the following.
> {code:java}
> public RangerAccessRequestImpl(RangerAccessResource resource, String 
> accessType, String user, Set userGroups, Set userRoles) {
> ...
> {code}
> The new signature is also provided in CDP Ranger. Thus to unblock 
> IMPALA-12921 or to be able to build Apache Impala with locally built Apache 
> Ranger, it 

[jira] [Commented] (IMPALA-13075) Setting very high BATCH_SIZE can blow up memory usage of fragments

2024-06-14 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17855178#comment-17855178
 ] 

ASF subversion and git services commented on IMPALA-13075:
--

Commit b1320bd1d646eba3f044ef647b7d4497487d4674 in impala's branch 
refs/heads/master from Riza Suminto
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=b1320bd1d ]

IMPALA-13075: Cap memory usage for ExprValuesCache at 256KB

ExprValuesCache uses BATCH_SIZE as a deciding factor to set its
capacity. It bounds the capacity such that expr_values_array_ memory
usage stays below 256KB. This patch tightens that limit to include all
memory usage from ExprValuesCache::MemUsage() instead of
expr_values_array_ only. Therefore, setting a very high BATCH_SIZE will
not push the total memory usage of ExprValuesCache beyond 256KB.

Simplify table dimension creation methods and fix few flake8 warnings in
test_dimensions.py.

Testing:
- Add test_join_queries.py::TestExprValueCache.
- Pass core tests.

Change-Id: Iee27cbbe8d3100301d05a6516b62c45975a8d0e0
Reviewed-on: http://gerrit.cloudera.org:8080/21455
Reviewed-by: Riza Suminto 
Tested-by: Impala Public Jenkins 


> Setting very high BATCH_SIZE can blow up memory usage of fragments
> --
>
> Key: IMPALA-13075
> URL: https://issues.apache.org/jira/browse/IMPALA-13075
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 4.0.0
>Reporter: Ezra Zerihun
>Assignee: Riza Suminto
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> In Impala 4.0, setting a very high BATCH_SIZE or near max limit of 65536 can 
> cause some fragment's memory usage to spike way past the query's defined 
> MEM_LIMIT or pool's Maximum Query Memory Limit with Clamp on. So even though 
> MEM_LIMIT is set reasonable, the query can still fail with out of memory and 
> a huge amount of memory used on fragment. Reducing BATCH_SIZE to a reasonable 
> amount or back to default will allow the query to run without issue and use 
> reasonable amount of memory within query's MEM_LIMIT or pool's Maximum Query 
> Memory Limit.
>  
> 1) set BATCH_SIZE=65536; set MEM_LIMIT=1g;
>  
> {code:java}
>     Query State: EXCEPTION
>     Impala Query State: ERROR
>     Query Status: Memory limit exceeded: Error occurred on backend ...:27000 
> by fragment ... Memory left in process limit: 145.53 GB Memory left in query 
> limit: -6.80 GB Query(...): memory limit exceeded. Limit=1.00 GB 
> Reservation=86.44 MB ReservationLimit=819.20 MB OtherMemory=7.71 GB 
> Total=7.80 GB Peak=7.84 GB   Unclaimed reservations: Reservation=8.50 MB 
> OtherMemory=0 Total=8.50 MB Peak=56.44 MB   Runtime Filter Bank: 
> Reservation=4.00 MB ReservationLimit=4.00 MB OtherMemory=0 Total=4.00 MB 
> Peak=4.00 MB   Fragment ...: Reservation=1.94 MB OtherMemory=7.59 GB 
> Total=7.59 GB Peak=7.63 GB     HASH_JOIN_NODE (id=8): Reservation=1.94 MB 
> OtherMemory=7.57 GB Total=7.57 GB Peak=7.57 GB       Exprs: Total=7.57 GB 
> Peak=7.57 GB       Hash Join Builder (join_node_id=8): Total=0 Peak=1.95 MB
> ...
> Query Options (set by configuration): 
> BATCH_SIZE=65536,MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell 
> v4.0.0.7.2.16.0-287 (5ae3917) built on Mon Jan  9 21:23:59 UTC 
> 2023,DEFAULT_FILE_FORMAT=PARQUET,...
> ...
>    ExecSummary:
> ...
> 09:AGGREGATE                    32     32    0.000ns    0.000ns        0      
>  4.83M   36.31 MB      212.78 MB  STREAMING                                 
> 08:HASH JOIN                    32     32    5s149ms      2m44s        0     
> 194.95M    7.57 GB        1.94 MB  RIGHT OUTER JOIN, PARTITIONED
> |--18:EXCHANGE                  32     32   93.750us    1.000ms   10.46K      
>  1.55K    1.65 MB        2.56 MB  HASH(...
> {code}
>  
>  
> 2) set BATCH_SIZE=0; set MEM_LIMIT=1g;
>  
> {code:java}
> Query State: FINISHED
> Impala Query State: FINISHED
> ...
> Query Options (set by configuration and planner): 
> MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell v4.0.0.7.2.16.0-287 
> (5ae3917) built on Mon Jan  9 21:23:59 UTC 
> 2023,DEFAULT_FILE_FORMAT=PARQUET,...
> ...
>     ExecSummary:
> ...
> 09:AGGREGATE                    32     32  593.748us   18.999ms       45      
>  4.83M    34.06 MB      212.78 MB  STREAMING
> 08:HASH JOIN                    32     32   10s873ms      5m47s   10.47K     
> 194.95M   123.48 MB        1.94 MB  RIGHT OUTER JOIN, PARTITIONED
> |--18:EXCHANGE                  32     32    0.000ns    0.000ns   10.46K      
>  1.55K   344.00 KB        1.69 MB  HASH(...
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For 

[jira] [Commented] (IMPALA-12712) INVALIDATE METADATA should set a better createEventId

2024-06-14 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17855089#comment-17855089
 ] 

ASF subversion and git services commented on IMPALA-12712:
--

Commit f98da3315e1e4744ad0e49405a4d1c7f98be85ae in impala's branch 
refs/heads/master from Sai Hemanth Gantasala
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=f98da3315 ]

IMPALA-12712: Invalidate metadata on table should set better
createEventId

"INVALIDATE METADATA " can be used to bring up a table in
Impala's catalog cache if the table exists in HMS. Currently,
createEventId for such tables are always set as -1 which will lead to
always removing the table. Sequence of drop table + create table +
invalidate table can lead to flaky test failures like IMPALA-12266.

Solution:
When Invalidate metadata  is fired, fetch the latest eventId
from HMS and set it as createEventId for the table, so that drop table
event that happend before invalidate query will be ignored without
removing the table from cache.

Note: Also removed an unnecessary RPC call to HMS to get table object
since we alrady have required info in table metadata rpc call.

Testing:
- Added an end-to-end test to verify that drop table event happened
before time shouldn't remove the metadata object from cache.

Change-Id: Iff6ac18fe8d9e7b25cc41c7e41eecde251fbccdd
Reviewed-on: http://gerrit.cloudera.org:8080/21402
Reviewed-by: Csaba Ringhofer 
Tested-by: Impala Public Jenkins 


> INVALIDATE METADATA  should set a better createEventId
> -
>
> Key: IMPALA-12712
> URL: https://issues.apache.org/jira/browse/IMPALA-12712
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Quanlong Huang
>Assignee: Sai Hemanth Gantasala
>Priority: Critical
>  Labels: catalog-2024
>
> "INVALIDATE METADATA " can be used to bring up a table in Impala's 
> catalog cache if the table exists in HMS. For instance, when HMS event 
> processing is disabled, we can use it in Impala to bring up tables that are 
> created outside Impala.
> The createEventId for such tables are always set as -1:
> [https://github.com/apache/impala/blob/6ddd69c605d4c594e33fdd39a2ca888538b4b8d7/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L2243-L2246]
> This is problematic when event-processing is enabled. DropTable events and 
> RenameTable events use the createEventId to decide whether to remove the 
> table in catalog cache. -1 will lead to always removing the table. Though it 
> might be added back shortly in follow-up CreateTable events, in the period 
> between them the table is missing in Impala, causing test failures like 
> IMPALA-12266.
> A simpler reproducing of the issue is creating a table in Hive and launching 
> Impala with a long event polling interval to mimic the delay on events. Note 
> that we start Impala cluster after creating the table so Impala don't need to 
> process the CREATE_TABLE event.
> {noformat}
> hive> create table debug_tbl (i int);
> bin/start-impala-cluster.py --catalogd_args=--hms_event_polling_interval_s=60
> {noformat}
> Drop the table in Impala and recreate it in Hive, so it doesn't exist in the 
> catalog cache but exist in HMS. Run "INVALIDATE METADATA " in Impala 
> to bring it up before the DROP_TABLE event come.
> {noformat}
> impala> drop table debug_tbl;
> hive> create table debug_tbl (i int, j int);
> impala> invalidate metadata debug_tbl;
> {noformat}
> The table will be dropped by the DROP_TABLE event and then added back by the 
> CREATE_TABLE event. Shown in catalogd logs:
> {noformat}
> I0115 16:30:15.376713  3208 JniUtil.java:177] 
> 02457b6d5f174d1f:3bdeee14] Finished execDdl request: DROP_TABLE 
> default.debug_tbl issued by quanlong. Time spent: 417ms
> I0115 16:30:23.390962  3208 CatalogServiceCatalog.java:2777] 
> 1840bd101f78d611:22079a5a] Invalidating table metadata: 
> default.debug_tbl
> I0115 16:30:23.404150  3208 Table.java:234] 
> 1840bd101f78d611:22079a5a] createEventId_ for table: 
> default.debug_tbl set to: -1
> I0115 16:30:23.405138  3208 JniUtil.java:177] 
> 1840bd101f78d611:22079a5a] Finished resetMetadata request: INVALIDATE 
> TABLE default.debug_tbl issued by quanlong. Time spent: 17ms
> I0115 16:30:55.108006 32760 MetastoreEvents.java:637] EventId: 8668853 
> EventType: DROP_TABLE Successfully removed table default.debug_tbl
> I0115 16:30:55.108459 32760 MetastoreEvents.java:637] EventId: 8668855 
> EventType: CREATE_TABLE Successfully added table default.debug_tbl
> {noformat}
> CC [~VenuReddy], [~hemanth619]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: 

[jira] [Commented] (IMPALA-12920) Support ai_generate_text built-in function for OpenAI's LLMs

2024-06-14 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17855091#comment-17855091
 ] 

ASF subversion and git services commented on IMPALA-12920:
--

Commit b341e389573cc87fcfad5a8137620c6c96bb05e1 in impala's branch 
refs/heads/master from Michael Smith
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=b341e3895 ]

[tools] Add .gitignore for new files

Adds .gitignore for test.jceks - added with IMPALA-12920 - and
hive-site-housekeeping-on (presumably added via a Hive update).

Change-Id: I3d289d465fff7c81091b28cd62b9436957f8bade
Reviewed-on: http://gerrit.cloudera.org:8080/21503
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Support ai_generate_text built-in function for OpenAI's LLMs
> 
>
> Key: IMPALA-12920
> URL: https://issues.apache.org/jira/browse/IMPALA-12920
> Project: IMPALA
>  Issue Type: Task
>Reporter: Abhishek Rawat
>Assignee: Abhishek Rawat
>Priority: Major
> Fix For: Impala 4.4.0
>
>
> Built in function which can help communicate with [OpenAi's chat completion 
> API|https://platform.openai.com/docs/api-reference/chat] endpoint through SQL.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12266) Sporadic failure after migrating a table to Iceberg

2024-06-14 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17855090#comment-17855090
 ] 

ASF subversion and git services commented on IMPALA-12266:
--

Commit f98da3315e1e4744ad0e49405a4d1c7f98be85ae in impala's branch 
refs/heads/master from Sai Hemanth Gantasala
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=f98da3315 ]

IMPALA-12712: Invalidate metadata on table should set better
createEventId

"INVALIDATE METADATA " can be used to bring up a table in
Impala's catalog cache if the table exists in HMS. Currently,
createEventId for such tables are always set as -1 which will lead to
always removing the table. Sequence of drop table + create table +
invalidate table can lead to flaky test failures like IMPALA-12266.

Solution:
When Invalidate metadata  is fired, fetch the latest eventId
from HMS and set it as createEventId for the table, so that drop table
event that happend before invalidate query will be ignored without
removing the table from cache.

Note: Also removed an unnecessary RPC call to HMS to get table object
since we alrady have required info in table metadata rpc call.

Testing:
- Added an end-to-end test to verify that drop table event happened
before time shouldn't remove the metadata object from cache.

Change-Id: Iff6ac18fe8d9e7b25cc41c7e41eecde251fbccdd
Reviewed-on: http://gerrit.cloudera.org:8080/21402
Reviewed-by: Csaba Ringhofer 
Tested-by: Impala Public Jenkins 


> Sporadic failure after migrating a table to Iceberg
> ---
>
> Key: IMPALA-12266
> URL: https://issues.apache.org/jira/browse/IMPALA-12266
> Project: IMPALA
>  Issue Type: Bug
>  Components: fe
>Affects Versions: Impala 4.2.0
>Reporter: Tamas Mate
>Assignee: Gabor Kaszab
>Priority: Critical
>  Labels: impala-iceberg
> Attachments: 
> catalogd.bd40020df22b.invalid-user.log.INFO.20230704-181939.1, 
> impalad.6c0f48d9ce66.invalid-user.log.INFO.20230704-181940.1
>
>
> TestIcebergTable.test_convert_table test failed in a recent verify job's 
> dockerised tests:
> https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/7629
> {code:none}
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EINNER EXCEPTION: 
> EMESSAGE: AnalysisException: Failed to load metadata for table: 
> 'parquet_nopartitioned'
> E   CAUSED BY: TableLoadingException: Could not load table 
> test_convert_table_cdba7383.parquet_nopartitioned from catalog
> E   CAUSED BY: TException: 
> TGetPartialCatalogObjectResponse(status:TStatus(status_code:GENERAL, 
> error_msgs:[NullPointerException: null]), lookup_status:OK)
> {code}
> {code:none}
> E0704 19:09:22.980131   833 JniUtil.java:183] 
> 7145c21173f2c47b:2579db55] Error in Getting partial catalog object of 
> TABLE:test_convert_table_cdba7383.parquet_nopartitioned. Time spent: 49ms
> I0704 19:09:22.980309   833 jni-util.cc:288] 
> 7145c21173f2c47b:2579db55] java.lang.NullPointerException
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.replaceTableIfUnchanged(CatalogServiceCatalog.java:2357)
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.getOrLoadTable(CatalogServiceCatalog.java:2300)
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.doGetPartialCatalogObject(CatalogServiceCatalog.java:3587)
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.getPartialCatalogObject(CatalogServiceCatalog.java:3513)
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.getPartialCatalogObject(CatalogServiceCatalog.java:3480)
>   at 
> org.apache.impala.service.JniCatalog.lambda$getPartialCatalogObject$11(JniCatalog.java:397)
>   at 
> org.apache.impala.service.JniCatalogOp.lambda$execAndSerialize$1(JniCatalogOp.java:90)
>   at org.apache.impala.service.JniCatalogOp.execOp(JniCatalogOp.java:58)
>   at 
> org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:89)
>   at 
> org.apache.impala.service.JniCatalogOp.execAndSerializeSilentStartAndFinish(JniCatalogOp.java:109)
>   at 
> org.apache.impala.service.JniCatalog.execAndSerializeSilentStartAndFinish(JniCatalog.java:238)
>   at 
> org.apache.impala.service.JniCatalog.getPartialCatalogObject(JniCatalog.java:396)
> I0704 19:09:22.980324   833 status.cc:129] 7145c21173f2c47b:2579db55] 
> NullPointerException: null
> @  0x1012f9f  impala::Status::Status()
> @  0x187f964  impala::JniUtil::GetJniExceptionMsg()
> @   0xfee920  impala::JniCall::Call<>()
> @   0xfccd0f  impala::Catalog::GetPartialCatalogObject()
> @   0xfb55a5  
> impala::CatalogServiceThriftIf::GetPartialCatalogObject()
> @   0xf7a691  
> impala::CatalogServiceProcessorT<>::process_GetPartialCatalogObject()

[jira] [Commented] (IMPALA-13131) Azure OpenAI API expects 'api-key' instead of 'Authorization' in the request header

2024-06-12 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854592#comment-17854592
 ] 

ASF subversion and git services commented on IMPALA-13131:
--

Commit 3668a9517c4d8097591ed3b6fa672bf87faa77f6 in impala's branch 
refs/heads/master from Abhishek Rawat
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=3668a9517 ]

IMPALA-13131: Azure OpenAI API expects 'api-key' instead of 'Authorization' in 
the request header

Updated the POST request when communicating with Azure Open AI
endpoint. The header now includes 'api-key: ' instead of
'Authorization: Bearer '.

Also, removed 'model' as a required param for the Azure Open AI api
call. This is mainly because the endpoint contains deployment which
is basically already mapped to a model.

Testing:
- Updated existing unit test as per the Azure API reference
- Manually tested builtin 'ai_generate_text' using an Azure Open AI
deployment.

Change-Id: If9cc07940ce355d511bcf0ee615ff31042d13eb5
Reviewed-on: http://gerrit.cloudera.org:8080/21493
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Azure OpenAI API expects 'api-key' instead of 'Authorization' in the request 
> header
> ---
>
> Key: IMPALA-13131
> URL: https://issues.apache.org/jira/browse/IMPALA-13131
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Abhishek Rawat
>Assignee: Abhishek Rawat
>Priority: Major
>
> As per the [API 
> reference|https://learn.microsoft.com/en-us/azure/ai-services/openai/reference],
>  the header expects API key as follows:
>  
> {code:java}
> curl 
> https://YOUR_RESOURCE_NAME.openai.azure.com/openai/deployments/YOUR_DEPLOYMENT_NAME/completions?api-version=2024-02-01\
>   -H "Content-Type: application/json" \
>   -H "api-key: YOUR_API_KEY" \ <<< API Key
>   -d "{
>   \"prompt\": \"Once upon a time\",
>   \"max_tokens\": 5
> }" {code}
> Impala supports API Key as follows:
>  
>  
> {code:java}
> curl 
> https://YOUR_RESOURCE_NAME.openai.azure.com/openai/deployments/YOUR_DEPLOYMENT_NAME/completions?api-version=2024-02-01\
>   -H "Content-Type: application/json" \
>   -H "Authorization: Bearer YOUR_API_KEY" \    API Key
>   -d "{
>   \"prompt\": \"Once upon a time\",
>   \"max_tokens\": 5
> }"{code}
> This causes ai functions calling Azure OpenAI endpoint to fail with 401 error:
> {code:java}
> { "statusCode": 401, "message": "Unauthorized. Access token is missing, 
> invalid, audience is incorrect (https://cognitiveservices.azure.com), or have 
> expired." } {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12562) CAST(ROUND(INT a/ INT b, INT d)) as STRING) may return wrong result

2024-06-12 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854541#comment-17854541
 ] 

ASF subversion and git services commented on IMPALA-12562:
--

Commit 0d429462f7f61565119ee2e593867a22886d7209 in impala's branch 
refs/heads/master from zhangyifan27
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=0d429462f ]

IMPALA-12562: Cast double and float to string with exact presicion

The builtin functions casttostring(DOUBLE) and casttostring(FLOAT)
printed more digits when converting double and float values to
string values. This patch fixes this by switching to use the existing
methods DoubleToBuffer and FloatToBuffer, which are simple and fast
implementations to print necessary digits.

Testing:
  - Add end-to-end tests to verify the fixes
  - Add benchmarks for modified functions
  - Update tests in expr-test

Change-Id: Icd79c55dd57dc0fa13e4ec11c2284ef2800e8b1a
Reviewed-on: http://gerrit.cloudera.org:8080/21441
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> CAST(ROUND(INT a/ INT b, INT d)) as STRING) may return wrong result
> ---
>
> Key: IMPALA-12562
> URL: https://issues.apache.org/jira/browse/IMPALA-12562
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 4.3.0
>Reporter: YifanZhang
>Priority: Major
>
> The following query returns a wrong result:
> {code:java}
>  select cast(round(1/3*100, 2) as string)
> +-+
> | cast(round(1 / 3, 2) as string) |
> +-+
> | 0.33002             |
> +-+
> Fetched 1 row(s) in 0.11s {code}
> Remove the cast function and the result is expected:
> {code:java}
>  select round(1/3,2);
> +-+
> | round(1 / 3, 2) |
> +-+
> | 0.33            |
> +-+ {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12800) Queries with many nested inline views see performance issues with ExprSubstitutionMap

2024-06-12 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854474#comment-17854474
 ] 

ASF subversion and git services commented on IMPALA-12800:
--

Commit 4681666e9386d87c647d19d6333750c16b6fa0c1 in impala's branch 
refs/heads/master from Michael Smith
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=4681666e9 ]

IMPALA-12800: Add cache for isTrueWithNullSlots() evaluation

isTrueWithNullSlots() can be expensive when it has to query the backend.
Many of the expressions will look similar, especially in large
auto-generated expressions. Adds a cache based on the nullified
expression to avoid querying the backend for expressions with identical
structure.

With DEBUG logging enabled for the Analyzer, computes and logs stats
about the null slots cache.

Adds 'use_null_slots_cache' query option to disable caching. Documents
the new option.

Change-Id: Ib63f5553284f21f775d2097b6c5d6bbb63699acd
Reviewed-on: http://gerrit.cloudera.org:8080/21484
Reviewed-by: Quanlong Huang 
Tested-by: Impala Public Jenkins 


> Queries with many nested inline views see performance issues with 
> ExprSubstitutionMap
> -
>
> Key: IMPALA-12800
> URL: https://issues.apache.org/jira/browse/IMPALA-12800
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 4.3.0
>Reporter: Joe McDonnell
>Assignee: Michael Smith
>Priority: Critical
> Fix For: Impala 4.5.0
>
> Attachments: impala12800repro.sql, impala12800schema.sql, 
> long_query_jstacks.tar.gz
>
>
> A user running a query with many layers of inline views saw a large amount of 
> time spent in analysis. 
>  
> {noformat}
> - Authorization finished (ranger): 7s518ms (13.134ms)
> - Value transfer graph computed: 7s760ms (241.953ms)
> - Single node plan created: 2m47s (2m39s)
> - Distributed plan created: 2m47s (7.430ms)
> - Lineage info computed: 2m47s (39.017ms)
> - Planning finished: 2m47s (672.518ms){noformat}
> In reproducing it locally, we found that most of the stacks end up in 
> ExprSubstitutionMap.
>  
> Here are the main stacks seen while running jstack every 3 seconds during a 
> 75 second execution:
> Location 1: (ExprSubstitutionMap::compose -> contains -> indexOf -> Expr 
> equals) (4 samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at org.apache.impala.analysis.Expr.equals(Expr.java:1008)
>     at java.util.ArrayList.indexOf(ArrayList.java:323)
>     at java.util.ArrayList.contains(ArrayList.java:306)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.compose(ExprSubstitutionMap.java:120){noformat}
> Location 2:  (ExprSubstitutionMap::compose -> verify -> Expr equals) (9 
> samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at org.apache.impala.analysis.Expr.equals(Expr.java:1008)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.verify(ExprSubstitutionMap.java:173)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.compose(ExprSubstitutionMap.java:126){noformat}
> Location 3: (ExprSubstitutionMap::combine -> verify -> Expr equals) (5 
> samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at org.apache.impala.analysis.Expr.equals(Expr.java:1008)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.verify(ExprSubstitutionMap.java:173)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.combine(ExprSubstitutionMap.java:143){noformat}
> Location 4:  (TupleIsNullPredicate.wrapExprs ->  Analyzer.isTrueWithNullSlots 
> -> FeSupport.EvalPredicate -> Thrift serialization) (4 samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at java.lang.StringCoding.encode(StringCoding.java:364)
>     at java.lang.String.getBytes(String.java:941)
>     at 
> org.apache.thrift.protocol.TBinaryProtocol.writeString(TBinaryProtocol.java:227)
>     at 
> org.apache.impala.thrift.TClientRequest$TClientRequestStandardScheme.write(TClientRequest.java:532)
>     at 
> org.apache.impala.thrift.TClientRequest$TClientRequestStandardScheme.write(TClientRequest.java:467)
>     at org.apache.impala.thrift.TClientRequest.write(TClientRequest.java:394)
>     at 
> org.apache.impala.thrift.TQueryCtx$TQueryCtxStandardScheme.write(TQueryCtx.java:3034)
>     at 
> org.apache.impala.thrift.TQueryCtx$TQueryCtxStandardScheme.write(TQueryCtx.java:2709)
>     at org.apache.impala.thrift.TQueryCtx.write(TQueryCtx.java:2400)
>     at org.apache.thrift.TSerializer.serialize(TSerializer.java:84)
>     at 
> org.apache.impala.service.FeSupport.EvalExprWithoutRowBounded(FeSupport.java:206)
>     at 
> org.apache.impala.service.FeSupport.EvalExprWithoutRow(FeSupport.java:194)
>     at org.apache.impala.service.FeSupport.EvalPredicate(FeSupport.java:275)
>     

[jira] [Commented] (IMPALA-12800) Queries with many nested inline views see performance issues with ExprSubstitutionMap

2024-06-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853919#comment-17853919
 ] 

ASF subversion and git services commented on IMPALA-12800:
--

Commit 800246add5fcb20c34a767870346f6ce255e41f9 in impala's branch 
refs/heads/master from Michael Smith
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=800246add ]

IMPALA-12800: Use HashMap for ExprSubstitutionMap lookups

Adds a HashMap to ExprSubstitutionMap to speed lookups while retaining
lists for correct ordering (ordering needs to match to SlotRef order).
Ignores duplicate inserts, preserving the old behavior that only the
first match would actually be usable; duplicates primarily show up as a
result of combining duplicate distinct and aggregate expressions, or
redundant nested aggregation (like the tests for IMPALA-10182).

Implements localHash and hashCode for Expr and related classes.

Avoids deep-cloning LHS Exprs in ExprSubstitutionMap as they're used for
lookup and not expected to be mutated.

Adds the many expressions test, which now runs in a handful of seconds.

Change-Id: Ic538a82c69ee1dd76981fbacf95289c9d00ea9fe
Reviewed-on: http://gerrit.cloudera.org:8080/21483
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Queries with many nested inline views see performance issues with 
> ExprSubstitutionMap
> -
>
> Key: IMPALA-12800
> URL: https://issues.apache.org/jira/browse/IMPALA-12800
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 4.3.0
>Reporter: Joe McDonnell
>Assignee: Michael Smith
>Priority: Critical
> Attachments: impala12800repro.sql, impala12800schema.sql, 
> long_query_jstacks.tar.gz
>
>
> A user running a query with many layers of inline views saw a large amount of 
> time spent in analysis. 
>  
> {noformat}
> - Authorization finished (ranger): 7s518ms (13.134ms)
> - Value transfer graph computed: 7s760ms (241.953ms)
> - Single node plan created: 2m47s (2m39s)
> - Distributed plan created: 2m47s (7.430ms)
> - Lineage info computed: 2m47s (39.017ms)
> - Planning finished: 2m47s (672.518ms){noformat}
> In reproducing it locally, we found that most of the stacks end up in 
> ExprSubstitutionMap.
>  
> Here are the main stacks seen while running jstack every 3 seconds during a 
> 75 second execution:
> Location 1: (ExprSubstitutionMap::compose -> contains -> indexOf -> Expr 
> equals) (4 samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at org.apache.impala.analysis.Expr.equals(Expr.java:1008)
>     at java.util.ArrayList.indexOf(ArrayList.java:323)
>     at java.util.ArrayList.contains(ArrayList.java:306)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.compose(ExprSubstitutionMap.java:120){noformat}
> Location 2:  (ExprSubstitutionMap::compose -> verify -> Expr equals) (9 
> samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at org.apache.impala.analysis.Expr.equals(Expr.java:1008)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.verify(ExprSubstitutionMap.java:173)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.compose(ExprSubstitutionMap.java:126){noformat}
> Location 3: (ExprSubstitutionMap::combine -> verify -> Expr equals) (5 
> samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at org.apache.impala.analysis.Expr.equals(Expr.java:1008)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.verify(ExprSubstitutionMap.java:173)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.combine(ExprSubstitutionMap.java:143){noformat}
> Location 4:  (TupleIsNullPredicate.wrapExprs ->  Analyzer.isTrueWithNullSlots 
> -> FeSupport.EvalPredicate -> Thrift serialization) (4 samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at java.lang.StringCoding.encode(StringCoding.java:364)
>     at java.lang.String.getBytes(String.java:941)
>     at 
> org.apache.thrift.protocol.TBinaryProtocol.writeString(TBinaryProtocol.java:227)
>     at 
> org.apache.impala.thrift.TClientRequest$TClientRequestStandardScheme.write(TClientRequest.java:532)
>     at 
> org.apache.impala.thrift.TClientRequest$TClientRequestStandardScheme.write(TClientRequest.java:467)
>     at org.apache.impala.thrift.TClientRequest.write(TClientRequest.java:394)
>     at 
> org.apache.impala.thrift.TQueryCtx$TQueryCtxStandardScheme.write(TQueryCtx.java:3034)
>     at 
> org.apache.impala.thrift.TQueryCtx$TQueryCtxStandardScheme.write(TQueryCtx.java:2709)
>     at org.apache.impala.thrift.TQueryCtx.write(TQueryCtx.java:2400)
>     at org.apache.thrift.TSerializer.serialize(TSerializer.java:84)
>     at 
> org.apache.impala.service.FeSupport.EvalExprWithoutRowBounded(FeSupport.java:206)
>     at 

[jira] [Commented] (IMPALA-13151) DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on ARM

2024-06-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853921#comment-17853921
 ] 

ASF subversion and git services commented on IMPALA-13151:
--

Commit cce6b349f1103c167e2e9ef49fa181ede301b94f in impala's branch 
refs/heads/master from Michael Smith
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=cce6b349f ]

IMPALA-13151: Use MonotonicNanos to track test time

Uses MonotonicNanos to track test time rather than MonotonicStopWatch.
IMPALA-2407 updated MonotonicStopWatch to use a low-precision
implementation for performance, which on ARM in particular sometimes
results in undercounting time by a few microseconds. That's enough to
cause a failure in DataStreamTestSlowServiceQueue.TestPrioritizeEos.

Also uses SleepForMs and NANOS_PER_SEC rather than Kudu versions to
better match Impala code base.

Reproduced on ARM and tested the new implementation for several dozen
runs without failure.

Change-Id: I9beb63669c5bdd910e5f713ecd42551841e95400
Reviewed-on: http://gerrit.cloudera.org:8080/21497
Reviewed-by: Riza Suminto 
Tested-by: Impala Public Jenkins 


> DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on ARM
> -
>
> Key: IMPALA-13151
> URL: https://issues.apache.org/jira/browse/IMPALA-13151
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Assignee: Michael Smith
>Priority: Critical
>  Labels: broken-build
>
> The recently introduced DataStreamTestSlowServiceQueue.TestPrioritizeEos is 
> failing with errors like this:
> {noformat}
> /data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/be/src/runtime/data-stream-test.cc:912
> Expected: (timer.ElapsedTime()) > (3 * MonoTime::kNanosecondsPerSecond), 
> actual: 269834 vs 30{noformat}
> So far, I only see failures on ARM jobs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-2407) Nested Types : Remove calls to clock_gettime for a 9x performance improvement on EC2

2024-06-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853922#comment-17853922
 ] 

ASF subversion and git services commented on IMPALA-2407:
-

Commit cce6b349f1103c167e2e9ef49fa181ede301b94f in impala's branch 
refs/heads/master from Michael Smith
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=cce6b349f ]

IMPALA-13151: Use MonotonicNanos to track test time

Uses MonotonicNanos to track test time rather than MonotonicStopWatch.
IMPALA-2407 updated MonotonicStopWatch to use a low-precision
implementation for performance, which on ARM in particular sometimes
results in undercounting time by a few microseconds. That's enough to
cause a failure in DataStreamTestSlowServiceQueue.TestPrioritizeEos.

Also uses SleepForMs and NANOS_PER_SEC rather than Kudu versions to
better match Impala code base.

Reproduced on ARM and tested the new implementation for several dozen
runs without failure.

Change-Id: I9beb63669c5bdd910e5f713ecd42551841e95400
Reviewed-on: http://gerrit.cloudera.org:8080/21497
Reviewed-by: Riza Suminto 
Tested-by: Impala Public Jenkins 


> Nested Types : Remove calls to clock_gettime for a 9x performance improvement 
> on EC2
> 
>
> Key: IMPALA-2407
> URL: https://issues.apache.org/jira/browse/IMPALA-2407
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.3.0
>Reporter: Mostafa Mokhtar
>Assignee: Jim Apple
>Priority: Critical
>  Labels: ec2, performance, ramp-up
> Fix For: Impala 2.5.0
>
> Attachments: q12Nested.tar.gz
>
>
> Queries against Nested types show that ~90% of the time is spent in 
> clock_gettime. 
> A cheaper accounting method can speed up Nested queries by 8-9x
> {code}
> select
>   count(*)
> from
>   customer.orders_string o,
>   o.lineitems_string l
> where
>   l_shipmode in ('MAIL', 'SHIP')
>   and l_commitdate < l_receiptdate
>   and l_shipdate < l_commitdate
>   and l_receiptdate >= '1994-01-01'
>   and l_receiptdate < '1995-01-01'
> group by
>   l_shipmode
> order by
>   l_shipmode
> {code}
> Schema
> +---+--+-+
>   
>  
> | name  | type | comment |
>   
>  
> +---+--+-+
>   
>  
> | c_custkey | bigint   | |
>   
>  
> | c_name| string   | |
>   
>  
> | c_address | string   | |
>   
>  
> | c_nationkey   | bigint   | |
> | c_phone   | string   | |
> | c_acctbal | double   | |
> | c_mktsegment  | string   | |
> | c_comment | string   | |
> | orders_string | array |   |   o_orderkey:bigint, | |
> |   |   o_orderstatus:string,  | |
> |   |   o_totalprice:double,   | |
> |   |   o_orderdate:string,| |
> |   |   o_orderpriority:string,| |
> |   |   o_clerk:string,| |
> |   |   o_shippriority:bigint, | |
> |   |   o_comment:string,  | |
> |   |   lineitems_string:array |   | l_partkey:bigint,| |
> |   | l_suppkey:bigint,| |
> |   | l_linenumber:bigint, | |
> |   | l_quantity:double,   | |
> |   | l_extendedprice:double,  | |
> |   | l_discount:double,   | |
> |   | l_tax:double,| |
> |   | l_returnflag:string, | |
> |   | l_linestatus:string, | |
> |   | l_shipdate:string,   | |
> |   | l_commitdate:string, |   

[jira] [Commented] (IMPALA-10182) Rows with NULLs filtered out with duplicate columns in subquery select inside UNION ALL

2024-06-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853920#comment-17853920
 ] 

ASF subversion and git services commented on IMPALA-10182:
--

Commit 800246add5fcb20c34a767870346f6ce255e41f9 in impala's branch 
refs/heads/master from Michael Smith
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=800246add ]

IMPALA-12800: Use HashMap for ExprSubstitutionMap lookups

Adds a HashMap to ExprSubstitutionMap to speed lookups while retaining
lists for correct ordering (ordering needs to match to SlotRef order).
Ignores duplicate inserts, preserving the old behavior that only the
first match would actually be usable; duplicates primarily show up as a
result of combining duplicate distinct and aggregate expressions, or
redundant nested aggregation (like the tests for IMPALA-10182).

Implements localHash and hashCode for Expr and related classes.

Avoids deep-cloning LHS Exprs in ExprSubstitutionMap as they're used for
lookup and not expected to be mutated.

Adds the many expressions test, which now runs in a handful of seconds.

Change-Id: Ic538a82c69ee1dd76981fbacf95289c9d00ea9fe
Reviewed-on: http://gerrit.cloudera.org:8080/21483
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Rows with NULLs filtered out with duplicate columns in subquery select inside 
> UNION ALL
> ---
>
> Key: IMPALA-10182
> URL: https://issues.apache.org/jira/browse/IMPALA-10182
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Tim Armstrong
>Assignee: Aman Sinha
>Priority: Blocker
>  Labels: correctness
> Fix For: Impala 4.0.0
>
>
> Bug report from here - 
> https://community.cloudera.com/t5/Support-Questions/quot-union-all-quot-dropping-records-with-all-null-empty/m-p/303153#M221415
> Repro:
> {noformat}
> create database if not exists as_adventure;
> use as_adventure;
> CREATE tABLE IF NOT EXISTS
> as_adventure.t1 
> ( 
> productsubcategorykey INT, 
> productline STRING);
> insert into t1 values (1,'l1');
> insert into t1 values (2,'l1');
> insert into t1 values (1,'l2');
> insert into t1 values (3,'l3');
> insert into t1 values (null,'');
> select * from t1; 
> SELECT
> MIN(t_53.c_41)   c_41,
> CAST(NULL AS DOUBLE) c_43,
> CAST(NULL AS BIGINT) c_44,
> t_53.c2  c2,
> t_53.c3s0c3s0,
> t_53.c4  c4,
> t_53.c5s0c5s0
> FROM
> (   SELECT
> t.productsubcategorykey c_41,
> t.productline   c2,
> t.productline   c3s0,
> t.productsubcategorykey c4,
> t.productsubcategorykey c5s0
> FROM
> as_adventure.t1 t
> WHERE
> true
> GROUP BY
> 2,
> 3,
> 4,
> 5 ) t_53
> GROUP BY
> 4,
> 5,
> 6,
> 7
>  
> UNION ALL
> SELECT
> MIN(t_53.c_41)   c_41,
> CAST(NULL AS DOUBLE) c_43,
> CAST(NULL AS BIGINT) c_44,
> t_53.c2  c2,
> t_53.c3s0c3s0,
> t_53.c4  c4,
> t_53.c5s0c5s0
> FROM
> (   SELECT
> t.productsubcategorykey c_41,
> t.productline   c2,
> t.productline   c3s0,
> t.productsubcategorykey c4,
> t.productsubcategorykey c5s0
> FROM
> as_adventure.t1 t
> WHERE
> true
> GROUP BY
> 2,
> 3,
> 4,
> 5 ) t_53
> GROUP BY
> 4,
> 5,
> 6,
> 7
> {noformat}
> Somewhat similar to IMPALA-7957 in that the inferred predicates from the 
> column equivalences get placed in a Select node. It's a bit different in that 
> the NULLs that are filtered out from the predicates come from the base table.
> {noformat}
> ++
> | Explain String  
>|
> ++
> | Max Per-Host Resource Reservation: Memory=136.02MB Threads=6
>|
> | Per-Host Resource Estimates: Memory=576MB   
>|
> | WARNING: The following tables are missing relevant table and/or column 
> statistics. |
> | as_adventure.t1 
>|
> | 
>|
> | PLAN-ROOT SINK 

[jira] [Commented] (IMPALA-11871) INSERT statement does not respect Ranger policies for HDFS

2024-06-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853923#comment-17853923
 ] 

ASF subversion and git services commented on IMPALA-11871:
--

Commit f7e629935b77f412bf74aeebd704af88f03de351 in impala's branch 
refs/heads/master from halim.kim
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=f7e629935 ]

IMPALA-11871: Skip permissions loading and check on HDFS if Ranger is enabled

Before this patch, Impala checked whether the Impala service user had
the WRITE access to the target HDFS table/partition(s) during the
analysis of the INSERT and LOAD DATA statements in the legacy catalog
mode. The access levels of the corresponding HDFS table and partitions
were computed by the catalog server solely based on the HDFS permissions
and ACLs when the table and partitions were instantiated.

After this patch, we skip loading HDFS permissions and assume the
Impala service user has the READ_WRITE permission on all the HDFS paths
associated with the target table during query analysis when Ranger is
enabled. The assumption could be removed after Impala's implementation
of FsPermissionChecker could additionally take Ranger's policies of HDFS
into consideration when performing the check.

Testing:
 - Added end-to-end tests to verify Impala's behavior with respect to
   the INSERT and LOAD DATA statements when Ranger is enabled in the
   legacy catalog mode.

Change-Id: Id33c400fbe0c918b6b65d713b09009512835a4c9
Reviewed-on: http://gerrit.cloudera.org:8080/20221
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> INSERT statement does not respect Ranger policies for HDFS
> --
>
> Key: IMPALA-11871
> URL: https://issues.apache.org/jira/browse/IMPALA-11871
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>
> In a cluster with Ranger auth (and with legacy catalog mode), even if you 
> provide RWX to cm_hdfs -> all-path for the user impala, inserting into a 
> table whose HDFS POSIX permissions happen to exclude impala access will 
> result in an
> {noformat}
> "AnalysisException: Unable to INSERT into target table (default.t1) because 
> Impala does not have WRITE access to HDFS location: 
> hdfs://nightly-71x-vx-2.nightly-71x-vx.root.hwx.site:8020/warehouse/tablespace/external/hive/t1"{noformat}
>  
> {noformat}
> [root@nightly-71x-vx-3 ~]# hdfs dfs -getfacl 
> /warehouse/tablespace/external/hive/t1
> file: /warehouse/tablespace/external/hive/t1 
> owner: hive 
> group: supergroup
> user::rwx
> user:impala:rwx #effective:r-x
> group::rwx #effective:r-x
> mask::r-x
> other::---
> default:user::rwx
> default:user:impala:rwx
> default:group::rwx
> default:mask::rwx
> default:other::--- {noformat}
> ~~
> ANALYSIS
> Stack trace from a version of Cloudera's distribution of Impala (impalad 
> version 3.4.0-SNAPSHOT RELEASE (build 
> {*}db20b59a093c17ea4699117155d58fe874f7d68f{*})):
> {noformat}
> at 
> org.apache.impala.catalog.FeFsTable$Utils.checkWriteAccess(FeFsTable.java:585)
> at 
> org.apache.impala.analysis.InsertStmt.analyzeWriteAccess(InsertStmt.java:545)
> at org.apache.impala.analysis.InsertStmt.analyze(InsertStmt.java:391)
> at 
> org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:463)
> at 
> org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:426)
> at org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:1570)
> at org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1536)
> at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1506)
> at 
> org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:155){noformat}
> The exception occurs at analysis time, so I tested and succeeded in writing 
> directly into the said directory.
> {noformat}
> [root@nightly-71x-vx-3 ~]# hdfs dfs -touchz 
> /warehouse/tablespace/external/hive/t1/test
> [root@nightly-71x-vx-3 ~]# hdfs dfs -ls 
> /warehouse/tablespace/external/hive/t1/
> Found 8 items
> rw-rw---+ 3 hive supergroup 417 2023-01-27 17:37 
> /warehouse/tablespace/external/hive/t1/00_0
> rw-rw---+ 3 hive supergroup 417 2023-01-27 17:44 
> /warehouse/tablespace/external/hive/t1/00_0_copy_1
> rw-rw---+ 3 hive supergroup 417 2023-01-27 17:49 
> /warehouse/tablespace/external/hive/t1/00_0_copy_2
> rw-rw---+ 3 hive supergroup 417 2023-01-27 17:53 
> /warehouse/tablespace/external/hive/t1/00_0_copy_3
> rw-rw---+ 3 impala hive 355 2023-01-27 17:17 
> /warehouse/tablespace/external/hive/t1/4c4477c12c51ad96-3126b52d_2029811630_data.0.parq
> rw-rw---+ 3 impala hive 355 2023-01-27 17:39 
> 

[jira] [Commented] (IMPALA-13146) Javascript tests sometimes fail to download NodeJS

2024-06-10 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853729#comment-17853729
 ] 

ASF subversion and git services commented on IMPALA-13146:
--

Commit e7dac008bbafb20e4c7d15d46f2bac9a757f in impala's branch 
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=e7dac008b ]

IMPALA-13146: Download NodeJS from native toolchain

Some test runs have had issues downloading the NodeJS
tarball from the nodejs servers. This changes the
test to download from our native toolchain to make this
more reliable. This means that future upgrades to
NodeJS will need to upload new tarballs to the native
toolchain.

Testing:
 - Ran x86_64/ARM javascript tests

Change-Id: I1def801469cb68633e89b4a0f3c07a771febe599
Reviewed-on: http://gerrit.cloudera.org:8080/21494
Tested-by: Impala Public Jenkins 
Reviewed-by: Surya Hebbar 
Reviewed-by: Wenzhe Zhou 


> Javascript tests sometimes fail to download NodeJS
> --
>
> Key: IMPALA-13146
> URL: https://issues.apache.org/jira/browse/IMPALA-13146
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Critical
>  Labels: broken-build, flaky
>
> For automated tests, sometimes the Javascript tests fail to download NodeJS:
> {noformat}
> 01:37:16 Fetching NodeJS v16.20.2-linux-x64 binaries ...
> 01:37:16   % Total% Received % Xferd  Average Speed   TimeTime 
> Time  Current
> 01:37:16  Dload  Upload   Total   Spent
> Left  Speed
> 01:37:16 
>   0 00 00 0  0  0 --:--:-- --:--:-- --:--:-- 0
>   0 00 00 0  0  0 --:--:--  0:00:01 --:--:-- 0
>   0 00 00 0  0  0 --:--:--  0:00:02 --:--:-- 0
>   0 21.5M0   9020 0293  0 21:23:04  0:00:03 21:23:01   293
> ...
>  30 21.5M   30 6776k    0     0  50307      0  0:07:28  0:02:17  0:05:11 23826
> 01:39:34 curl: (18) transfer closed with 15617860 bytes remaining to 
> read{noformat}
> If this keeps happening, we should mirror the NodeJS binary on the 
> native-toolchain s3 bucket.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13143) TestCatalogdHA.test_catalogd_failover_with_sync_ddl times out expecting query failure

2024-06-07 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853328#comment-17853328
 ] 

ASF subversion and git services commented on IMPALA-13143:
--

Commit bafd1903069163f38812d7fa42f9c4d2f7218fcf in impala's branch 
refs/heads/master from wzhou-code
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=bafd19030 ]

IMPALA-13143: Fix flaky test_catalogd_failover_with_sync_ddl

The test_catalogd_failover_with_sync_ddl test which was added to
custom_cluster/test_catalogd_ha.py in IMPALA-13134 failed on s3.
The test relies on specific timing with a sleep injected via a
debug action so that the DDL query is still running when catalogd
failover is triggered. The failures were caused by slowly restarting
for catalogd on s3 so that the query finished before catalogd
failover was triggered.

This patch fixed the issue by increasing the sleep time for s3 builds
and other slow builds.

Testing:
 - Ran the test 100 times in a loop on s3.

Change-Id: I15bb6aae23a2f544067f993533e322969372ebd5
Reviewed-on: http://gerrit.cloudera.org:8080/21491
Reviewed-by: Riza Suminto 
Tested-by: Impala Public Jenkins 


> TestCatalogdHA.test_catalogd_failover_with_sync_ddl times out expecting query 
> failure
> -
>
> Key: IMPALA-13143
> URL: https://issues.apache.org/jira/browse/IMPALA-13143
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Assignee: Wenzhe Zhou
>Priority: Critical
>  Labels: broken-build, flaky
>
> The new TestCatalogdHA.test_catalogd_failover_with_sync_ddl test is failing 
> intermittently with:
> {noformat}
> custom_cluster/test_catalogd_ha.py:472: in 
> test_catalogd_failover_with_sync_ddl
> self.wait_for_state(handle, QueryState.EXCEPTION, 30, client=client)
> common/impala_test_suite.py:1216: in wait_for_state
> self.wait_for_any_state(handle, [expected_state], timeout, client)
> common/impala_test_suite.py:1234: in wait_for_any_state
> raise Timeout(timeout_msg)
> E   Timeout: query '9d49ab6360f6cbc5:4826a796' did not reach one of 
> the expected states [5], last known state 4{noformat}
> This means the query succeeded even though we expected it to fail. This is 
> currently limited to s3 jobs. In a different test, we saw issues because s3 
> is slower (see IMPALA-12616).
> This test was introduced by IMPALA-13134: 
> https://github.com/apache/impala/commit/70b7b6a78d49c30933d79e0a1c2a725f7e0a3e50



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13134) DDL hang with SYNC_DDL enabled when Catalogd is changed to standby status

2024-06-07 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853329#comment-17853329
 ] 

ASF subversion and git services commented on IMPALA-13134:
--

Commit bafd1903069163f38812d7fa42f9c4d2f7218fcf in impala's branch 
refs/heads/master from wzhou-code
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=bafd19030 ]

IMPALA-13143: Fix flaky test_catalogd_failover_with_sync_ddl

The test_catalogd_failover_with_sync_ddl test which was added to
custom_cluster/test_catalogd_ha.py in IMPALA-13134 failed on s3.
The test relies on specific timing with a sleep injected via a
debug action so that the DDL query is still running when catalogd
failover is triggered. The failures were caused by slowly restarting
for catalogd on s3 so that the query finished before catalogd
failover was triggered.

This patch fixed the issue by increasing the sleep time for s3 builds
and other slow builds.

Testing:
 - Ran the test 100 times in a loop on s3.

Change-Id: I15bb6aae23a2f544067f993533e322969372ebd5
Reviewed-on: http://gerrit.cloudera.org:8080/21491
Reviewed-by: Riza Suminto 
Tested-by: Impala Public Jenkins 


> DDL hang with SYNC_DDL enabled when Catalogd is changed to standby status
> -
>
> Key: IMPALA-13134
> URL: https://issues.apache.org/jira/browse/IMPALA-13134
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend, Catalog
>Reporter: Wenzhe Zhou
>Assignee: Wenzhe Zhou
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> Catalogd waits for SYNC_DDL version when it processes a DDL with SYNC_DDL 
> enabled. If the status of Catalogd is changed from active to standby when 
> CatalogServiceCatalog.waitForSyncDdlVersion() is called, the standby catalogd 
> does not receive catalog topic updates from statestore. This causes catalogd 
> thread waits indefinitely and DDL query hanging.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13096) Cleanup Parser.jj for Calcite planner to only use supported syntax

2024-06-07 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853235#comment-17853235
 ] 

ASF subversion and git services commented on IMPALA-13096:
--

Commit 141f38197be2ca23757cb8b3f283cdb9dd62de47 in impala's branch 
refs/heads/master from Steve Carlin
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=141f38197 ]

IMPALA-12935: First pass on Calcite planner functions

This commit handles the first pass on getting functions to work
through the Calcite planner. Only basic functions will work with
this commit. Implicit conversions for parameters are not yet supported.
Custom UDFs are also not supported yet.

The ImpalaOperatorTable is used at validation time to check for
existence of the function name for Impala. At first, it will check
Calcite operators for the existence of the function name (A TODO,
IMPALA-13096, is that we need to remove non-supported names from the
parser file). It is preferable to use the Calcite Operator since
Calcite does some optimizations based on the Calcite Operator class.

If the name is not found within the Calcite Operators, a check is done
within the BuiltinsDb (TODO: IMPALA-13095 handle UDFs) for the function.
If found, and SqlOperator class is generated on the fly to handle this
function.

The validation process for Calcite includes a call into the operator
method "inferReturnType". This method will validate that there exists
a function that will handle the operands, and if so, return the "return
type" of the function. In this commit, we will assume that the Calcite
operators will match Impala functionality. In later commits, there
will be overrides where we will use Impala validation for operators
where Calcite's validation isn't good enough.

After validation is complete, the functions will be in a Calcite format.
After the rest of compilation (relnode conversion, optimization) is
complete, the function needs to be converted back into Impala form (the
Expr object) to eventually get it into its thrift request.

In this commit, all functions are converted into Expr starting in the
ImpalaProjectRel, since this is the RelNode where functions do their
thing. The RexCallConverter and RexLiteralConverter get called via the
CreateExprVisitor for this conversion.

Since Calcite is providing the analysis portion of the planning, there
is no need to go through Impala's Analyzer object. However, the Impala
planner requires Expr objects to be analyzed. To get around this, the
AnalyzedFunctionCallExpr and AnalyzedNullLiteral objects exist which
analyze the expression in the constructor. While this could potentially
be combined with the existing FunctionCallExpr and NullLiteral objects,
this fits in with the general plan to avoid changing "fe" Impala code
as much as we can until much later in the commit cycle. Also, there
will be other Analyzed*Expr classes created in the future, but this
commit is intended for basic function call expressions only.

One minor change to the parser is added with this commit. Calcite parser
does not have acknowledge the "string" datatype, so this has been
added here in Parser.jj and config.fmpp.

Change-Id: I2dd4e402d69ee10547abeeafe893164ffd789b88
Reviewed-on: http://gerrit.cloudera.org:8080/21357
Reviewed-by: Michael Smith 
Tested-by: Impala Public Jenkins 


> Cleanup Parser.jj for Calcite planner to only use supported syntax
> --
>
> Key: IMPALA-13096
> URL: https://issues.apache.org/jira/browse/IMPALA-13096
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Steve Carlin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13095) Handle UDFs in Calcite planner

2024-06-07 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853236#comment-17853236
 ] 

ASF subversion and git services commented on IMPALA-13095:
--

Commit 141f38197be2ca23757cb8b3f283cdb9dd62de47 in impala's branch 
refs/heads/master from Steve Carlin
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=141f38197 ]

IMPALA-12935: First pass on Calcite planner functions

This commit handles the first pass on getting functions to work
through the Calcite planner. Only basic functions will work with
this commit. Implicit conversions for parameters are not yet supported.
Custom UDFs are also not supported yet.

The ImpalaOperatorTable is used at validation time to check for
existence of the function name for Impala. At first, it will check
Calcite operators for the existence of the function name (A TODO,
IMPALA-13096, is that we need to remove non-supported names from the
parser file). It is preferable to use the Calcite Operator since
Calcite does some optimizations based on the Calcite Operator class.

If the name is not found within the Calcite Operators, a check is done
within the BuiltinsDb (TODO: IMPALA-13095 handle UDFs) for the function.
If found, and SqlOperator class is generated on the fly to handle this
function.

The validation process for Calcite includes a call into the operator
method "inferReturnType". This method will validate that there exists
a function that will handle the operands, and if so, return the "return
type" of the function. In this commit, we will assume that the Calcite
operators will match Impala functionality. In later commits, there
will be overrides where we will use Impala validation for operators
where Calcite's validation isn't good enough.

After validation is complete, the functions will be in a Calcite format.
After the rest of compilation (relnode conversion, optimization) is
complete, the function needs to be converted back into Impala form (the
Expr object) to eventually get it into its thrift request.

In this commit, all functions are converted into Expr starting in the
ImpalaProjectRel, since this is the RelNode where functions do their
thing. The RexCallConverter and RexLiteralConverter get called via the
CreateExprVisitor for this conversion.

Since Calcite is providing the analysis portion of the planning, there
is no need to go through Impala's Analyzer object. However, the Impala
planner requires Expr objects to be analyzed. To get around this, the
AnalyzedFunctionCallExpr and AnalyzedNullLiteral objects exist which
analyze the expression in the constructor. While this could potentially
be combined with the existing FunctionCallExpr and NullLiteral objects,
this fits in with the general plan to avoid changing "fe" Impala code
as much as we can until much later in the commit cycle. Also, there
will be other Analyzed*Expr classes created in the future, but this
commit is intended for basic function call expressions only.

One minor change to the parser is added with this commit. Calcite parser
does not have acknowledge the "string" datatype, so this has been
added here in Parser.jj and config.fmpp.

Change-Id: I2dd4e402d69ee10547abeeafe893164ffd789b88
Reviewed-on: http://gerrit.cloudera.org:8080/21357
Reviewed-by: Michael Smith 
Tested-by: Impala Public Jenkins 


> Handle UDFs in Calcite planner
> --
>
> Key: IMPALA-13095
> URL: https://issues.apache.org/jira/browse/IMPALA-13095
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Steve Carlin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12935) Allow function parsing for Impala Calcite planner

2024-06-07 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853234#comment-17853234
 ] 

ASF subversion and git services commented on IMPALA-12935:
--

Commit 141f38197be2ca23757cb8b3f283cdb9dd62de47 in impala's branch 
refs/heads/master from Steve Carlin
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=141f38197 ]

IMPALA-12935: First pass on Calcite planner functions

This commit handles the first pass on getting functions to work
through the Calcite planner. Only basic functions will work with
this commit. Implicit conversions for parameters are not yet supported.
Custom UDFs are also not supported yet.

The ImpalaOperatorTable is used at validation time to check for
existence of the function name for Impala. At first, it will check
Calcite operators for the existence of the function name (A TODO,
IMPALA-13096, is that we need to remove non-supported names from the
parser file). It is preferable to use the Calcite Operator since
Calcite does some optimizations based on the Calcite Operator class.

If the name is not found within the Calcite Operators, a check is done
within the BuiltinsDb (TODO: IMPALA-13095 handle UDFs) for the function.
If found, and SqlOperator class is generated on the fly to handle this
function.

The validation process for Calcite includes a call into the operator
method "inferReturnType". This method will validate that there exists
a function that will handle the operands, and if so, return the "return
type" of the function. In this commit, we will assume that the Calcite
operators will match Impala functionality. In later commits, there
will be overrides where we will use Impala validation for operators
where Calcite's validation isn't good enough.

After validation is complete, the functions will be in a Calcite format.
After the rest of compilation (relnode conversion, optimization) is
complete, the function needs to be converted back into Impala form (the
Expr object) to eventually get it into its thrift request.

In this commit, all functions are converted into Expr starting in the
ImpalaProjectRel, since this is the RelNode where functions do their
thing. The RexCallConverter and RexLiteralConverter get called via the
CreateExprVisitor for this conversion.

Since Calcite is providing the analysis portion of the planning, there
is no need to go through Impala's Analyzer object. However, the Impala
planner requires Expr objects to be analyzed. To get around this, the
AnalyzedFunctionCallExpr and AnalyzedNullLiteral objects exist which
analyze the expression in the constructor. While this could potentially
be combined with the existing FunctionCallExpr and NullLiteral objects,
this fits in with the general plan to avoid changing "fe" Impala code
as much as we can until much later in the commit cycle. Also, there
will be other Analyzed*Expr classes created in the future, but this
commit is intended for basic function call expressions only.

One minor change to the parser is added with this commit. Calcite parser
does not have acknowledge the "string" datatype, so this has been
added here in Parser.jj and config.fmpp.

Change-Id: I2dd4e402d69ee10547abeeafe893164ffd789b88
Reviewed-on: http://gerrit.cloudera.org:8080/21357
Reviewed-by: Michael Smith 
Tested-by: Impala Public Jenkins 


> Allow function parsing for Impala Calcite planner
> -
>
> Key: IMPALA-12935
> URL: https://issues.apache.org/jira/browse/IMPALA-12935
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Steve Carlin
>Priority: Major
>
> We need the ability to parse and validate Impala functions using the Calcite 
> planner
> This commit is not attended to work for all functions, or even most 
> functions.  It will work as a base to be reviewed, and at least some 
> functions will work.  More complicated functions will be added in a later 
> commit.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12616) test_restart_catalogd_while_handling_rpc_response* tests fail not reaching expected states

2024-06-07 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853196#comment-17853196
 ] 

ASF subversion and git services commented on IMPALA-12616:
--

Commit 1935f9e1a199c958c5fb12ad53277fa720d6ae5c in impala's branch 
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=1935f9e1a ]

IMPALA-12616: Fix test_restart_services.py::TestRestart tests for S3

The test_restart_catalogd_while_handling_rpc_response* tests
from custom_cluster/test_restart_services.py have been failing
consistently on s3. The alter table statement is expected to
succeed, but instead it fails with:
"CatalogException: Detected catalog service ID changes"
This manifests as a timeout waiting for the statement to reach
the finished state.

The test relies on specific timing with a sleep injected via a
debug action. The failure stems from the catalog being slower
on s3. The alter table wakes up before the catalog service ID
change has fully completed, and it fails when it sees the
catalog service ID change.

This increases two sleep times:
1. This increases the sleep time before restarting the catalogd
   from 0.5 seconds to 5 seconds. This gives the catalogd longer
   to receive the message about the alter table and respond back
   to the impalad.
2. This increases the WAIT_BEFORE_PROCESSING_CATALOG_UPDATE
   sleep from 10 seconds to 30 seconds so the alter table
   statement doesn't wake up until the catalog service ID change
   is finalized.
The test is verifying that the right messages are in the impalad
logs, so we know this is still testing the same condition.

This modifies the tests to use wait_for_finished_timeout()
rather than wait_for_state(). This bails out immediately if the
query fails rather than waiting unnecessarily for the full timeout.
This also clears the query options so that later statements
don't inherit the debug_action that the alter table statement
used.

Testing:
 - Ran the tests 100x in a loop on s3
 - Ran the tests 100x in a loop on HDFS

Change-Id: Ieb5699b8fb0b2ad8bad4ac30922a7b4d7fa17d29
Reviewed-on: http://gerrit.cloudera.org:8080/21485
Tested-by: Impala Public Jenkins 
Reviewed-by: Daniel Becker 


> test_restart_catalogd_while_handling_rpc_response* tests fail not reaching 
> expected states
> --
>
> Key: IMPALA-12616
> URL: https://issues.apache.org/jira/browse/IMPALA-12616
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 1.4.2
>Reporter: Andrew Sherman
>Assignee: Daniel Becker
>Priority: Critical
>
> There are failures in both 
> custom_cluster.test_restart_services.TestRestart.test_restart_catalogd_while_handling_rpc_response_with_timeout
>  and 
> custom_cluster.test_restart_services.TestRestart.test_restart_catalogd_while_handling_rpc_response_with_max_iters,
>  both look the same:
> {code:java}
> custom_cluster/test_restart_services.py:232: in 
> test_restart_catalogd_while_handling_rpc_response_with_timeout
> self.wait_for_state(handle, self.client.QUERY_STATES["FINISHED"], 
> max_wait_time)
> common/impala_test_suite.py:1181: in wait_for_state
> self.wait_for_any_state(handle, [expected_state], timeout, client)
> common/impala_test_suite.py:1199: in wait_for_any_state
> raise Timeout(timeout_msg)
> E   Timeout: query '6a4e0bad9b511ccf:bf93de68' did not reach one of 
> the expected states [4], last known state 5
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13130) Under heavy load, Impala does not prioritize data stream operations

2024-06-07 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853074#comment-17853074
 ] 

ASF subversion and git services commented on IMPALA-13130:
--

Commit 3f827bfc2447d8c11a4f09bcb96e86c53b92d753 in impala's branch 
refs/heads/master from Michael Smith
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=3f827bfc2 ]

IMPALA-13130: Prioritize EndDataStream messages

Prioritize EndDataStream messages over other types handled by
DataStreamService, and avoid rejecting them when memory limit is
reached. They take very little memory (~75 bytes) and will usually help
reduce memory use by closing out in-progress operations.

Adds the 'data_stream_sender_eos_timeout_ms' flag to control EOS
timeouts. Defaults to 1 hour, and can be disabled by setting to -1.

Adds unit tests ensuring EOS are processed even if mem limit is reached
and ahead of TransmitData messages in the queue.

Change-Id: I2829e1ab5bcde36107e10bff5fe629c5ee60f3e8
Reviewed-on: http://gerrit.cloudera.org:8080/21476
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Under heavy load, Impala does not prioritize data stream operations
> ---
>
> Key: IMPALA-13130
> URL: https://issues.apache.org/jira/browse/IMPALA-13130
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Major
>
> Under heavy load - where Impala reaches max memory for the DataStreamService 
> and applies backpressure via 
> https://github.com/apache/impala/blob/4.4.0/be/src/rpc/impala-service-pool.cc#L191-L199
>  - DataStreamService does not differentiate between types of requests and may 
> reject requests that could help reduce load.
> The DataStreamService deals with TransmitData, PublishFilter, UpdateFilter, 
> UpdateFilterFromRemote, and EndDataStream. It seems like we should prioritize 
> completing EndDataStream, especially under heavy load, to complete work and 
> release resources more quickly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13119) CostingSegment.java is initialized with wrong cost

2024-06-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852621#comment-17852621
 ] 

ASF subversion and git services commented on IMPALA-13119:
--

Commit 753ee9b8a80d8e4c0db966a3132446a5aceb05cd in impala's branch 
refs/heads/master from Riza Suminto
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=753ee9b8a ]

IMPALA-13119: Fix cost_ initialization at CostingSegment.java

This patch fix cost_ initialization of CostingSegment. The public
constructor should initialize cost_ with ProcessingCost directly taken
from PlanNode or DataSink parameter. The private constructor still
initialize cost_ with ProcessingCost.zero().

Testing:
- Add TpcdsCpuCostPlannerTest#testQ43Verbose
  Verify that "#cons:#prod" is correct in verbose profile.
- Pass FE tests TpcdsCpuCostPlannerTest, PlannerTest#testProcessingCost,
  and PlannerTest#testProcessingCostPlanAdmissionSlots
- Pass test_executor_groups.py

Change-Id: I5b3c99c87a1d0a08edc8d276cf33d709bd39fe14
Reviewed-on: http://gerrit.cloudera.org:8080/21468
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> CostingSegment.java is initialized with wrong cost
> --
>
> Key: IMPALA-13119
> URL: https://issues.apache.org/jira/browse/IMPALA-13119
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 4.4.0
>Reporter: Riza Suminto
>Assignee: Riza Suminto
>Priority: Major
>
> CostingSegment.java has two public constructor: one accept PlanNode, while 
> the other accept DataSink as parameter. Both call appendCost method, which 
> sum the additionalCost with the segment's current cost_.
> However, if cost_ were ProcessingCost.zero(), it can mistakenly 
> setNumRowToConsume to 0.
> [https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/CostingSegment.java#L114]
>  
> The public constructor should just initialize cost_ with ProcessingCost from 
> PlanNode or DataSink from constructor.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13134) DDL hang with SYNC_DDL enabled when Catalogd is changed to standby status

2024-06-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852620#comment-17852620
 ] 

ASF subversion and git services commented on IMPALA-13134:
--

Commit 70b7b6a78d49c30933d79e0a1c2a725f7e0a3e50 in impala's branch 
refs/heads/master from wzhou-code
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=70b7b6a78 ]

IMPALA-13134: DDL hang with SYNC_DDL enabled when Catalogd is changed to 
standby status

Catalogd waits for SYNC_DDL version when it processes a DDL with
SYNC_DDL enabled. If the status of Catalogd is changed from active to
standby when CatalogServiceCatalog.waitForSyncDdlVersion() is called,
the standby catalogd does not receive catalog topic updates from
statestore, hence catalogd thread waits indefinitely.

This patch fixed the issue by re-generating service id when Catalogd
is changed to standby status and throwing exception if its service id
has been changed when waiting for SYNC_DDL version.

Testing:
 - Added unit-test code for CatalogD HA to run DDL with SYNC_DDL enabled
   and injected delay when waiting SYNC_DLL version, then verify that
   DDL query fails due to catalog failover.
 - Passed test_catalogd_ha.py.

Change-Id: I2dcd628cff3c10d2e7566ba2d9de0b5886a18fc1
Reviewed-on: http://gerrit.cloudera.org:8080/21480
Reviewed-by: Riza Suminto 
Tested-by: Impala Public Jenkins 


> DDL hang with SYNC_DDL enabled when Catalogd is changed to standby status
> -
>
> Key: IMPALA-13134
> URL: https://issues.apache.org/jira/browse/IMPALA-13134
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend, Catalog
>Reporter: Wenzhe Zhou
>Assignee: Wenzhe Zhou
>Priority: Major
>
> Catalogd waits for SYNC_DDL version when it processes a DDL with SYNC_DDL 
> enabled. If the status of Catalogd is changed from active to standby when 
> CatalogServiceCatalog.waitForSyncDdlVersion() is called, the standby catalogd 
> does not receive catalog topic updates from statestore. This causes catalogd 
> thread waits indefinitely and DDL query hanging.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12705) Add a page to show the catalog's HA information

2024-06-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852597#comment-17852597
 ] 

ASF subversion and git services commented on IMPALA-12705:
--

Commit f67f5f1815c60a4723887ea6fcdaa067b7fa4ca5 in impala's branch 
refs/heads/master from ttz
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=f67f5f181 ]

IMPALA-12705: Add /catalog_ha_info page on Statestore to show catalog HA 
information

This patch adds /catalog_ha_info page on Statestore to show catalog HA
information. The page contains the following information: Active Node,
Standby Node, and Notified Subscribers table. In the Notified
Subscribers table, include the following information items:
  -- Id,
  -- Address,
  -- Registration ID,
  -- Subscriber Type,
  -- Catalogd Version,
  -- Catalogd Address,
  -- Last Update Catalogd Time

Change-Id: If85f6a827ae8180d13caac588b92af0511ac35e3
Reviewed-on: http://gerrit.cloudera.org:8080/21418
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Add a page to show the catalog's HA information
> ---
>
> Key: IMPALA-12705
> URL: https://issues.apache.org/jira/browse/IMPALA-12705
> Project: IMPALA
>  Issue Type: Improvement
>Affects Versions: Impala 4.3.0
>Reporter: Zhi Tang
>Assignee: Zhi Tang
>Priority: Major
> Attachments: image-2024-05-27-10-57-37-158.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13129) Hit DCHECK when skipping MIN_MAX runtime filter

2024-06-04 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852181#comment-17852181
 ] 

ASF subversion and git services commented on IMPALA-13129:
--

Commit e2e45401e2bead4090fd5c562709db521cbc6d38 in impala's branch 
refs/heads/master from Riza Suminto
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=e2e45401e ]

IMPALA-13129: Move runtime filter skipping at registerRuntimeFilter

A DCHECK in hdfs-scanner.h was hit when skipping a MIN_MAX runtime
filter using RUNTIME_FILTER_IDS_TO_SKIP query option. This is because
HdfsScanNode.tryToComputeOverlapPredicate() is called and register a
TOverlapPredicateDesc during runtime filter generation, but the minmax
filter is then skipped later, causing backend to hit DCHECK.

This patch move the runtime filter skipping at registerRuntimeFilter()
so that HdfsScanNode.tryToComputeOverlapPredicate() will not be called
at all once a filter is skipped.

Testing:
- Add test in overlap_min_max_filters.test to explicitly skip a minmax
  runtime filter.
- Pass test_runtime_filters.py

Change-Id: I43c1c4abc88019aadaa85d2e3d0ecda417297bfc
Reviewed-on: http://gerrit.cloudera.org:8080/21477
Reviewed-by: Wenzhe Zhou 
Tested-by: Impala Public Jenkins 


> Hit DCHECK when skipping MIN_MAX runtime filter
> ---
>
> Key: IMPALA-13129
> URL: https://issues.apache.org/jira/browse/IMPALA-13129
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 4.4.0
>Reporter: Riza Suminto
>Assignee: Riza Suminto
>Priority: Major
>
> A [DCHECK in 
> hdfs-scanner.h|https://github.com/apache/impala/blob/ce8078204e5995277f79e226e26fe8b9eaca408b/be/src/exec/hdfs-scanner.h#L199]
>  is hit when skipping a MIN_MAX runtime filter using 
> RUNTIME_FILTER_IDS_TO_SKIP query option. This is because during runtime 
> filter generation, HdfsScanNode.tryToComputeOverlapPredicate() is called and 
> register a 
> TOverlapPredicateDesc, but the minmax filter is then skipped later, causing 
> backend to hit DCHECK.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13111) impala-gdb.py's find-query-ids/find-fragment-instances return unusable query ids

2024-05-31 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851225#comment-17851225
 ] 

ASF subversion and git services commented on IMPALA-13111:
--

Commit ce8078204e5995277f79e226e26fe8b9eaca408b in impala's branch 
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=ce8078204 ]

IMPALA-13111: Fix the calculation of fragment ids for impala-gdb.py

The gdb helpers in impala-gdb.py provide functions to look on
the stack for the information added in IMPALA-6416 and get the
fragment/query ids. Right now, it is incorrectly using a signed
integer, which leads to incorrect ids like this:
-3cbda1606b3ade7c:f170c4bd

This changes the logic to AND the integer with an 0xFF* sequence
of the right length. This forces the integer to be unsigned,
producing the right query id.

Testing:
 - Ran this on a minidump and verified the the listed query ids
   were valid (and existed in the profile log)

Change-Id: I59798407e99ee0e9100cac6b4b082cdb85ed43d1
Reviewed-on: http://gerrit.cloudera.org:8080/21472
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> impala-gdb.py's find-query-ids/find-fragment-instances return unusable query 
> ids
> 
>
> Key: IMPALA-13111
> URL: https://issues.apache.org/jira/browse/IMPALA-13111
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 4.5.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
>
> The gdb helpers in lib/python/impala_py_lib/gdb/impala-gdb.py provide 
> information about the queries / fragments running in a core file. However, 
> the query/fragment ids that it returns have issues with the signedness of the 
> integers:
> {noformat}
> (gdb) find-fragment-instances
> Fragment Instance Id    Thread IDs
> -23b76c1699a831a1:279358680036    [117120]
> -23b76c1699a831a1:279358680037    [117121]
> -23b76c1699a831a1:279358680038    [117122]
> ..
> (gdb) find-query-ids
> -3cbda1606b3ade7c:f170c4bd
> -23b76c1699a831a1:27935868
> 68435df1364aa90f:1752944f
> 3442ed6354c7355d:78c83d20{noformat}
> The low values for find-query-ids don't have this problem, because it is 
> ANDed with 0x:
> {noformat}
>             qid_low = format(int(qid_low, 16) & 0x, 
> 'x'){noformat}
> We can fix the other locations by ANDing with 0x.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-6416) Extend Thread::Create to track fragment instance id automatically based on parent's fid

2024-05-31 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-6416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851226#comment-17851226
 ] 

ASF subversion and git services commented on IMPALA-6416:
-

Commit ce8078204e5995277f79e226e26fe8b9eaca408b in impala's branch 
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=ce8078204 ]

IMPALA-13111: Fix the calculation of fragment ids for impala-gdb.py

The gdb helpers in impala-gdb.py provide functions to look on
the stack for the information added in IMPALA-6416 and get the
fragment/query ids. Right now, it is incorrectly using a signed
integer, which leads to incorrect ids like this:
-3cbda1606b3ade7c:f170c4bd

This changes the logic to AND the integer with an 0xFF* sequence
of the right length. This forces the integer to be unsigned,
producing the right query id.

Testing:
 - Ran this on a minidump and verified the the listed query ids
   were valid (and existed in the profile log)

Change-Id: I59798407e99ee0e9100cac6b4b082cdb85ed43d1
Reviewed-on: http://gerrit.cloudera.org:8080/21472
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Extend Thread::Create to track fragment instance id automatically based on 
> parent's fid
> ---
>
> Key: IMPALA-6416
> URL: https://issues.apache.org/jira/browse/IMPALA-6416
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
> Fix For: Impala 2.12.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13057) Incorporate tuple/slot information into the tuple cache key

2024-05-30 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850883#comment-17850883
 ] 

ASF subversion and git services commented on IMPALA-13057:
--

Commit 825900fa6c3a51941b7b90edb8af6f7dba5e5fe8 in impala's branch 
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=825900fa6 ]

IMPALA-13057: Incorporate tuple/slot information into tuple cache key

The tuple cache keys currently do not include information about
the tuples or slots, as that information is stored outside
the PlanNode thrift structures. The tuple/slot information is
critical to determining which columns are referenced and what
data layout the result tuple has. This adds code to incorporate
the TupleDescriptors and SlotDescriptors into the cache key.

Since the tuple and slot ids are indexes into a global structure
(the descriptor table), they hinder cache key matches across
different queries. If a query has an extra filter, it can shift
all the slot ids. If the query has an extra join, it can
shift all the tuple ids. To eliminate this effect, this adds the
ability to translate tuple and slot ids from global indices to
local indices. The translation only contains information from the
subtree below that point, so it is not influenced by unrelated
parts of the query.

When the code registers a tuple with the TupleCacheInfo, it also
registers a translation from the global index to a local index.
Any code that puts SlotIds or TupleIds into a Thrift data structure
can use the translateTupleId() and translateSlotId() functions to
get the local index. These are exposed on ThriftSerializationCtx
by functions of the same name, but those functions apply the
translation only when working for the tuple cache.

This passes the ThriftSerializationCtx into Exprs that have
TupleIds or SlotIds and applies the translation. It also passes
the ThriftSerializationCtx into PlanNode::toThrift(), which is
used to translate TupleIds in HdfsScanNode.

This also adds a way to register a table with the tuple cache
and incorporate information about it. This allows us to mask
out additional fields in PlanNode and enable a test case that
relies on matching with different table aliases.

Testing:
 - This fixes some commented out test cases in TupleCacheTest
   (specifically telling columns apart)
 - This adds new test cases that match due to id translation
   (extra filters, extra joins)
 - This adds a unit test for the id translation to
   TupleCacheInfoTest

Change-Id: I7f5278e9dbb976cbebdc6a21a6e66bc90ce06c6c
Reviewed-on: http://gerrit.cloudera.org:8080/21398
Reviewed-by: Joe McDonnell 
Tested-by: Impala Public Jenkins 


> Incorporate tuple/slot information into the tuple cache key
> ---
>
> Key: IMPALA-13057
> URL: https://issues.apache.org/jira/browse/IMPALA-13057
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 4.4.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
>
> Since the tuple and slot information is kept separately in the descriptor 
> table, it does not get incorporated into the PlanNode thrift used for the 
> tuple cache key. This means that the tuple cache can't distinguish between 
> these two queries:
> {noformat}
> select int_col1 from table;
> select int_col2 from table;{noformat}
> To solve this, the tuple/slot information needs to be incorporated into the 
> cache key. PlanNode::initThrift() walks through each tuple, so this is a good 
> place to serialize the TupleDescriptor/SlotDescriptors and incorporate it 
> into the hash.
> The tuple ids and slot ids are global ids, so the value is influenced by the 
> entirety of the query. This is a problem for matching cache results across 
> different queries. As part of incorporating the tuple/slot information, we 
> should also add an ability to translate tuple/slot ids into ids local to a 
> subtree.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13108) Update Impala version to 4.5.0-SNAPSHOT

2024-05-29 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850526#comment-17850526
 ] 

ASF subversion and git services commented on IMPALA-13108:
--

Commit 1324a6e6c9589300424c84ad2a2aa7fd256068b2 in impala's branch 
refs/heads/master from Zoltan Borok-Nagy
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=1324a6e6c ]

IMPALA-13108: Update version to 4.5.0-SNAPSHOT

Updated IMPALA_VERSION in impala-config.sh

Executed the followings for Java:

  cd java
  mvn versions:set -DnewVersion=4.5.0-SNAPSHOT

Change-Id: Ie7803fe523406dbdd1ac066a35bb31d21765a244
Reviewed-on: http://gerrit.cloudera.org:8080/21460
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Update Impala version to 4.5.0-SNAPSHOT
> ---
>
> Key: IMPALA-13108
> URL: https://issues.apache.org/jira/browse/IMPALA-13108
> Project: IMPALA
>  Issue Type: Task
>Reporter: Zoltán Borók-Nagy
>Priority: Major
>
> With the release of 4.4.0, we should update the master to version 4.5.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13107) Invalid TExecPlanFragmentInfo received by executor with instance number as 0

2024-05-29 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850525#comment-17850525
 ] 

ASF subversion and git services commented on IMPALA-13107:
--

Commit 3e1b10556bc83b0e697b7a2aac411ccad6094563 in impala's branch 
refs/heads/master from wzhou-code
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=3e1b10556 ]

IMPALA-13107: Don't start query on executor if instance number equals 0

In bad networking condition, TExecPlanFragmentInfo in KRPC messages
received by executors could be truncated due to KRPC failures, but
truncation may not cause thrift deserialization error. The invalid
TExecPlanFragmentInfo causes Impala daemon to crash.
To avoid crash, this patch checks number of instances in received
TExecPlanFragment on executor. The query will not be started if number
of instances equals 0. Also adds DCHECK on coordinator side to make
sure it does not send TExecPlanFragment without any instance.

Testing:
 - Passed core tests.
 - Passed exhaustive tests in debug build. The new DCHECKs were not
   hit.

Change-Id: Ie92ee120f1e9369f8dc2512792a05b7f8be5f007
Reviewed-on: http://gerrit.cloudera.org:8080/21458
Reviewed-by: Wenzhe Zhou 
Tested-by: Impala Public Jenkins 


> Invalid TExecPlanFragmentInfo received by executor with instance number as 0
> 
>
> Key: IMPALA-13107
> URL: https://issues.apache.org/jira/browse/IMPALA-13107
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Wenzhe Zhou
>Assignee: Wenzhe Zhou
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> In a customer reported case, TExecPlanFragmentInfo received by executors with 
> instance number equals 0, which caused impala daemon to crash. Here are log 
> messages collected on the Impala executors:
> {code:java}
> impalad.executor.net.impala.log.INFO.20240522-160138.197583:I0523 
> 00:59:16.892853 199528 control-service.cc:148] 
> 624c47e9264ebb62:5aa89af3] ExecQueryFInstances(): 
> query_id=624c47e9264ebb62:5aa89af3 coord=coordinator.net:27000 
> #instances=0
> ..
> I0523 00:59:19.306522 199185 kMinidump in thread 
> [1890723]query-state-624c47e9264ebb62:5aa89af3 running query 
> 624c47e9264ebb62:5aa89af3, fragment instance 
> :
> Wrote minidump to 
> /var/log/impala-minidumps/impalad/021b06ea-1627-4c69-9f27858a-f3cd9026.dmp
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x012ff9d9, pid=197583, tid=0x7eefc98a0700
> #
> # JRE version: Java(TM) SE Runtime Environment (8.0_381) (build 1.8.0_381-b09)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.381-b09 mixed mode 
> linux-amd64 )
> # Problematic frame:
> # C  [impalad+0xeff9d9]  
> impala::FragmentState::FragmentState(impala::QueryState*, 
> impala::TPlanFragment const&, impala::PlanFragmentCtxPB const&)+0xf9
> #
> # Failed to write core dump. Core dumps have been disabled. To enable core 
> dumping, try "ulimit -c unlimited" before starting Java again
> #
> {code}
> From the collected profiles, there was no fragment with instance number as 0 
> in the corresponding query plan so coordinator should not send fragments to 
> executor with number of instances as 0.  Executor log files showed that there 
> were lots of KRPC errors around the time when receiving invalid 
> TExecPlanFragmentInfo. It seems KRPC messages were truncated due to KRPC 
> failures, but truncation might not cause thrift deserialization error. The 
> invalid TExecPlanFragmentInfo caused Impala daemon to crash with following 
> stack trace when the query was started on executor.
> {code:java}
> #0  SubstituteArg (value=..., this=0x7f86cec79d30) at 
> ../gutil/strings/substitute.h:79
> #1  impala::FragmentState::FragmentState (this=0x35c78f40, 
> query_state=0x7972db00, fragment=..., 
> fragment_ctx= 0x35c78f88>) at fragment-state.cc:143
> #2  0x013019aa in impala::FragmentState::CreateFragmentStateMap 
> (fragment_info=..., exec_request=..., 
> state=state@entry=0x7972db00, fragment_map=...) at fragment-state.cc:47
> #3  0x01292d71 in impala::QueryState::StartFInstances 
> (this=this@entry=0x7972db00) at query-state.cc:820
> #4  0x01284810 in impala::QueryExecMgr::ExecuteQueryHelper 
> (this=0x11943b00, qs=0x7972db00)
> at query-exec-mgr.cc:162
> #5  0x01752915 in operator() (this=0x7f86cec7ab40)
> at 
> ../../../toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:770
> #6  impala::Thread::SuperviseThread(std::__cxx11::basic_string std::char_traits, std::allocator > const&, 
> std::__cxx11::basic_string, std::allocator 
> > const&, boost::function, impala::ThreadDebugInfo 

[jira] [Commented] (IMPALA-13085) Add warning and NULL out DECIMAL values in Iceberg metadata tables

2024-05-28 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850054#comment-17850054
 ] 

ASF subversion and git services commented on IMPALA-13085:
--

Commit 2e093bbc8ae06f89f17bbe57f41d5e91749572c4 in impala's branch 
refs/heads/master from Daniel Becker
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=2e093bbc8 ]

IMPALA-13085: Add warning and NULL out DECIMAL values in Iceberg metadata tables

DECIMAL values are not supported in Iceberg metadata tables and Impala
runs on a DCHECK and crashes if it encounters one.

Until this issue is properly fixed (see IMPALA-13080), this commit
introduces a temporary solution: DECIMAL values coming from Iceberg
metadata tables are NULLed out and a warning is issued.

Testing:
 - added a DECIMAL column to the 'iceberg_metadata_alltypes' test table,
   so querying the `files` metadata table will include a DECIMAL in the
   'readable_metrics' struct.

Change-Id: I0c8791805bc4fa2112e092e65366ca2815f3fa22
Reviewed-on: http://gerrit.cloudera.org:8080/21429
Reviewed-by: Daniel Becker 
Tested-by: Impala Public Jenkins 


> Add warning and NULL out DECIMAL values in Iceberg metadata tables
> --
>
> Key: IMPALA-13085
> URL: https://issues.apache.org/jira/browse/IMPALA-13085
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Daniel Becker
>Assignee: Daniel Becker
>Priority: Major
>  Labels: impala-iceberg
>
> IMPALA-13080 is about adding support for DECIMAL values in Iceberg metadata 
> tables. Until it is done, we should NULL out the values and issue a warning 
> instead of running on a DCHECK and crashing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13080) Add support for DECIMAL in Iceberg metadata tables

2024-05-28 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850055#comment-17850055
 ] 

ASF subversion and git services commented on IMPALA-13080:
--

Commit 2e093bbc8ae06f89f17bbe57f41d5e91749572c4 in impala's branch 
refs/heads/master from Daniel Becker
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=2e093bbc8 ]

IMPALA-13085: Add warning and NULL out DECIMAL values in Iceberg metadata tables

DECIMAL values are not supported in Iceberg metadata tables and Impala
runs on a DCHECK and crashes if it encounters one.

Until this issue is properly fixed (see IMPALA-13080), this commit
introduces a temporary solution: DECIMAL values coming from Iceberg
metadata tables are NULLed out and a warning is issued.

Testing:
 - added a DECIMAL column to the 'iceberg_metadata_alltypes' test table,
   so querying the `files` metadata table will include a DECIMAL in the
   'readable_metrics' struct.

Change-Id: I0c8791805bc4fa2112e092e65366ca2815f3fa22
Reviewed-on: http://gerrit.cloudera.org:8080/21429
Reviewed-by: Daniel Becker 
Tested-by: Impala Public Jenkins 


> Add support for DECIMAL in Iceberg metadata tables
> --
>
> Key: IMPALA-13080
> URL: https://issues.apache.org/jira/browse/IMPALA-13080
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Daniel Becker
>Priority: Major
>  Labels: impala-iceberg, ramp-up
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8042) Better selectivity estimate for BETWEEN

2024-05-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-8042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849437#comment-17849437
 ] 

ASF subversion and git services commented on IMPALA-8042:
-

Commit d0237fbe47eb5089ee19a0a201045b862d65ecaa in impala's branch 
refs/heads/master from Riza Suminto
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=d0237fbe4 ]

IMPALA-8042: Assign BETWEEN selectivity for discrete-unique column

Impala frontend can not evaluate BETWEEN/NOT BETWEEN predicate directly.
It needs to transform a BetweenPredicate into a CompoundPredicate
consisting of upper bound and lower bound BinaryPredicate through
BetweenToCompoundRule.java. The BinaryPredicate can then be pushed down
or rewritten into other form by another expression rewrite rule.
However, the selectivity of BetweenPredicate or its derivatives remains
unassigned and often collapses with other unknown selectivity predicates
to have collective selectivity equals Expr.DEFAULT_SELECTIVITY (0.1).

This patch adds a narrow optimization of BetweenPredicate selectivity
when the following criteria are met:

1. The BetweenPredicate is bound to a slot reference of a single column
   of a table.
2. The column type is discrete, such as INTEGER or DATE.
3. The column stats are available.
4. The column is sufficiently unique based on available stats.
5. The BETWEEN/NOT BETWEEN predicate is in good form (lower bound value
   <= upper bound value).
6. The final calculated selectivity is less than or equal to
   Expr.DEFAULT_SELECTIVITY.

If these criteria are unmet, the Planner will revert to the old
behavior, which is letting the selectivity unassigned.

Since this patch only target BetweenPredicate over unique column, the
following query will still have the default scan selectivity (0.1):

select count(*) from tpch.customer c
where c.c_custkey >= 1234 and c.c_custkey <= 2345;

While this equivalent query written with BETWEEN predicate will have
lower scan selectivity:

select count(*) from tpch.customer c
where c.c_custkey between 1234 and 2345;

This patch calculates the BetweenPredicate selectivity during
transformation at BetweenToCompoundRule.java. The selectivity is
piggy-backed into the resulting CompoundPredicate and BinaryPredicate as
betweenSelectivity_ field, separate from the selectivity_ field.
Analyzer.getBoundPredicates() is modified to prioritize the derived
BinaryPredicate over ordinary BinaryPredicate in its return value to
prevent the derived BinaryPredicate from being eliminated by a matching
ordinary BinaryPredicate.

Testing:
- Add table functional_parquet.unique_with_nulls.
- Add FE tests in ExprCardinalityTest#testBetweenSelectivity,
  ExprCardinalityTest#testNotBetweenSelectivity, and
  PlannerTest#testScanCardinality.
- Pass core tests.

Change-Id: Ib349d97349d1ee99788645a66be1b81749684d10
Reviewed-on: http://gerrit.cloudera.org:8080/21377
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Better selectivity estimate for BETWEEN
> ---
>
> Key: IMPALA-8042
> URL: https://issues.apache.org/jira/browse/IMPALA-8042
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 3.1.0
>Reporter: Paul Rogers
>Assignee: Riza Suminto
>Priority: Minor
>
> The analyzer rewrites a BETWEEN expression into a pair of inequalities.  
> IMPALA-8037 explains that the planner then groups all such non-quality 
> conditions together and assigns a selectivity of 0.1. IMPALA-8031 explains 
> that the analyzer should handle inequalities better.
> BETWEEN is a special case and informs the final result. If we assume a 
> selectivity of s for inequality, then BETWEEN should be something like s/2. 
> The intuition is that if c >= x includes, say, ⅓ of values, and c <= y 
> includes a third of values, then c BETWEEN x AND y should be a narrower set 
> of values, say ⅙.
> [Ramakrishnan an 
> Gherke|http://pages.cs.wisc.edu/~dbbook/openAccess/Minibase/optimizer/costformula.html\
>  recommend 0.4 for between, 0.3 for inequality, and 0.3^2 = 0.09 for the 
> general expression x <= c AND c <= Y. Note the discrepancy between the 
> compound inequality case and the BETWEEN case, likely reflecting the 
> additional information we obtain when the user chooses to use BETWEEN.
> To implement a special BETWEEN selectivity in Impala, we must remember the 
> selectivity of BETWEEN during the rewrite to a compound inequality.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13105) Multiple imported query profiles fail to import/clear at once

2024-05-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849436#comment-17849436
 ] 

ASF subversion and git services commented on IMPALA-13105:
--

Commit 8a6f2824b8abf53ea022ca571da33619d564a14a in impala's branch 
refs/heads/master from Surya Hebbar
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=8a6f2824b ]

IMPALA-13105: Fix multiple imported query profiles fail to import/clear at once

On importing multiple query profiles, insertion of the last query in the
queue fails as no delay is provided for the insertion.

This has been fixed by providing a delay after inserting the final query.

On clearing all the imported queries, in some instances page reloads
before clearing the IndexedDB object store.

This has been fixed by triggering the page reload after clearing
the object store succeeds.

Change-Id: I42470fecd0cff6e193f080102575e51d86a2d562
Reviewed-on: http://gerrit.cloudera.org:8080/21450
Reviewed-by: Wenzhe Zhou 
Reviewed-by: Riza Suminto 
Tested-by: Impala Public Jenkins 


> Multiple imported query profiles fail to import/clear at once
> -
>
> Key: IMPALA-13105
> URL: https://issues.apache.org/jira/browse/IMPALA-13105
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Surya Hebbar
>Assignee: Surya Hebbar
>Priority: Major
>
> When multiple query profiles are chosen at once, the last query profile in 
> the insertion queue fails as the page reloads without providing a delay for 
> inserting it.
>  
> The same behavior is seen when clearing all the query profiles.
>  
> This is mostly seen in Chromium based browsers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13034) Add logs for slow HTTP requests dumping the profile

2024-05-23 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849073#comment-17849073
 ] 

ASF subversion and git services commented on IMPALA-13034:
--

Commit b975165a0acfe37af302dd7c007360633df54917 in impala's branch 
refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=b975165a0 ]

IMPALA-13034: Add logs and counters for HTTP profile requests blocking client 
fetches

There are several endpoints in WebUI that can dump a query profile:
/query_profile, /query_profile_encoded, /query_profile_plain_text,
/query_profile_json. The HTTP handler thread goes into
ImpalaServer::GetRuntimeProfileOutput() which acquires lock of the
ClientRequestState. This could block client requests in fetching query
results.

To help identify this issue, this patch adds warning logs when such
profile dumping requests run slow and the query is still in-flight. Also
adds a profile counter, GetInFlightProfileTimeStats, for the summary
stats of this time. Dumping the profiles after the query is archived
(e.g. closed) won't be tracked.

Logs for slow http responses are also added. The thresholds are defined
by two new flags, slow_profile_dump_warning_threshold_ms, and
slow_http_response_warning_threshold_ms.

Note that dumping the profile in-flight won't always block the query,
e.g. if there are no client fetch requests or if the coordinator
fragment is idle waiting for executor fragment instances. So a long time
shown in GetInFlightProfileTimeStats doesn't mean it's hitting the
issue.

To better identify this issue, this patch adds another profile counter,
ClientFetchLockWaitTimer, as the cumulative time client fetch requests
waiting for locks.

Also fixes false positive logs for complaining invalid query handles.
Such logs are added in GetQueryHandle() when the query is not found in
the active query map, but it could still exist in the query log. This
removes the logs in GetQueryHandle() and lets the callers decide whether
to log the error.

Tests:
 - Added e2e test
 - Ran CORE tests

Change-Id: I538ebe914f70f460bc8412770a8f7a1cc8b505dc
Reviewed-on: http://gerrit.cloudera.org:8080/21412
Reviewed-by: Impala Public Jenkins 
Tested-by: Michael Smith 


> Add logs for slow HTTP requests dumping the profile
> ---
>
> Key: IMPALA-13034
> URL: https://issues.apache.org/jira/browse/IMPALA-13034
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
> Fix For: Impala 4.5.0
>
>
> There are several endpoints in WebUI that can dump a query profile: 
> /query_profile, /query_profile_encoded, /query_profile_plain_text, 
> /query_profile_json
> The HTTP handler thread goes into ImpalaServer::GetRuntimeProfileOutput() 
> which acquires lock of the ClientRequestState. This could blocks client 
> requests in fetching query results. We should add warning logs when such HTTP 
> requests run slow (e.g. when the profile is too large to download in a short 
> time). IP address and other info of such requests should also be logged.
> Related codes:
> https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-server.cc#L736
> https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-beeswax-server.cc#L601
> https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-hs2-server.cc#L207



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13102) Loading tables with illegal stats failed

2024-05-23 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849072#comment-17849072
 ] 

ASF subversion and git services commented on IMPALA-13102:
--

Commit e35f8183cb1ba069ae00ee93e71451eccd505d0a in impala's branch 
refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=e35f8183c ]

IMPALA-13102: Normalize invalid column stats from HMS

Column stats like numDVs, numNulls in HMS could have arbitrary values.
Impala expects them to be non-negative or -1 for unknown. So loading
tables with invalid stats values (<-1) will fail.

This patch adds logic to normalize the stats values. If the value < -1,
use -1 for it and add corresponding warning logs. Also refactor some
redundant codes in ColumnStats.

Tests:
 - Add e2e test

Change-Id: If6216e3d6e73a529a9b3a8c0ea9d22727ab43f1a
Reviewed-on: http://gerrit.cloudera.org:8080/21445
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Loading tables with illegal stats failed
> 
>
> Key: IMPALA-13102
> URL: https://issues.apache.org/jira/browse/IMPALA-13102
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
>
> When the table has illegal stats, e.g. numDVs=-100, Impala can't load the 
> table. So DROP STATS or DROP TABLE can't be perform on the table.
> {code:sql}
> [localhost:21050] default> drop stats alltypes_bak;
> Query: drop stats alltypes_bak
> ERROR: AnalysisException: Failed to load metadata for table: 'alltypes_bak'
> CAUSED BY: TableLoadingException: Failed to load metadata for table: 
> default.alltypes_bak
> CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, 
> avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, 
> numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1}{code}
> We should allow at least dropping the stats or dropping the table. So user 
> can use Impala to recover the stats.
> Stacktrace in the logs:
> {noformat}
> I0520 08:00:56.661746 17543 jni-util.cc:321] 
> 5343142d1173494f:44dcde8c] 
> org.apache.impala.common.AnalysisException: Failed to load metadata for 
> table: 'alltypes_bak'
> at 
> org.apache.impala.analysis.Analyzer.resolveTableRef(Analyzer.java:974)
> at 
> org.apache.impala.analysis.DropStatsStmt.analyze(DropStatsStmt.java:94)
> at 
> org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:551)
> at 
> org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:498)
> at 
> org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2542)
> at 
> org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2224)
> at 
> org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1985)
> at 
> org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175)
> Caused by: org.apache.impala.catalog.TableLoadingException: Failed to load 
> metadata for table: default.alltypes_bak
> CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, 
> avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, 
> numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1}
> at 
> org.apache.impala.catalog.IncompleteTable.loadFromThrift(IncompleteTable.java:162)
> at org.apache.impala.catalog.Table.fromThrift(Table.java:586)
> at 
> org.apache.impala.catalog.ImpaladCatalog.addTable(ImpaladCatalog.java:479)
> at 
> org.apache.impala.catalog.ImpaladCatalog.addCatalogObject(ImpaladCatalog.java:334)
> at 
> org.apache.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:262)
> at 
> org.apache.impala.service.FeCatalogManager$CatalogdImpl.updateCatalogCache(FeCatalogManager.java:114)
> at 
> org.apache.impala.service.Frontend.updateCatalogCache(Frontend.java:585)
> at 
> org.apache.impala.service.JniFrontend.updateCatalogCache(JniFrontend.java:196)
> at .: 
> org.apache.impala.catalog.TableLoadingException: Failed to load metadata for 
> table: default.alltypes_bak
> at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1318)
> at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1213)
> at org.apache.impala.catalog.TableLoader.load(TableLoader.java:145)
> at 
> org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:251)
> at 
> org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:247)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> 

[jira] [Commented] (IMPALA-11735) Handle CREATE_TABLE event when the db is invisible to the impala server user

2024-05-23 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848916#comment-17848916
 ] 

ASF subversion and git services commented on IMPALA-11735:
--

Commit 9672312015be959360795a8af0843fdf386b557c in impala's branch 
refs/heads/master from Sai Hemanth Gantasala
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=967231201 ]

IMPALA-11735: Handle CREATE_TABLE event when the db is invisible to the
impala server user

It's possible that some dbs are invisible to Impala cluster due to
authorization restrictions. However, the CREATE_TABLE events in such
dbs will lead the event-processor into ERROR state. Event processor
should ignore such CREAT_TABLE events when database is not found.

note: This is an incorrect setup, where 'impala' super user is denied
access on the metadata object database but given access to fetch events
from notification log table of metastore.

Testing:
- Manually verified this on local cluster.
- Added automated unit test to verify the same.

Change-Id: I90275bb8c065fc5af61186901ac7e9839a68c43b
Reviewed-on: http://gerrit.cloudera.org:8080/21188
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Handle CREATE_TABLE event when the db is invisible to the impala server user
> 
>
> Key: IMPALA-11735
> URL: https://issues.apache.org/jira/browse/IMPALA-11735
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Quanlong Huang
>Assignee: Sai Hemanth Gantasala
>Priority: Critical
>
> It's possible that some dbs are invisible to Impala cluster due to 
> authorization restrictions. However, the CREATE_TABLE events in such dbs will 
> lead the event-processor into ERROR state:
> {noformat}
> E1026 03:02:30.650302 116774 MetastoreEventsProcessor.java:684] Unexpected 
> exception received while processing event
> Java exception follows:
> org.apache.impala.catalog.events.MetastoreNotificationException: EventId: 
> 184240416 EventType: CREATE_TABLE Unable to process event
> at 
> org.apache.impala.catalog.events.MetastoreEvents$CreateTableEvent.process(MetastoreEvents.java:735)
> at 
> org.apache.impala.catalog.events.MetastoreEvents$MetastoreEvent.processIfEnabled(MetastoreEvents.java:345)
> at 
> org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:772)
> at 
> org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:670)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> E1026 03:02:30.650447 116774 MetastoreEventsProcessor.java:795] Notification 
> event is null
> {noformat}
> It should be handled (e.g. ignored) and reported to the admin (e.g. in logs).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13083) Clarify REASON_MEM_LIMIT_TOO_LOW_FOR_RESERVATION error message

2024-05-23 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848917#comment-17848917
 ] 

ASF subversion and git services commented on IMPALA-13083:
--

Commit 98739a84557a209e05694abd79f62f7f7daf8777 in impala's branch 
refs/heads/master from Riza Suminto
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=98739a845 ]

IMPALA-13083: Clarify REASON_MEM_LIMIT_TOO_LOW_FOR_RESERVATION

This patch improves REASON_MEM_LIMIT_TOO_LOW_FOR_RESERVATION error
message by saying the specific configuration that must be adjusted such
that the query can pass the Admission Control. New fields
'per_backend_mem_to_admit_source' and
'coord_backend_mem_to_admit_source' of type MemLimitSourcePB are added
into QuerySchedulePB. These fields explain what limiting factor drives
final numbers at 'per_backend_mem_to_admit' and
'coord_backend_mem_to_admit' respectively. In turn, Admission Control
will use this information to compose a more informative error message
that the user can act upon. The new error message pattern also
explicitly mentions "Per Host Min Memory Reservation" as a place to look
at to investigate memory reservations scheduled for each backend node.

Updated documentation with examples of query rejection by Admission
Control and how to read the error message.

Testing:
- Add BE tests at admission-controller-test.cc
- Adjust and pass affected EE tests

Change-Id: I1ef7fb7e7a194b2036c2948639a06c392590bf66
Reviewed-on: http://gerrit.cloudera.org:8080/21436
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Clarify REASON_MEM_LIMIT_TOO_LOW_FOR_RESERVATION error message
> --
>
> Key: IMPALA-13083
> URL: https://issues.apache.org/jira/browse/IMPALA-13083
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Distributed Exec
>Reporter: Riza Suminto
>Assignee: Riza Suminto
>Priority: Major
>
> REASON_MEM_LIMIT_TOO_LOW_FOR_RESERVATION error message is too vague for 
> user/administrator to make necessary adjustment to run query that is rejected 
> by admission-controller.
> {code:java}
> const string REASON_MEM_LIMIT_TOO_LOW_FOR_RESERVATION =
> "minimum memory reservation is greater than memory available to the query 
> for buffer "
> "reservations. Memory reservation needed given the current plan: $0. 
> Adjust either "
> "the mem_limit or the pool config (max-query-mem-limit, 
> min-query-mem-limit) for the "
> "query to allow the query memory limit to be at least $1. Note that 
> changing the "
> "mem_limit may also change the plan. See the query profile for more 
> information "
> "about the per-node memory requirements.";
> {code}
> There are many config and options that directly and indirectly clamp 
> schedule.per_backend_mem_limit() and schedule.per_backend_mem_to_admit().
> [https://github.com/apache/impala/blob/3b35ddc8ca7b0e540fc16c413a170a25e164462b/be/src/scheduling/schedule-state.cc#L262-L361]
> Ideally, this error message should clearly mention which query option / llama 
> config / backend flag that influence per_backend_mem_limit decision so that 
> user can make directly make adjustment on that config. It should also clearly 
> mention 'Per Host Min Memory Reservation' info string at query profile 
> instead of just 'per-node memory requirements'.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13040) SIGSEGV in QueryState::UpdateFilterFromRemote

2024-05-22 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848461#comment-17848461
 ] 

ASF subversion and git services commented on IMPALA-13040:
--

Commit aa01079478773aed28c9a4d8b07c062202de698d in impala's branch 
refs/heads/master from Riza Suminto
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=aa0107947 ]

IMPALA-13040: (addendum) Inject larger delay for sanitized build

TestLateQueryStateInit has been flaky in sanitized build because the
largest delay injection time is fixed at 3 seconds. This patch fixes
the issue by setting largest delay injection time equal to
RUNTIME_FILTER_WAIT_TIME_MS, which is 3 second for regular build and 10
seconds for sanitized build.

Testing:
- Loop and pass test_runtime_filter_aggregation.py 10 times in ASAN
  build and 50 times in UBSAN build.

Change-Id: I09e5ae4646f53632e9a9f519d370a33a5534df19
Reviewed-on: http://gerrit.cloudera.org:8080/21439
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> SIGSEGV in  QueryState::UpdateFilterFromRemote
> --
>
> Key: IMPALA-13040
> URL: https://issues.apache.org/jira/browse/IMPALA-13040
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Csaba Ringhofer
>Assignee: Riza Suminto
>Priority: Critical
> Fix For: Impala 4.5.0
>
>
> {code}
> Crash reason:  SIGSEGV /SEGV_MAPERR
> Crash address: 0x48
> Process uptime: not available
> Thread 114 (crashed)
>  0  libpthread.so.0 + 0x9d00
> rax = 0x00019e57ad00   rdx = 0x2a656720
> rcx = 0x059a9860   rbx = 0x
> rsi = 0x00019e57ad00   rdi = 0x0038
> rbp = 0x7f6233d544e0   rsp = 0x7f6233d544a8
>  r8 = 0x06a53540r9 = 0x0039
> r10 = 0x   r11 = 0x000a
> r12 = 0x00019e57ad00   r13 = 0x7f62a2f997d0
> r14 = 0x7f6233d544f8   r15 = 0x1632c0f0
> rip = 0x7f62a2f96d00
> Found by: given as instruction pointer in context
>  1  
> impalad!impala::QueryState::UpdateFilterFromRemote(impala::UpdateFilterParamsPB
>  const&, kudu::rpc::RpcContext*) [query-state.cc : 1033 + 0x5]
> rbp = 0x7f6233d54520   rsp = 0x7f6233d544f0
> rip = 0x015c0837
> Found by: previous frame's frame pointer
>  2  
> impalad!impala::DataStreamService::UpdateFilterFromRemote(impala::UpdateFilterParamsPB
>  const*, impala::UpdateFilterResultPB*, kudu::rpc::RpcContext*) 
> [data-stream-service.cc : 134 + 0xb]
> rbp = 0x7f6233d54640   rsp = 0x7f6233d54530
> rip = 0x017c05de
> Found by: previous frame's frame pointer
> {code}
> The line that crashes is 
> https://github.com/apache/impala/blob/b39cd79ae84c415e0aebec2c2b4d7690d2a0cc7a/be/src/runtime/query-state.cc#L1033
> My guess is that inside the actual segfault is within WaitForPrepare() but it 
> was inlined. Not sure if a remote filter can arrive even before 
> QueryState::Init is finished - that would explain the issue, as 
> instances_prepared_barrier_ is not yet created at that point.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12800) Queries with many nested inline views see performance issues with ExprSubstitutionMap

2024-05-22 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848462#comment-17848462
 ] 

ASF subversion and git services commented on IMPALA-12800:
--

Commit ae6846b1cd039b2cd6f8753ce3ff810c5b2d3ce3 in impala's branch 
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=ae6846b1c ]

IMPALA-12800: Skip O(n^2) ExprSubstitutionMap::verify() for release builds

ExprSubstitutionMap::compose() and combine() call verify() to
check the new ExprSubstitutionMap for duplicates. This algorithm
is O(n^2) and can add significant overhead to SQLs with a large
number of expressions or inline views. This changes verify() to
skip the check for release builds (keeping it for debug builds).

In a query with 20+ layers of inline views and thousands of
expressions, turning off the verify() call cuts the execution
time from 51 minutes to 18 minutes.

This doesn't fully solve slowness in ExprSubstitutionMap.
Further improvement would require Expr to support hash-based
algorithms, which is a much larger change.

Testing:
 - Manual performance comparison with/without the verify() call

Change-Id: Ieeacfec6a5b487076ce5b19747319630616411f0
Reviewed-on: http://gerrit.cloudera.org:8080/21444
Reviewed-by: Joe McDonnell 
Tested-by: Impala Public Jenkins 


> Queries with many nested inline views see performance issues with 
> ExprSubstitutionMap
> -
>
> Key: IMPALA-12800
> URL: https://issues.apache.org/jira/browse/IMPALA-12800
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 4.3.0
>Reporter: Joe McDonnell
>Priority: Critical
> Attachments: impala12800repro.sql, impala12800schema.sql, 
> long_query_jstacks.tar.gz
>
>
> A user running a query with many layers of inline views saw a large amount of 
> time spent in analysis. 
>  
> {noformat}
> - Authorization finished (ranger): 7s518ms (13.134ms)
> - Value transfer graph computed: 7s760ms (241.953ms)
> - Single node plan created: 2m47s (2m39s)
> - Distributed plan created: 2m47s (7.430ms)
> - Lineage info computed: 2m47s (39.017ms)
> - Planning finished: 2m47s (672.518ms){noformat}
> In reproducing it locally, we found that most of the stacks end up in 
> ExprSubstitutionMap.
>  
> Here are the main stacks seen while running jstack every 3 seconds during a 
> 75 second execution:
> Location 1: (ExprSubstitutionMap::compose -> contains -> indexOf -> Expr 
> equals) (4 samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at org.apache.impala.analysis.Expr.equals(Expr.java:1008)
>     at java.util.ArrayList.indexOf(ArrayList.java:323)
>     at java.util.ArrayList.contains(ArrayList.java:306)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.compose(ExprSubstitutionMap.java:120){noformat}
> Location 2:  (ExprSubstitutionMap::compose -> verify -> Expr equals) (9 
> samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at org.apache.impala.analysis.Expr.equals(Expr.java:1008)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.verify(ExprSubstitutionMap.java:173)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.compose(ExprSubstitutionMap.java:126){noformat}
> Location 3: (ExprSubstitutionMap::combine -> verify -> Expr equals) (5 
> samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at org.apache.impala.analysis.Expr.equals(Expr.java:1008)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.verify(ExprSubstitutionMap.java:173)
>     at 
> org.apache.impala.analysis.ExprSubstitutionMap.combine(ExprSubstitutionMap.java:143){noformat}
> Location 4:  (TupleIsNullPredicate.wrapExprs ->  Analyzer.isTrueWithNullSlots 
> -> FeSupport.EvalPredicate -> Thrift serialization) (4 samples)
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at java.lang.StringCoding.encode(StringCoding.java:364)
>     at java.lang.String.getBytes(String.java:941)
>     at 
> org.apache.thrift.protocol.TBinaryProtocol.writeString(TBinaryProtocol.java:227)
>     at 
> org.apache.impala.thrift.TClientRequest$TClientRequestStandardScheme.write(TClientRequest.java:532)
>     at 
> org.apache.impala.thrift.TClientRequest$TClientRequestStandardScheme.write(TClientRequest.java:467)
>     at org.apache.impala.thrift.TClientRequest.write(TClientRequest.java:394)
>     at 
> org.apache.impala.thrift.TQueryCtx$TQueryCtxStandardScheme.write(TQueryCtx.java:3034)
>     at 
> org.apache.impala.thrift.TQueryCtx$TQueryCtxStandardScheme.write(TQueryCtx.java:2709)
>     at org.apache.impala.thrift.TQueryCtx.write(TQueryCtx.java:2400)
>     at org.apache.thrift.TSerializer.serialize(TSerializer.java:84)
>     at 
> 

[jira] [Commented] (IMPALA-13079) Add support for FLOAT/DOUBLE in Iceberg metadata tables

2024-05-21 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848164#comment-17848164
 ] 

ASF subversion and git services commented on IMPALA-13079:
--

Commit e5fdcb4f4b7e2f37e5f7bb357eede8092de8f429 in impala's branch 
refs/heads/master from Daniel Becker
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=e5fdcb4f4 ]

IMPALA-13091: query_test.test_iceberg.TestIcebergV2Table.test_metadata_tables 
fails on an expected constant

IMPALA-13079 added a test in iceberg-metadata-tables.test that included
assertions about values that can change across builds, e.g. file sizes,
which caused test failures.

This commit fixes it by doing two things:
1. narrowing down the result set of the query to the column that the
   test is really about - this removes some of the problematic values
2. using regexes for the remaining problematic values.

Change-Id: Ic056079eed87a68afa95cd111ce2037314cd9620
Reviewed-on: http://gerrit.cloudera.org:8080/21440
Tested-by: Impala Public Jenkins 
Reviewed-by: Riza Suminto 


> Add support for FLOAT/DOUBLE in Iceberg metadata tables
> ---
>
> Key: IMPALA-13079
> URL: https://issues.apache.org/jira/browse/IMPALA-13079
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Daniel Becker
>Assignee: Daniel Becker
>Priority: Major
>  Labels: impala-iceberg
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13091) query_test.test_iceberg.TestIcebergV2Table.test_metadata_tables fails on an expected constant

2024-05-21 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848163#comment-17848163
 ] 

ASF subversion and git services commented on IMPALA-13091:
--

Commit e5fdcb4f4b7e2f37e5f7bb357eede8092de8f429 in impala's branch 
refs/heads/master from Daniel Becker
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=e5fdcb4f4 ]

IMPALA-13091: query_test.test_iceberg.TestIcebergV2Table.test_metadata_tables 
fails on an expected constant

IMPALA-13079 added a test in iceberg-metadata-tables.test that included
assertions about values that can change across builds, e.g. file sizes,
which caused test failures.

This commit fixes it by doing two things:
1. narrowing down the result set of the query to the column that the
   test is really about - this removes some of the problematic values
2. using regexes for the remaining problematic values.

Change-Id: Ic056079eed87a68afa95cd111ce2037314cd9620
Reviewed-on: http://gerrit.cloudera.org:8080/21440
Tested-by: Impala Public Jenkins 
Reviewed-by: Riza Suminto 


> query_test.test_iceberg.TestIcebergV2Table.test_metadata_tables fails on an 
> expected constant
> -
>
> Key: IMPALA-13091
> URL: https://issues.apache.org/jira/browse/IMPALA-13091
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 4.5.0
>Reporter: Laszlo Gaal
>Assignee: Daniel Becker
>Priority: Critical
>  Labels: impala-iceberg
>
> This fails in various sanitizer builds (ASAN, UBSAN):
> Failure report:{code}
> query_test/test_iceberg.py:1527: in test_metadata_tables
> '$OVERWRITE_SNAPSHOT_TS': str(overwrite_snapshot_ts.data[0])})
> common/impala_test_suite.py:820: in run_test_case
> self.__verify_results_and_errors(vector, test_section, result, use_db)
> common/impala_test_suite.py:627: in __verify_results_and_errors
> replace_filenames_with_placeholder)
> common/test_result_verifier.py:520: in verify_raw_results
> VERIFIER_MAP[verifier](expected, actual)
> common/test_result_verifier.py:313: in verify_query_result_is_equal
> assert expected_results == actual_results
> E   assert Comparing QueryTestResults (expected vs actual):
> E 
> 0,regex:'.*\.parquet','PARQUET',0,3,3648,'{1:32,2:63,3:71,4:43,5:55,6:47,7:39,8:58,9:47,13:63,14:96,15:75,16:78}','{1:3,2:3,3:3,4:3,5:3,6:3,7:3,8:3,9:3,13:3,14:6,15:6,16:6}','{1:1,2:0,3:0,4:0,5:0,6:1,7:1,8:1,9:1,13:0,14:0,15:0,16:0}','{16:0,4:1,5:1,14:0}','{1:"AA==",2:"AQ==",3:"9v8=",4:"/+ZbLw==",5:"MAWO5C7/O6s=",6:"AFgLImsYBgA=",7:"kU0AAA==",8:"QSBzdHJpbmc=",9:"YmluMQ==",13:"av///w==",14:"fcOUJa1JwtQ=",16:"Pw=="}','{1:"AQ==",2:"BQ==",3:"lgA=",4:"qV/jWA==",5:"fcOUJa1JwlQ=",6:"AMhZw6A3BgA=",7:"Hk8AAA==",8:"U29tZSBzdHJpbmc=",9:"YmluMg==",13:"Cg==",14:"NEA=",16:"AAB6RA=="}','NULL','[4]','NULL',0,'{"arr.element":{"column_size":96,"value_count":6,"null_value_count":0,"nan_value_count":0,"lower_bound":-2e+100,"upper_bound":20},"b":{"column_size":32,"value_count":3,"null_value_count":1,"nan_value_count":null,"lower_bound":false,"upper_bound":true},"bn":{"column_size":47,"value_count":3,"null_value_count":1,"nan_value_count":null,"lower_bound":"YmluMQ==","upper_bound":"YmluMg=="},"d":{"column_size":55,"value_count":3,"null_value_count":0,"nan_value_count":1,"lower_bound":-2e-100,"upper_bound":2e+100},"dt":{"column_size":39,"value_count":3,"null_value_count":1,"nan_value_count":null,"lower_bound":"2024-05-14","upper_bound":"2025-06-15"},"f":{"column_size":43,"value_count":3,"null_value_count":0,"nan_value_count":1,"lower_bound":2.00026702864e-10,"upper_bound":199973982208},"i":{"column_size":63,"value_count":3,"null_value_count":0,"nan_value_count":null,"lower_bound":1,"upper_bound":5},"l":{"column_size":71,"value_count":3,"null_value_count":0,"nan_value_count":null,"lower_bound":-10,"upper_bound":150},"mp.key":{"column_size":75,"value_count":6,"null_value_count":0,"nan_value_count":null,"lower_bound":null,"upper_bound":null},"mp.value":{"column_size":78,"value_count":6,"null_value_count":0,"nan_value_count":0,"lower_bound":0.5,"upper_bound":1000},"s":{"column_size":58,"value_count":3,"null_value_count":1,"nan_value_count":null,"lower_bound":"A
>  string","upper_bound":"Some 
> string"},"strct.i":{"column_size":63,"value_count":3,"null_value_count":0,"nan_value_count":null,"lower_bound":-150,"upper_bound":10},"ts":{"column_size":47,"value_count":3,"null_value_count":1,"nan_value_count":null,"lower_bound":"2024-05-14
>  14:51:12","upper_bound":"2025-06-15 18:51:12"}}' != 
> 

[jira] [Commented] (IMPALA-12362) Improve Linux packaging support.

2024-05-21 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848162#comment-17848162
 ] 

ASF subversion and git services commented on IMPALA-12362:
--

Commit a5e5aa16d887faedee4eea1bc809fba41d758f5b in impala's branch 
refs/heads/branch-3.4.2 from Xiang Yang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=a5e5aa16d ]

IMPALA-12362: (part-4/4) Refactor linux packaging related cmake files.

Independent linux packaging related content to package/CMakeLists.txt
to make it more clearly.

This patch also add LICENSE and NOTICE file in the final package.

Testing:
 - Manually deploy package on Ubuntu22.04 and verify it.

Backport note for 3.4.x:
 - Resolved conflicts in CMakeLists.txt and modified
   package/CMakeLists.txt accordingly.

Change-Id: If3914dcda69f81a735cdf70d76c59fa09454777b
Reviewed-on: http://gerrit.cloudera.org:8080/20263
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
Reviewed-on: http://gerrit.cloudera.org:8080/21410
Reviewed-by: Xiang Yang 
Reviewed-by: Zihao Ye 
Tested-by: Quanlong Huang 


> Improve Linux packaging support.
> 
>
> Key: IMPALA-12362
> URL: https://issues.apache.org/jira/browse/IMPALA-12362
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: XiangYang
>Assignee: XiangYang
>Priority: Major
> Fix For: Impala 4.4.0
>
>
> including:
> (part-1/4) Refactor service management scripts.
> (part-2/4) Optimize default configurations for packaging module.
> (part-3/4) Add admissiond service and impala-profile-tool to packaging module.
> (part-4/4) Refactor linux packaging related cmake files, add LICENSE and 
> NOTICE files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13020) catalog-topic updates >2GB do not work due to Thrift's max message size

2024-05-19 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847651#comment-17847651
 ] 

ASF subversion and git services commented on IMPALA-13020:
--

Commit c8415513158842e2ddb1d64891298d76fb0b367f in impala's branch 
refs/heads/branch-4.4.0 from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=c84155131 ]

IMPALA-13020 (part 1): Change thrift_rpc_max_message_size to int64_t

Thrift 0.16.0 introduced a max message size to protect
receivers against a malicious message allocating large
amounts of memory. That limit is a 32-bit signed integer,
so the max value is 2GB. Impala introduced the
thrift_rpc_max_message_size startup option to set that
for Impala's thrift servers.

There are times when Impala wants to send a message that
is larger than 2GB. In particular, the catalog-update
topic for the statestore can exceed 2GBs when there is
a lot of metadata loaded using the old v1 catalog. When
there is a 2GB max message size, the statestore can create
and send a >2GB message, but the impalads will reject
it. This can lead to impalads having stale metadata.

This switches to a patched Thrift that uses an int64_t
for the max message size for C++ code. It does not modify
the limit.

The MaxMessageSize error was being swallowed in TAcceptQueueServer.cpp,
so this fixes that location to always print MaxMessageSize
exceptions.

This is only patching the Thrift C++ library. It does not
patch the Thrift Java library. There are a few reasons for
that:
 - This specific issue involves C++ to C++ communication and
   will be solved by patching the C++ library.
 - C++ is easy to patch as it is built via the native-toolchain.
   There is no corresponding mechanism for patching our Java
   dependencies (though one could be developed).
 - Java modifications have implications for other dependencies
   like Hive which use Thrift to communicate with HMS.
For the Java code that uses max message size, this converts
the 64-bit value to 32-bit value by capping the value at
Integer.MAX_VALUE.

Testing:
 - Added enough tables to produce a >2GB catalog-topic and
   restarted an impalad with a higher limit specific. Without
   the patch, the catalog-topic update would be rejected by the
   impalad. With the patch, it succeeds.

Change-Id: I681b1849cc565dcb25de8c070c18776ce69cbb87
Reviewed-on: http://gerrit.cloudera.org:8080/21367
Reviewed-by: Michael Smith 
Reviewed-by: Joe McDonnell 
Tested-by: Joe McDonnell 
(cherry picked from commit 13df8239d82a61afc3196295a7878ca2ffe91873)


> catalog-topic updates >2GB do not work due to Thrift's max message size
> ---
>
> Key: IMPALA-13020
> URL: https://issues.apache.org/jira/browse/IMPALA-13020
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.2.0, Impala 4.3.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Critical
> Fix For: Impala 4.5.0
>
>
> Thrift 0.16.0 added a max message size to protect against malicious packets 
> that can consume a large amount of memory on the receiver side. This max 
> message size is a signed 32-bit integer, so it maxes out at 2GB (which we set 
> via thrift_rpc_max_message_size).
> In catalog v1, the catalog-update statestore topic can become larger than 2GB 
> when there are a large number of tables / partitions / files. If this happens 
> and an Impala coordinator needs to start up (or needs a full topic update for 
> any other reason), it is expecting the statestore to send it the full topic 
> update, but the coordinator actually can't process the message. The 
> deserialization of the message hits the 2GB max message size limit and fails.
> On the statestore side, it shows this message:
> {noformat}
> I0418 16:54:51.727290 3844140 statestore.cc:507] Preparing initial 
> catalog-update topic update for 
> impa...@mcdonnellthrift.vpc.cloudera.com:27000. Size = 2.27 GB
> I0418 16:54:53.889446 3844140 thrift-util.cc:198] TSocket::write_partial() 
> send() : Broken pipe
> I0418 16:54:53.889488 3844140 client-cache.cc:82] ReopenClient(): re-creating 
> client for mcdonnellthrift.vpc.cloudera.com:23000
> I0418 16:54:53.889493 3844140 thrift-util.cc:198] TSocket::write_partial() 
> send() : Broken pipe
> I0418 16:54:53.889503 3844140 thrift-client.cc:116] Error closing connection 
> to: mcdonnellthrift.vpc.cloudera.com:23000, ignoring (write() send(): Broken 
> pipe)
> I0418 16:54:56.052882 3844140 thrift-util.cc:198] TSocket::write_partial() 
> send() : Broken pipe
> I0418 16:54:56.052932 3844140 client-cache.h:363] RPC Error: Client for 
> mcdonnellthrift.vpc.cloudera.com:23000 hit an unexpected exception: write() 
> send(): Broken pipe, type: N6apache6thrift9transport19TTransportExceptionE, 
> 

[jira] [Commented] (IMPALA-13020) catalog-topic updates >2GB do not work due to Thrift's max message size

2024-05-19 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847652#comment-17847652
 ] 

ASF subversion and git services commented on IMPALA-13020:
--

Commit c9745fd5b941f52b3cd3496c425722fcbbffe894 in impala's branch 
refs/heads/branch-4.4.0 from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=c9745fd5b ]

IMPALA-13020 (part 2): Split out external vs internal Thrift max message size

The Thrift max message size is designed to protect against malicious
messages that consume a lot of memory on the receiver. This is an
important security measure for externally facing services, but it
can interfere with internal communication within the cluster.
Currently, the max message size is controlled by a single startup
flag for both. This puts tensions between having a low value to
protect against malicious messages versus having a high value to
avoid issues with internal communication (e.g. large statestore
updates).

This introduces a new flag thrift_external_rpc_max_message_size to
specify the limit for externally-facing services. The current
thrift_rpc_max_message_size now applies only for internal services.
Splitting them apart allows setting a much higher value for
internal services (64GB) while leaving the externally facing services
using the current 2GB limit.

This modifies various code locations that wrap a Thrift transport to
pass in the original transport's TConfiguration. This also adds DCHECKs
to make sure that the new transport inherits the max message size. This
limits the locations where we actually need to set max message size.

ThriftServer/ThriftServerBuilder have a setting "is_external_facing"
which can be specified on each ThriftServer. This modifies statestore
and catalog to set is_external_facing to false. All other servers stay
with the default of true.

Testing:
 - This adds a test case to verify that is_external_facing uses the
   higher limit.
 - Ran through the steps in testdata/scale_test_metadata/README.md
   and updated the value in that doc.
 - Created many tables to push the catalog-update topic to be >2GB
   and verified that statestore successfully sends it when an impalad
   restarts.

Change-Id: Ib9a649ef49a8a99c7bd9a1b73c37c4c621661311
Reviewed-on: http://gerrit.cloudera.org:8080/21420
Tested-by: Impala Public Jenkins 
Reviewed-by: Riza Suminto 
Reviewed-by: Michael Smith 
(cherry picked from commit bcff4df6194b2f192d937bb9c031721feccb69df)


> catalog-topic updates >2GB do not work due to Thrift's max message size
> ---
>
> Key: IMPALA-13020
> URL: https://issues.apache.org/jira/browse/IMPALA-13020
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.2.0, Impala 4.3.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Critical
> Fix For: Impala 4.5.0
>
>
> Thrift 0.16.0 added a max message size to protect against malicious packets 
> that can consume a large amount of memory on the receiver side. This max 
> message size is a signed 32-bit integer, so it maxes out at 2GB (which we set 
> via thrift_rpc_max_message_size).
> In catalog v1, the catalog-update statestore topic can become larger than 2GB 
> when there are a large number of tables / partitions / files. If this happens 
> and an Impala coordinator needs to start up (or needs a full topic update for 
> any other reason), it is expecting the statestore to send it the full topic 
> update, but the coordinator actually can't process the message. The 
> deserialization of the message hits the 2GB max message size limit and fails.
> On the statestore side, it shows this message:
> {noformat}
> I0418 16:54:51.727290 3844140 statestore.cc:507] Preparing initial 
> catalog-update topic update for 
> impa...@mcdonnellthrift.vpc.cloudera.com:27000. Size = 2.27 GB
> I0418 16:54:53.889446 3844140 thrift-util.cc:198] TSocket::write_partial() 
> send() : Broken pipe
> I0418 16:54:53.889488 3844140 client-cache.cc:82] ReopenClient(): re-creating 
> client for mcdonnellthrift.vpc.cloudera.com:23000
> I0418 16:54:53.889493 3844140 thrift-util.cc:198] TSocket::write_partial() 
> send() : Broken pipe
> I0418 16:54:53.889503 3844140 thrift-client.cc:116] Error closing connection 
> to: mcdonnellthrift.vpc.cloudera.com:23000, ignoring (write() send(): Broken 
> pipe)
> I0418 16:54:56.052882 3844140 thrift-util.cc:198] TSocket::write_partial() 
> send() : Broken pipe
> I0418 16:54:56.052932 3844140 client-cache.h:363] RPC Error: Client for 
> mcdonnellthrift.vpc.cloudera.com:23000 hit an unexpected exception: write() 
> send(): Broken pipe, type: N6apache6thrift9transport19TTransportExceptionE, 
> rpc: N6impala20TUpdateStateResponseE, send: not done
> I0418 16:54:56.052937 

[jira] [Commented] (IMPALA-13020) catalog-topic updates >2GB do not work due to Thrift's max message size

2024-05-17 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847419#comment-17847419
 ] 

ASF subversion and git services commented on IMPALA-13020:
--

Commit 13df8239d82a61afc3196295a7878ca2ffe91873 in impala's branch 
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=13df8239d ]

IMPALA-13020 (part 1): Change thrift_rpc_max_message_size to int64_t

Thrift 0.16.0 introduced a max message size to protect
receivers against a malicious message allocating large
amounts of memory. That limit is a 32-bit signed integer,
so the max value is 2GB. Impala introduced the
thrift_rpc_max_message_size startup option to set that
for Impala's thrift servers.

There are times when Impala wants to send a message that
is larger than 2GB. In particular, the catalog-update
topic for the statestore can exceed 2GBs when there is
a lot of metadata loaded using the old v1 catalog. When
there is a 2GB max message size, the statestore can create
and send a >2GB message, but the impalads will reject
it. This can lead to impalads having stale metadata.

This switches to a patched Thrift that uses an int64_t
for the max message size for C++ code. It does not modify
the limit.

The MaxMessageSize error was being swallowed in TAcceptQueueServer.cpp,
so this fixes that location to always print MaxMessageSize
exceptions.

This is only patching the Thrift C++ library. It does not
patch the Thrift Java library. There are a few reasons for
that:
 - This specific issue involves C++ to C++ communication and
   will be solved by patching the C++ library.
 - C++ is easy to patch as it is built via the native-toolchain.
   There is no corresponding mechanism for patching our Java
   dependencies (though one could be developed).
 - Java modifications have implications for other dependencies
   like Hive which use Thrift to communicate with HMS.
For the Java code that uses max message size, this converts
the 64-bit value to 32-bit value by capping the value at
Integer.MAX_VALUE.

Testing:
 - Added enough tables to produce a >2GB catalog-topic and
   restarted an impalad with a higher limit specific. Without
   the patch, the catalog-topic update would be rejected by the
   impalad. With the patch, it succeeds.

Change-Id: I681b1849cc565dcb25de8c070c18776ce69cbb87
Reviewed-on: http://gerrit.cloudera.org:8080/21367
Reviewed-by: Michael Smith 
Reviewed-by: Joe McDonnell 
Tested-by: Joe McDonnell 


> catalog-topic updates >2GB do not work due to Thrift's max message size
> ---
>
> Key: IMPALA-13020
> URL: https://issues.apache.org/jira/browse/IMPALA-13020
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.2.0, Impala 4.3.0
>Reporter: Joe McDonnell
>Priority: Critical
>
> Thrift 0.16.0 added a max message size to protect against malicious packets 
> that can consume a large amount of memory on the receiver side. This max 
> message size is a signed 32-bit integer, so it maxes out at 2GB (which we set 
> via thrift_rpc_max_message_size).
> In catalog v1, the catalog-update statestore topic can become larger than 2GB 
> when there are a large number of tables / partitions / files. If this happens 
> and an Impala coordinator needs to start up (or needs a full topic update for 
> any other reason), it is expecting the statestore to send it the full topic 
> update, but the coordinator actually can't process the message. The 
> deserialization of the message hits the 2GB max message size limit and fails.
> On the statestore side, it shows this message:
> {noformat}
> I0418 16:54:51.727290 3844140 statestore.cc:507] Preparing initial 
> catalog-update topic update for 
> impa...@mcdonnellthrift.vpc.cloudera.com:27000. Size = 2.27 GB
> I0418 16:54:53.889446 3844140 thrift-util.cc:198] TSocket::write_partial() 
> send() : Broken pipe
> I0418 16:54:53.889488 3844140 client-cache.cc:82] ReopenClient(): re-creating 
> client for mcdonnellthrift.vpc.cloudera.com:23000
> I0418 16:54:53.889493 3844140 thrift-util.cc:198] TSocket::write_partial() 
> send() : Broken pipe
> I0418 16:54:53.889503 3844140 thrift-client.cc:116] Error closing connection 
> to: mcdonnellthrift.vpc.cloudera.com:23000, ignoring (write() send(): Broken 
> pipe)
> I0418 16:54:56.052882 3844140 thrift-util.cc:198] TSocket::write_partial() 
> send() : Broken pipe
> I0418 16:54:56.052932 3844140 client-cache.h:363] RPC Error: Client for 
> mcdonnellthrift.vpc.cloudera.com:23000 hit an unexpected exception: write() 
> send(): Broken pipe, type: N6apache6thrift9transport19TTransportExceptionE, 
> rpc: N6impala20TUpdateStateResponseE, send: not done
> I0418 16:54:56.052937 3844140 client-cache.cc:174] Broken Connection, destroy 
> client for 

[jira] [Commented] (IMPALA-13020) catalog-topic updates >2GB do not work due to Thrift's max message size

2024-05-17 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847420#comment-17847420
 ] 

ASF subversion and git services commented on IMPALA-13020:
--

Commit bcff4df6194b2f192d937bb9c031721feccb69df in impala's branch 
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=bcff4df61 ]

IMPALA-13020 (part 2): Split out external vs internal Thrift max message size

The Thrift max message size is designed to protect against malicious
messages that consume a lot of memory on the receiver. This is an
important security measure for externally facing services, but it
can interfere with internal communication within the cluster.
Currently, the max message size is controlled by a single startup
flag for both. This puts tensions between having a low value to
protect against malicious messages versus having a high value to
avoid issues with internal communication (e.g. large statestore
updates).

This introduces a new flag thrift_external_rpc_max_message_size to
specify the limit for externally-facing services. The current
thrift_rpc_max_message_size now applies only for internal services.
Splitting them apart allows setting a much higher value for
internal services (64GB) while leaving the externally facing services
using the current 2GB limit.

This modifies various code locations that wrap a Thrift transport to
pass in the original transport's TConfiguration. This also adds DCHECKs
to make sure that the new transport inherits the max message size. This
limits the locations where we actually need to set max message size.

ThriftServer/ThriftServerBuilder have a setting "is_external_facing"
which can be specified on each ThriftServer. This modifies statestore
and catalog to set is_external_facing to false. All other servers stay
with the default of true.

Testing:
 - This adds a test case to verify that is_external_facing uses the
   higher limit.
 - Ran through the steps in testdata/scale_test_metadata/README.md
   and updated the value in that doc.
 - Created many tables to push the catalog-update topic to be >2GB
   and verified that statestore successfully sends it when an impalad
   restarts.

Change-Id: Ib9a649ef49a8a99c7bd9a1b73c37c4c621661311
Reviewed-on: http://gerrit.cloudera.org:8080/21420
Tested-by: Impala Public Jenkins 
Reviewed-by: Riza Suminto 
Reviewed-by: Michael Smith 


> catalog-topic updates >2GB do not work due to Thrift's max message size
> ---
>
> Key: IMPALA-13020
> URL: https://issues.apache.org/jira/browse/IMPALA-13020
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.2.0, Impala 4.3.0
>Reporter: Joe McDonnell
>Priority: Critical
>
> Thrift 0.16.0 added a max message size to protect against malicious packets 
> that can consume a large amount of memory on the receiver side. This max 
> message size is a signed 32-bit integer, so it maxes out at 2GB (which we set 
> via thrift_rpc_max_message_size).
> In catalog v1, the catalog-update statestore topic can become larger than 2GB 
> when there are a large number of tables / partitions / files. If this happens 
> and an Impala coordinator needs to start up (or needs a full topic update for 
> any other reason), it is expecting the statestore to send it the full topic 
> update, but the coordinator actually can't process the message. The 
> deserialization of the message hits the 2GB max message size limit and fails.
> On the statestore side, it shows this message:
> {noformat}
> I0418 16:54:51.727290 3844140 statestore.cc:507] Preparing initial 
> catalog-update topic update for 
> impa...@mcdonnellthrift.vpc.cloudera.com:27000. Size = 2.27 GB
> I0418 16:54:53.889446 3844140 thrift-util.cc:198] TSocket::write_partial() 
> send() : Broken pipe
> I0418 16:54:53.889488 3844140 client-cache.cc:82] ReopenClient(): re-creating 
> client for mcdonnellthrift.vpc.cloudera.com:23000
> I0418 16:54:53.889493 3844140 thrift-util.cc:198] TSocket::write_partial() 
> send() : Broken pipe
> I0418 16:54:53.889503 3844140 thrift-client.cc:116] Error closing connection 
> to: mcdonnellthrift.vpc.cloudera.com:23000, ignoring (write() send(): Broken 
> pipe)
> I0418 16:54:56.052882 3844140 thrift-util.cc:198] TSocket::write_partial() 
> send() : Broken pipe
> I0418 16:54:56.052932 3844140 client-cache.h:363] RPC Error: Client for 
> mcdonnellthrift.vpc.cloudera.com:23000 hit an unexpected exception: write() 
> send(): Broken pipe, type: N6apache6thrift9transport19TTransportExceptionE, 
> rpc: N6impala20TUpdateStateResponseE, send: not done
> I0418 16:54:56.052937 3844140 client-cache.cc:174] Broken Connection, destroy 
> client for mcdonnellthrift.vpc.cloudera.com:23000{noformat}
> On the Impala side, it doesn't 

[jira] [Commented] (IMPALA-13055) Some Iceberg metadata table tests doesn't assert

2024-05-17 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847380#comment-17847380
 ] 

ASF subversion and git services commented on IMPALA-13055:
--

Commit 3a8eb999cbc746c055708425e071c30e3c00422e in impala's branch 
refs/heads/master from Gabor Kaszab
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=3a8eb999c ]

IMPALA-13055: Some Iceberg metadata table tests don't assert

Some tests in the Iceberg metadata table suite use the following regex
to verify numbers in the output: [1-9]\d*|0
However, if this format is given, the test unconditionally passes.

This patch changes this format to \d+ and fixes the test results that
incorrectly passed before due to the test not asserting.

Opened IMPALA-13067 to investigate why the test framework works like
this for |0 in the regexes.

Change-Id: Ie47093f25a70253b3e6faca27d466d7cf6999fad
Reviewed-on: http://gerrit.cloudera.org:8080/21394
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Some Iceberg metadata table tests doesn't assert
> 
>
> Key: IMPALA-13055
> URL: https://issues.apache.org/jira/browse/IMPALA-13055
> Project: IMPALA
>  Issue Type: Test
>Reporter: Gabor Kaszab
>Priority: Major
>  Labels: impala-iceberg
>
> Some test in the Iceberg metadata table suite use the following regex to 
> verify numbers in the output: [1-9]\d*|0
> However, if this format is given, the test unconditionally passes. On could 
> put the formula within parentheses, or simply verify for \d+



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13067) Some regex make the tests unconditionally pass

2024-05-17 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847381#comment-17847381
 ] 

ASF subversion and git services commented on IMPALA-13067:
--

Commit 3a8eb999cbc746c055708425e071c30e3c00422e in impala's branch 
refs/heads/master from Gabor Kaszab
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=3a8eb999c ]

IMPALA-13055: Some Iceberg metadata table tests don't assert

Some tests in the Iceberg metadata table suite use the following regex
to verify numbers in the output: [1-9]\d*|0
However, if this format is given, the test unconditionally passes.

This patch changes this format to \d+ and fixes the test results that
incorrectly passed before due to the test not asserting.

Opened IMPALA-13067 to investigate why the test framework works like
this for |0 in the regexes.

Change-Id: Ie47093f25a70253b3e6faca27d466d7cf6999fad
Reviewed-on: http://gerrit.cloudera.org:8080/21394
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Some regex make the tests unconditionally pass
> --
>
> Key: IMPALA-13067
> URL: https://issues.apache.org/jira/browse/IMPALA-13067
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Reporter: Gabor Kaszab
>Priority: Major
>  Labels: test-framework
>
> This issue came out in the Iceberg metadata table tests where this regex was 
> used:
> [1-9]\d*|0
>  
> The "|0" part for some reason made the test framework confused and then 
> regardless of what you provide as an expected result the tests passed. One 
> workaround was to put the regex expression between parentheses. Or simply use 
> "d+". https://issues.apache.org/jira/browse/IMPALA-13055 applied this second 
> workaround on the tests.
> Some analysis would be great why this is the behavior of the test framework, 
> and if it's indeed the issue of the framnework, we should fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12559) Support x5c Parameter in JSON Web Keys (JWK)

2024-05-16 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846970#comment-17846970
 ] 

ASF subversion and git services commented on IMPALA-12559:
--

Commit 7550eb607c2b92b1367dc5cf5667b681d59a8915 in impala's branch 
refs/heads/master from wzhou-code
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=7550eb607 ]

IMPALA-12559 (part 2): Fix build issue for different versions of openssl

Previous patch calls OpenSSL API X509_get0_tbs_sigalg() which is not
available in the version of OpenSSL in ToolChain. It causes build
failures.
This patch fixes the issue by calling X509_get_signature_nid().

Testing:
 - Passed jwt-test unit-test and end-end unit-test.

Change-Id: I62b9f0c00f91c2b13be30c415e3f1ebd0e1bd2bc
Reviewed-on: http://gerrit.cloudera.org:8080/21432
Reviewed-by: gaurav singh 
Tested-by: Impala Public Jenkins 
Reviewed-by: Abhishek Rawat 


> Support x5c Parameter in JSON Web Keys (JWK)
> 
>
> Key: IMPALA-12559
> URL: https://issues.apache.org/jira/browse/IMPALA-12559
> Project: IMPALA
>  Issue Type: Bug
>  Components: be, Security
>Reporter: Jason Fehr
>Assignee: gaurav singh
>Priority: Critical
>  Labels: JWT, jwt, security
>
> The ["x5u"|https://datatracker.ietf.org/doc/html/rfc7517#section-4.6], 
> ["x5c"|https://datatracker.ietf.org/doc/html/rfc7517#section-4.7], 
> ["x5t"|https://datatracker.ietf.org/doc/html/rfc7517#section-4.8], and 
> ["x5t#S256|https://datatracker.ietf.org/doc/html/rfc7517#section-4.9] 
> parameters in JWKs is not supported by Impala.  Implement support for this 
> parameter using the available methods in the [Thalhammer/jwt-cpp 
> library|https://github.com/Thalhammer/jwt-cpp/blob/ce1f9df3a9f861d136d6f0c93a6f811c364d1d3d/example/jwks-verify.cpp].
> Note:  If the "alg" property is specified and so is "x5u" or "x5c", then the 
> value of the "alg" property must match the algorithm on the certificate from 
> the "x5u" or "x5c" property.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12559) Support x5c Parameter in JSON Web Keys (JWK)

2024-05-15 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846760#comment-17846760
 ] 

ASF subversion and git services commented on IMPALA-12559:
--

Commit 34c084cebb2f52a6ee11d3d93609b3e4e238816f in impala's branch 
refs/heads/master from gaurav1086
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=34c084ceb ]

IMPALA-12559: Support x5c Parameter for RSA JSON
Web Keys

This enables the jwt verification using the x5c
certificate(s) in the RSA jwks keys. The x5c claim can be
part of the jwks either as a string or an array.
This patch only supports a single x5c certificate per
jwk.

If the "x5c" is present and "alg" is not present,
then "alg" is extracted from the "x5c" certificate using the
signature algorithm. However, if "x5c" is not preseent, then
"alg" is a mandatory field on jwk.

Current mapping of signature algorithm string => algorithm:

sha256WithRSAEncryption => rs256
sha384WithRSAEncryption => rs384
sha512WithRSAEncryption => rs512

If "x5c" is present, then it is given priority over other
mandatory fields like "n", "e" to construct the public key.

Testing:
* added unit test VerifyJwtTokenWithx5cCertificate to
verify jwt with x5c certificate.
* added unit test VerifyJwtTokenWithx5cCertificateWithoutAlg
to verify jwt with x5c certificate without "alg".
* added e2e test testJwtAuthWithJwksX5cHttpUrl to verify
jwt with x5c certificate.

Change-Id: I70be6f9f54190544aa005b2644e2ed8db6f6bb74
Reviewed-on: http://gerrit.cloudera.org:8080/21382
Reviewed-by: Jason Fehr 
Reviewed-by: Wenzhe Zhou 
Tested-by: Impala Public Jenkins 


> Support x5c Parameter in JSON Web Keys (JWK)
> 
>
> Key: IMPALA-12559
> URL: https://issues.apache.org/jira/browse/IMPALA-12559
> Project: IMPALA
>  Issue Type: Bug
>  Components: be, Security
>Reporter: Jason Fehr
>Assignee: gaurav singh
>Priority: Critical
>  Labels: JWT, jwt, security
>
> The ["x5u"|https://datatracker.ietf.org/doc/html/rfc7517#section-4.6], 
> ["x5c"|https://datatracker.ietf.org/doc/html/rfc7517#section-4.7], 
> ["x5t"|https://datatracker.ietf.org/doc/html/rfc7517#section-4.8], and 
> ["x5t#S256|https://datatracker.ietf.org/doc/html/rfc7517#section-4.9] 
> parameters in JWKs is not supported by Impala.  Implement support for this 
> parameter using the available methods in the [Thalhammer/jwt-cpp 
> library|https://github.com/Thalhammer/jwt-cpp/blob/ce1f9df3a9f861d136d6f0c93a6f811c364d1d3d/example/jwks-verify.cpp].
> Note:  If the "alg" property is specified and so is "x5u" or "x5c", then the 
> value of the "alg" property must match the algorithm on the certificate from 
> the "x5u" or "x5c" property.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13079) Add support for FLOAT/DOUBLE in Iceberg metadata tables

2024-05-15 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846761#comment-17846761
 ] 

ASF subversion and git services commented on IMPALA-13079:
--

Commit bbfba13ed4d084681b542d7c5e1b5156576a603b in impala's branch 
refs/heads/master from Daniel Becker
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=bbfba13ed ]

IMPALA-13079: Add support for FLOAT/DOUBLE in Iceberg metadata tables

Until now, the float and double data types were not supported in Iceberg
metadata tables. This commit adds support for them.

Testing:
 - added a test table that contains all primitive types (except for
   decimal, which is still not supported), a struct, an array and a map
 - added a test query that queries the `files` metadata table of the
   above table - the 'readable_metrics' struct contains lower and upper
   bounds for all columns in the original table, with the original type

Change-Id: I2171c9aa9b6d2b634b8c511263b1610cb1d7cb29
Reviewed-on: http://gerrit.cloudera.org:8080/21425
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Add support for FLOAT/DOUBLE in Iceberg metadata tables
> ---
>
> Key: IMPALA-13079
> URL: https://issues.apache.org/jira/browse/IMPALA-13079
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Daniel Becker
>Assignee: Daniel Becker
>Priority: Major
>  Labels: impala-iceberg
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9577) Use `system_unsync` time for Kudu test clusters

2024-05-14 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846448#comment-17846448
 ] 

ASF subversion and git services commented on IMPALA-9577:
-

Commit f507a02b60e905c51e80e6139eef00946cf6d453 in impala's branch 
refs/heads/branch-3.4.2 from Grant Henke
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=f507a02b6 ]

IMPALA-9577: [test] Use `system_unsync` time for Kudu test clusters

Recently Kudu made enhancements to time source configuration and
adjusted the time source for local clusters/tests to `system_unsync`.

This patch mirrors that behavior in Impala test clusters given there is no
need to require NTP-synchronized clock for a test where all the
participating Kudu masters and tablet servers are run at the same node
using the same local wallclock.

See the Kudu commit here for details:
https://github.com/apache/kudu/commit/eb2b70d4b96be2fc2fdd6b3625acc284ac5774be

While making this change, I removed all ntp related packages and special
handling as they should not be needed in a development environment
any more. I also added curl and gawk which were missing in my
Docker ubuntu environment and broke my testing.

Testing:
I tested with the steps below using Docker for Mac:

  docker rm impala-dev
  docker volume rm impala
  docker run --privileged --interactive --tty --name impala-dev -v impala:/home 
-p 25000:25000 -p 25010:25010 -p 25020:25020 ubuntu:16.04 /bin/bash

  apt-get update
  apt-get install sudo
  adduser --disabled-password --gecos '' impdev
  echo 'impdev ALL=(ALL) NOPASSWD:ALL' >> /etc/sudoers
  su - impdev
  cd ~

  sudo apt-get --yes install git
  git clone https://git-wip-us.apache.org/repos/asf/impala.git ~/Impala
  cd ~/Impala
  export IMPALA_HOME=`pwd`
  git remote add fork https://github.com/granthenke/impala.git
  git fetch fork
  git checkout kudu-system-time

  $IMPALA_HOME/bin/bootstrap_development.sh

  source $IMPALA_HOME/bin/impala-config.sh
  (pushd fe && mvn -fae test -Dtest=AnalyzeDDLTest)
  (pushd fe && mvn -fae test -Dtest=AnalyzeKuduDDLTest)

  $IMPALA_HOME/bin/start-impala-cluster.py
  ./tests/run-tests.py query_test/test_kudu.py

Change-Id: Id99e5cb58ab988c3ad4f98484be8db193d5eaf99
Reviewed-on: http://gerrit.cloudera.org:8080/15568
Reviewed-by: Impala Public Jenkins 
Reviewed-by: Alexey Serbin 
Tested-by: Impala Public Jenkins 
Reviewed-on: http://gerrit.cloudera.org:8080/21422
Reviewed-by: Alexey Serbin 
Reviewed-by: Zihao Ye 
Tested-by: Quanlong Huang 


> Use `system_unsync` time for Kudu test clusters
> ---
>
> Key: IMPALA-9577
> URL: https://issues.apache.org/jira/browse/IMPALA-9577
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Grant Henke
>Assignee: Grant Henke
>Priority: Major
> Fix For: Impala 4.0.0
>
>
> Recently Kudu made enhancements to time source configuration and adjusted the 
> time source for local clusters/tests to `system_unsync`. Impala should mirror 
> that behavior in Impala test clusters given there is no need to require 
> NTP-synchronized clock for a test where all the participating Kudu masters 
> and tablet servers are run at the same node using the same local wallclock.
>  
> See the Kudu commit here for details: 
> [https://github.com/apache/kudu/commit/eb2b70d4b96be2fc2fdd6b3625acc284ac5774be]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13051) Speed up test_query_log test runs

2024-05-14 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846267#comment-17846267
 ] 

ASF subversion and git services commented on IMPALA-13051:
--

Commit 3b35ddc8ca7b0e540fc16c413a170a25e164462b in impala's branch 
refs/heads/master from Michael Smith
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=3b35ddc8c ]

IMPALA-13051: Speed up, refactor query log tests

Sets faster default shutdown_grace_period_s and shutdown_deadline_s when
impalad_graceful_shutdown=True in tests. Impala waits until grace period
has passed and all queries are stopped (or deadline is exceeded) before
flushing the query log, so grace period of 0 is sufficient. Adds them in
setup_method to reduce duplication in test declarations.

Re-uses TQueryTableColumn Thrift definitions for testing.

Moves waiting for query log table to exist to setup_method rather than
as a side-effect of get_client.

Refactors workload management code to reduce if-clause nesting.

Adds functional query workload tests for both the sys.impala_query_log
and the sys.impala_query_live tables to assert the names and order of
the individual columns within each table.

Renames the python tests for the sys.impala_query_log table removing the
unnecessary "_query_log_table_" string from the name of each test.

Change-Id: I1127ef041a3e024bf2b262767d56ec5f29bf3855
Reviewed-on: http://gerrit.cloudera.org:8080/21358
Tested-by: Impala Public Jenkins 
Reviewed-by: Riza Suminto 


> Speed up test_query_log test runs
> -
>
> Key: IMPALA-13051
> URL: https://issues.apache.org/jira/browse/IMPALA-13051
> Project: IMPALA
>  Issue Type: Task
>Affects Versions: Impala 4.4.0
>Reporter: Michael Smith
>Assignee: Jason Fehr
>Priority: Minor
>
> test_query_log.py takes 11 minutes to run. Most of them use graceful 
> shutdown, and provide an unnecessary grace period. Optimize test_query_log 
> test runs, and do some other code cleanup around workload management.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13061) Query Live table fails to load if default_transactional_type=insert_only set globally

2024-05-13 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845906#comment-17845906
 ] 

ASF subversion and git services commented on IMPALA-13061:
--

Commit 338fedb44703646664e2e22c6e2f35336924db22 in impala's branch 
refs/heads/branch-4.4.0 from Michael Smith
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=338fedb44 ]

IMPALA-13061: Create query live as external table

Impala determines whether a managed table is transactional based on the
'transactional' table property. It assumes any managed table with
transactional=true returns non-null getValidWriteIds.

When 'default_transactional_type=insert_only' is set at startup (via
default_query_options), impala_query_live is created as a managed table
with transactional=true, but SystemTables don't implement
getValidWriteIds and are not meant to be transactional.

DataSourceTable has a similar problem, and when a JDBC table is
created setJdbcDataSourceProperties sets transactional=false. This
patch uses CREATE EXTERNAL TABLE sys.impala_Query_live so that it is not
created as a managed table and 'transactional' is not set. That avoids
creating a SystemTable that Impala can't read (it encounters an
IllegalStateException).

Change-Id: Ie60a2bd03fabc63c85bcd9fa2489e9d47cd2aa65
Reviewed-on: http://gerrit.cloudera.org:8080/21401
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
(cherry picked from commit 1233ac3c579b5929866dba23debae63e5d2aae90)


> Query Live table fails to load if default_transactional_type=insert_only set 
> globally
> -
>
> Key: IMPALA-13061
> URL: https://issues.apache.org/jira/browse/IMPALA-13061
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Critical
> Fix For: Impala 4.5.0
>
>
> If transactional type defaults to insert_only for all queries via
> {code}
> --default_query_options=default_transactional_type=insert_only
> {code}
> the table definition for {{sys.impala_query_live}} is set to transactional, 
> which causes an exception in catalogd
> {code}
> I0506 22:07:42.808758  3972 jni-util.cc:302] 
> 4547b965aeebc5f0:8ba96c58] java.lang.IllegalStateException
> at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:496)
> at org.apache.impala.catalog.Table.getPartialInfo(Table.java:851)
> at 
> org.apache.impala.catalog.CatalogServiceCatalog.doGetPartialCatalogObject(CatalogServiceCatalog.java:3818)
> at 
> org.apache.impala.catalog.CatalogServiceCatalog.getPartialCatalogObject(CatalogServiceCatalog.java:3714)
> at 
> org.apache.impala.catalog.CatalogServiceCatalog.getPartialCatalogObject(CatalogServiceCatalog.java:3681)
> at 
> org.apache.impala.service.JniCatalog.lambda$getPartialCatalogObject$10(JniCatalog.java:431)
> at 
> org.apache.impala.service.JniCatalogOp.lambda$execAndSerialize$1(JniCatalogOp.java:90)
> at org.apache.impala.service.JniCatalogOp.execOp(JniCatalogOp.java:58)
> at 
> org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:89)
> at 
> org.apache.impala.service.JniCatalogOp.execAndSerializeSilentStartAndFinish(JniCatalogOp.java:109)
> at 
> org.apache.impala.service.JniCatalog.execAndSerializeSilentStartAndFinish(JniCatalog.java:253)
> at 
> org.apache.impala.service.JniCatalog.getPartialCatalogObject(JniCatalog.java:430)
> {code}
> We need to override that setting while creating {{sys.impala_query_live}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13045) Fix intermittent failure in TestQueryLive.test_local_catalog

2024-05-13 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845905#comment-17845905
 ] 

ASF subversion and git services commented on IMPALA-13045:
--

Commit 39233ba3d134b8c18f6f208a7d85c3fadf8ee371 in impala's branch 
refs/heads/branch-4.4.0 from Michael Smith
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=39233ba3d ]

IMPALA-13045: Wait for impala_query_live to exist

Waits for creation of 'sys.impala_query_live' in tests to ensure it has
been registered with HMS.

Change-Id: I5cc3fa3c43be7af9a5f097359a0d4f20d057a207
Reviewed-on: http://gerrit.cloudera.org:8080/21372
Reviewed-by: Impala Public Jenkins 
Tested-by: Michael Smith 
(cherry picked from commit b35aa819653dce062109e61d8f30171234dce5f9)


> Fix intermittent failure in TestQueryLive.test_local_catalog
> 
>
> Key: IMPALA-13045
> URL: https://issues.apache.org/jira/browse/IMPALA-13045
> Project: IMPALA
>  Issue Type: Task
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> IMPALA-13005 introduced {{drop table sys.impala_query_live}}. In some test 
> environments (notably testing with Ozone), recreating that table in the 
> following test - test_local_catalog - does not occur before running the test 
> case portion that attempts to query that table.
> Update the test to wait for the table to be available.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12910) Run TPCH/TPCDS queries for external JDBC tables

2024-05-13 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845902#comment-17845902
 ] 

ASF subversion and git services commented on IMPALA-12910:
--

Commit 01401a0368cb8f19c86dc3fab764ee4b5732f2f6 in impala's branch 
refs/heads/branch-4.4.0 from wzhou-code
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=01401a036 ]

IMPALA-12910: Support running TPCH/TPCDS queries for JDBC tables

This patch adds script to create external JDBC tables for the dataset of
TPCH and TPCDS, and adds unit-tests to run TPCH and TPCDS queries for
external JDBC tables with Impala-Impala federation. Note that JDBC
tables are mapping tables, they don't take additional disk spaces.
It fixes the race condition when caching of SQL DataSource objects by
using a new DataSourceObjectCache class, which checks reference count
before closing SQL DataSource.
Adds a new query-option 'clean_dbcp_ds_cache' with default value as
true. When it's set as false, SQL DataSource object will not be closed
when its reference count equals 0 and will be kept in cache until
the SQL DataSource is idle for more than 5 minutes. Flag variable
'dbcp_data_source_idle_timeout_s' is added to make the duration
configurable.
java.sql.Connection.close() fails to remove a closed connection from
connection pool sometimes, which causes JDBC working threads to wait
for available connections from the connection pool for a long time.
The work around is to call BasicDataSource.invalidateConnection() API
to close a connection.
Two flag variables are added for DBCP configuration properties
'maxTotal' and 'maxWaitMillis'. Note that 'maxActive' and 'maxWait'
properties are renamed to 'maxTotal' and 'maxWaitMillis' respectively
in apache.commons.dbcp v2.
Fixes a bug for database type comparison since the type strings
specified by user could be lower case or mix of upper/lower cases, but
the code compares the types with upper case string.
Fixes issue to close SQL DataSource object in JdbcDataSource.open()
and JdbcDataSource.getNext() when some errors returned from DBCP APIs
or JDBC drivers.

testdata/bin/create-tpc-jdbc-tables.py supports to create JDBC tables
for Impala-Impala, Postgres and MySQL.
Following sample commands creates TPCDS JDBC tables for Impala-Impala
federation with remote coordinator running at 10.19.10.86, and Postgres
server running at 10.19.10.86:
  ${IMPALA_HOME}/testdata/bin/create-tpc-jdbc-tables.py \
--jdbc_db_name=tpcds_jdbc --workload=tpcds \
--database_type=IMPALA --database_host=10.19.10.86 --clean

  ${IMPALA_HOME}/testdata/bin/create-tpc-jdbc-tables.py \
--jdbc_db_name=tpcds_jdbc --workload=tpcds \
--database_type=POSTGRES --database_host=10.19.10.86 \
--database_name=tpcds --clean

TPCDS tests for JDBC tables run only for release/exhaustive builds.
TPCH tests for JDBC tables run for core and exhaustive builds, except
Dockerized builds.

Remaining Issues:
 - tpcds-decimal_v2-q80a failed with returned rows not matching expected
   results for some decimal values. This will be fixed in IMPALA-13018.

Testing:
 - Passed core tests.
 - Passed query_test/test_tpcds_queries.py in release/exhaustive build.
 - Manually verified that only one SQL DataSource object was created for
   test_tpcds_queries.py::TestTpcdsQueryForJdbcTables since query option
   'clean_dbcp_ds_cache' was set as false, and the SQL DataSource object
   was closed by cleanup thread.

Change-Id: I44e8c1bb020e90559c7f22483a7ab7a151b8f48a
Reviewed-on: http://gerrit.cloudera.org:8080/21304
Reviewed-by: Abhishek Rawat 
Tested-by: Impala Public Jenkins 
(cherry picked from commit 08f8a300250df7b4f9a517cdb6bab48c379b7e03)


> Run TPCH/TPCDS queries for external JDBC tables
> ---
>
> Key: IMPALA-12910
> URL: https://issues.apache.org/jira/browse/IMPALA-12910
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Perf Investigation
>Reporter: Wenzhe Zhou
>Assignee: Wenzhe Zhou
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> Need performance data for queries on external JDBC tables to be documented in 
> the design doc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-11499) Refactor UrlEncode function to handle special characters

2024-05-13 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845907#comment-17845907
 ] 

ASF subversion and git services commented on IMPALA-11499:
--

Commit b8a66b0e104f8e25e70fce0326d36c9b48672dbb in impala's branch 
refs/heads/branch-4.4.0 from pranavyl
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=b8a66b0e1 ]

IMPALA-11499: Refactor UrlEncode function to handle special characters

An error came from an issue with URL encoding, where certain Unicode
characters were being incorrectly encoded due to their UTF-8
representation matching characters in the set of characters to escape.
For example, the string '运', which consists of three bytes
0xe8 0xbf 0x90 was wrongly getting encoded into '\E8%FFBF\90',
because the middle byte matched one of the two bytes that
represented the "\u00FF" literal. Inclusion of "\u00FF" was likely
a mistake from the beginning and it should have been '\x7F'.

The patch makes three key changes:
1. Before the change, the set of characters that need to be escaped
was stored as a string. The current patch uses an unordered_set
instead.

2. '\xFF', which is an invalid UTF-8 byte and whose inclusion was
erroneous from the beginning, is replaced with '\x7F', which is a
control character for DELETE, ensuring consistency and correctness in
URL encoding.

3. The list of characters to be escaped is extended to match the
current list in Hive.

Testing: Tests on both traditional Hive tables and Iceberg tables
are included in unicode-column-name.test, insert.test,
coding-util-test.cc and test_insert.py.

Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Reviewed-on: http://gerrit.cloudera.org:8080/21131
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
(cherry picked from commit 85cd07a11e876f3d8773f2638f699c61a6b0dd4c)


> Refactor UrlEncode function to handle special characters
> 
>
> Key: IMPALA-11499
> URL: https://issues.apache.org/jira/browse/IMPALA-11499
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Quanlong Huang
>Assignee: Pranav Yogi Lodha
>Priority: Critical
> Fix For: Impala 4.5.0
>
>
> Partition values are incorrectly URL-encoded in backend for unicode 
> characters, e.g. '运营业务数据' is encoded to '�%FFBF�营业务数据' which is wrong.
> To reproduce the issue, first create a partition table:
> {code:sql}
> create table my_part_tbl (id int) partitioned by (p string) stored as parquet;
> {code}
> Then insert data into it using partition values containing '运'. They will 
> fail:
> {noformat}
> [localhost:21050] default> insert into my_part_tbl partition(p='运营业务数据') 
> values (0);
> Query: insert into my_part_tbl partition(p='运营业务数据') values (0)
> Query submitted at: 2022-08-16 10:03:56 (Coordinator: 
> http://quanlong-OptiPlex-BJ:25000)
> Query progress can be monitored at: 
> http://quanlong-OptiPlex-BJ:25000/query_plan?query_id=404ac3027c4b7169:39d16a2d
> ERROR: Error(s) moving partition files. First error (of 1) was: Hdfs op 
> (RENAME 
> hdfs://localhost:20500/test-warehouse/my_part_tbl/_impala_insert_staging/404ac3027c4b7169_39d16a2d/.404ac3027c4b7169-39d16a2d_1475855322_dir/p=�%FFBF�营业务数据/404ac3027c4b7169-39d16a2d_1585092794_data.0.parq
>  TO 
> hdfs://localhost:20500/test-warehouse/my_part_tbl/p=�%FFBF�营业务数据/404ac3027c4b7169-39d16a2d_1585092794_data.0.parq)
>  failed, error was: 
> hdfs://localhost:20500/test-warehouse/my_part_tbl/_impala_insert_staging/404ac3027c4b7169_39d16a2d/.404ac3027c4b7169-39d16a2d_1475855322_dir/p=�%FFBF�营业务数据/404ac3027c4b7169-39d16a2d_1585092794_data.0.parq
> Error(5): Input/output error
> [localhost:21050] default> insert into my_part_tbl partition(p='运') values 
> (0);
> Query: insert into my_part_tbl partition(p='运') values (0)
> Query submitted at: 2022-08-16 10:04:22 (Coordinator: 
> http://quanlong-OptiPlex-BJ:25000)
> Query progress can be monitored at: 
> http://quanlong-OptiPlex-BJ:25000/query_plan?query_id=a64e5883473ec28d:86e7e335
> ERROR: Error(s) moving partition files. First error (of 1) was: Hdfs op 
> (RENAME 
> hdfs://localhost:20500/test-warehouse/my_part_tbl/_impala_insert_staging/a64e5883473ec28d_86e7e335/.a64e5883473ec28d-86e7e335_1582623091_dir/p=�%FFBF�/a64e5883473ec28d-86e7e335_163454510_data.0.parq
>  TO 
> hdfs://localhost:20500/test-warehouse/my_part_tbl/p=�%FFBF�/a64e5883473ec28d-86e7e335_163454510_data.0.parq)
>  failed, error was: 
> hdfs://localhost:20500/test-warehouse/my_part_tbl/_impala_insert_staging/a64e5883473ec28d_86e7e335/.a64e5883473ec28d-86e7e335_1582623091_dir/p=�%FFBF�/a64e5883473ec28d-86e7e335_163454510_data.0.parq
> 

[jira] [Commented] (IMPALA-13018) Fix test_tpcds_queries.py/TestTpcdsQueryForJdbcTables.test_tpcds-decimal_v2-q80a failure

2024-05-13 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845903#comment-17845903
 ] 

ASF subversion and git services commented on IMPALA-13018:
--

Commit 01401a0368cb8f19c86dc3fab764ee4b5732f2f6 in impala's branch 
refs/heads/branch-4.4.0 from wzhou-code
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=01401a036 ]

IMPALA-12910: Support running TPCH/TPCDS queries for JDBC tables

This patch adds script to create external JDBC tables for the dataset of
TPCH and TPCDS, and adds unit-tests to run TPCH and TPCDS queries for
external JDBC tables with Impala-Impala federation. Note that JDBC
tables are mapping tables, they don't take additional disk spaces.
It fixes the race condition when caching of SQL DataSource objects by
using a new DataSourceObjectCache class, which checks reference count
before closing SQL DataSource.
Adds a new query-option 'clean_dbcp_ds_cache' with default value as
true. When it's set as false, SQL DataSource object will not be closed
when its reference count equals 0 and will be kept in cache until
the SQL DataSource is idle for more than 5 minutes. Flag variable
'dbcp_data_source_idle_timeout_s' is added to make the duration
configurable.
java.sql.Connection.close() fails to remove a closed connection from
connection pool sometimes, which causes JDBC working threads to wait
for available connections from the connection pool for a long time.
The work around is to call BasicDataSource.invalidateConnection() API
to close a connection.
Two flag variables are added for DBCP configuration properties
'maxTotal' and 'maxWaitMillis'. Note that 'maxActive' and 'maxWait'
properties are renamed to 'maxTotal' and 'maxWaitMillis' respectively
in apache.commons.dbcp v2.
Fixes a bug for database type comparison since the type strings
specified by user could be lower case or mix of upper/lower cases, but
the code compares the types with upper case string.
Fixes issue to close SQL DataSource object in JdbcDataSource.open()
and JdbcDataSource.getNext() when some errors returned from DBCP APIs
or JDBC drivers.

testdata/bin/create-tpc-jdbc-tables.py supports to create JDBC tables
for Impala-Impala, Postgres and MySQL.
Following sample commands creates TPCDS JDBC tables for Impala-Impala
federation with remote coordinator running at 10.19.10.86, and Postgres
server running at 10.19.10.86:
  ${IMPALA_HOME}/testdata/bin/create-tpc-jdbc-tables.py \
--jdbc_db_name=tpcds_jdbc --workload=tpcds \
--database_type=IMPALA --database_host=10.19.10.86 --clean

  ${IMPALA_HOME}/testdata/bin/create-tpc-jdbc-tables.py \
--jdbc_db_name=tpcds_jdbc --workload=tpcds \
--database_type=POSTGRES --database_host=10.19.10.86 \
--database_name=tpcds --clean

TPCDS tests for JDBC tables run only for release/exhaustive builds.
TPCH tests for JDBC tables run for core and exhaustive builds, except
Dockerized builds.

Remaining Issues:
 - tpcds-decimal_v2-q80a failed with returned rows not matching expected
   results for some decimal values. This will be fixed in IMPALA-13018.

Testing:
 - Passed core tests.
 - Passed query_test/test_tpcds_queries.py in release/exhaustive build.
 - Manually verified that only one SQL DataSource object was created for
   test_tpcds_queries.py::TestTpcdsQueryForJdbcTables since query option
   'clean_dbcp_ds_cache' was set as false, and the SQL DataSource object
   was closed by cleanup thread.

Change-Id: I44e8c1bb020e90559c7f22483a7ab7a151b8f48a
Reviewed-on: http://gerrit.cloudera.org:8080/21304
Reviewed-by: Abhishek Rawat 
Tested-by: Impala Public Jenkins 
(cherry picked from commit 08f8a300250df7b4f9a517cdb6bab48c379b7e03)


> Fix 
> test_tpcds_queries.py/TestTpcdsQueryForJdbcTables.test_tpcds-decimal_v2-q80a 
> failure
> 
>
> Key: IMPALA-13018
> URL: https://issues.apache.org/jira/browse/IMPALA-13018
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend, Frontend
>Reporter: Wenzhe Zhou
>Assignee: Wenzhe Zhou
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> The returned rows are not matching expected results for some decimal type of 
> columns. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13038) Support profile tab for imported query profiles

2024-05-10 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845519#comment-17845519
 ] 

ASF subversion and git services commented on IMPALA-13038:
--

Commit 0d215da8d4e3f93ad3c1cd72aa801fbcb9464fb0 in impala's branch 
refs/heads/master from Surya Hebbar
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=0d215da8d ]

IMPALA-13038: Support profile tab for imported query profiles

For query profile imports currently the following tabs are supported.
 - Query Statement
 - Query Timeline
 - Query Text Plan

With the current patch "Query Profile" tab will also be supported.

In the "QueryProfileHandler", "query_id" is now added before verifying
its existence in the query log as in "QuerySummaryHandler" and others.

"getQueryID" function has been added to "util.js", as it is helpful
across multiple query pages for retrieving the query ID into JS scripts,
before the page loads up.

On loading the imported "Query Profile" page, query profile download
section and server's non-existing query ID alerts are removed.
All unsupported navbar tabs are removed and current tab is set to active.

The query profile is retrieved from the indexedDB's "imported_queries"
database. Then query profile is passed onto "profileToString" function,
which converts the profile into indented text for displaying on the
profile page.

Each profile and its child profiles are printed in the following order
with the right indentation(fields are skipped, if they do not exist).

Profile name:
  - Info strings:
  - Event sequences:
- Offset:
- Events:
  - Child profile(recursive):
  - Counters:

Change-Id: Iddcf2e285abbf42f97bde19014be076ccd6374bc
Reviewed-on: http://gerrit.cloudera.org:8080/21400
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Support profile tab for imported query profiles
> ---
>
> Key: IMPALA-13038
> URL: https://issues.apache.org/jira/browse/IMPALA-13038
> Project: IMPALA
>  Issue Type: New Feature
>Reporter: Surya Hebbar
>Assignee: Surya Hebbar
>Priority: Major
> Attachments: json_profile_a34485359bfdfe1f_3ca8177b.json, 
> json_profile_a34485359bfdfe1f_3ca8177b.txt
>
>
> Query profile imports currently support the following tabs.
>  - Query Statement
>  - Query Timeline
>  - Query Text Plan
> It would be helpful to support "Query Profile" tab for these imports.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-13036) Document Iceberg metadata tables

2024-05-10 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-13036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845335#comment-17845335
 ] 

ASF subversion and git services commented on IMPALA-13036:
--

Commit aba27edc3338765a6b5133be095989f83cce4747 in impala's branch 
refs/heads/master from Daniel Becker
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=aba27edc3 ]

IMPALA-13036: Document Iceberg metadata tables

This change adds documentation on how Iceberg metadata tables can be
used.

Testing:
 - built docs locally

Change-Id: Ic453f567b814cb4363a155e2008029e94efb6ed1
Reviewed-on: http://gerrit.cloudera.org:8080/21387
Tested-by: Impala Public Jenkins 
Reviewed-by: Peter Rozsa 


> Document Iceberg metadata tables
> 
>
> Key: IMPALA-13036
> URL: https://issues.apache.org/jira/browse/IMPALA-13036
> Project: IMPALA
>  Issue Type: Documentation
>Reporter: Daniel Becker
>Assignee: Daniel Becker
>Priority: Major
>  Labels: impala-iceberg
>
> Impala now supports displaying Iceberg metadata tables, we should document 
> this feature.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



  1   2   3   4   5   6   7   8   9   10   >