[jira] [Commented] (IMPALA-13252) Filter update log message prints TUniqueId in non-standard format
[ https://issues.apache.org/jira/browse/IMPALA-13252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17869283#comment-17869283 ] ASF subversion and git services commented on IMPALA-13252: -- Commit 8d4497be0947e7552e0e9e2c15b9b08566aad148 in impala's branch refs/heads/master from Michael Smith [ https://gitbox.apache.org/repos/asf?p=impala.git;h=8d4497be0 ] IMPALA-13252: Consistently use PrintId to print TUniqueId Some logging formats TUniqueId inconsistently by relying on the Thrift to_string/toString generated printers. This makes it difficult to track a specific query through logs. Adds operator<<(ostream, TUniqueId) to simplify logging TUniqueId correctly, uses PrintId instead of toString in Java, and adds a verifier to test_banned_log_messages to ensure TUniqueId is not printed in logs. Change-Id: If01bf20a240debbbd4c0a22798045ea03f17b28e Reviewed-on: http://gerrit.cloudera.org:8080/21606 Reviewed-by: Yida Wu Tested-by: Impala Public Jenkins > Filter update log message prints TUniqueId in non-standard format > - > > Key: IMPALA-13252 > URL: https://issues.apache.org/jira/browse/IMPALA-13252 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.4.0 >Reporter: Michael Smith >Assignee: Michael Smith >Priority: Major > > Some error messages, such as > {code} > Filter update received for non-executing query with id: > TUniqueId(hi=-8482965541048796556, lo=3501357296473079808) > {code} > print query id as the raw Thrift type rather than our colon-delimited format, > e.g. "8a4673c8fbe83a74:309751e9". This makes it difficult to trace > queries through logs. > Normalize on the colon-delimited format. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13214) test_shell_commandline..test_removed_query_option failed with assertion failure
[ https://issues.apache.org/jira/browse/IMPALA-13214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17868479#comment-17868479 ] ASF subversion and git services commented on IMPALA-13214: -- Commit e1098a6a02c417ddc63904259fa0abbcc64fcdb7 in impala's branch refs/heads/master from Michael Smith [ https://gitbox.apache.org/repos/asf?p=impala.git;h=e1098a6a0 ] IMPALA-13214: Skip wait_until_connected when shell exits The ImpalaShell class expects to start impala-shell and interact with it by sending instructions over stdin and reading the results. This assumption was incorrect when used for impala-shell batch sessions, where the process exits on its own. If there's a delay in ImpalaShell.__init__ - between starting the process and polling to see that it's running - for a batch process, ImpalaShell will fail the assertion that process_status is None. This can be easily reproduced by adding a small (0.1s) sleep after starting the new process. Most batch runs of impala-shell happen through `run_impala_shell_cmd`. Updated that function to only wait for a successful connection when stdin input is supplied. Otherwise the command is assumed to be a batch function and any failures will be detected during `get_result`. Removed explicit use of `wait_until_connected` as redundant. Fixed cases in test_config_file that previously ignored WARNING before the connection string because they did not specify `wait_until_connected`. Tested by running shell/test_shell_commandline.py with a 0.1s delay before ImpalaShell polls. Change-Id: I24e029b6192a17773760cb44fd7a4f87b71c0aae Reviewed-on: http://gerrit.cloudera.org:8080/21598 Tested-by: Impala Public Jenkins Reviewed-by: Jason Fehr Reviewed-by: Kurt Deschler > test_shell_commandline..test_removed_query_option failed with assertion > failure > --- > > Key: IMPALA-13214 > URL: https://issues.apache.org/jira/browse/IMPALA-13214 > Project: IMPALA > Issue Type: Bug > Components: Test >Affects Versions: Impala 4.5.0 >Reporter: Laszlo Gaal >Assignee: Michael Smith >Priority: Blocker > Fix For: Impala 4.5.0 > > > Happened during a recent s3-arm-data-cache build. > Python backtrace: > {code} > /data/jenkins/workspace/impala-asf-master-core-s3-arm-data-cache/repos/Impala/tests/shell/test_shell_commandline.py:305: > in test_removed_query_option > expect_success=True) > shell/util.py:135: in run_impala_shell_cmd > stderr_file=stderr_file) > shell/util.py:155: in run_impala_shell_cmd_no_expect > stdout_file=stdout_file, stderr_file=stderr_file) > shell/util.py:271: in __init__ > "Impala shell exited with return code {0}".format(process_status) > E AssertionError: Impala shell exited with return code 0 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13243) Update Dropwizard Metrics to supported version
[ https://issues.apache.org/jira/browse/IMPALA-13243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17867966#comment-17867966 ] ASF subversion and git services commented on IMPALA-13243: -- Commit 22b59d27d0be25999eba4c839ea157c279939d76 in impala's branch refs/heads/master from Michael Smith [ https://gitbox.apache.org/repos/asf?p=impala.git;h=22b59d27d ] IMPALA-13243: Update Dropwizard Metrics to 4.2.x Updates Dropwizard Metrics components to the latest 4.2.x release, 4.2.26. We directly use metrics-core, and metrics-jvm/metrics-json are imported via Hive (via https://github.com/joshelser/dropwizard-hadoop-metrics2). Dropwizard Metrics manually tested with these versions on https://github.com/joshelser/dropwizard-hadoop-metrics2/pull/8. Change-Id: Ie9bec7a7c23194604430531bd83b25c5969e888e Reviewed-on: http://gerrit.cloudera.org:8080/21599 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Update Dropwizard Metrics to supported version > -- > > Key: IMPALA-13243 > URL: https://issues.apache.org/jira/browse/IMPALA-13243 > Project: IMPALA > Issue Type: Task > Components: Frontend >Affects Versions: Impala 4.4.0 >Reporter: Michael Smith >Assignee: Michael Smith >Priority: Major > > [Dropwizard Metrics|https://metrics.dropwizard.io] 4.1.x was [EOL in > 2023|https://github.com/dropwizard/metrics/discussions/3029]. Impala's still > on 3.x. Update to 4.2.x to keep up with bug and security fixes. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12857) Add flag to enable merge-on-read even if tables are configured with copy-on-write
[ https://issues.apache.org/jira/browse/IMPALA-12857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17867965#comment-17867965 ] ASF subversion and git services commented on IMPALA-12857: -- Commit 05585c19bfcc235ab9d7574c970db04125fb9743 in impala's branch refs/heads/master from Noemi Pap-Takacs [ https://gitbox.apache.org/repos/asf?p=impala.git;h=05585c19b ] IMPALA-12857: Add flag to enable merge-on-read even if tables are configured with copy-on-write Impala can only modify an Iceberg table via 'merge-on-read'. The 'iceberg_always allow_merge_on_read_operations' backend flag makes it possible to execute 'merge-on-read' operations (DELETE, UPDATE, MERGE) even if the table property is 'copy-on-write'. Testing: - custom cluster test - negative E2E test Change-Id: I3800043e135beeedfb655a238c0644aaa0ef11f4 Reviewed-on: http://gerrit.cloudera.org:8080/21578 Reviewed-by: Daniel Becker Tested-by: Impala Public Jenkins > Add flag to enable merge-on-read even if tables are configured with > copy-on-write > - > > Key: IMPALA-12857 > URL: https://issues.apache.org/jira/browse/IMPALA-12857 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Reporter: Zoltán Borók-Nagy >Assignee: Noemi Pap-Takacs >Priority: Major > Labels: impala-iceberg > > Impala can only modify a table via 'merge-on-read'. It raises an error if > users want to modify a table that is configured with 'copy-on-write'. > We could add a backend flag to relax this restriction, i.e. enable > 'merge-on-read' operations (DELETE, UPDATE, MERGE) even if the table property > is 'copy-on-write'. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13226) TupleCacheInfo unintentionally overwrites Object.finalize()
[ https://issues.apache.org/jira/browse/IMPALA-13226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17866936#comment-17866936 ] ASF subversion and git services commented on IMPALA-13226: -- Commit 04608452d37edc9256e368bec69b23c9e989b443 in impala's branch refs/heads/master from stiga-huang [ https://gitbox.apache.org/repos/asf?p=impala.git;h=04608452d ] IMPALA-13226: Rename TupleCacheInfo.finalize() to finalizeHash() TupleCacheInfo.finalize() unintentionally overwrites Object.finalize() which is called by the JVM garbage collector when garbage collection determines that there are no more references to the object. Usually the finalize method is overrided to dispose of system resources or to perform other cleanup. TupleCacheInfo.finalize() is not meant to be used during GC. We'd better use another method name to avoid confusion. This patch renames it to finalizeHash(). Also fixed some stale comments. Change-Id: I657c4f14b074b7c16dc7d126b0c8b5083b8f19c6 Reviewed-on: http://gerrit.cloudera.org:8080/21588 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > TupleCacheInfo unintentionally overwrites Object.finalize() > --- > > Key: IMPALA-13226 > URL: https://issues.apache.org/jira/browse/IMPALA-13226 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Major > > Object.finalize() is called by the JVM garbage collector on an object when > garbage collection determines that there are no more references to the > object. A subclass overrides the finalize method to dispose of system > resources or to perform other cleanup. > TupleCacheInfo.finalize() is not meant to be used during GC. We'd better use > another method name to avoid confusion. > {code:java} > public void finalize() { > finalizedHashString_ = hasher_.hash().toString(); > hasher_ = null; > finalizedHashTrace_ = hashTraceBuilder_.toString(); > hashTraceBuilder_ = null; > finalized_ = true; > }{code} > https://github.com/apache/impala/blob/d83b48cf72fa94ec7f6e55da409b4dff3350543b/fe/src/main/java/org/apache/impala/planner/TupleCacheInfo.java#L157-L163 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13208) Add cluster id to the membership and request-queue topic names
[ https://issues.apache.org/jira/browse/IMPALA-13208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17866908#comment-17866908 ] ASF subversion and git services commented on IMPALA-13208: -- Commit fcee022e6033afe8c8c072fef1274640336b8770 in impala's branch refs/heads/master from stiga-huang [ https://gitbox.apache.org/repos/asf?p=impala.git;h=fcee022e6 ] IMPALA-13208: Add cluster id to the membership and request-queue topic names To share catalogd and statestore across Impala clusters, this adds the cluster id to the membership and request-queue topic names. So impalads are only visible to each other inside the same cluster, i.e. using the same cluster id. Note that impalads are still subscribe to the same catalog-update topic so they can share the same catalog service. If cluster id is empty, use the original topic names. This also adds the non-empty cluster id as the prefix of the statestore subscriber id for impalad and admissiond. Tests: - Add custom cluster test - Ran exhaustive tests Change-Id: I2ff41539f568ef03c0ee2284762b4116b313d90f Reviewed-on: http://gerrit.cloudera.org:8080/21573 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Add cluster id to the membership and request-queue topic names > -- > > Key: IMPALA-13208 > URL: https://issues.apache.org/jira/browse/IMPALA-13208 > Project: IMPALA > Issue Type: New Feature > Components: Backend >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Critical > > Coordinators subscribe to 3 statestore topics: catalog-update, > impala-membership and impala-request-queue. The last two topics are about > query scheduling. To separate the cluster or share catalogd and statestore > across Impala clusters, we can add the cluster id to these two topic names. > Impalads are only visible to each other inside the same cluster (i.e. using > the same cluster id). Queries won't be scheduled across clusters. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13194) Fast-serialize position delete records
[ https://issues.apache.org/jira/browse/IMPALA-13194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17866883#comment-17866883 ] ASF subversion and git services commented on IMPALA-13194: -- Commit daa4d6e916f80d8b929dcf4873668accceb33b0b in impala's branch refs/heads/master from Zoltan Borok-Nagy [ https://gitbox.apache.org/repos/asf?p=impala.git;h=daa4d6e91 ] IMPALA-13194: Fast-serialize position delete records Currently the serialization of position delete records are very wasteful. The records contain slots 'file_path' and 'pos'. And what we do during serialization is the following: 1. Write fixed-size tuples that have a StringValue and a BigInt slot 2. Copy the StringValue's contents after the tuple. 3. Convert the StringValue ptr to be an offset to the string data So we end up having something like this: +-+++-+++-+ | StringValue | BigInt | File path| StringValue | BigInt | File path | ... | +-+++-+++-+ | ptr, len| 42 | /.../a.parquet | ptr, len| 43 | /.../a.parquet | ... | +-+++-+++-+ This is very redundant to store the file paths that way, and in the end we will have a huge buffer that we need to compress and send over the network. Moreover, we copy the file paths in memory twice: 1. From input row batch to the KrpcDataStreamSender::Channel's temporary row batch 2. From the temporary row batch to the outbound row batch (during serialization) The position delete files store the delete records in ascending order. This means adjacent records mostly have the same file path. So we could just buffer the position delete records up to the Channel's capacity, then serialize the data in a more efficient way. With this patch, serialized data will look like this: ++-++-++-+ | File path| StringValue | BigInt | StringValue | BigInt | ... | ++-++-++-+ | /.../a.parquet | ptr, len| 42 | ptr, len| 43 | ... | ++-++-++-+ File path, then tuples with the same file path, after that comes the next file path and tuples associated with that one, and so on. Measurements: 07:EXCHANGE: 1m ==> 52s F02:EXCHANGE SENDER: 1m2s ==> 16s Change-Id: I6095f318e3d06dedb4197681156b40dd2a326c6f Reviewed-on: http://gerrit.cloudera.org:8080/21563 Reviewed-by: Csaba Ringhofer Tested-by: Impala Public Jenkins > Fast-serialize position delete records > -- > > Key: IMPALA-13194 > URL: https://issues.apache.org/jira/browse/IMPALA-13194 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Zoltán Borók-Nagy >Assignee: Zoltán Borók-Nagy >Priority: Major > Labels: impala-iceberg > > Currently the serialization of position delete records are very wasteful. The > records contain slots 'file_path' and 'pos'. And what we do during > serialization is the following. > # Write fixed-size tuple that have a StringValue and a BigInt slot (20 bytes > in total) > # We copy the StringValue's contents after the tuple. > # We convert the StringValue slot to be an offset to the string data > So we end up having something like this: > {noformat} > +-+++-+++-+ > | StringValue | BigInt | File path| StringValue | BigInt | File path > | ... | > +-+++-+++-+ > | ptr, len| 42 | /.../a.parquet | ptr, len| 43 | > /.../a.parquet | ... | > +-+++-+++-+ > {noformat} > This is very redundant to store the file paths that way, and at the end we > will have a huge buffer that we need to compress and send over the network. > Moreover, we copy the file paths in memory twice: > # From input row batch to the KrpcDataStreamSender::Channel's temporary row > batch > # From the temporary row batch to the outbound row batch (during > serialization) > The position delete files store the delete records in ascending order. This > means adjacent records mostly have the same file path. So we could just > buffer the position delete records up to the Channel's capacity, then > serialize the data in a more efficient way. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail:
[jira] [Commented] (IMPALA-13231) Some auto-generated files for ranger are not ignored by Git
[ https://issues.apache.org/jira/browse/IMPALA-13231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17866884#comment-17866884 ] ASF subversion and git services commented on IMPALA-13231: -- Commit 8d16858f29f5c0ef0d5c03c48db693bbdae64c0f in impala's branch refs/heads/master from Xuebin Su [ https://gitbox.apache.org/repos/asf?p=impala.git;h=8d16858f2 ] IMPALA-13231: Gitignore auto-generated files for ranger Previously, some files generated from templates by `setup-ranger.sh` are not ignored by Git. This patch fixes the issue by adding those files to `.gitignore`. Change-Id: I3057b136643412f686352f3188bf7e2b801626bd Reviewed-on: http://gerrit.cloudera.org:8080/21590 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Some auto-generated files for ranger are not ignored by Git > --- > > Key: IMPALA-13231 > URL: https://issues.apache.org/jira/browse/IMPALA-13231 > Project: IMPALA > Issue Type: Bug >Reporter: Xuebin Su >Assignee: Xuebin Su >Priority: Major > > When {{bin/bootstrap_development.sh}} runs, some files generated by > {{testdata/binsetup-ranger.sh}} from the templates are not ignored by Git, > including > * {{testdata/cluster/ranger/setup/impala_user_non_owner_2.json}}, and > * {{testdata/cluster/ranger/setup/all_database_policy_revised.json}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13161) impalad crash -- impala::DelimitedTextParser::ParseFieldLocations
[ https://issues.apache.org/jira/browse/IMPALA-13161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17866608#comment-17866608 ] ASF subversion and git services commented on IMPALA-13161: -- Commit 5c4e771241a7f847d2349ae248bc268243e071ed in impala's branch refs/heads/master from stiga-huang [ https://gitbox.apache.org/repos/asf?p=impala.git;h=5c4e77124 ] IMPALA-13161: Fix column index overflow in DelimitedTextParser DelimitedTextParser tracks the current column index inside the current row that is under parsing. The row could have arbitrary numbers of fields. The index, 'column_idx_', is defined as int type which could overflow when there are more than 2^31 fields in the row. This index is only used to check whether the current column should be materialized. It doesn't make sense to track the index if it's larger than the number of columns of the table. This patch fixes the overflow issue by only bumping 'column_idx_' when it's smaller than the number of columns of the table. Tests - Add e2e test Change-Id: I527a8971e92e270d5576c2155e4622dd6d43d745 Reviewed-on: http://gerrit.cloudera.org:8080/21559 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > impalad crash -- impala::DelimitedTextParser::ParseFieldLocations > --- > > Key: IMPALA-13161 > URL: https://issues.apache.org/jira/browse/IMPALA-13161 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.0.0, Impala 4.4.0 >Reporter: nyq >Assignee: Quanlong Huang >Priority: Critical > > Impala version: 4.0.0 > Problem: > impalad crash, by operating a text table, which has a 3GB data file that only > contains '\x00' char > Steps: > python -c 'f=open("impala_0_3gb.data.csv", "wb");tmp="\x00"*1024*1024*3; > [f.write(tmp) for i in range(1024)] ;f.close()' > create table impala_0_3gb (id int) > hdfs dfs -put impala_0_3gb.data.csv /user/hive/warehouse/impala_0_3gb/ > refresh impala_0_3gb > select count(1) from impala_0_3gb > Errors: > Wrote minidump to 1dcf110f-5a2e-49a2-be4eb7a5-4709ed19.dmp > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x0181861c, pid=956182, tid=0x7fc6b340e700 > # > # JRE version: OpenJDK Runtime Environment (8.0) (build 1.8.0) > # Java VM: OpenJDK 64-Bit Server VM > # Problematic frame: > # C [impalad+0x141861c] > impala::DelimitedTextParser::ParseFieldLocations(int, long, char**, > char**, impala::FieldLocation*, int*, int*, char**)+0x7cc > # > # Failed to write core dump. Core dumps have been disabled. To enable core > dumping, try "ulimit -c unlimited" before starting Java again > # > # An error report file with more information is saved as: > # /tmp/hs_err_pid956182.log > # > # > C [impalad+0x141861c] > impala::DelimitedTextParser::ParseFieldLocations(int, long, char**, > char**, impala::FieldLocation*, int*, int*, char**)+0x7cc > C [impalad+0x136fe11] > impala::HdfsTextScanner::ProcessRange(impala::RowBatch*, int*)+0x1a1 > C [impalad+0x137100e] > impala::HdfsTextScanner::FinishScanRange(impala::RowBatch*)+0x3be > C [impalad+0x13721ac] > impala::HdfsTextScanner::GetNextInternal(impala::RowBatch*)+0x12c > C [impalad+0x131cdfc] impala::HdfsScanner::ProcessSplit()+0x19c > C [impalad+0x1443e17] > impala::HdfsScanNode::ProcessSplit(std::vector std::allocator > const&, impala::MemPool*, > impala::io::ScanRange*, long*)+0x7e7 > C [impalad+0x1447001] impala::HdfsScanNode::ScannerThread(bool, long)+0x541 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13209) ExchangeNode's ConvertRowBatchTime can be high
[ https://issues.apache.org/jira/browse/IMPALA-13209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17866607#comment-17866607 ] ASF subversion and git services commented on IMPALA-13209: -- Commit a486305a922d672f77ff23b5f42e604a720597fd in impala's branch refs/heads/master from Csaba Ringhofer [ https://gitbox.apache.org/repos/asf?p=impala.git;h=a486305a9 ] IMPALA-13209: Optimize ConvertRowBatchTime in ExchangeNode The patch optimizes the most common case when the src and dst RowBatches have the same number of tuples per row. ConvertRowBatchTime is decreased from >600ms to <100ms in a query with busy exchange node: set mt_dop=8; select straight_join count(*) from tpcds_parquet.store_sales s1 join /*+broadcast*/ tpcds_parquet.store_sales16 s2 on s1.ss_customer_sk = s2.ss_customer_sk; TPCDS-20 showed minor improvement (0.77%). The affect is likely to be larger if more nodes are involved. Testing: - passed core tests Change-Id: Iab94315364e8886da1ae01cf6af623812a2da9cb Reviewed-on: http://gerrit.cloudera.org:8080/21571 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > ExchangeNode's ConvertRowBatchTime can be high > -- > > Key: IMPALA-13209 > URL: https://issues.apache.org/jira/browse/IMPALA-13209 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Csaba Ringhofer >Assignee: Csaba Ringhofer >Priority: Major > Labels: performance > > ConvertRowBatchTime can be surprisingly high - the only thing done during > this timer is copying tuple pointers from one RowBatch to another. > https://github.com/apache/impala/blob/c53987480726b114e0c3537c71297df2834a4962/be/src/exec/exchange-node.cc#L217 > {code} > set mt_dop=8; > select straight_join count(*) from tpcds_parquet.store_sales s1 join > /*+broadcast*/ tpcds_parquet.store_sales16 s2 on s1.ss_customer_sk = > s2.ss_customer_sk; > ConvertRowBatchTime dominates the busy exchange node's exec time in the > profile: >- ConvertRowBatchTime: 640.072ms >- InactiveTotalTime: 243.783ms >- PeakMemoryUsage: 12.53 MB (13142368) >- RowsReturned: 46.09M (46086464) >- RowsReturnedRate: 46.93 M/sec >- TotalTime: 981.968ms > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13227) test_spilling_hash_join should be marked for serial execution
[ https://issues.apache.org/jira/browse/IMPALA-13227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17866606#comment-17866606 ] ASF subversion and git services commented on IMPALA-13227: -- Commit 3de8c2ab9c755b1adfc35ea2176a2ac193899ca6 in impala's branch refs/heads/master from Zoltan Borok-Nagy [ https://gitbox.apache.org/repos/asf?p=impala.git;h=3de8c2ab9 ] IMPALA-13227: test_spilling_hash_join should be marked for serial execution test_spilling_hash_join consumes too much resources and parallel tests can fail because of it. We should mark it for serial execution. Testing: * had a green exhaustive run, and we also now that before test_spilling_hash_join was added, the exhaustive runs were much stable Change-Id: I7b50376db9dde5b33a02fde55880f49a7db4b7c1 Reviewed-on: http://gerrit.cloudera.org:8080/21589 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > test_spilling_hash_join should be marked for serial execution > - > > Key: IMPALA-13227 > URL: https://issues.apache.org/jira/browse/IMPALA-13227 > Project: IMPALA > Issue Type: Bug >Reporter: Zoltán Borók-Nagy >Assignee: Zoltán Borók-Nagy >Priority: Major > > test_spilling_hash_join consumes too much resources and parallel tests can > fail because of it. > We should mark it for serial execution. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13088) Speedup IcebergDeleteBuilder
[ https://issues.apache.org/jira/browse/IMPALA-13088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17866482#comment-17866482 ] ASF subversion and git services commented on IMPALA-13088: -- Commit f1133acc2a038a97426087675286ca1dcd863767 in impala's branch refs/heads/master from Zoltan Borok-Nagy [ https://gitbox.apache.org/repos/asf?p=impala.git;h=f1133acc2 ] IMPALA-13088, IMPALA-13109: Use RoaringBitmap instead of sorted vector of int64s This patch substitutes the sorted 64-bit integer vectors that we use in IcebergDeleteNode to 64-bit roaring bitmaps. We use the CRoaring library (version 4.0.0). CRoaring also offers C++ classes, but this patch adds its own thin C++ wrapper class around the C functions to get the best performance. Toolchain Clang 5.0.1 was not able to compile CRoaring due to a bug which is tracked by IMPALA-13190, this patch also fixes it with a new toolchain. Performance I used an extended version of the "One Trillion Row" challenge. This means after inserting 1 Trillion records to a table I also deleted / updated lots of records (see statements at the end). So at the end I had 1 Trillion data records and ~68.5 Billion delete records in the table. For the measurements I used clusters with 10 and 40 executors, and executed the following query: SELECT station, min(measure), max(measure), avg(measure) FROM measurements_extra_1trc_partitioned GROUP BY 1 ORDER BY 1; JOIN BUILD times: ++--+--+ | Implementation | 10 executors | 40 executors | ++--+--+ | Sorted vectors | CRASH| 4m15s| | Roaring bitmap | 6m35s| 1m51s| ++--+--+ 10 executors cluster with sorted vectors failed to run the query because executors crashed due to out-of-memory. Memory usage (VmRSS) for 10 executors: +++ | Implementation | 10 executors | +++ | Sorted vectors | 54.4 GB (before CRASH) | | Roaring bitmap | 7.4 GB | +++ The resource estimations were wrong when MT_DOP was greater than 1. This has been also fixed. Testing: * added tests for RoaringBitmap64 * added tests for resource estimations Statements I used to delete / update the records for the One Trillion Row challenge: create table measurements_extra_1trc_partitioned( station string, ts timestamp, sensor_type int, measure decimal(5,2)) partitioned by spec (bucket(11, station), day(ts), truncate(10, sensor_type)) stored as iceberg; The original challenge didn't have any row-level modifications, columns 'ts' and 'sensor_type' are new: 'ts': timestamps that span a year 'sensor_type': integer between 0 and 100 Both 'ts' and 'sensor_type' has uniform distribution. Ingested data with the help of the original table One Trillion Row challenge, then issued the following DML statements: -- DELETE ~10 Billion delete from measurements_extra_1trc_partitioned where sensor_type = 13; -- UPDATE ~220 Million update measurements_extra_1trc_partitioned set measure = cast(measure - 2 as decimal(5,2)) where station in ('Budapest', 'Paris', 'Zurich', 'Kuala Lumpur') and sensor_type in (7, 17, 77); -- DELETE ~7.1 Billion delete from measurements_extra_1trc_partitioned where ts between '2024-01-15 11:30:00' and '2024-09-10 11:30:00' and sensor_type between 45 and 51 and station regexp '[ATZ].*'; -- UPDATE ~334 Million update measurements_extra_1trc_partitioned set measure = cast(measure + 5 as decimal(5,2)) where station in ('Accra', 'Addis Ababa', 'Entebbe', 'Helsinki', 'Hong Kong', 'Nairobi', 'Ottawa', 'Tauranga', 'Yaounde', 'Zagreb', 'Zurich') and ts > '2024-11-05 22:30:00' and sensor_type > 90; -- DELETE 50.6 Billion delete from measurements_extra_1trc_partitioned where sensor_type between 65 and 77 and ts > '2024-08-11 12:00:00' ; -- UPDATE ~200 Million update measurements_extra_1trc_partitioned set measure = cast(measure + 3.5 as decimal(5,2)) where sensor_type in (56, 66, 76, 86, 96) and ts < '2024-03-17 01:00:00' and (station like 'Z%' or station like 'Y%'); Change-Id: Ib769965d094149e99c43e0044914d9e76107 Reviewed-on: http://gerrit.cloudera.org:8080/21557 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Speedup IcebergDeleteBuilder > > > Key: IMPALA-13088 > URL: https://issues.apache.org/jira/browse/IMPALA-13088 > Project: IMPALA > Issue Type: Improvement >Reporter: Zoltán Borók-Nagy >Assignee: Zoltán Borók-Nagy >Priority: Major > Labels: impala-iceberg > > When there are lots of delete records IcebergDeleteBuilder can become a > bottleneck. Since the left side of the JOIN
[jira] [Commented] (IMPALA-13109) Use RoaringBitmap in IcebergDeleteNode
[ https://issues.apache.org/jira/browse/IMPALA-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17866483#comment-17866483 ] ASF subversion and git services commented on IMPALA-13109: -- Commit f1133acc2a038a97426087675286ca1dcd863767 in impala's branch refs/heads/master from Zoltan Borok-Nagy [ https://gitbox.apache.org/repos/asf?p=impala.git;h=f1133acc2 ] IMPALA-13088, IMPALA-13109: Use RoaringBitmap instead of sorted vector of int64s This patch substitutes the sorted 64-bit integer vectors that we use in IcebergDeleteNode to 64-bit roaring bitmaps. We use the CRoaring library (version 4.0.0). CRoaring also offers C++ classes, but this patch adds its own thin C++ wrapper class around the C functions to get the best performance. Toolchain Clang 5.0.1 was not able to compile CRoaring due to a bug which is tracked by IMPALA-13190, this patch also fixes it with a new toolchain. Performance I used an extended version of the "One Trillion Row" challenge. This means after inserting 1 Trillion records to a table I also deleted / updated lots of records (see statements at the end). So at the end I had 1 Trillion data records and ~68.5 Billion delete records in the table. For the measurements I used clusters with 10 and 40 executors, and executed the following query: SELECT station, min(measure), max(measure), avg(measure) FROM measurements_extra_1trc_partitioned GROUP BY 1 ORDER BY 1; JOIN BUILD times: ++--+--+ | Implementation | 10 executors | 40 executors | ++--+--+ | Sorted vectors | CRASH| 4m15s| | Roaring bitmap | 6m35s| 1m51s| ++--+--+ 10 executors cluster with sorted vectors failed to run the query because executors crashed due to out-of-memory. Memory usage (VmRSS) for 10 executors: +++ | Implementation | 10 executors | +++ | Sorted vectors | 54.4 GB (before CRASH) | | Roaring bitmap | 7.4 GB | +++ The resource estimations were wrong when MT_DOP was greater than 1. This has been also fixed. Testing: * added tests for RoaringBitmap64 * added tests for resource estimations Statements I used to delete / update the records for the One Trillion Row challenge: create table measurements_extra_1trc_partitioned( station string, ts timestamp, sensor_type int, measure decimal(5,2)) partitioned by spec (bucket(11, station), day(ts), truncate(10, sensor_type)) stored as iceberg; The original challenge didn't have any row-level modifications, columns 'ts' and 'sensor_type' are new: 'ts': timestamps that span a year 'sensor_type': integer between 0 and 100 Both 'ts' and 'sensor_type' has uniform distribution. Ingested data with the help of the original table One Trillion Row challenge, then issued the following DML statements: -- DELETE ~10 Billion delete from measurements_extra_1trc_partitioned where sensor_type = 13; -- UPDATE ~220 Million update measurements_extra_1trc_partitioned set measure = cast(measure - 2 as decimal(5,2)) where station in ('Budapest', 'Paris', 'Zurich', 'Kuala Lumpur') and sensor_type in (7, 17, 77); -- DELETE ~7.1 Billion delete from measurements_extra_1trc_partitioned where ts between '2024-01-15 11:30:00' and '2024-09-10 11:30:00' and sensor_type between 45 and 51 and station regexp '[ATZ].*'; -- UPDATE ~334 Million update measurements_extra_1trc_partitioned set measure = cast(measure + 5 as decimal(5,2)) where station in ('Accra', 'Addis Ababa', 'Entebbe', 'Helsinki', 'Hong Kong', 'Nairobi', 'Ottawa', 'Tauranga', 'Yaounde', 'Zagreb', 'Zurich') and ts > '2024-11-05 22:30:00' and sensor_type > 90; -- DELETE 50.6 Billion delete from measurements_extra_1trc_partitioned where sensor_type between 65 and 77 and ts > '2024-08-11 12:00:00' ; -- UPDATE ~200 Million update measurements_extra_1trc_partitioned set measure = cast(measure + 3.5 as decimal(5,2)) where sensor_type in (56, 66, 76, 86, 96) and ts < '2024-03-17 01:00:00' and (station like 'Z%' or station like 'Y%'); Change-Id: Ib769965d094149e99c43e0044914d9e76107 Reviewed-on: http://gerrit.cloudera.org:8080/21557 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Use RoaringBitmap in IcebergDeleteNode > -- > > Key: IMPALA-13109 > URL: https://issues.apache.org/jira/browse/IMPALA-13109 > Project: IMPALA > Issue Type: Improvement >Reporter: Zoltán Borók-Nagy >Assignee: Zoltán Borók-Nagy >Priority: Major > Labels: impala-iceberg > > IcebergDeleteNode currently uses an ordered int64_t array for each data file > to hold the deleted
[jira] [Commented] (IMPALA-13190) Backport Clang compiler fix to Toolchain Clang 5.0.1
[ https://issues.apache.org/jira/browse/IMPALA-13190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17866484#comment-17866484 ] ASF subversion and git services commented on IMPALA-13190: -- Commit f1133acc2a038a97426087675286ca1dcd863767 in impala's branch refs/heads/master from Zoltan Borok-Nagy [ https://gitbox.apache.org/repos/asf?p=impala.git;h=f1133acc2 ] IMPALA-13088, IMPALA-13109: Use RoaringBitmap instead of sorted vector of int64s This patch substitutes the sorted 64-bit integer vectors that we use in IcebergDeleteNode to 64-bit roaring bitmaps. We use the CRoaring library (version 4.0.0). CRoaring also offers C++ classes, but this patch adds its own thin C++ wrapper class around the C functions to get the best performance. Toolchain Clang 5.0.1 was not able to compile CRoaring due to a bug which is tracked by IMPALA-13190, this patch also fixes it with a new toolchain. Performance I used an extended version of the "One Trillion Row" challenge. This means after inserting 1 Trillion records to a table I also deleted / updated lots of records (see statements at the end). So at the end I had 1 Trillion data records and ~68.5 Billion delete records in the table. For the measurements I used clusters with 10 and 40 executors, and executed the following query: SELECT station, min(measure), max(measure), avg(measure) FROM measurements_extra_1trc_partitioned GROUP BY 1 ORDER BY 1; JOIN BUILD times: ++--+--+ | Implementation | 10 executors | 40 executors | ++--+--+ | Sorted vectors | CRASH| 4m15s| | Roaring bitmap | 6m35s| 1m51s| ++--+--+ 10 executors cluster with sorted vectors failed to run the query because executors crashed due to out-of-memory. Memory usage (VmRSS) for 10 executors: +++ | Implementation | 10 executors | +++ | Sorted vectors | 54.4 GB (before CRASH) | | Roaring bitmap | 7.4 GB | +++ The resource estimations were wrong when MT_DOP was greater than 1. This has been also fixed. Testing: * added tests for RoaringBitmap64 * added tests for resource estimations Statements I used to delete / update the records for the One Trillion Row challenge: create table measurements_extra_1trc_partitioned( station string, ts timestamp, sensor_type int, measure decimal(5,2)) partitioned by spec (bucket(11, station), day(ts), truncate(10, sensor_type)) stored as iceberg; The original challenge didn't have any row-level modifications, columns 'ts' and 'sensor_type' are new: 'ts': timestamps that span a year 'sensor_type': integer between 0 and 100 Both 'ts' and 'sensor_type' has uniform distribution. Ingested data with the help of the original table One Trillion Row challenge, then issued the following DML statements: -- DELETE ~10 Billion delete from measurements_extra_1trc_partitioned where sensor_type = 13; -- UPDATE ~220 Million update measurements_extra_1trc_partitioned set measure = cast(measure - 2 as decimal(5,2)) where station in ('Budapest', 'Paris', 'Zurich', 'Kuala Lumpur') and sensor_type in (7, 17, 77); -- DELETE ~7.1 Billion delete from measurements_extra_1trc_partitioned where ts between '2024-01-15 11:30:00' and '2024-09-10 11:30:00' and sensor_type between 45 and 51 and station regexp '[ATZ].*'; -- UPDATE ~334 Million update measurements_extra_1trc_partitioned set measure = cast(measure + 5 as decimal(5,2)) where station in ('Accra', 'Addis Ababa', 'Entebbe', 'Helsinki', 'Hong Kong', 'Nairobi', 'Ottawa', 'Tauranga', 'Yaounde', 'Zagreb', 'Zurich') and ts > '2024-11-05 22:30:00' and sensor_type > 90; -- DELETE 50.6 Billion delete from measurements_extra_1trc_partitioned where sensor_type between 65 and 77 and ts > '2024-08-11 12:00:00' ; -- UPDATE ~200 Million update measurements_extra_1trc_partitioned set measure = cast(measure + 3.5 as decimal(5,2)) where sensor_type in (56, 66, 76, 86, 96) and ts < '2024-03-17 01:00:00' and (station like 'Z%' or station like 'Y%'); Change-Id: Ib769965d094149e99c43e0044914d9e76107 Reviewed-on: http://gerrit.cloudera.org:8080/21557 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Backport Clang compiler fix to Toolchain Clang 5.0.1 > > > Key: IMPALA-13190 > URL: https://issues.apache.org/jira/browse/IMPALA-13190 > Project: IMPALA > Issue Type: Sub-task > Components: Toolchain >Reporter: Zoltán Borók-Nagy >Assignee: Zoltán Borók-Nagy >Priority: Major > > Toolchain Clang 5.0.1 fails to compile the CRoaring library. > There was an
[jira] [Commented] (IMPALA-13001) Add graceful and force shutdown for packaging script.
[ https://issues.apache.org/jira/browse/IMPALA-13001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17866454#comment-17866454 ] ASF subversion and git services commented on IMPALA-13001: -- Commit 8af0ce8ed6659fdda9b81847d4871c14036e173c in impala's branch refs/heads/master from Xiang Yang [ https://gitbox.apache.org/repos/asf?p=impala.git;h=8af0ce8ed ] IMPALA-13001: Support graceful and force shutdown for impala.sh This patch add graceful and force shutdown support for impala.sh. This patch also keep the stdout and stderr log when startup. This patch also fix some bugs in the impala.sh, including: - empty service name check. - restart command cannot work. Testing: - Manually deploy package on Ubuntu22.04 and verify it. Change-Id: Ib7743234952ba6b12694ecc68a920d59fea0d4ba Reviewed-on: http://gerrit.cloudera.org:8080/21297 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Add graceful and force shutdown for packaging script. > - > > Key: IMPALA-13001 > URL: https://issues.apache.org/jira/browse/IMPALA-13001 > Project: IMPALA > Issue Type: Improvement >Reporter: XiangYang >Assignee: XiangYang >Priority: Major > > Add graceful and force shutdown for packaging script to finish the TODO in > https://github.com/apache/impala/blob/4.3.0/package/bin/impala-env.sh#L33. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13193) RuntimeFilter on parquet dictionary should evaluate null values
[ https://issues.apache.org/jira/browse/IMPALA-13193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17864837#comment-17864837 ] ASF subversion and git services commented on IMPALA-13193: -- Commit c53987480726b114e0c3537c71297df2834a4962 in impala's branch refs/heads/master from ttz [ https://gitbox.apache.org/repos/asf?p=impala.git;h=c53987480 ] IMPALA-13193: RuntimeFilter on parquet dictionary should evaluate NULL values NULL values are not included in the parquet dictionary. If the column contains NULL values, add evaluating for NULL values. Testing: - Added a test case in parquet-dictionary-runtime-filter.test Change-Id: I0f69405c0c08feb47141d080a828847e5094163f Reviewed-on: http://gerrit.cloudera.org:8080/21566 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > RuntimeFilter on parquet dictionary should evaluate null values > --- > > Key: IMPALA-13193 > URL: https://issues.apache.org/jira/browse/IMPALA-13193 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.1.0, Impala 4.2.0, Impala 4.1.1, Impala 4.1.2, > Impala 4.3.0, Impala 4.4.0 >Reporter: Quanlong Huang >Assignee: Zhi Tang >Priority: Critical > Labels: correctness > > IMPALA-10910, IMPALA-5509 introduces an optimization to evaluate runtime > filter on parquet dictionary values. If non of the values can pass the check, > the whole row group will be skipped. However, NULL values are not included in > the parquet dictionary. Runtime filters that accept NULL values might > incorrectly reject the row group if none of the dictionary values can pass > the check. > Here are steps to reproduce the bug: > {code:sql} > create table parq_tbl (id bigint, name string) stored as parquet; > insert into parq_tbl values (0, "abc"), (1, NULL), (2, NULL), (3, "abc"); > create table dim_tbl (name string); > insert into dim_tbl values (NULL); > select * from parq_tbl p join dim_tbl d > on COALESCE(p.name, '') = COALESCE(d.name, '');{code} > The SELECT query should return 2 rows but now it returns 0 rows. > A workaround is to disable this optimization: > {code:sql} > set PARQUET_DICTIONARY_RUNTIME_FILTER_ENTRY_LIMIT=0;{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12786) Optimize count(*) for JSON scans
[ https://issues.apache.org/jira/browse/IMPALA-12786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17864738#comment-17864738 ] ASF subversion and git services commented on IMPALA-12786: -- Commit ec59578106b9d9adcdc4d4ea2223d3531eac9cbc in impala's branch refs/heads/master from Eyizoha [ https://gitbox.apache.org/repos/asf?p=impala.git;h=ec5957810 ] IMPALA-12786: Optimize count(*) for JSON scans When performing zero slots scans on a JSON table for operations like count(*), we don't require specific data from the JSON, we only need the number of top-level JSON objects. However, the current JSON parser based on rapidjson still decodes and copies specific data from the JSON, even in zero slots scans. Skipping these steps can significantly improve scan performance. This patch introduces a JSON skipper to conduct zero slots scans on JSON data. Essentially, it is a simplified version of a rapidjson parser, removing specific data decoding and copying operations, resulting in faster parsing of the number of JSON objects. The skipper retains the ability to recognize malformed JSON and provide specific error codes same as the rapidjson parser. Nevertheless, as it bypasses specific data parsing, it cannot identify string encoding errors or numeric overflow errors. Despite this, these data errors do not impact the counting of JSON objects, so it is acceptable to ignore them. The TEXT scanner exhibits similar behavior. Additionally, a new query option, disable_optimized_json_count_star, has been added to disable this optimization and revert to the old behavior. In the performance test of TPC-DS with a format of json/none and a scale of 10GB, the performance optimization is shown in the following tables: +---+---+++-++---++---++-++ | Workload | Query | File Format| Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval | +---+---+++-++---++---++-++ | TPCDS(10) | TPCDS-Q_COUNT_UNOPTIMIZED | json / none / none | 6.78 | 6.88 | -1.46% | 4.93% | 3.63%| 9 | -1.51% | -0.74 | -0.72 | | TPCDS(10) | TPCDS-Q_COUNT_ZERO_SLOT | json / none / none | 2.42 | 6.75 | I -64.20% | 6.44% | 4.58%| 9 | I -177.75% | -3.36 | -37.55 | | TPCDS(10) | TPCDS-Q_COUNT_OPTIMIZED | json / none / none | 2.42 | 7.03 | I -65.63% | 3.93% | 4.39%| 9 | I -194.13% | -3.36 | -42.82 | +---+---+++-++---++---++-++ (I) Improvement: TPCDS(10) TPCDS-Q_COUNT_ZERO_SLOT [json / none / none] (6.75s -> 2.42s [-64.20%]) +--++-+--+++--+--+++---++---+ | Operator | % of Query | Avg | Base Avg | Delta(Avg) | StdDev(%) | Max | Base Max | Delta(Max) | #Hosts | #Inst | #Rows | Est #Rows | +--++-+--+++--+--+++---++---+ | 01:AGGREGATE | 2.58% | 54.85ms | 58.88ms | -6.85% | * 14.43% * | 115.82ms | 133.11ms | -12.99%| 3 | 3 | 3 | 1 | | 00:SCAN HDFS | 97.41% | 2.07s | 6.07s| -65.84%| 5.87%| 2.43s| 6.95s| -65.01%| 3 | 3 | 28.80M | 143.83M | +--++-+--+++--+--+++---++---+ (I) Improvement: TPCDS(10) TPCDS-Q_COUNT_OPTIMIZED [json / none / none] (7.03s -> 2.42s [-65.63%]) +--++---+--++---+---+--+++---++---+ | Operator | % of Query | Avg | Base Avg | Delta(Avg) | StdDev(%) | Max | Base Max | Delta(Max) | #Hosts | #Inst | #Rows | Est #Rows | +--++---+--++---+---+--+++---++---+ | 00:SCAN HDFS | 99.35% | 2.07s | 6.49s| -68.15%| 4.83% | 2.37s | 7.49s| -68.32%| 3 | 3 | 28.80M | 143.83M | +--++---+--++---+---+--+++---++---+ Testing: - Added new test cases in TestQueriesJsonTables to verify that query results are consistent before and after optimization. -
[jira] [Commented] (IMPALA-13203) ExprRewriter did not rewrite 'id = 0 OR false' as expected
[ https://issues.apache.org/jira/browse/IMPALA-13203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17864737#comment-17864737 ] ASF subversion and git services commented on IMPALA-13203: -- Commit 018f5980884fcf34f78ee3898b06531c05826c8a in impala's branch refs/heads/master from Eyizoha [ https://gitbox.apache.org/repos/asf?p=impala.git;h=018f59808 ] IMPALA-13203: Rewrite 'id = 0 OR false' as expected Currently, ExprRewriter cannot rewrite 'id = 0 OR false' to 'id = 0' as expected. More precisely, it fails to rewrite any cases where a boolean literal follows 'AND/OR' as expected. The issue is that the CompoundPredicate generated by NormalizeExprsRule is not analyzed, causing SimplifyConditionalsRule to skip the rewrite. This patch fixes the issue by adding analysis of the rewritten CompoundPredicate in NormalizeExprsRule. Testing: - Modified and passed FE test case ExprRewriteRulesTest#testCompoundPredicate - Modified and passed related test case Change-Id: I9d9fffdd1cc644cc2b48f08c2509f22a72362d22 Reviewed-on: http://gerrit.cloudera.org:8080/21568 Reviewed-by: Csaba Ringhofer Tested-by: Impala Public Jenkins > ExprRewriter did not rewrite 'id = 0 OR false' as expected > --- > > Key: IMPALA-13203 > URL: https://issues.apache.org/jira/browse/IMPALA-13203 > Project: IMPALA > Issue Type: Bug > Components: fe >Affects Versions: Impala 4.4.0 >Reporter: Zihao Ye >Assignee: Zihao Ye >Priority: Minor > > The comments in the SimplifyConditionalsRule class mention that 'id = 0 OR > false' would be rewritten to 'id = 0', but in reality, it does not perform > this rewrite as expected. After executing such SQL, we can see in the text > plan that: > {code:sql} > Analyzed query: SELECT * FROM functional.alltypestiny WHERE FALSE OR id = > CAST(0 > AS INT) {code} > The issue appears to be that the CompoundPredicate generated by > NormalizeExprsRule was not analyzed, causing the SimplifyConditionalsRule to > skip the rewrite. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13196) Query timeline page can not display normally when Knox proxying is being used
[ https://issues.apache.org/jira/browse/IMPALA-13196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17864328#comment-17864328 ] ASF subversion and git services commented on IMPALA-13196: -- Commit e419545c1c2b6fa0b6d23643d3b2eb0faa4dce52 in impala's branch refs/heads/master from zhangyifan27 [ https://gitbox.apache.org/repos/asf?p=impala.git;h=e419545c1 ] IMPALA-13196: Fully qualify urls in www/query_timeline This patch fixes some links in www/query_timeline with adding {{ __common__.host-url }} prefix to fully qualify urls when Knox proxying is being used. Testing: - Ran in a cluster and manually checked the query_timeline page works as expected. Change-Id: I4a701f37cf257a0b11a027c9c598645ca0c997f3 Reviewed-on: http://gerrit.cloudera.org:8080/21564 Reviewed-by: Quanlong Huang Tested-by: Impala Public Jenkins > Query timeline page can not display normally when Knox proxying is being used > - > > Key: IMPALA-13196 > URL: https://issues.apache.org/jira/browse/IMPALA-13196 > Project: IMPALA > Issue Type: Bug >Reporter: YifanZhang >Priority: Major > > We should use absolute url in query_timeline.tmpl. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9441) TestHS2.test_get_schemas is flaky in local catalog mode
[ https://issues.apache.org/jira/browse/IMPALA-9441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17861098#comment-17861098 ] ASF subversion and git services commented on IMPALA-9441: - Commit 00d0b0dda1e215d8e91ff52688fe6654bee52282 in impala's branch refs/heads/master from stiga-huang [ https://gitbox.apache.org/repos/asf?p=impala.git;h=00d0b0dda ] IMPALA-9441,IMPALA-13170: Ops listing dbs/tables should handle db not exists We have some operations listing the dbs/tables in the following steps: 1. Get the db list 2. Do something on the db which could fail if the db no longer exists For instance, when authorization is enabled, SHOW DATABASES would need a step-2 to get the owner of each db. This is fine in the legacy catalog mode since the whole Db object is cached in the coordinator side. However, in the local catalog mode, the msDb could be missing in the local cache. Coordinator then triggers a getPartialCatalogObject RPC to load it from catalogd. If the db no longer exists in catalogd, such step will fail. The same in GetTables HS2 requests when listing all tables in all dbs. In step-2 we list the table names for a db. Though it exists when we get the db list, it could be dropped when we start listing the table names in it. This patch adds codes to handle the exceptions due to db no longer exists. Also improves GetSchemas to not list the table names to get rid of the same issue. Tests: - Add e2e tests Change-Id: I2bd40d33859feca2bbd2e5f1158f3894a91c2929 Reviewed-on: http://gerrit.cloudera.org:8080/21546 Reviewed-by: Yida Wu Tested-by: Impala Public Jenkins > TestHS2.test_get_schemas is flaky in local catalog mode > --- > > Key: IMPALA-9441 > URL: https://issues.apache.org/jira/browse/IMPALA-9441 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Sahil Takiar >Assignee: Quanlong Huang >Priority: Critical > > Saw this once on a ubuntu-16.04-dockerised-tests job: > {code:java} > Error Message > hs2/hs2_test_suite.py:63: in add_session lambda: fn(self)) > hs2/hs2_test_suite.py:44: in add_session_helper fn() > hs2/hs2_test_suite.py:63: in lambda: fn(self)) > hs2/test_hs2.py:423: in test_get_schemas > TestHS2.check_response(get_schemas_resp) hs2/hs2_test_suite.py:131: in > check_response assert response.status.statusCode == expected_status_code > E assert 3 == 0 E+ where 3 = 3 E+where 3 = > TStatus(errorCode=None, errorMessage="DatabaseNotFoundException: Database > 'test_compute_stats_impala_2201_e794b8f' not found\n", sqlState='HY000', > infoMessages=None, statusCode=3).statusCode E+ where > TStatus(errorCode=None, errorMessage="DatabaseNotFoundException: Database > 'test_compute_stats_impala_2201_e794b8f' not found\n", sqlState='HY000', > infoMessages=None, statusCode=3) = TStatus(errorCode=None, > errorMessage="DatabaseNotFoundException: Database > 'test_compute_stats_impala_2201_e794b8f' not found\n", sqlState='HY000', > infoMessages=None, statusCode=3) E+where TStatus(errorCode=None, > errorMessage="DatabaseNotFoundException: Database > 'test_compute_stats_impala_2201_e794b8f' not found\n", sqlState='HY000', > infoMessages=None, statusCode=3) = > TGetSchemasResp(status=TStatus(errorCode=None, > errorMessage="DatabaseNotFoundException: Database > 'test_compute_stats_i...nHandle(hasResultSet=False, modifiedRowCount=None, > operationType=3, operationId=THandleIdentifier(secret='', guid=''))).status > Stacktrace > hs2/hs2_test_suite.py:63: in add_session > lambda: fn(self)) > hs2/hs2_test_suite.py:44: in add_session_helper > fn() > hs2/hs2_test_suite.py:63: in > lambda: fn(self)) > hs2/test_hs2.py:423: in test_get_schemas > TestHS2.check_response(get_schemas_resp) > hs2/hs2_test_suite.py:131: in check_response > assert response.status.statusCode == expected_status_code > E assert 3 == 0 > E+ where 3 = 3 > E+where 3 = TStatus(errorCode=None, > errorMessage="DatabaseNotFoundException: Database > 'test_compute_stats_impala_2201_e794b8f' not found\n", sqlState='HY000', > infoMessages=None, statusCode=3).statusCode > E+ where TStatus(errorCode=None, > errorMessage="DatabaseNotFoundException: Database > 'test_compute_stats_impala_2201_e794b8f' not found\n", sqlState='HY000', > infoMessages=None, statusCode=3) = TStatus(errorCode=None, > errorMessage="DatabaseNotFoundException: Database > 'test_compute_stats_impala_2201_e794b8f' not found\n", sqlState='HY000', > infoMessages=None, statusCode=3) > E+where TStatus(errorCode=None, > errorMessage="DatabaseNotFoundException: Database > 'test_compute_stats_impala_2201_e794b8f' not found\n", sqlState='HY000', > infoMessages=None, statusCode=3) = >
[jira] [Commented] (IMPALA-13170) InconsistentMetadataFetchException due to database dropped when showing databases
[ https://issues.apache.org/jira/browse/IMPALA-13170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17861099#comment-17861099 ] ASF subversion and git services commented on IMPALA-13170: -- Commit 00d0b0dda1e215d8e91ff52688fe6654bee52282 in impala's branch refs/heads/master from stiga-huang [ https://gitbox.apache.org/repos/asf?p=impala.git;h=00d0b0dda ] IMPALA-9441,IMPALA-13170: Ops listing dbs/tables should handle db not exists We have some operations listing the dbs/tables in the following steps: 1. Get the db list 2. Do something on the db which could fail if the db no longer exists For instance, when authorization is enabled, SHOW DATABASES would need a step-2 to get the owner of each db. This is fine in the legacy catalog mode since the whole Db object is cached in the coordinator side. However, in the local catalog mode, the msDb could be missing in the local cache. Coordinator then triggers a getPartialCatalogObject RPC to load it from catalogd. If the db no longer exists in catalogd, such step will fail. The same in GetTables HS2 requests when listing all tables in all dbs. In step-2 we list the table names for a db. Though it exists when we get the db list, it could be dropped when we start listing the table names in it. This patch adds codes to handle the exceptions due to db no longer exists. Also improves GetSchemas to not list the table names to get rid of the same issue. Tests: - Add e2e tests Change-Id: I2bd40d33859feca2bbd2e5f1158f3894a91c2929 Reviewed-on: http://gerrit.cloudera.org:8080/21546 Reviewed-by: Yida Wu Tested-by: Impala Public Jenkins > InconsistentMetadataFetchException due to database dropped when showing > databases > - > > Key: IMPALA-13170 > URL: https://issues.apache.org/jira/browse/IMPALA-13170 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Affects Versions: Impala 3.4.0 >Reporter: Yida Wu >Assignee: Quanlong Huang >Priority: Major > > Using impalad 3.4.0, an InconsistentMetadataFetchException occurs when > running "show databases" in Impala while simultaneously executing "drop > database" to drop the newly created database in Hive. > Step is: > 1, Creates database (Hive) > 2, Creates tables (Hive) > 3, Drops tables (Hive) > 4, Run show databases (Impala) Drop database (Hive) > Logs in Impalad: > {code:java} > I0610 02:18:32.435815 278475 CatalogdMetaProvider.java:1354] 1:2] > Invalidated objects in cache: [list of database names, HMS_METADATA for DB > test_hive] > I0610 02:18:32.436224 278475 jni-util.cc:288] 1:2] > org.apache.impala.catalog.local.InconsistentMetadataFetchException: Fetching > DATABASE failed. Could not find TCatalogObject(type:DATABASE, > catalog_version:0, db:TDatabase(db_name:test_hive)) > > > > at > org.apache.impala.catalog.local.CatalogdMetaProvider.sendRequest(CatalogdMetaProvider.java:424) > at > org.apache.impala.catalog.local.CatalogdMetaProvider.access$100(CatalogdMetaProvider.java:185) > at > org.apache.impala.catalog.local.CatalogdMetaProvider$2.call(CatalogdMetaProvider.java:643) > at > org.apache.impala.catalog.local.CatalogdMetaProvider$2.call(CatalogdMetaProvider.java:638) > at > org.apache.impala.catalog.local.CatalogdMetaProvider.loadWithCaching(CatalogdMetaProvider.java:521) > at > org.apache.impala.catalog.local.CatalogdMetaProvider.loadDb(CatalogdMetaProvider.java:635) > at org.apache.impala.catalog.local.LocalDb.getMetaStoreDb(LocalDb.java:91) > at org.apache.impala.catalog.local.LocalDb.getOwnerUser(LocalDb.java:294) > at org.apache.impala.service.Frontend.getDbs(Frontend.java:1066) > at org.apache.impala.service.JniFrontend.getDbs(JniFrontend.java:301) > I0610 02:18:32.436257 278475 status.cc:129] 1:2] > InconsistentMetadataFetchException: Fetching DATABASE failed. Could not find > TCatalogObject(type:DATABASE, catalog_version:0, > {code} > Logs in Catalog: > {code:java} > I0610 02:18:16.190133 222885 MetastoreEvents.java:505] EventId: 141467532 > EventType: CREATE_DATABASE Successfully added database test_hive > ... > I0610 02:18:32.276082 222885 MetastoreEvents.java:516] EventId: 141467562 > EventType: DROP_DATABASE Creating event 141467562 of type DROP_DATABASE on > database test_hive > I0610 02:18:32.277876 222885 MetastoreEvents.java:254] Total number of events > received: 6 Total number of events filtered out: 0 > I0610 02:18:32.277910 222885 MetastoreEvents.java:258] Incremented skipped > metric to 2564 > I0610 02:18:32.279537 222885
[jira] [Commented] (IMPALA-13120) Failed table loads are not tried to load again even though hive metastore is UP
[ https://issues.apache.org/jira/browse/IMPALA-13120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17860737#comment-17860737 ] ASF subversion and git services commented on IMPALA-13120: -- Commit ab4e62d3c3dd623a8b5ad896641db07782cbb939 in impala's branch refs/heads/master from Venu Reddy [ https://gitbox.apache.org/repos/asf?p=impala.git;h=ab4e62d3c ] IMPALA-13120: Load failed table without need for manual invalidate If the metastore is down when the table load is triggered, catalogd updates a new version of incomplete table with cause as TableLoadingException. On coordinator/impalad, StmtMetadataLoader loadTables that has been waiting for table load to complete, considers the table as loaded. Then, during the analyzer’s table resolve step, for the incomplete table, TableLoadingException (i.e., Could not connect to meta store, failed to load metadata for table and running invalidate metadata for table may resolve this problem) is thrown. Henceforth, any query on the table doesn’t trigger the load since the table is incomplete with TableLoadingException cause. Even though metastore is UP later at some time, queries continue to throw the same exception. It is both misleading and also not required to manually invalidate all such tables. Note: Incomplete table with cause is considered as loaded. This patch tries again to load the table that has previously failed due to metastore connection error(i.e., a recoverable error), when a query involving the table is fired. Idea is to keep track of table object present in db that requires load. On successful/failed load, table object in db is updated. Therefore the tracked table object reference can be compared to table object in db to detect the completion of load. Testing: - Added end-to-end tests Change-Id: Ia882fdd865ef716351be7f1eaf203a9fb04c1c15 Reviewed-on: http://gerrit.cloudera.org:8080/21478 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Failed table loads are not tried to load again even though hive metastore is > UP > --- > > Key: IMPALA-13120 > URL: https://issues.apache.org/jira/browse/IMPALA-13120 > Project: IMPALA > Issue Type: Improvement >Reporter: Venugopal Reddy K >Priority: Major > > *Description:* > If the metastore is down at the time when the table load is triggered, > catalogd creates a new IncompleteTable instance with > cause=TableLoadingException and updates catalog with a new version. And on > coordinator/impalad, StmtMetadataLoader loadTables() that has been waiting > for table load to complete, considers table as loaded/failed load. Then > during the analyzer’s table resolve step, if the table is incomplete, > TableLoadingException is thrown to user. > Note: IncompleteTable with cause not being null is considered as loaded. > *Henceforth, queries on the table doesn’t trigger the table load(at > StmtMetadataLoader) since the table is IncompleteTable with non-null > cause(i.e.,TableLoadingException). Even though metastore is UP later at some > time, queries continue to fail with same TableLoadingException:* > {{CAUSED BY: TableLoadingException: Failed to load metadata for table: > default.t1. Running 'invalidate metadata default.t1' may resolve this > problem.}} > {{CAUSED BY: MetaException: Could not connect to meta store using any of the > URIs provided. Most recent failure: > org.apache.thrift.transport.TTransportException: java.net.ConnectException: > Connection refused (Connection refused)}} > *At present, explicit Invalidate metadata is the only way to recover table > from this state.* {*}Queries executed after metastore is up should succeed > without the need for explicit invalidate metadata{*}{*}{{*}} > *Steps to Reproduce:* > # create a table from hive and insert some data into it. > # Bring down the hive metastore process > # Run a query on impala that triggers the table load. Query fails with > TableLoadingException. > # Bring up the hive metastore process > # Run the query on impala again. It still fails with same > TableLoadingException. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12093) impala-shell should preserve all cookies by default
[ https://issues.apache.org/jira/browse/IMPALA-12093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17860659#comment-17860659 ] ASF subversion and git services commented on IMPALA-12093: -- Commit 100693d5adce3a5db38bb171cae4e9c0dec5e20e in impala's branch refs/heads/master from Michael Smith [ https://gitbox.apache.org/repos/asf?p=impala.git;h=100693d5a ] IMPALA-12093: impala-shell to preserve all cookies Updates impala-shell to preserve all cookies by default, defined as setting 'http_cookie_names=*'. Prior behavior of restricting cookies to a user-specified list is preserved when 'http_cookie_names' is given any value besides '*'. Setting 'http_cookie_names=' prevents any cookies from being preserved. Adds verbose output that prints all cookies that are preserved by the HTTP client. Existing cookie tests with LDAP still work. Adds a test where Impala returns an extra cookie, and test verifies that verbose mode prints all expected cookies. Change-Id: Ic81f790288460b086ab218e6701e8115a996dfa7 Reviewed-on: http://gerrit.cloudera.org:8080/19827 Reviewed-by: Impala Public Jenkins Tested-by: Michael Smith > impala-shell should preserve all cookies by default > --- > > Key: IMPALA-12093 > URL: https://issues.apache.org/jira/browse/IMPALA-12093 > Project: IMPALA > Issue Type: Improvement > Components: Clients >Affects Versions: Impala 4.3.0 >Reporter: Joe McDonnell >Assignee: Michael Smith >Priority: Major > Fix For: Impala 4.5.0 > > > Currently, impala-shell's http_cookie_names parameter specifies which cookies > should be preserved and sent back with subsequent requests. This defaults to > a couple well-known cookie names that Impala uses. > In general, we don't know what proxies are between impala-shell and Impala, > and we don't know what cookie name they rely on being preserved. As an > example, Apache Knox can rely on a cookie it sets to route requests to the > appropriate Impala coordinator. Limiting our cookie preservation to a small > allow list makes this much more brittle and hard to use. Clients need to know > the right list of cookies to put in http_cookie_names, and that is not > obvious. > It seems like the default behavior should be to preserve all cookies. We can > keep a way to disallow or limit the cookies for unusual cases. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13028) libkudu_client.so is not stripped in the DEB/RPM packages
[ https://issues.apache.org/jira/browse/IMPALA-13028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17860445#comment-17860445 ] ASF subversion and git services commented on IMPALA-13028: -- Commit aea057f095fecb331bc0c58687c3f0ac4f6affa8 in impala's branch refs/heads/master from Xiang Yang [ https://gitbox.apache.org/repos/asf?p=impala.git;h=aea057f09 ] IMPALA-13028: Strip dynamic link libraries in Linux DEB/RPM packages This optimization can reduce the DEB package size from 611MB to 554MB, and reduce the kudu client library size from 188MB to 10.5MB at the same time. Testing: - Manually make a DEB package and check the dynamic link libraries whether be stripped. Change-Id: Ie7bee0b4ef904db3706a350f17bcd68d769aa5ad Reviewed-on: http://gerrit.cloudera.org:8080/21542 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > libkudu_client.so is not stripped in the DEB/RPM packages > - > > Key: IMPALA-13028 > URL: https://issues.apache.org/jira/browse/IMPALA-13028 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Reporter: Quanlong Huang >Assignee: XiangYang >Priority: Major > > The current DEB package is 611M on ubuntu18.04. Here are the top-10 largest > files: > {noformat} > 14 MB > ./opt/impala/lib/jars/hive-standalone-metastore-3.1.3000.7.2.18.0-369.jar > 15 MB ./opt/impala/lib/jars/kudu-client-e742f86f6d.jar > 20 MB ./opt/impala/lib/native/libstdc++.so.6.0.28 > 22 MB ./opt/impala/lib/jars/js-22.3.0.jar > 29 MB ./opt/impala/lib/jars/iceberg-hive-runtime-1.3.1.7.2.18.0-369.jar > 60 MB ./opt/impala/lib/jars/ozone-filesystem-hadoop3-1.3.0.7.2.18.0-369.jar > 84 MB ./opt/impala/util/impala-profile-tool > 85 MB ./opt/impala/sbin/impalad > 175 MB ./opt/impala/lib/jars/impala-minimal-s3a-aws-sdk-4.4.0-SNAPSHOT.jar > 188 MB ./opt/impala/lib/native/libkudu_client.so.0.1.0{noformat} > It appears that we just strip binaries built by Impala, e.g. impalad and > impala-profile-tool. > libkudu_client.so.0.1.0 remains the same as the one in the toolchain folder. > {code:bash} > $ ll -th > toolchain/toolchain-packages-gcc10.4.0/kudu-e742f86f6d/release/lib/libkudu_client.so.0.1.0 > -rw-r--r-- 1 quanlong quanlong 189M 10月 18 2023 > toolchain/toolchain-packages-gcc10.4.0/kudu-e742f86f6d/release/lib/libkudu_client.so.0.1.0 > $ file > toolchain/toolchain-packages-gcc10.4.0/kudu-e742f86f6d/release/lib/libkudu_client.so.0.1.0 > toolchain/toolchain-packages-gcc10.4.0/kudu-e742f86f6d/release/lib/libkudu_client.so.0.1.0: > ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, > with debug_info, not stripped{code} > CC [~yx91490] [~boroknagyz] [~rizaon] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8042) Better selectivity estimate for BETWEEN
[ https://issues.apache.org/jira/browse/IMPALA-8042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17860283#comment-17860283 ] ASF subversion and git services commented on IMPALA-8042: - Commit 101e10ba3189db0e115cfb98bb8fe7ac1b108186 in impala's branch refs/heads/master from Riza Suminto [ https://gitbox.apache.org/repos/asf?p=impala.git;h=101e10ba3 ] IMPALA-6311: Lower max_filter_error_rate to 10% Recent changes such as IMPALA-11924 and IMPALA-8042 managed make NDV estimate more accurate in some cases. However, the more accurate (smaller) NDV estimates after these changes have exacerbated the problem with the 75% default FPP, which causes more cases of badly undersized filters. This patch lower default value of max_filter_error_rate flag from 75% to 10%. Lower target FPP will result in doubling runtime filter size most of the time when previous FPP is greater than 10%. Testing: - Pass exhaustive tests. - Manually ran a TPC-DS test at 3 TB comparing 10% to 75%. A value of 10% improves q94 by 2x and q95 by 5x, improves total query time and geomean time by a few percent, and doesn't cause a significant (> 10%) regression in any individual query. Change-Id: I4104e65cc3ce0ef4b36f6420f5044f2cdba9de04 Reviewed-on: http://gerrit.cloudera.org:8080/21552 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Better selectivity estimate for BETWEEN > --- > > Key: IMPALA-8042 > URL: https://issues.apache.org/jira/browse/IMPALA-8042 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Affects Versions: Impala 3.1.0 >Reporter: Paul Rogers >Assignee: Riza Suminto >Priority: Minor > Fix For: Impala 4.5.0 > > > The analyzer rewrites a BETWEEN expression into a pair of inequalities. > IMPALA-8037 explains that the planner then groups all such non-quality > conditions together and assigns a selectivity of 0.1. IMPALA-8031 explains > that the analyzer should handle inequalities better. > BETWEEN is a special case and informs the final result. If we assume a > selectivity of s for inequality, then BETWEEN should be something like s/2. > The intuition is that if c >= x includes, say, ⅓ of values, and c <= y > includes a third of values, then c BETWEEN x AND y should be a narrower set > of values, say ⅙. > [Ramakrishnan an > Gherke|http://pages.cs.wisc.edu/~dbbook/openAccess/Minibase/optimizer/costformula.html\ > recommend 0.4 for between, 0.3 for inequality, and 0.3^2 = 0.09 for the > general expression x <= c AND c <= Y. Note the discrepancy between the > compound inequality case and the BETWEEN case, likely reflecting the > additional information we obtain when the user chooses to use BETWEEN. > To implement a special BETWEEN selectivity in Impala, we must remember the > selectivity of BETWEEN during the rewrite to a compound inequality. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13128) disk-file-test hangs on ARM + UBSAN test jobs
[ https://issues.apache.org/jira/browse/IMPALA-13128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17860284#comment-17860284 ] ASF subversion and git services commented on IMPALA-13128: -- Commit 8d05f5134cc95f53e4e4bbd8ceb9de88b845fda1 in impala's branch refs/heads/master from Joe McDonnell [ https://gitbox.apache.org/repos/asf?p=impala.git;h=8d05f5134 ] IMPALA-13128: disk-file-test Hangs on ARM + UBSAN Test Jobs The Jenkins jobs that run the UBSAN tests on ARM were occaisonally hanging on the disk-file-test. This commit fixes these hangs by upgrading Google Test and implementing the Death Test handling functionality which safely runs tests that expect the process to die. See https://github.com/google/googletest/blob/main/docs/advanced.md#death-tests for details on known problems with running death tests and threads at the same time causing tests to hang. Testing was accomplished by running the disk-file-test repeatedly in a loop on a RHEL 8.9 ARM machine. Before this fix was implemented, this test would run up to 70 times before it hung. After the fix was implemented, the test ran 2,490 times and was still running when it was stopped. These test runs had durations between 18.7 and 19.9 seconds which means disk-file-test now takes about 15 seconds longer than its previous duration of about 4.4 seconds. Change-Id: Ie01f7781f24644a66e9ec52652450116f5cb4297 Reviewed-on: http://gerrit.cloudera.org:8080/21544 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > disk-file-test hangs on ARM + UBSAN test jobs > - > > Key: IMPALA-13128 > URL: https://issues.apache.org/jira/browse/IMPALA-13128 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.5.0 >Reporter: Joe McDonnell >Priority: Critical > Labels: broken-build, flaky > > The UBSAN ARM job (running on Redhat 8) has been hanging then timing out with > this being the last output: > {noformat} > 23:06:47 63/147 Test #63: disk-io-mgr-test . Passed > 43.42 sec > 23:07:30 Start 64: disk-file-test > 23:07:30 > 18:47:00 > 18:47:00 run-all-tests.sh TIMED OUT! {noformat} > This has happened multiple times, but it looks limited to ARM + UBSAN. The > jobs take stack traces, but only of the running impalads / HMS. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-11924) Bloom filter size is unaffected by column NDV
[ https://issues.apache.org/jira/browse/IMPALA-11924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17860282#comment-17860282 ] ASF subversion and git services commented on IMPALA-11924: -- Commit 101e10ba3189db0e115cfb98bb8fe7ac1b108186 in impala's branch refs/heads/master from Riza Suminto [ https://gitbox.apache.org/repos/asf?p=impala.git;h=101e10ba3 ] IMPALA-6311: Lower max_filter_error_rate to 10% Recent changes such as IMPALA-11924 and IMPALA-8042 managed make NDV estimate more accurate in some cases. However, the more accurate (smaller) NDV estimates after these changes have exacerbated the problem with the 75% default FPP, which causes more cases of badly undersized filters. This patch lower default value of max_filter_error_rate flag from 75% to 10%. Lower target FPP will result in doubling runtime filter size most of the time when previous FPP is greater than 10%. Testing: - Pass exhaustive tests. - Manually ran a TPC-DS test at 3 TB comparing 10% to 75%. A value of 10% improves q94 by 2x and q95 by 5x, improves total query time and geomean time by a few percent, and doesn't cause a significant (> 10%) regression in any individual query. Change-Id: I4104e65cc3ce0ef4b36f6420f5044f2cdba9de04 Reviewed-on: http://gerrit.cloudera.org:8080/21552 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Bloom filter size is unaffected by column NDV > - > > Key: IMPALA-11924 > URL: https://issues.apache.org/jira/browse/IMPALA-11924 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Reporter: Csaba Ringhofer >Assignee: Csaba Ringhofer >Priority: Major > Labels: bloom-filter, runtime-filters > Fix For: Impala 4.3.0 > > > For bloom filter sizing Impala simply uses the the cardinality of the build > side while it could be clearly capped by NDV: > https://github.com/apache/impala/blob/feb4a76ed4cb5b688143eb21370f78ec93133c56/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java#L661 > https://github.com/apache/impala/blob/feb4a76ed4cb5b688143eb21370f78ec93133c56/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java#L698 > E.g.: > {code} > use tpch_parquet; > set RUNTIME_FILTER_MIN_SIZE=8192 > RUNTIME_FILTER_MIN_SIZE > explain select count(*) from orders join customer on o_comment = c_mktsegment > PLAN-ROOT SINK > | > 06:AGGREGATE [FINALIZE] > | output: count:merge(*) > | row-size=8B cardinality=1 > | > 05:EXCHANGE [UNPARTITIONED] > | > 03:AGGREGATE > | output: count(*) > | row-size=8B cardinality=1 > | > 02:HASH JOIN [INNER JOIN, BROADCAST] > | hash predicates: o_comment = c_mktsegment > | runtime filters: RF000 <- c_mktsegment > | row-size=82B cardinality=162.03K > | > |--04:EXCHANGE [BROADCAST] > | | > | 01:SCAN HDFS [tpch_parquet.customer] > | HDFS partitions=1/1 files=1 size=12.34MB > | row-size=21B cardinality=150.00K > | > 00:SCAN HDFS [tpch_parquet.orders] >HDFS partitions=1/1 files=2 size=54.21MB >runtime filters: RF000 -> o_comment >row-size=61B cardinality=1.50M > {code} > The query above set RF000's size to 65536, while the minimum 8192 would be > more than enough, as the ndv of c_mktsegment is 5 > The current logic should work well for FK/PK joins where the build size's > cardinality is close the PK's ndv, but can massively overestimate large > tables with small ndv keys. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-6311) Evaluate smaller FPP for Bloom filters
[ https://issues.apache.org/jira/browse/IMPALA-6311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17860281#comment-17860281 ] ASF subversion and git services commented on IMPALA-6311: - Commit 101e10ba3189db0e115cfb98bb8fe7ac1b108186 in impala's branch refs/heads/master from Riza Suminto [ https://gitbox.apache.org/repos/asf?p=impala.git;h=101e10ba3 ] IMPALA-6311: Lower max_filter_error_rate to 10% Recent changes such as IMPALA-11924 and IMPALA-8042 managed make NDV estimate more accurate in some cases. However, the more accurate (smaller) NDV estimates after these changes have exacerbated the problem with the 75% default FPP, which causes more cases of badly undersized filters. This patch lower default value of max_filter_error_rate flag from 75% to 10%. Lower target FPP will result in doubling runtime filter size most of the time when previous FPP is greater than 10%. Testing: - Pass exhaustive tests. - Manually ran a TPC-DS test at 3 TB comparing 10% to 75%. A value of 10% improves q94 by 2x and q95 by 5x, improves total query time and geomean time by a few percent, and doesn't cause a significant (> 10%) regression in any individual query. Change-Id: I4104e65cc3ce0ef4b36f6420f5044f2cdba9de04 Reviewed-on: http://gerrit.cloudera.org:8080/21552 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Evaluate smaller FPP for Bloom filters > -- > > Key: IMPALA-6311 > URL: https://issues.apache.org/jira/browse/IMPALA-6311 > Project: IMPALA > Issue Type: Task > Components: Perf Investigation >Reporter: Jim Apple >Assignee: Riza Suminto >Priority: Major > Fix For: Impala 4.5.0 > > > The Bloom filters are created by estimating the NDV and then using the FPP of > 75% to get the right size for the filter. This is may be too high to be very > useful - if our filters are currently filtering more than 75% out, then it is > only because we are overestimating NDV. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13168) Add README file for setting up Trino
[ https://issues.apache.org/jira/browse/IMPALA-13168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17859989#comment-17859989 ] ASF subversion and git services commented on IMPALA-13168: -- Commit a6f285cdd5c9e94d720cbbb3d517482768ec00bb in impala's branch refs/heads/master from Daniel Becker [ https://gitbox.apache.org/repos/asf?p=impala.git;h=a6f285cdd ] IMPALA-13168: Add README file for setting up Trino The Impala repository contains scripts that make it easy to set up Trino in the development environment. This commit adds the TRINO-README.md file that describes how they can be used. Change-Id: Ic9fea891074223475a57c8f49f788924a0929b12 Reviewed-on: http://gerrit.cloudera.org:8080/21538 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Add README file for setting up Trino > > > Key: IMPALA-13168 > URL: https://issues.apache.org/jira/browse/IMPALA-13168 > Project: IMPALA > Issue Type: Improvement >Reporter: Daniel Becker >Assignee: Daniel Becker >Priority: Major > > The Impala repository contains scripts that make it easy to set up Trino in > the development environment. We should add a README file that describes how > they can be used. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12754) Update Impala document to cover external jdbc table
[ https://issues.apache.org/jira/browse/IMPALA-12754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17856579#comment-17856579 ] ASF subversion and git services commented on IMPALA-12754: -- Commit 6632fd00e17867c9f8f40d6905feafa049368a98 in impala's branch refs/heads/master from jankiram84 [ https://gitbox.apache.org/repos/asf?p=impala.git;h=6632fd00e ] IMPALA-12754: [DOCS] External JDBC table support Created the docs for Impala external JDBC table support Change-Id: I5360389037ae9ee675ab406d87617d55d476bf8f Reviewed-on: http://gerrit.cloudera.org:8080/21539 Tested-by: Impala Public Jenkins Reviewed-by: gaurav singh Reviewed-by: Wenzhe Zhou > Update Impala document to cover external jdbc table > --- > > Key: IMPALA-12754 > URL: https://issues.apache.org/jira/browse/IMPALA-12754 > Project: IMPALA > Issue Type: Sub-task > Components: Docs >Reporter: Wenzhe Zhou >Assignee: Jankiram Balakrishnan >Priority: Major > > We need to document the SQL syntax to create external JDBC table and alter > external JDBC table, including the table properties to be set for JDBC and > DBCP (Database Connection Pool). > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13136) Refactor AnalyzedFunctionCallExpr
[ https://issues.apache.org/jira/browse/IMPALA-13136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17856578#comment-17856578 ] ASF subversion and git services commented on IMPALA-13136: -- Commit 4c00cbff7ee82f9a100746a97b07ce22b3fed5ae in impala's branch refs/heads/master from Steve Carlin [ https://gitbox.apache.org/repos/asf?p=impala.git;h=4c00cbff7 ] IMPALA-13136: Refactor AnalyzedFunctionCallExpr (for Calcite) The analyze method is now called after the Expr is constructed. This code is more in line with the existing way that Impala constructs the Expr object. Change-Id: Ideb662d9c7536659cb558bf62baec29c82217aa2 Reviewed-on: http://gerrit.cloudera.org:8080/21525 Tested-by: Impala Public Jenkins Reviewed-by: Joe McDonnell > Refactor AnalyzedFunctionCallExpr > - > > Key: IMPALA-13136 > URL: https://issues.apache.org/jira/browse/IMPALA-13136 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Reporter: Steve Carlin >Priority: Major > > Copied from code review: > The part where we immediately analyze as part of the constructor makes for > complicated exception handling. RexVisitor doesn't support exceptions, so it > adds complication to handle them under those circumstances. I can't really > explain why it is necessary. > Let me sketch out an alternative: > 1. Construct the whole Expr tree without analyzing it > 2. Any errors that happen during this process are not usually actionable by > the end user. It's good to have a descriptive error message, but it doesn't > mean there is something wrong with the SQL. I think that it is ok for this > code to throw subclasses of RuntimeException or use > Preconditions.checkState() with a good explanation. > 3. When we get the Expr tree back in CreateExprVisitor::getExpr(), we call > analyze() on the root node, which does a recursive analysis of the whole tree. > 4. The special Expr classes don't run analyze() in the constructor, don't > keep a reference to the Analyzer, and don't override resetAnalysisState(). > They override analyzeImpl() and they should be idempotent. The clone > constructor should not need to do anything special, just do a deep copy. > I don't want to bog down this review. If we want to address this as a > followup, I can live with that, but I don't want us to go too far down this > road. (Or if we have a good explanation for why it is necessary, then we can > write a good comment and move on.) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13169) Specify cluster id before starting HiveServer2 after HIVE-28324
[ https://issues.apache.org/jira/browse/IMPALA-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17856441#comment-17856441 ] ASF subversion and git services commented on IMPALA-13169: -- Commit 1ecc43f8c2171475950c37682973b8cd660bfd0c in impala's branch refs/heads/master from Fang-Yu Rao [ https://gitbox.apache.org/repos/asf?p=impala.git;h=1ecc43f8c ] IMPALA-13169: Specify cluster id before starting HiveServer2 After HIVE-28324, in order to start HiveServer2, it is required that the cluster id has to be passed to HiveServer2, either via the environment variable 'HIVE_CLUSTER_ID', or the command line Java property 'hive.cluster.id'. This patch exports HIVE_CLUSTER_ID before starting HiveServer2. Testing: - Manually verified that a HiveServer2 including HIVE-28324 could be started after this patch. - Verified that this patch passed the core tests. Change-Id: I9d07ec01a04f8123b7ccca676ce744ac485f167c Reviewed-on: http://gerrit.cloudera.org:8080/21540 Tested-by: Impala Public Jenkins Reviewed-by: Quanlong Huang > Specify cluster id before starting HiveServer2 after HIVE-28324 > --- > > Key: IMPALA-13169 > URL: https://issues.apache.org/jira/browse/IMPALA-13169 > Project: IMPALA > Issue Type: Task >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > > After HIVE-28324, in order to start HiveServer2, it is required that the > cluster id has to be passed to HiveServer2, either via the environment > variable, or the command line Java property. We should provide HiveServer2 > with the cluster id before we bump up CDP_BUILD_NUMBER to have a CDP Hive > dependency that includes this Hive change. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12940) Implement filtering conditions
[ https://issues.apache.org/jira/browse/IMPALA-12940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17856334#comment-17856334 ] ASF subversion and git services commented on IMPALA-12940: -- Commit a6db27850af5c8dc01be19c2c396ec03211fa402 in impala's branch refs/heads/master from Steve Carlin [ https://gitbox.apache.org/repos/asf?p=impala.git;h=a6db27850 ] IMPALA-12940: Added filtering capability for Calcite planner The Filter RelNode is now handled in the Calcite planner. The parsing and analysis is done by Calcite so there were no changes added to that portion. The ImpalaFilterRel class was created to handled the conversion of the Calcite LogicalFilter to create a filter condition within the Impala plan nodes. There is no explicit filter plan node in Impala. Instead, the filter condition attaches itself to an existing plan node. The filter condition gets passed into the children plan nodes through the ParentPlanRelContext. The ExprConjunctsConverter class is responsible for creating the filter Expr list that is used. The list contains separate AND conditions that are on the top level. Change-Id: If104bf1cd801d5ee92dd7e43d398a21a18be5d97 Reviewed-on: http://gerrit.cloudera.org:8080/21498 Reviewed-by: Joe McDonnell Tested-by: Impala Public Jenkins Reviewed-by: Csaba Ringhofer > Implement filtering conditions > -- > > Key: IMPALA-12940 > URL: https://issues.apache.org/jira/browse/IMPALA-12940 > Project: IMPALA > Issue Type: Sub-task >Reporter: Steve Carlin >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13159) Running queries been cancelled after statestore failover
[ https://issues.apache.org/jira/browse/IMPALA-13159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17855783#comment-17855783 ] ASF subversion and git services commented on IMPALA-13159: -- Commit 9c2c27c68ce27b6a6d227379581ac39a34f8f348 in impala's branch refs/heads/master from wzhou-code [ https://gitbox.apache.org/repos/asf?p=impala.git;h=9c2c27c68 ] IMPALA-13159: Fix query cancellation caused by statestore failover A momentary inconsistent cluster membership state after statestore failover results in query cancellation. We already have code to handle inconsistent cluster membership after statestore restarting by defining a post-recovery grace period. During the grace period, don't update the current cluster membership so that the inconsistent membership will not be used to cancel queries on coordinators and executors. This patch handles inconsistent cluster membership state after statestore failover in the same way. Testing: - Added a new test case to verify that inconsistent cluster membership after statestore failover will not result in query cancellation. - Fixed closing client issue for Catalogd HA test case test_catalogd_failover_with_sync_ddl when the test fails. - Passed core test. Change-Id: I720bec5199df46475b954558abb0637ca7e6298b Reviewed-on: http://gerrit.cloudera.org:8080/21520 Reviewed-by: Michael Smith Reviewed-by: Riza Suminto Tested-by: Impala Public Jenkins > Running queries been cancelled after statestore failover > > > Key: IMPALA-13159 > URL: https://issues.apache.org/jira/browse/IMPALA-13159 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Wenzhe Zhou >Assignee: Wenzhe Zhou >Priority: Major > Fix For: Impala 4.5.0 > > > A momentary inconsistent cluster membership state after statestore failover > results in query cancellation. > We already have code to handle inconsistent cluster membership after > statestore restarting. We need to handle inconsistent cluster membership > after statestore failover in same way. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13150) Possible buffer overflow in StringVal::CopyFrom()
[ https://issues.apache.org/jira/browse/IMPALA-13150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17855719#comment-17855719 ] ASF subversion and git services commented on IMPALA-13150: -- Commit 5d7ca0712af493eca6704a3fdfcfaf16bde46ed0 in impala's branch refs/heads/master from Daniel Becker [ https://gitbox.apache.org/repos/asf?p=impala.git;h=5d7ca0712 ] IMPALA-13150: Possible buffer overflow in StringVal::CopyFrom() In StringVal::CopyFrom(), we take the 'len' parameter as a size_t, which is usually a 64-bit unsigned integer. We pass it to the constructor of StringVal, which takes it as an int, which is usually a 32-bit signed integer. The constructor then allocates memory for the length using the int value, but afterwards in CopyFrom(), we copy the buffer with the size_t length. If size_t is indeed 64 bits and int is 32 bits, and the value is truncated, we may copy more bytes that what we have allocated for the destination. Note that in the constructor of StringVal it is checked whether the length is greater than 1GB, but if the value is truncated because of the type conversion, the check doesn't necessarily catch it as the truncated value may be small. This change fixes the problem by doing the length check with 64 bit integers in StringVal::CopyFrom(). Testing: - added unit tests for StringVal::CopyFrom() in udf-test.cc. Change-Id: I6a1d03d65ec4339a0f33e69ff29abdd8cc3e3067 Reviewed-on: http://gerrit.cloudera.org:8080/21501 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Possible buffer overflow in StringVal::CopyFrom() > - > > Key: IMPALA-13150 > URL: https://issues.apache.org/jira/browse/IMPALA-13150 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Daniel Becker >Assignee: Daniel Becker >Priority: Major > > In {{{}StringVal::CopyFrom(){}}}, we take the 'len' parameter as a > {{{}size_t{}}}, which is usually a 64-bit unsigned integer. We pass it to the > constructor of {{{}StringVal{}}}, which takes it as an {{{}int{}}}, which is > usually a 32-bit signed integer. The constructor then allocates memory for > the length using the {{int}} value, but back in {{{}CopyFrom(){}}}, we copy > the buffer with the {{size_t}} length. If {{size_t}} is indeed 64 bits and > {{int}} is 32 bits, and the value is truncated, we may copy more bytes that > what we have allocated the destination for. See > https://github.com/apache/impala/blob/ce8078204e5995277f79e226e26fe8b9eaca408b/be/src/udf/udf.cc#L546 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-11648) validate-java-pom-versions.sh should skip pom.xml in toolchain
[ https://issues.apache.org/jira/browse/IMPALA-11648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17855453#comment-17855453 ] ASF subversion and git services commented on IMPALA-11648: -- Commit dd62dd98b90f114cc0b1fbbce966a7194f30971a in impala's branch refs/heads/branch-3.4.2 from stiga-huang [ https://gitbox.apache.org/repos/asf?p=impala.git;h=dd62dd98b ] IMPALA-11648: validate-java-pom-versions.sh should skip pom.xml in toolchain bin/validate-java-pom-versions.sh validates the pom.xml files have consistent version strings. However, it checks all files in IMPALA_HOME when building from the tarball. There are some pom.xml files in the toolchain directory that should be skipped. This patch modifies the find command used in the script from find ${IMPALA_HOME} -name pom.xml to find ${IMPALA_HOME} -path ${IMPALA_TOOLCHAIN} -prune -o -name pom.xml -print to list pom.xml files excluding the toolchain directory. More examples about how to use `find -prune` can be found in this blog: https://www.theunixschool.com/2012/07/find-command-15-examples-to-exclude.html Tests: - Built from the tarball locally - Modified version strings in some pom.xml files and verified validate-java-pom-versions.sh is still able to find them. Change-Id: I55bbd9c85ab0e4a7c054ee2abd70eae0f55c8a01 Reviewed-on: http://gerrit.cloudera.org:8080/19122 Reviewed-by: Daniel Becker Tested-by: Impala Public Jenkins > validate-java-pom-versions.sh should skip pom.xml in toolchain > -- > > Key: IMPALA-11648 > URL: https://issues.apache.org/jira/browse/IMPALA-11648 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 4.2.0 >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Blocker > Fix For: Impala 4.2.0, Impala 4.1.1 > > > Building the RC1 tarball of 4.1.1 release failed by > bin/validate-java-pom-versions.sh: > {noformat} > Check for Java pom.xml versions FAILED > Expected 4.1.1-RELEASE > Not found in: > > /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/accumulo-handler/pom.xml > > /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/beeline/pom.xml > > /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/classification/pom.xml > > /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/cli/pom.xml > > /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/common/pom.xml > > /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/contrib/pom.xml > > /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/druid-handler/pom.xml > > /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/hbase-handler/pom.xml > > /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/hcatalog/core/pom.xml > > /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/hcatalog/hcatalog-pig-adapter/pom.xml > > /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/hcatalog/pom.xml > > /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/hcatalog/server-extensions/pom.xml > > /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/hcatalog/streaming/pom.xml > > /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/hcatalog/webhcat/java-client/pom.xml > > /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/hcatalog/webhcat/svr/pom.xml > > /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/hplsql/pom.xml > > /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/impala/pom.xml > > /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/itests/catalogd-unit/pom.xml > > /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/itests/custom-serde/pom.xml > > /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/itests/custom-udfs/pom.xml > > /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/itests/custom-udfs/udf-classloader-udf1/pom.xml > > /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/itests/custom-udfs/udf-classloader-udf2/pom.xml > > /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/itests/custom-udfs/udf-classloader-util/pom.xml > >
[jira] [Commented] (IMPALA-10436) Investigate the need for granting ALL privilege on server when creating an external Kudu table
[ https://issues.apache.org/jira/browse/IMPALA-10436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17855450#comment-17855450 ] ASF subversion and git services commented on IMPALA-10436: -- Commit 3a2f5f28c9709664ef31ea9b2b3675eba31f2d15 in impala's branch refs/heads/master from Fang-Yu Rao [ https://gitbox.apache.org/repos/asf?p=impala.git;h=3a2f5f28c ] IMPALA-12921, IMPALA-12985: Support running Impala with locally built Ranger The goals and non-goals of this patch could be summarized as follows. Goals: - Add changes to the minicluster configuration that allow a non-default version of Ranger (possibly built locally) to run in the context of the minicluster, and to be used as the authorization server by Impala. - Switch to the new constructor when instantiating RangerAccessRequestImpl. This resolves IMPALA-12985 and also makes Impala compatible with Apache Ranger if RangerAccessRequestImpl from Apache Ranger is consumed. - Prepare Ranger and Impala patches as supplemental material to verify what authorization-related tests could be passed if Apache Ranger is the authorization provider. Merging IMPALA-12921_addendum.diff to the Impala repository is not in the scope of this patch in that the diff file changes the behavior of Impala and thus more discussion is required if we'd like to merge it in the future. Non-goals: - Set up any automation for building Ranger from source. - Pass all Impala authorization-related tests with a non-default version of Ranger. Instructions on running Impala with locally built Ranger: Suppose the Ranger project is under the folder $RANGER_SRC_DIR. We could execute the following to build Apache Ranger for easy reference. By default, the compressed tarball is produced under $RANGER_SRC_DIR/target. mvn clean compile -B -nsu -DskipCheck=true -Dcheckstyle.skip=true \ package install -DskipITs -DskipTests -Dmaven.javadoc.skip=true After building Ranger, we need to build Impala's Java code so that Impala's Java code could consume the locally produced Ranger classes. We will need to export the following environment variables before building Impala. This prevents bootstrap_toolchain.py from trying to download the compressed Ranger tarball. 1. export RANGER_VERSION_OVERRIDE=\ $(mvn -f $RANGER_SRC_DIR/pom.xml -q help:evaluate \ -Dexpression=project.version -DforceStdout) 2. export RANGER_HOME_OVERRIDE=$RANGER_SRC_DIR/target/\ ranger-${RANGER_VERSION_OVERRIDE}-admin It then suffices to execute the following to point Impala to the locally built Ranger server before starting Impala. 1. source $IMPALA_HOME/bin/impala-config.sh 2. tar zxv -f $RANGER_SRC_DIR/target/\ ranger-${IMPALA_RANGER_VERSION}-admin.tar.gz \ -C $RANGER_SRC_DIR/target/ 3. $IMPALA_HOME/bin/create-test-configuration.sh 4. $IMPALA_HOME/bin/create-test-configuration.sh \ -create_ranger_policy_db 5. $IMPALA_HOME/testdata/bin/run-ranger.sh (run-all.sh has to be executed instead if other underlying services have not been started) 6. $IMPALA_HOME/testdata/bin/setup-ranger.sh Testing: - Manually verified that we could point Impala to a locally built Apache Ranger on the master branch (with tip being https://github.com/apache/ranger/commit/4abb993). - Manually verified that with RANGER-4771.diff and IMPALA-12921_addendum.diff, only 3 authorization-related tests failed. They failed because the resource type of 'storage-type' is not supported in Apache Ranger yet and thus the test cases added in IMPALA-10436 could fail. - Manually verified that the log files of Apache and CDP Ranger's Admin server could be created under ${RANGER_LOG_DIR} after we start the Ranger service. - Verified that this patch passed the core tests when CDP Ranger is used. Change-Id: I268d6d4d6e371da7497aac8d12f78178d57c6f27 Reviewed-on: http://gerrit.cloudera.org:8080/21160 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Investigate the need for granting ALL privilege on server when creating an > external Kudu table > -- > > Key: IMPALA-10436 > URL: https://issues.apache.org/jira/browse/IMPALA-10436 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > Fix For: Impala 4.2.0 > > > We found that to allow a user {{usr}} to create an external Kudu table in > Impala, we need to grant the user the {{ALL}} privilege on the server in > advance like the following, which seems too strict. It would be good to > figure out whether such a requirement is indeed necessary. > {code:sql} > GRANT ALL ON SERVER TO USER usr; > {code} -- This message was sent by Atlassian
[jira] [Commented] (IMPALA-12921) Consider adding support for locally built Ranger
[ https://issues.apache.org/jira/browse/IMPALA-12921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17855448#comment-17855448 ] ASF subversion and git services commented on IMPALA-12921: -- Commit 3a2f5f28c9709664ef31ea9b2b3675eba31f2d15 in impala's branch refs/heads/master from Fang-Yu Rao [ https://gitbox.apache.org/repos/asf?p=impala.git;h=3a2f5f28c ] IMPALA-12921, IMPALA-12985: Support running Impala with locally built Ranger The goals and non-goals of this patch could be summarized as follows. Goals: - Add changes to the minicluster configuration that allow a non-default version of Ranger (possibly built locally) to run in the context of the minicluster, and to be used as the authorization server by Impala. - Switch to the new constructor when instantiating RangerAccessRequestImpl. This resolves IMPALA-12985 and also makes Impala compatible with Apache Ranger if RangerAccessRequestImpl from Apache Ranger is consumed. - Prepare Ranger and Impala patches as supplemental material to verify what authorization-related tests could be passed if Apache Ranger is the authorization provider. Merging IMPALA-12921_addendum.diff to the Impala repository is not in the scope of this patch in that the diff file changes the behavior of Impala and thus more discussion is required if we'd like to merge it in the future. Non-goals: - Set up any automation for building Ranger from source. - Pass all Impala authorization-related tests with a non-default version of Ranger. Instructions on running Impala with locally built Ranger: Suppose the Ranger project is under the folder $RANGER_SRC_DIR. We could execute the following to build Apache Ranger for easy reference. By default, the compressed tarball is produced under $RANGER_SRC_DIR/target. mvn clean compile -B -nsu -DskipCheck=true -Dcheckstyle.skip=true \ package install -DskipITs -DskipTests -Dmaven.javadoc.skip=true After building Ranger, we need to build Impala's Java code so that Impala's Java code could consume the locally produced Ranger classes. We will need to export the following environment variables before building Impala. This prevents bootstrap_toolchain.py from trying to download the compressed Ranger tarball. 1. export RANGER_VERSION_OVERRIDE=\ $(mvn -f $RANGER_SRC_DIR/pom.xml -q help:evaluate \ -Dexpression=project.version -DforceStdout) 2. export RANGER_HOME_OVERRIDE=$RANGER_SRC_DIR/target/\ ranger-${RANGER_VERSION_OVERRIDE}-admin It then suffices to execute the following to point Impala to the locally built Ranger server before starting Impala. 1. source $IMPALA_HOME/bin/impala-config.sh 2. tar zxv -f $RANGER_SRC_DIR/target/\ ranger-${IMPALA_RANGER_VERSION}-admin.tar.gz \ -C $RANGER_SRC_DIR/target/ 3. $IMPALA_HOME/bin/create-test-configuration.sh 4. $IMPALA_HOME/bin/create-test-configuration.sh \ -create_ranger_policy_db 5. $IMPALA_HOME/testdata/bin/run-ranger.sh (run-all.sh has to be executed instead if other underlying services have not been started) 6. $IMPALA_HOME/testdata/bin/setup-ranger.sh Testing: - Manually verified that we could point Impala to a locally built Apache Ranger on the master branch (with tip being https://github.com/apache/ranger/commit/4abb993). - Manually verified that with RANGER-4771.diff and IMPALA-12921_addendum.diff, only 3 authorization-related tests failed. They failed because the resource type of 'storage-type' is not supported in Apache Ranger yet and thus the test cases added in IMPALA-10436 could fail. - Manually verified that the log files of Apache and CDP Ranger's Admin server could be created under ${RANGER_LOG_DIR} after we start the Ranger service. - Verified that this patch passed the core tests when CDP Ranger is used. Change-Id: I268d6d4d6e371da7497aac8d12f78178d57c6f27 Reviewed-on: http://gerrit.cloudera.org:8080/21160 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Consider adding support for locally built Ranger > > > Key: IMPALA-12921 > URL: https://issues.apache.org/jira/browse/IMPALA-12921 > Project: IMPALA > Issue Type: Task >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > > It would be nice to be able to support locally built Ranger in Impala's > minicluster in that it would facilitate the testing of features that require > changes to both components. > *+Edit:+* > Making the current Apache Impala on *master* (tip is > {*}IMPALA-12925{*}: Fix decimal data type for external JDBC table) to support > Ranger on *master* (tip is > {*}RANGER-4745{*}: Enhance handling of subAccess authorization in Ranger HDFS > plugin) may be too ambitious. > The signatures of some classes are already incompatible. For instance, on the >
[jira] [Commented] (IMPALA-13152) IllegalStateException in computing processing cost when there are predicates on analytic output columns
[ https://issues.apache.org/jira/browse/IMPALA-13152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17855451#comment-17855451 ] ASF subversion and git services commented on IMPALA-13152: -- Commit 5d1bd80623324f829aca604b25d97ace21f51417 in impala's branch refs/heads/master from Riza Suminto [ https://gitbox.apache.org/repos/asf?p=impala.git;h=5d1bd8062 ] IMPALA-13152: Avoid NaN, infinite, and negative ProcessingCost TOP-N cost will turn into NaN if inputCardinality is equal to 0 due to Math.log(inputCardinality). This patch fix the issue by avoiding Math.log(0) and replace it with 0 instead. After this patch, Instantiating BaseProcessingCost with NaN, infinite, or negative totalCost will throw IllegalArgumentException. In BaseProcessingCost.getDetails(), "total-cost" is renamed to "raw-cost" to avoid confusion with "cost-total" in ProcessingCost.getDetails(). Testing: - Add testcase that run TOP-N query over empty table. - Compute ProcessingCost in most FE and EE test even when COMPUTE_PROCESSING_COST option is not enabled by checking if RuntimeEnv.INSTANCE.isTestEnv() is True or TEST_REPLAN option is enabled. - Pass core test. Change-Id: Ib49c7ae397dadcb2cb69fde1850d442d33cdf177 Reviewed-on: http://gerrit.cloudera.org:8080/21504 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > IllegalStateException in computing processing cost when there are predicates > on analytic output columns > --- > > Key: IMPALA-13152 > URL: https://issues.apache.org/jira/browse/IMPALA-13152 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Reporter: Quanlong Huang >Assignee: Riza Suminto >Priority: Major > > Saw an error in the following query when is on: > {code:sql} > create table tbl (a int, b int, c int); > set COMPUTE_PROCESSING_COST=1; > explain select a, b from ( > select a, b, c, > row_number() over(partition by a order by b desc) as latest > from tbl > )b > WHERE latest=1 > ERROR: IllegalStateException: Processing cost of PlanNode 01:TOP-N is invalid! > {code} > Exception in the logs: > {noformat} > I0611 13:04:37.192874 28004 jni-util.cc:321] > 264ee79bfb6ac031:42f8006c] java.lang.IllegalStateException: > Processing cost of PlanNode 01:TOP-N is invalid! > at > com.google.common.base.Preconditions.checkState(Preconditions.java:512) > at > org.apache.impala.planner.PlanNode.computeRowConsumptionAndProductionToCost(PlanNode.java:1047) > at > org.apache.impala.planner.PlanFragment.computeCostingSegment(PlanFragment.java:287) > at > org.apache.impala.planner.Planner.computeProcessingCost(Planner.java:560) > at > org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1932) > at > org.apache.impala.service.Frontend.getPlannedExecRequest(Frontend.java:2892) > at > org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2676) > at > org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2224) > at > org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1985) > at > org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175){noformat} > Don't see the error if removing the predicate "latest=1". -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12985) Use the new constructor when instantiating RangerAccessRequestImpl
[ https://issues.apache.org/jira/browse/IMPALA-12985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17855449#comment-17855449 ] ASF subversion and git services commented on IMPALA-12985: -- Commit 3a2f5f28c9709664ef31ea9b2b3675eba31f2d15 in impala's branch refs/heads/master from Fang-Yu Rao [ https://gitbox.apache.org/repos/asf?p=impala.git;h=3a2f5f28c ] IMPALA-12921, IMPALA-12985: Support running Impala with locally built Ranger The goals and non-goals of this patch could be summarized as follows. Goals: - Add changes to the minicluster configuration that allow a non-default version of Ranger (possibly built locally) to run in the context of the minicluster, and to be used as the authorization server by Impala. - Switch to the new constructor when instantiating RangerAccessRequestImpl. This resolves IMPALA-12985 and also makes Impala compatible with Apache Ranger if RangerAccessRequestImpl from Apache Ranger is consumed. - Prepare Ranger and Impala patches as supplemental material to verify what authorization-related tests could be passed if Apache Ranger is the authorization provider. Merging IMPALA-12921_addendum.diff to the Impala repository is not in the scope of this patch in that the diff file changes the behavior of Impala and thus more discussion is required if we'd like to merge it in the future. Non-goals: - Set up any automation for building Ranger from source. - Pass all Impala authorization-related tests with a non-default version of Ranger. Instructions on running Impala with locally built Ranger: Suppose the Ranger project is under the folder $RANGER_SRC_DIR. We could execute the following to build Apache Ranger for easy reference. By default, the compressed tarball is produced under $RANGER_SRC_DIR/target. mvn clean compile -B -nsu -DskipCheck=true -Dcheckstyle.skip=true \ package install -DskipITs -DskipTests -Dmaven.javadoc.skip=true After building Ranger, we need to build Impala's Java code so that Impala's Java code could consume the locally produced Ranger classes. We will need to export the following environment variables before building Impala. This prevents bootstrap_toolchain.py from trying to download the compressed Ranger tarball. 1. export RANGER_VERSION_OVERRIDE=\ $(mvn -f $RANGER_SRC_DIR/pom.xml -q help:evaluate \ -Dexpression=project.version -DforceStdout) 2. export RANGER_HOME_OVERRIDE=$RANGER_SRC_DIR/target/\ ranger-${RANGER_VERSION_OVERRIDE}-admin It then suffices to execute the following to point Impala to the locally built Ranger server before starting Impala. 1. source $IMPALA_HOME/bin/impala-config.sh 2. tar zxv -f $RANGER_SRC_DIR/target/\ ranger-${IMPALA_RANGER_VERSION}-admin.tar.gz \ -C $RANGER_SRC_DIR/target/ 3. $IMPALA_HOME/bin/create-test-configuration.sh 4. $IMPALA_HOME/bin/create-test-configuration.sh \ -create_ranger_policy_db 5. $IMPALA_HOME/testdata/bin/run-ranger.sh (run-all.sh has to be executed instead if other underlying services have not been started) 6. $IMPALA_HOME/testdata/bin/setup-ranger.sh Testing: - Manually verified that we could point Impala to a locally built Apache Ranger on the master branch (with tip being https://github.com/apache/ranger/commit/4abb993). - Manually verified that with RANGER-4771.diff and IMPALA-12921_addendum.diff, only 3 authorization-related tests failed. They failed because the resource type of 'storage-type' is not supported in Apache Ranger yet and thus the test cases added in IMPALA-10436 could fail. - Manually verified that the log files of Apache and CDP Ranger's Admin server could be created under ${RANGER_LOG_DIR} after we start the Ranger service. - Verified that this patch passed the core tests when CDP Ranger is used. Change-Id: I268d6d4d6e371da7497aac8d12f78178d57c6f27 Reviewed-on: http://gerrit.cloudera.org:8080/21160 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Use the new constructor when instantiating RangerAccessRequestImpl > -- > > Key: IMPALA-12985 > URL: https://issues.apache.org/jira/browse/IMPALA-12985 > Project: IMPALA > Issue Type: Task > Components: Frontend >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > > After RANGER-2763, we changed the signature of the class > RangerAccessRequestImpl in by adding an additional input argument 'userRoles' > as shown in the following. > {code:java} > public RangerAccessRequestImpl(RangerAccessResource resource, String > accessType, String user, Set userGroups, Set userRoles) { > ... > {code} > The new signature is also provided in CDP Ranger. Thus to unblock > IMPALA-12921 or to be able to build Apache Impala with locally built Apache > Ranger, it
[jira] [Commented] (IMPALA-13075) Setting very high BATCH_SIZE can blow up memory usage of fragments
[ https://issues.apache.org/jira/browse/IMPALA-13075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17855178#comment-17855178 ] ASF subversion and git services commented on IMPALA-13075: -- Commit b1320bd1d646eba3f044ef647b7d4497487d4674 in impala's branch refs/heads/master from Riza Suminto [ https://gitbox.apache.org/repos/asf?p=impala.git;h=b1320bd1d ] IMPALA-13075: Cap memory usage for ExprValuesCache at 256KB ExprValuesCache uses BATCH_SIZE as a deciding factor to set its capacity. It bounds the capacity such that expr_values_array_ memory usage stays below 256KB. This patch tightens that limit to include all memory usage from ExprValuesCache::MemUsage() instead of expr_values_array_ only. Therefore, setting a very high BATCH_SIZE will not push the total memory usage of ExprValuesCache beyond 256KB. Simplify table dimension creation methods and fix few flake8 warnings in test_dimensions.py. Testing: - Add test_join_queries.py::TestExprValueCache. - Pass core tests. Change-Id: Iee27cbbe8d3100301d05a6516b62c45975a8d0e0 Reviewed-on: http://gerrit.cloudera.org:8080/21455 Reviewed-by: Riza Suminto Tested-by: Impala Public Jenkins > Setting very high BATCH_SIZE can blow up memory usage of fragments > -- > > Key: IMPALA-13075 > URL: https://issues.apache.org/jira/browse/IMPALA-13075 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 4.0.0 >Reporter: Ezra Zerihun >Assignee: Riza Suminto >Priority: Major > Fix For: Impala 4.5.0 > > > In Impala 4.0, setting a very high BATCH_SIZE or near max limit of 65536 can > cause some fragment's memory usage to spike way past the query's defined > MEM_LIMIT or pool's Maximum Query Memory Limit with Clamp on. So even though > MEM_LIMIT is set reasonable, the query can still fail with out of memory and > a huge amount of memory used on fragment. Reducing BATCH_SIZE to a reasonable > amount or back to default will allow the query to run without issue and use > reasonable amount of memory within query's MEM_LIMIT or pool's Maximum Query > Memory Limit. > > 1) set BATCH_SIZE=65536; set MEM_LIMIT=1g; > > {code:java} > Query State: EXCEPTION > Impala Query State: ERROR > Query Status: Memory limit exceeded: Error occurred on backend ...:27000 > by fragment ... Memory left in process limit: 145.53 GB Memory left in query > limit: -6.80 GB Query(...): memory limit exceeded. Limit=1.00 GB > Reservation=86.44 MB ReservationLimit=819.20 MB OtherMemory=7.71 GB > Total=7.80 GB Peak=7.84 GB Unclaimed reservations: Reservation=8.50 MB > OtherMemory=0 Total=8.50 MB Peak=56.44 MB Runtime Filter Bank: > Reservation=4.00 MB ReservationLimit=4.00 MB OtherMemory=0 Total=4.00 MB > Peak=4.00 MB Fragment ...: Reservation=1.94 MB OtherMemory=7.59 GB > Total=7.59 GB Peak=7.63 GB HASH_JOIN_NODE (id=8): Reservation=1.94 MB > OtherMemory=7.57 GB Total=7.57 GB Peak=7.57 GB Exprs: Total=7.57 GB > Peak=7.57 GB Hash Join Builder (join_node_id=8): Total=0 Peak=1.95 MB > ... > Query Options (set by configuration): > BATCH_SIZE=65536,MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell > v4.0.0.7.2.16.0-287 (5ae3917) built on Mon Jan 9 21:23:59 UTC > 2023,DEFAULT_FILE_FORMAT=PARQUET,... > ... > ExecSummary: > ... > 09:AGGREGATE 32 32 0.000ns 0.000ns 0 > 4.83M 36.31 MB 212.78 MB STREAMING > 08:HASH JOIN 32 32 5s149ms 2m44s 0 > 194.95M 7.57 GB 1.94 MB RIGHT OUTER JOIN, PARTITIONED > |--18:EXCHANGE 32 32 93.750us 1.000ms 10.46K > 1.55K 1.65 MB 2.56 MB HASH(... > {code} > > > 2) set BATCH_SIZE=0; set MEM_LIMIT=1g; > > {code:java} > Query State: FINISHED > Impala Query State: FINISHED > ... > Query Options (set by configuration and planner): > MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell v4.0.0.7.2.16.0-287 > (5ae3917) built on Mon Jan 9 21:23:59 UTC > 2023,DEFAULT_FILE_FORMAT=PARQUET,... > ... > ExecSummary: > ... > 09:AGGREGATE 32 32 593.748us 18.999ms 45 > 4.83M 34.06 MB 212.78 MB STREAMING > 08:HASH JOIN 32 32 10s873ms 5m47s 10.47K > 194.95M 123.48 MB 1.94 MB RIGHT OUTER JOIN, PARTITIONED > |--18:EXCHANGE 32 32 0.000ns 0.000ns 10.46K > 1.55K 344.00 KB 1.69 MB HASH(... > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For
[jira] [Commented] (IMPALA-12712) INVALIDATE METADATA should set a better createEventId
[ https://issues.apache.org/jira/browse/IMPALA-12712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17855089#comment-17855089 ] ASF subversion and git services commented on IMPALA-12712: -- Commit f98da3315e1e4744ad0e49405a4d1c7f98be85ae in impala's branch refs/heads/master from Sai Hemanth Gantasala [ https://gitbox.apache.org/repos/asf?p=impala.git;h=f98da3315 ] IMPALA-12712: Invalidate metadata on table should set better createEventId "INVALIDATE METADATA " can be used to bring up a table in Impala's catalog cache if the table exists in HMS. Currently, createEventId for such tables are always set as -1 which will lead to always removing the table. Sequence of drop table + create table + invalidate table can lead to flaky test failures like IMPALA-12266. Solution: When Invalidate metadata is fired, fetch the latest eventId from HMS and set it as createEventId for the table, so that drop table event that happend before invalidate query will be ignored without removing the table from cache. Note: Also removed an unnecessary RPC call to HMS to get table object since we alrady have required info in table metadata rpc call. Testing: - Added an end-to-end test to verify that drop table event happened before time shouldn't remove the metadata object from cache. Change-Id: Iff6ac18fe8d9e7b25cc41c7e41eecde251fbccdd Reviewed-on: http://gerrit.cloudera.org:8080/21402 Reviewed-by: Csaba Ringhofer Tested-by: Impala Public Jenkins > INVALIDATE METADATA should set a better createEventId > - > > Key: IMPALA-12712 > URL: https://issues.apache.org/jira/browse/IMPALA-12712 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Quanlong Huang >Assignee: Sai Hemanth Gantasala >Priority: Critical > Labels: catalog-2024 > > "INVALIDATE METADATA " can be used to bring up a table in Impala's > catalog cache if the table exists in HMS. For instance, when HMS event > processing is disabled, we can use it in Impala to bring up tables that are > created outside Impala. > The createEventId for such tables are always set as -1: > [https://github.com/apache/impala/blob/6ddd69c605d4c594e33fdd39a2ca888538b4b8d7/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L2243-L2246] > This is problematic when event-processing is enabled. DropTable events and > RenameTable events use the createEventId to decide whether to remove the > table in catalog cache. -1 will lead to always removing the table. Though it > might be added back shortly in follow-up CreateTable events, in the period > between them the table is missing in Impala, causing test failures like > IMPALA-12266. > A simpler reproducing of the issue is creating a table in Hive and launching > Impala with a long event polling interval to mimic the delay on events. Note > that we start Impala cluster after creating the table so Impala don't need to > process the CREATE_TABLE event. > {noformat} > hive> create table debug_tbl (i int); > bin/start-impala-cluster.py --catalogd_args=--hms_event_polling_interval_s=60 > {noformat} > Drop the table in Impala and recreate it in Hive, so it doesn't exist in the > catalog cache but exist in HMS. Run "INVALIDATE METADATA " in Impala > to bring it up before the DROP_TABLE event come. > {noformat} > impala> drop table debug_tbl; > hive> create table debug_tbl (i int, j int); > impala> invalidate metadata debug_tbl; > {noformat} > The table will be dropped by the DROP_TABLE event and then added back by the > CREATE_TABLE event. Shown in catalogd logs: > {noformat} > I0115 16:30:15.376713 3208 JniUtil.java:177] > 02457b6d5f174d1f:3bdeee14] Finished execDdl request: DROP_TABLE > default.debug_tbl issued by quanlong. Time spent: 417ms > I0115 16:30:23.390962 3208 CatalogServiceCatalog.java:2777] > 1840bd101f78d611:22079a5a] Invalidating table metadata: > default.debug_tbl > I0115 16:30:23.404150 3208 Table.java:234] > 1840bd101f78d611:22079a5a] createEventId_ for table: > default.debug_tbl set to: -1 > I0115 16:30:23.405138 3208 JniUtil.java:177] > 1840bd101f78d611:22079a5a] Finished resetMetadata request: INVALIDATE > TABLE default.debug_tbl issued by quanlong. Time spent: 17ms > I0115 16:30:55.108006 32760 MetastoreEvents.java:637] EventId: 8668853 > EventType: DROP_TABLE Successfully removed table default.debug_tbl > I0115 16:30:55.108459 32760 MetastoreEvents.java:637] EventId: 8668855 > EventType: CREATE_TABLE Successfully added table default.debug_tbl > {noformat} > CC [~VenuReddy], [~hemanth619] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail:
[jira] [Commented] (IMPALA-12920) Support ai_generate_text built-in function for OpenAI's LLMs
[ https://issues.apache.org/jira/browse/IMPALA-12920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17855091#comment-17855091 ] ASF subversion and git services commented on IMPALA-12920: -- Commit b341e389573cc87fcfad5a8137620c6c96bb05e1 in impala's branch refs/heads/master from Michael Smith [ https://gitbox.apache.org/repos/asf?p=impala.git;h=b341e3895 ] [tools] Add .gitignore for new files Adds .gitignore for test.jceks - added with IMPALA-12920 - and hive-site-housekeeping-on (presumably added via a Hive update). Change-Id: I3d289d465fff7c81091b28cd62b9436957f8bade Reviewed-on: http://gerrit.cloudera.org:8080/21503 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Support ai_generate_text built-in function for OpenAI's LLMs > > > Key: IMPALA-12920 > URL: https://issues.apache.org/jira/browse/IMPALA-12920 > Project: IMPALA > Issue Type: Task >Reporter: Abhishek Rawat >Assignee: Abhishek Rawat >Priority: Major > Fix For: Impala 4.4.0 > > > Built in function which can help communicate with [OpenAi's chat completion > API|https://platform.openai.com/docs/api-reference/chat] endpoint through SQL. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12266) Sporadic failure after migrating a table to Iceberg
[ https://issues.apache.org/jira/browse/IMPALA-12266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17855090#comment-17855090 ] ASF subversion and git services commented on IMPALA-12266: -- Commit f98da3315e1e4744ad0e49405a4d1c7f98be85ae in impala's branch refs/heads/master from Sai Hemanth Gantasala [ https://gitbox.apache.org/repos/asf?p=impala.git;h=f98da3315 ] IMPALA-12712: Invalidate metadata on table should set better createEventId "INVALIDATE METADATA " can be used to bring up a table in Impala's catalog cache if the table exists in HMS. Currently, createEventId for such tables are always set as -1 which will lead to always removing the table. Sequence of drop table + create table + invalidate table can lead to flaky test failures like IMPALA-12266. Solution: When Invalidate metadata is fired, fetch the latest eventId from HMS and set it as createEventId for the table, so that drop table event that happend before invalidate query will be ignored without removing the table from cache. Note: Also removed an unnecessary RPC call to HMS to get table object since we alrady have required info in table metadata rpc call. Testing: - Added an end-to-end test to verify that drop table event happened before time shouldn't remove the metadata object from cache. Change-Id: Iff6ac18fe8d9e7b25cc41c7e41eecde251fbccdd Reviewed-on: http://gerrit.cloudera.org:8080/21402 Reviewed-by: Csaba Ringhofer Tested-by: Impala Public Jenkins > Sporadic failure after migrating a table to Iceberg > --- > > Key: IMPALA-12266 > URL: https://issues.apache.org/jira/browse/IMPALA-12266 > Project: IMPALA > Issue Type: Bug > Components: fe >Affects Versions: Impala 4.2.0 >Reporter: Tamas Mate >Assignee: Gabor Kaszab >Priority: Critical > Labels: impala-iceberg > Attachments: > catalogd.bd40020df22b.invalid-user.log.INFO.20230704-181939.1, > impalad.6c0f48d9ce66.invalid-user.log.INFO.20230704-181940.1 > > > TestIcebergTable.test_convert_table test failed in a recent verify job's > dockerised tests: > https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/7629 > {code:none} > E ImpalaBeeswaxException: ImpalaBeeswaxException: > EINNER EXCEPTION: > EMESSAGE: AnalysisException: Failed to load metadata for table: > 'parquet_nopartitioned' > E CAUSED BY: TableLoadingException: Could not load table > test_convert_table_cdba7383.parquet_nopartitioned from catalog > E CAUSED BY: TException: > TGetPartialCatalogObjectResponse(status:TStatus(status_code:GENERAL, > error_msgs:[NullPointerException: null]), lookup_status:OK) > {code} > {code:none} > E0704 19:09:22.980131 833 JniUtil.java:183] > 7145c21173f2c47b:2579db55] Error in Getting partial catalog object of > TABLE:test_convert_table_cdba7383.parquet_nopartitioned. Time spent: 49ms > I0704 19:09:22.980309 833 jni-util.cc:288] > 7145c21173f2c47b:2579db55] java.lang.NullPointerException > at > org.apache.impala.catalog.CatalogServiceCatalog.replaceTableIfUnchanged(CatalogServiceCatalog.java:2357) > at > org.apache.impala.catalog.CatalogServiceCatalog.getOrLoadTable(CatalogServiceCatalog.java:2300) > at > org.apache.impala.catalog.CatalogServiceCatalog.doGetPartialCatalogObject(CatalogServiceCatalog.java:3587) > at > org.apache.impala.catalog.CatalogServiceCatalog.getPartialCatalogObject(CatalogServiceCatalog.java:3513) > at > org.apache.impala.catalog.CatalogServiceCatalog.getPartialCatalogObject(CatalogServiceCatalog.java:3480) > at > org.apache.impala.service.JniCatalog.lambda$getPartialCatalogObject$11(JniCatalog.java:397) > at > org.apache.impala.service.JniCatalogOp.lambda$execAndSerialize$1(JniCatalogOp.java:90) > at org.apache.impala.service.JniCatalogOp.execOp(JniCatalogOp.java:58) > at > org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:89) > at > org.apache.impala.service.JniCatalogOp.execAndSerializeSilentStartAndFinish(JniCatalogOp.java:109) > at > org.apache.impala.service.JniCatalog.execAndSerializeSilentStartAndFinish(JniCatalog.java:238) > at > org.apache.impala.service.JniCatalog.getPartialCatalogObject(JniCatalog.java:396) > I0704 19:09:22.980324 833 status.cc:129] 7145c21173f2c47b:2579db55] > NullPointerException: null > @ 0x1012f9f impala::Status::Status() > @ 0x187f964 impala::JniUtil::GetJniExceptionMsg() > @ 0xfee920 impala::JniCall::Call<>() > @ 0xfccd0f impala::Catalog::GetPartialCatalogObject() > @ 0xfb55a5 > impala::CatalogServiceThriftIf::GetPartialCatalogObject() > @ 0xf7a691 > impala::CatalogServiceProcessorT<>::process_GetPartialCatalogObject()
[jira] [Commented] (IMPALA-13131) Azure OpenAI API expects 'api-key' instead of 'Authorization' in the request header
[ https://issues.apache.org/jira/browse/IMPALA-13131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854592#comment-17854592 ] ASF subversion and git services commented on IMPALA-13131: -- Commit 3668a9517c4d8097591ed3b6fa672bf87faa77f6 in impala's branch refs/heads/master from Abhishek Rawat [ https://gitbox.apache.org/repos/asf?p=impala.git;h=3668a9517 ] IMPALA-13131: Azure OpenAI API expects 'api-key' instead of 'Authorization' in the request header Updated the POST request when communicating with Azure Open AI endpoint. The header now includes 'api-key: ' instead of 'Authorization: Bearer '. Also, removed 'model' as a required param for the Azure Open AI api call. This is mainly because the endpoint contains deployment which is basically already mapped to a model. Testing: - Updated existing unit test as per the Azure API reference - Manually tested builtin 'ai_generate_text' using an Azure Open AI deployment. Change-Id: If9cc07940ce355d511bcf0ee615ff31042d13eb5 Reviewed-on: http://gerrit.cloudera.org:8080/21493 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Azure OpenAI API expects 'api-key' instead of 'Authorization' in the request > header > --- > > Key: IMPALA-13131 > URL: https://issues.apache.org/jira/browse/IMPALA-13131 > Project: IMPALA > Issue Type: Bug >Reporter: Abhishek Rawat >Assignee: Abhishek Rawat >Priority: Major > > As per the [API > reference|https://learn.microsoft.com/en-us/azure/ai-services/openai/reference], > the header expects API key as follows: > > {code:java} > curl > https://YOUR_RESOURCE_NAME.openai.azure.com/openai/deployments/YOUR_DEPLOYMENT_NAME/completions?api-version=2024-02-01\ > -H "Content-Type: application/json" \ > -H "api-key: YOUR_API_KEY" \ <<< API Key > -d "{ > \"prompt\": \"Once upon a time\", > \"max_tokens\": 5 > }" {code} > Impala supports API Key as follows: > > > {code:java} > curl > https://YOUR_RESOURCE_NAME.openai.azure.com/openai/deployments/YOUR_DEPLOYMENT_NAME/completions?api-version=2024-02-01\ > -H "Content-Type: application/json" \ > -H "Authorization: Bearer YOUR_API_KEY" \ API Key > -d "{ > \"prompt\": \"Once upon a time\", > \"max_tokens\": 5 > }"{code} > This causes ai functions calling Azure OpenAI endpoint to fail with 401 error: > {code:java} > { "statusCode": 401, "message": "Unauthorized. Access token is missing, > invalid, audience is incorrect (https://cognitiveservices.azure.com), or have > expired." } {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12562) CAST(ROUND(INT a/ INT b, INT d)) as STRING) may return wrong result
[ https://issues.apache.org/jira/browse/IMPALA-12562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854541#comment-17854541 ] ASF subversion and git services commented on IMPALA-12562: -- Commit 0d429462f7f61565119ee2e593867a22886d7209 in impala's branch refs/heads/master from zhangyifan27 [ https://gitbox.apache.org/repos/asf?p=impala.git;h=0d429462f ] IMPALA-12562: Cast double and float to string with exact presicion The builtin functions casttostring(DOUBLE) and casttostring(FLOAT) printed more digits when converting double and float values to string values. This patch fixes this by switching to use the existing methods DoubleToBuffer and FloatToBuffer, which are simple and fast implementations to print necessary digits. Testing: - Add end-to-end tests to verify the fixes - Add benchmarks for modified functions - Update tests in expr-test Change-Id: Icd79c55dd57dc0fa13e4ec11c2284ef2800e8b1a Reviewed-on: http://gerrit.cloudera.org:8080/21441 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > CAST(ROUND(INT a/ INT b, INT d)) as STRING) may return wrong result > --- > > Key: IMPALA-12562 > URL: https://issues.apache.org/jira/browse/IMPALA-12562 > Project: IMPALA > Issue Type: Bug >Affects Versions: Impala 4.3.0 >Reporter: YifanZhang >Priority: Major > > The following query returns a wrong result: > {code:java} > select cast(round(1/3*100, 2) as string) > +-+ > | cast(round(1 / 3, 2) as string) | > +-+ > | 0.33002 | > +-+ > Fetched 1 row(s) in 0.11s {code} > Remove the cast function and the result is expected: > {code:java} > select round(1/3,2); > +-+ > | round(1 / 3, 2) | > +-+ > | 0.33 | > +-+ {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12800) Queries with many nested inline views see performance issues with ExprSubstitutionMap
[ https://issues.apache.org/jira/browse/IMPALA-12800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854474#comment-17854474 ] ASF subversion and git services commented on IMPALA-12800: -- Commit 4681666e9386d87c647d19d6333750c16b6fa0c1 in impala's branch refs/heads/master from Michael Smith [ https://gitbox.apache.org/repos/asf?p=impala.git;h=4681666e9 ] IMPALA-12800: Add cache for isTrueWithNullSlots() evaluation isTrueWithNullSlots() can be expensive when it has to query the backend. Many of the expressions will look similar, especially in large auto-generated expressions. Adds a cache based on the nullified expression to avoid querying the backend for expressions with identical structure. With DEBUG logging enabled for the Analyzer, computes and logs stats about the null slots cache. Adds 'use_null_slots_cache' query option to disable caching. Documents the new option. Change-Id: Ib63f5553284f21f775d2097b6c5d6bbb63699acd Reviewed-on: http://gerrit.cloudera.org:8080/21484 Reviewed-by: Quanlong Huang Tested-by: Impala Public Jenkins > Queries with many nested inline views see performance issues with > ExprSubstitutionMap > - > > Key: IMPALA-12800 > URL: https://issues.apache.org/jira/browse/IMPALA-12800 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Affects Versions: Impala 4.3.0 >Reporter: Joe McDonnell >Assignee: Michael Smith >Priority: Critical > Fix For: Impala 4.5.0 > > Attachments: impala12800repro.sql, impala12800schema.sql, > long_query_jstacks.tar.gz > > > A user running a query with many layers of inline views saw a large amount of > time spent in analysis. > > {noformat} > - Authorization finished (ranger): 7s518ms (13.134ms) > - Value transfer graph computed: 7s760ms (241.953ms) > - Single node plan created: 2m47s (2m39s) > - Distributed plan created: 2m47s (7.430ms) > - Lineage info computed: 2m47s (39.017ms) > - Planning finished: 2m47s (672.518ms){noformat} > In reproducing it locally, we found that most of the stacks end up in > ExprSubstitutionMap. > > Here are the main stacks seen while running jstack every 3 seconds during a > 75 second execution: > Location 1: (ExprSubstitutionMap::compose -> contains -> indexOf -> Expr > equals) (4 samples) > {noformat} > java.lang.Thread.State: RUNNABLE > at org.apache.impala.analysis.Expr.equals(Expr.java:1008) > at java.util.ArrayList.indexOf(ArrayList.java:323) > at java.util.ArrayList.contains(ArrayList.java:306) > at > org.apache.impala.analysis.ExprSubstitutionMap.compose(ExprSubstitutionMap.java:120){noformat} > Location 2: (ExprSubstitutionMap::compose -> verify -> Expr equals) (9 > samples) > {noformat} > java.lang.Thread.State: RUNNABLE > at org.apache.impala.analysis.Expr.equals(Expr.java:1008) > at > org.apache.impala.analysis.ExprSubstitutionMap.verify(ExprSubstitutionMap.java:173) > at > org.apache.impala.analysis.ExprSubstitutionMap.compose(ExprSubstitutionMap.java:126){noformat} > Location 3: (ExprSubstitutionMap::combine -> verify -> Expr equals) (5 > samples) > {noformat} > java.lang.Thread.State: RUNNABLE > at org.apache.impala.analysis.Expr.equals(Expr.java:1008) > at > org.apache.impala.analysis.ExprSubstitutionMap.verify(ExprSubstitutionMap.java:173) > at > org.apache.impala.analysis.ExprSubstitutionMap.combine(ExprSubstitutionMap.java:143){noformat} > Location 4: (TupleIsNullPredicate.wrapExprs -> Analyzer.isTrueWithNullSlots > -> FeSupport.EvalPredicate -> Thrift serialization) (4 samples) > {noformat} > java.lang.Thread.State: RUNNABLE > at java.lang.StringCoding.encode(StringCoding.java:364) > at java.lang.String.getBytes(String.java:941) > at > org.apache.thrift.protocol.TBinaryProtocol.writeString(TBinaryProtocol.java:227) > at > org.apache.impala.thrift.TClientRequest$TClientRequestStandardScheme.write(TClientRequest.java:532) > at > org.apache.impala.thrift.TClientRequest$TClientRequestStandardScheme.write(TClientRequest.java:467) > at org.apache.impala.thrift.TClientRequest.write(TClientRequest.java:394) > at > org.apache.impala.thrift.TQueryCtx$TQueryCtxStandardScheme.write(TQueryCtx.java:3034) > at > org.apache.impala.thrift.TQueryCtx$TQueryCtxStandardScheme.write(TQueryCtx.java:2709) > at org.apache.impala.thrift.TQueryCtx.write(TQueryCtx.java:2400) > at org.apache.thrift.TSerializer.serialize(TSerializer.java:84) > at > org.apache.impala.service.FeSupport.EvalExprWithoutRowBounded(FeSupport.java:206) > at > org.apache.impala.service.FeSupport.EvalExprWithoutRow(FeSupport.java:194) > at org.apache.impala.service.FeSupport.EvalPredicate(FeSupport.java:275) >
[jira] [Commented] (IMPALA-12800) Queries with many nested inline views see performance issues with ExprSubstitutionMap
[ https://issues.apache.org/jira/browse/IMPALA-12800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853919#comment-17853919 ] ASF subversion and git services commented on IMPALA-12800: -- Commit 800246add5fcb20c34a767870346f6ce255e41f9 in impala's branch refs/heads/master from Michael Smith [ https://gitbox.apache.org/repos/asf?p=impala.git;h=800246add ] IMPALA-12800: Use HashMap for ExprSubstitutionMap lookups Adds a HashMap to ExprSubstitutionMap to speed lookups while retaining lists for correct ordering (ordering needs to match to SlotRef order). Ignores duplicate inserts, preserving the old behavior that only the first match would actually be usable; duplicates primarily show up as a result of combining duplicate distinct and aggregate expressions, or redundant nested aggregation (like the tests for IMPALA-10182). Implements localHash and hashCode for Expr and related classes. Avoids deep-cloning LHS Exprs in ExprSubstitutionMap as they're used for lookup and not expected to be mutated. Adds the many expressions test, which now runs in a handful of seconds. Change-Id: Ic538a82c69ee1dd76981fbacf95289c9d00ea9fe Reviewed-on: http://gerrit.cloudera.org:8080/21483 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Queries with many nested inline views see performance issues with > ExprSubstitutionMap > - > > Key: IMPALA-12800 > URL: https://issues.apache.org/jira/browse/IMPALA-12800 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Affects Versions: Impala 4.3.0 >Reporter: Joe McDonnell >Assignee: Michael Smith >Priority: Critical > Attachments: impala12800repro.sql, impala12800schema.sql, > long_query_jstacks.tar.gz > > > A user running a query with many layers of inline views saw a large amount of > time spent in analysis. > > {noformat} > - Authorization finished (ranger): 7s518ms (13.134ms) > - Value transfer graph computed: 7s760ms (241.953ms) > - Single node plan created: 2m47s (2m39s) > - Distributed plan created: 2m47s (7.430ms) > - Lineage info computed: 2m47s (39.017ms) > - Planning finished: 2m47s (672.518ms){noformat} > In reproducing it locally, we found that most of the stacks end up in > ExprSubstitutionMap. > > Here are the main stacks seen while running jstack every 3 seconds during a > 75 second execution: > Location 1: (ExprSubstitutionMap::compose -> contains -> indexOf -> Expr > equals) (4 samples) > {noformat} > java.lang.Thread.State: RUNNABLE > at org.apache.impala.analysis.Expr.equals(Expr.java:1008) > at java.util.ArrayList.indexOf(ArrayList.java:323) > at java.util.ArrayList.contains(ArrayList.java:306) > at > org.apache.impala.analysis.ExprSubstitutionMap.compose(ExprSubstitutionMap.java:120){noformat} > Location 2: (ExprSubstitutionMap::compose -> verify -> Expr equals) (9 > samples) > {noformat} > java.lang.Thread.State: RUNNABLE > at org.apache.impala.analysis.Expr.equals(Expr.java:1008) > at > org.apache.impala.analysis.ExprSubstitutionMap.verify(ExprSubstitutionMap.java:173) > at > org.apache.impala.analysis.ExprSubstitutionMap.compose(ExprSubstitutionMap.java:126){noformat} > Location 3: (ExprSubstitutionMap::combine -> verify -> Expr equals) (5 > samples) > {noformat} > java.lang.Thread.State: RUNNABLE > at org.apache.impala.analysis.Expr.equals(Expr.java:1008) > at > org.apache.impala.analysis.ExprSubstitutionMap.verify(ExprSubstitutionMap.java:173) > at > org.apache.impala.analysis.ExprSubstitutionMap.combine(ExprSubstitutionMap.java:143){noformat} > Location 4: (TupleIsNullPredicate.wrapExprs -> Analyzer.isTrueWithNullSlots > -> FeSupport.EvalPredicate -> Thrift serialization) (4 samples) > {noformat} > java.lang.Thread.State: RUNNABLE > at java.lang.StringCoding.encode(StringCoding.java:364) > at java.lang.String.getBytes(String.java:941) > at > org.apache.thrift.protocol.TBinaryProtocol.writeString(TBinaryProtocol.java:227) > at > org.apache.impala.thrift.TClientRequest$TClientRequestStandardScheme.write(TClientRequest.java:532) > at > org.apache.impala.thrift.TClientRequest$TClientRequestStandardScheme.write(TClientRequest.java:467) > at org.apache.impala.thrift.TClientRequest.write(TClientRequest.java:394) > at > org.apache.impala.thrift.TQueryCtx$TQueryCtxStandardScheme.write(TQueryCtx.java:3034) > at > org.apache.impala.thrift.TQueryCtx$TQueryCtxStandardScheme.write(TQueryCtx.java:2709) > at org.apache.impala.thrift.TQueryCtx.write(TQueryCtx.java:2400) > at org.apache.thrift.TSerializer.serialize(TSerializer.java:84) > at > org.apache.impala.service.FeSupport.EvalExprWithoutRowBounded(FeSupport.java:206) > at
[jira] [Commented] (IMPALA-13151) DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on ARM
[ https://issues.apache.org/jira/browse/IMPALA-13151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853921#comment-17853921 ] ASF subversion and git services commented on IMPALA-13151: -- Commit cce6b349f1103c167e2e9ef49fa181ede301b94f in impala's branch refs/heads/master from Michael Smith [ https://gitbox.apache.org/repos/asf?p=impala.git;h=cce6b349f ] IMPALA-13151: Use MonotonicNanos to track test time Uses MonotonicNanos to track test time rather than MonotonicStopWatch. IMPALA-2407 updated MonotonicStopWatch to use a low-precision implementation for performance, which on ARM in particular sometimes results in undercounting time by a few microseconds. That's enough to cause a failure in DataStreamTestSlowServiceQueue.TestPrioritizeEos. Also uses SleepForMs and NANOS_PER_SEC rather than Kudu versions to better match Impala code base. Reproduced on ARM and tested the new implementation for several dozen runs without failure. Change-Id: I9beb63669c5bdd910e5f713ecd42551841e95400 Reviewed-on: http://gerrit.cloudera.org:8080/21497 Reviewed-by: Riza Suminto Tested-by: Impala Public Jenkins > DataStreamTestSlowServiceQueue.TestPrioritizeEos fails on ARM > - > > Key: IMPALA-13151 > URL: https://issues.apache.org/jira/browse/IMPALA-13151 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.5.0 >Reporter: Joe McDonnell >Assignee: Michael Smith >Priority: Critical > Labels: broken-build > > The recently introduced DataStreamTestSlowServiceQueue.TestPrioritizeEos is > failing with errors like this: > {noformat} > /data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/be/src/runtime/data-stream-test.cc:912 > Expected: (timer.ElapsedTime()) > (3 * MonoTime::kNanosecondsPerSecond), > actual: 269834 vs 30{noformat} > So far, I only see failures on ARM jobs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-2407) Nested Types : Remove calls to clock_gettime for a 9x performance improvement on EC2
[ https://issues.apache.org/jira/browse/IMPALA-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853922#comment-17853922 ] ASF subversion and git services commented on IMPALA-2407: - Commit cce6b349f1103c167e2e9ef49fa181ede301b94f in impala's branch refs/heads/master from Michael Smith [ https://gitbox.apache.org/repos/asf?p=impala.git;h=cce6b349f ] IMPALA-13151: Use MonotonicNanos to track test time Uses MonotonicNanos to track test time rather than MonotonicStopWatch. IMPALA-2407 updated MonotonicStopWatch to use a low-precision implementation for performance, which on ARM in particular sometimes results in undercounting time by a few microseconds. That's enough to cause a failure in DataStreamTestSlowServiceQueue.TestPrioritizeEos. Also uses SleepForMs and NANOS_PER_SEC rather than Kudu versions to better match Impala code base. Reproduced on ARM and tested the new implementation for several dozen runs without failure. Change-Id: I9beb63669c5bdd910e5f713ecd42551841e95400 Reviewed-on: http://gerrit.cloudera.org:8080/21497 Reviewed-by: Riza Suminto Tested-by: Impala Public Jenkins > Nested Types : Remove calls to clock_gettime for a 9x performance improvement > on EC2 > > > Key: IMPALA-2407 > URL: https://issues.apache.org/jira/browse/IMPALA-2407 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 2.3.0 >Reporter: Mostafa Mokhtar >Assignee: Jim Apple >Priority: Critical > Labels: ec2, performance, ramp-up > Fix For: Impala 2.5.0 > > Attachments: q12Nested.tar.gz > > > Queries against Nested types show that ~90% of the time is spent in > clock_gettime. > A cheaper accounting method can speed up Nested queries by 8-9x > {code} > select > count(*) > from > customer.orders_string o, > o.lineitems_string l > where > l_shipmode in ('MAIL', 'SHIP') > and l_commitdate < l_receiptdate > and l_shipdate < l_commitdate > and l_receiptdate >= '1994-01-01' > and l_receiptdate < '1995-01-01' > group by > l_shipmode > order by > l_shipmode > {code} > Schema > +---+--+-+ > > > | name | type | comment | > > > +---+--+-+ > > > | c_custkey | bigint | | > > > | c_name| string | | > > > | c_address | string | | > > > | c_nationkey | bigint | | > | c_phone | string | | > | c_acctbal | double | | > | c_mktsegment | string | | > | c_comment | string | | > | orders_string | array | | o_orderkey:bigint, | | > | | o_orderstatus:string, | | > | | o_totalprice:double, | | > | | o_orderdate:string,| | > | | o_orderpriority:string,| | > | | o_clerk:string,| | > | | o_shippriority:bigint, | | > | | o_comment:string, | | > | | lineitems_string:array | | l_partkey:bigint,| | > | | l_suppkey:bigint,| | > | | l_linenumber:bigint, | | > | | l_quantity:double, | | > | | l_extendedprice:double, | | > | | l_discount:double, | | > | | l_tax:double,| | > | | l_returnflag:string, | | > | | l_linestatus:string, | | > | | l_shipdate:string, | | > | | l_commitdate:string, |
[jira] [Commented] (IMPALA-10182) Rows with NULLs filtered out with duplicate columns in subquery select inside UNION ALL
[ https://issues.apache.org/jira/browse/IMPALA-10182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853920#comment-17853920 ] ASF subversion and git services commented on IMPALA-10182: -- Commit 800246add5fcb20c34a767870346f6ce255e41f9 in impala's branch refs/heads/master from Michael Smith [ https://gitbox.apache.org/repos/asf?p=impala.git;h=800246add ] IMPALA-12800: Use HashMap for ExprSubstitutionMap lookups Adds a HashMap to ExprSubstitutionMap to speed lookups while retaining lists for correct ordering (ordering needs to match to SlotRef order). Ignores duplicate inserts, preserving the old behavior that only the first match would actually be usable; duplicates primarily show up as a result of combining duplicate distinct and aggregate expressions, or redundant nested aggregation (like the tests for IMPALA-10182). Implements localHash and hashCode for Expr and related classes. Avoids deep-cloning LHS Exprs in ExprSubstitutionMap as they're used for lookup and not expected to be mutated. Adds the many expressions test, which now runs in a handful of seconds. Change-Id: Ic538a82c69ee1dd76981fbacf95289c9d00ea9fe Reviewed-on: http://gerrit.cloudera.org:8080/21483 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Rows with NULLs filtered out with duplicate columns in subquery select inside > UNION ALL > --- > > Key: IMPALA-10182 > URL: https://issues.apache.org/jira/browse/IMPALA-10182 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Reporter: Tim Armstrong >Assignee: Aman Sinha >Priority: Blocker > Labels: correctness > Fix For: Impala 4.0.0 > > > Bug report from here - > https://community.cloudera.com/t5/Support-Questions/quot-union-all-quot-dropping-records-with-all-null-empty/m-p/303153#M221415 > Repro: > {noformat} > create database if not exists as_adventure; > use as_adventure; > CREATE tABLE IF NOT EXISTS > as_adventure.t1 > ( > productsubcategorykey INT, > productline STRING); > insert into t1 values (1,'l1'); > insert into t1 values (2,'l1'); > insert into t1 values (1,'l2'); > insert into t1 values (3,'l3'); > insert into t1 values (null,''); > select * from t1; > SELECT > MIN(t_53.c_41) c_41, > CAST(NULL AS DOUBLE) c_43, > CAST(NULL AS BIGINT) c_44, > t_53.c2 c2, > t_53.c3s0c3s0, > t_53.c4 c4, > t_53.c5s0c5s0 > FROM > ( SELECT > t.productsubcategorykey c_41, > t.productline c2, > t.productline c3s0, > t.productsubcategorykey c4, > t.productsubcategorykey c5s0 > FROM > as_adventure.t1 t > WHERE > true > GROUP BY > 2, > 3, > 4, > 5 ) t_53 > GROUP BY > 4, > 5, > 6, > 7 > > UNION ALL > SELECT > MIN(t_53.c_41) c_41, > CAST(NULL AS DOUBLE) c_43, > CAST(NULL AS BIGINT) c_44, > t_53.c2 c2, > t_53.c3s0c3s0, > t_53.c4 c4, > t_53.c5s0c5s0 > FROM > ( SELECT > t.productsubcategorykey c_41, > t.productline c2, > t.productline c3s0, > t.productsubcategorykey c4, > t.productsubcategorykey c5s0 > FROM > as_adventure.t1 t > WHERE > true > GROUP BY > 2, > 3, > 4, > 5 ) t_53 > GROUP BY > 4, > 5, > 6, > 7 > {noformat} > Somewhat similar to IMPALA-7957 in that the inferred predicates from the > column equivalences get placed in a Select node. It's a bit different in that > the NULLs that are filtered out from the predicates come from the base table. > {noformat} > ++ > | Explain String >| > ++ > | Max Per-Host Resource Reservation: Memory=136.02MB Threads=6 >| > | Per-Host Resource Estimates: Memory=576MB >| > | WARNING: The following tables are missing relevant table and/or column > statistics. | > | as_adventure.t1 >| > | >| > | PLAN-ROOT SINK
[jira] [Commented] (IMPALA-11871) INSERT statement does not respect Ranger policies for HDFS
[ https://issues.apache.org/jira/browse/IMPALA-11871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853923#comment-17853923 ] ASF subversion and git services commented on IMPALA-11871: -- Commit f7e629935b77f412bf74aeebd704af88f03de351 in impala's branch refs/heads/master from halim.kim [ https://gitbox.apache.org/repos/asf?p=impala.git;h=f7e629935 ] IMPALA-11871: Skip permissions loading and check on HDFS if Ranger is enabled Before this patch, Impala checked whether the Impala service user had the WRITE access to the target HDFS table/partition(s) during the analysis of the INSERT and LOAD DATA statements in the legacy catalog mode. The access levels of the corresponding HDFS table and partitions were computed by the catalog server solely based on the HDFS permissions and ACLs when the table and partitions were instantiated. After this patch, we skip loading HDFS permissions and assume the Impala service user has the READ_WRITE permission on all the HDFS paths associated with the target table during query analysis when Ranger is enabled. The assumption could be removed after Impala's implementation of FsPermissionChecker could additionally take Ranger's policies of HDFS into consideration when performing the check. Testing: - Added end-to-end tests to verify Impala's behavior with respect to the INSERT and LOAD DATA statements when Ranger is enabled in the legacy catalog mode. Change-Id: Id33c400fbe0c918b6b65d713b09009512835a4c9 Reviewed-on: http://gerrit.cloudera.org:8080/20221 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > INSERT statement does not respect Ranger policies for HDFS > -- > > Key: IMPALA-11871 > URL: https://issues.apache.org/jira/browse/IMPALA-11871 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > > In a cluster with Ranger auth (and with legacy catalog mode), even if you > provide RWX to cm_hdfs -> all-path for the user impala, inserting into a > table whose HDFS POSIX permissions happen to exclude impala access will > result in an > {noformat} > "AnalysisException: Unable to INSERT into target table (default.t1) because > Impala does not have WRITE access to HDFS location: > hdfs://nightly-71x-vx-2.nightly-71x-vx.root.hwx.site:8020/warehouse/tablespace/external/hive/t1"{noformat} > > {noformat} > [root@nightly-71x-vx-3 ~]# hdfs dfs -getfacl > /warehouse/tablespace/external/hive/t1 > file: /warehouse/tablespace/external/hive/t1 > owner: hive > group: supergroup > user::rwx > user:impala:rwx #effective:r-x > group::rwx #effective:r-x > mask::r-x > other::--- > default:user::rwx > default:user:impala:rwx > default:group::rwx > default:mask::rwx > default:other::--- {noformat} > ~~ > ANALYSIS > Stack trace from a version of Cloudera's distribution of Impala (impalad > version 3.4.0-SNAPSHOT RELEASE (build > {*}db20b59a093c17ea4699117155d58fe874f7d68f{*})): > {noformat} > at > org.apache.impala.catalog.FeFsTable$Utils.checkWriteAccess(FeFsTable.java:585) > at > org.apache.impala.analysis.InsertStmt.analyzeWriteAccess(InsertStmt.java:545) > at org.apache.impala.analysis.InsertStmt.analyze(InsertStmt.java:391) > at > org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:463) > at > org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:426) > at org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:1570) > at org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1536) > at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1506) > at > org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:155){noformat} > The exception occurs at analysis time, so I tested and succeeded in writing > directly into the said directory. > {noformat} > [root@nightly-71x-vx-3 ~]# hdfs dfs -touchz > /warehouse/tablespace/external/hive/t1/test > [root@nightly-71x-vx-3 ~]# hdfs dfs -ls > /warehouse/tablespace/external/hive/t1/ > Found 8 items > rw-rw---+ 3 hive supergroup 417 2023-01-27 17:37 > /warehouse/tablespace/external/hive/t1/00_0 > rw-rw---+ 3 hive supergroup 417 2023-01-27 17:44 > /warehouse/tablespace/external/hive/t1/00_0_copy_1 > rw-rw---+ 3 hive supergroup 417 2023-01-27 17:49 > /warehouse/tablespace/external/hive/t1/00_0_copy_2 > rw-rw---+ 3 hive supergroup 417 2023-01-27 17:53 > /warehouse/tablespace/external/hive/t1/00_0_copy_3 > rw-rw---+ 3 impala hive 355 2023-01-27 17:17 > /warehouse/tablespace/external/hive/t1/4c4477c12c51ad96-3126b52d_2029811630_data.0.parq > rw-rw---+ 3 impala hive 355 2023-01-27 17:39 >
[jira] [Commented] (IMPALA-13146) Javascript tests sometimes fail to download NodeJS
[ https://issues.apache.org/jira/browse/IMPALA-13146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853729#comment-17853729 ] ASF subversion and git services commented on IMPALA-13146: -- Commit e7dac008bbafb20e4c7d15d46f2bac9a757f in impala's branch refs/heads/master from Joe McDonnell [ https://gitbox.apache.org/repos/asf?p=impala.git;h=e7dac008b ] IMPALA-13146: Download NodeJS from native toolchain Some test runs have had issues downloading the NodeJS tarball from the nodejs servers. This changes the test to download from our native toolchain to make this more reliable. This means that future upgrades to NodeJS will need to upload new tarballs to the native toolchain. Testing: - Ran x86_64/ARM javascript tests Change-Id: I1def801469cb68633e89b4a0f3c07a771febe599 Reviewed-on: http://gerrit.cloudera.org:8080/21494 Tested-by: Impala Public Jenkins Reviewed-by: Surya Hebbar Reviewed-by: Wenzhe Zhou > Javascript tests sometimes fail to download NodeJS > -- > > Key: IMPALA-13146 > URL: https://issues.apache.org/jira/browse/IMPALA-13146 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 4.5.0 >Reporter: Joe McDonnell >Assignee: Joe McDonnell >Priority: Critical > Labels: broken-build, flaky > > For automated tests, sometimes the Javascript tests fail to download NodeJS: > {noformat} > 01:37:16 Fetching NodeJS v16.20.2-linux-x64 binaries ... > 01:37:16 % Total% Received % Xferd Average Speed TimeTime > Time Current > 01:37:16 Dload Upload Total Spent > Left Speed > 01:37:16 > 0 00 00 0 0 0 --:--:-- --:--:-- --:--:-- 0 > 0 00 00 0 0 0 --:--:-- 0:00:01 --:--:-- 0 > 0 00 00 0 0 0 --:--:-- 0:00:02 --:--:-- 0 > 0 21.5M0 9020 0293 0 21:23:04 0:00:03 21:23:01 293 > ... > 30 21.5M 30 6776k 0 0 50307 0 0:07:28 0:02:17 0:05:11 23826 > 01:39:34 curl: (18) transfer closed with 15617860 bytes remaining to > read{noformat} > If this keeps happening, we should mirror the NodeJS binary on the > native-toolchain s3 bucket. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13143) TestCatalogdHA.test_catalogd_failover_with_sync_ddl times out expecting query failure
[ https://issues.apache.org/jira/browse/IMPALA-13143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853328#comment-17853328 ] ASF subversion and git services commented on IMPALA-13143: -- Commit bafd1903069163f38812d7fa42f9c4d2f7218fcf in impala's branch refs/heads/master from wzhou-code [ https://gitbox.apache.org/repos/asf?p=impala.git;h=bafd19030 ] IMPALA-13143: Fix flaky test_catalogd_failover_with_sync_ddl The test_catalogd_failover_with_sync_ddl test which was added to custom_cluster/test_catalogd_ha.py in IMPALA-13134 failed on s3. The test relies on specific timing with a sleep injected via a debug action so that the DDL query is still running when catalogd failover is triggered. The failures were caused by slowly restarting for catalogd on s3 so that the query finished before catalogd failover was triggered. This patch fixed the issue by increasing the sleep time for s3 builds and other slow builds. Testing: - Ran the test 100 times in a loop on s3. Change-Id: I15bb6aae23a2f544067f993533e322969372ebd5 Reviewed-on: http://gerrit.cloudera.org:8080/21491 Reviewed-by: Riza Suminto Tested-by: Impala Public Jenkins > TestCatalogdHA.test_catalogd_failover_with_sync_ddl times out expecting query > failure > - > > Key: IMPALA-13143 > URL: https://issues.apache.org/jira/browse/IMPALA-13143 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.5.0 >Reporter: Joe McDonnell >Assignee: Wenzhe Zhou >Priority: Critical > Labels: broken-build, flaky > > The new TestCatalogdHA.test_catalogd_failover_with_sync_ddl test is failing > intermittently with: > {noformat} > custom_cluster/test_catalogd_ha.py:472: in > test_catalogd_failover_with_sync_ddl > self.wait_for_state(handle, QueryState.EXCEPTION, 30, client=client) > common/impala_test_suite.py:1216: in wait_for_state > self.wait_for_any_state(handle, [expected_state], timeout, client) > common/impala_test_suite.py:1234: in wait_for_any_state > raise Timeout(timeout_msg) > E Timeout: query '9d49ab6360f6cbc5:4826a796' did not reach one of > the expected states [5], last known state 4{noformat} > This means the query succeeded even though we expected it to fail. This is > currently limited to s3 jobs. In a different test, we saw issues because s3 > is slower (see IMPALA-12616). > This test was introduced by IMPALA-13134: > https://github.com/apache/impala/commit/70b7b6a78d49c30933d79e0a1c2a725f7e0a3e50 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13134) DDL hang with SYNC_DDL enabled when Catalogd is changed to standby status
[ https://issues.apache.org/jira/browse/IMPALA-13134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853329#comment-17853329 ] ASF subversion and git services commented on IMPALA-13134: -- Commit bafd1903069163f38812d7fa42f9c4d2f7218fcf in impala's branch refs/heads/master from wzhou-code [ https://gitbox.apache.org/repos/asf?p=impala.git;h=bafd19030 ] IMPALA-13143: Fix flaky test_catalogd_failover_with_sync_ddl The test_catalogd_failover_with_sync_ddl test which was added to custom_cluster/test_catalogd_ha.py in IMPALA-13134 failed on s3. The test relies on specific timing with a sleep injected via a debug action so that the DDL query is still running when catalogd failover is triggered. The failures were caused by slowly restarting for catalogd on s3 so that the query finished before catalogd failover was triggered. This patch fixed the issue by increasing the sleep time for s3 builds and other slow builds. Testing: - Ran the test 100 times in a loop on s3. Change-Id: I15bb6aae23a2f544067f993533e322969372ebd5 Reviewed-on: http://gerrit.cloudera.org:8080/21491 Reviewed-by: Riza Suminto Tested-by: Impala Public Jenkins > DDL hang with SYNC_DDL enabled when Catalogd is changed to standby status > - > > Key: IMPALA-13134 > URL: https://issues.apache.org/jira/browse/IMPALA-13134 > Project: IMPALA > Issue Type: Bug > Components: Backend, Catalog >Reporter: Wenzhe Zhou >Assignee: Wenzhe Zhou >Priority: Major > Fix For: Impala 4.5.0 > > > Catalogd waits for SYNC_DDL version when it processes a DDL with SYNC_DDL > enabled. If the status of Catalogd is changed from active to standby when > CatalogServiceCatalog.waitForSyncDdlVersion() is called, the standby catalogd > does not receive catalog topic updates from statestore. This causes catalogd > thread waits indefinitely and DDL query hanging. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13096) Cleanup Parser.jj for Calcite planner to only use supported syntax
[ https://issues.apache.org/jira/browse/IMPALA-13096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853235#comment-17853235 ] ASF subversion and git services commented on IMPALA-13096: -- Commit 141f38197be2ca23757cb8b3f283cdb9dd62de47 in impala's branch refs/heads/master from Steve Carlin [ https://gitbox.apache.org/repos/asf?p=impala.git;h=141f38197 ] IMPALA-12935: First pass on Calcite planner functions This commit handles the first pass on getting functions to work through the Calcite planner. Only basic functions will work with this commit. Implicit conversions for parameters are not yet supported. Custom UDFs are also not supported yet. The ImpalaOperatorTable is used at validation time to check for existence of the function name for Impala. At first, it will check Calcite operators for the existence of the function name (A TODO, IMPALA-13096, is that we need to remove non-supported names from the parser file). It is preferable to use the Calcite Operator since Calcite does some optimizations based on the Calcite Operator class. If the name is not found within the Calcite Operators, a check is done within the BuiltinsDb (TODO: IMPALA-13095 handle UDFs) for the function. If found, and SqlOperator class is generated on the fly to handle this function. The validation process for Calcite includes a call into the operator method "inferReturnType". This method will validate that there exists a function that will handle the operands, and if so, return the "return type" of the function. In this commit, we will assume that the Calcite operators will match Impala functionality. In later commits, there will be overrides where we will use Impala validation for operators where Calcite's validation isn't good enough. After validation is complete, the functions will be in a Calcite format. After the rest of compilation (relnode conversion, optimization) is complete, the function needs to be converted back into Impala form (the Expr object) to eventually get it into its thrift request. In this commit, all functions are converted into Expr starting in the ImpalaProjectRel, since this is the RelNode where functions do their thing. The RexCallConverter and RexLiteralConverter get called via the CreateExprVisitor for this conversion. Since Calcite is providing the analysis portion of the planning, there is no need to go through Impala's Analyzer object. However, the Impala planner requires Expr objects to be analyzed. To get around this, the AnalyzedFunctionCallExpr and AnalyzedNullLiteral objects exist which analyze the expression in the constructor. While this could potentially be combined with the existing FunctionCallExpr and NullLiteral objects, this fits in with the general plan to avoid changing "fe" Impala code as much as we can until much later in the commit cycle. Also, there will be other Analyzed*Expr classes created in the future, but this commit is intended for basic function call expressions only. One minor change to the parser is added with this commit. Calcite parser does not have acknowledge the "string" datatype, so this has been added here in Parser.jj and config.fmpp. Change-Id: I2dd4e402d69ee10547abeeafe893164ffd789b88 Reviewed-on: http://gerrit.cloudera.org:8080/21357 Reviewed-by: Michael Smith Tested-by: Impala Public Jenkins > Cleanup Parser.jj for Calcite planner to only use supported syntax > -- > > Key: IMPALA-13096 > URL: https://issues.apache.org/jira/browse/IMPALA-13096 > Project: IMPALA > Issue Type: Sub-task >Reporter: Steve Carlin >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13095) Handle UDFs in Calcite planner
[ https://issues.apache.org/jira/browse/IMPALA-13095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853236#comment-17853236 ] ASF subversion and git services commented on IMPALA-13095: -- Commit 141f38197be2ca23757cb8b3f283cdb9dd62de47 in impala's branch refs/heads/master from Steve Carlin [ https://gitbox.apache.org/repos/asf?p=impala.git;h=141f38197 ] IMPALA-12935: First pass on Calcite planner functions This commit handles the first pass on getting functions to work through the Calcite planner. Only basic functions will work with this commit. Implicit conversions for parameters are not yet supported. Custom UDFs are also not supported yet. The ImpalaOperatorTable is used at validation time to check for existence of the function name for Impala. At first, it will check Calcite operators for the existence of the function name (A TODO, IMPALA-13096, is that we need to remove non-supported names from the parser file). It is preferable to use the Calcite Operator since Calcite does some optimizations based on the Calcite Operator class. If the name is not found within the Calcite Operators, a check is done within the BuiltinsDb (TODO: IMPALA-13095 handle UDFs) for the function. If found, and SqlOperator class is generated on the fly to handle this function. The validation process for Calcite includes a call into the operator method "inferReturnType". This method will validate that there exists a function that will handle the operands, and if so, return the "return type" of the function. In this commit, we will assume that the Calcite operators will match Impala functionality. In later commits, there will be overrides where we will use Impala validation for operators where Calcite's validation isn't good enough. After validation is complete, the functions will be in a Calcite format. After the rest of compilation (relnode conversion, optimization) is complete, the function needs to be converted back into Impala form (the Expr object) to eventually get it into its thrift request. In this commit, all functions are converted into Expr starting in the ImpalaProjectRel, since this is the RelNode where functions do their thing. The RexCallConverter and RexLiteralConverter get called via the CreateExprVisitor for this conversion. Since Calcite is providing the analysis portion of the planning, there is no need to go through Impala's Analyzer object. However, the Impala planner requires Expr objects to be analyzed. To get around this, the AnalyzedFunctionCallExpr and AnalyzedNullLiteral objects exist which analyze the expression in the constructor. While this could potentially be combined with the existing FunctionCallExpr and NullLiteral objects, this fits in with the general plan to avoid changing "fe" Impala code as much as we can until much later in the commit cycle. Also, there will be other Analyzed*Expr classes created in the future, but this commit is intended for basic function call expressions only. One minor change to the parser is added with this commit. Calcite parser does not have acknowledge the "string" datatype, so this has been added here in Parser.jj and config.fmpp. Change-Id: I2dd4e402d69ee10547abeeafe893164ffd789b88 Reviewed-on: http://gerrit.cloudera.org:8080/21357 Reviewed-by: Michael Smith Tested-by: Impala Public Jenkins > Handle UDFs in Calcite planner > -- > > Key: IMPALA-13095 > URL: https://issues.apache.org/jira/browse/IMPALA-13095 > Project: IMPALA > Issue Type: Sub-task >Reporter: Steve Carlin >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12935) Allow function parsing for Impala Calcite planner
[ https://issues.apache.org/jira/browse/IMPALA-12935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853234#comment-17853234 ] ASF subversion and git services commented on IMPALA-12935: -- Commit 141f38197be2ca23757cb8b3f283cdb9dd62de47 in impala's branch refs/heads/master from Steve Carlin [ https://gitbox.apache.org/repos/asf?p=impala.git;h=141f38197 ] IMPALA-12935: First pass on Calcite planner functions This commit handles the first pass on getting functions to work through the Calcite planner. Only basic functions will work with this commit. Implicit conversions for parameters are not yet supported. Custom UDFs are also not supported yet. The ImpalaOperatorTable is used at validation time to check for existence of the function name for Impala. At first, it will check Calcite operators for the existence of the function name (A TODO, IMPALA-13096, is that we need to remove non-supported names from the parser file). It is preferable to use the Calcite Operator since Calcite does some optimizations based on the Calcite Operator class. If the name is not found within the Calcite Operators, a check is done within the BuiltinsDb (TODO: IMPALA-13095 handle UDFs) for the function. If found, and SqlOperator class is generated on the fly to handle this function. The validation process for Calcite includes a call into the operator method "inferReturnType". This method will validate that there exists a function that will handle the operands, and if so, return the "return type" of the function. In this commit, we will assume that the Calcite operators will match Impala functionality. In later commits, there will be overrides where we will use Impala validation for operators where Calcite's validation isn't good enough. After validation is complete, the functions will be in a Calcite format. After the rest of compilation (relnode conversion, optimization) is complete, the function needs to be converted back into Impala form (the Expr object) to eventually get it into its thrift request. In this commit, all functions are converted into Expr starting in the ImpalaProjectRel, since this is the RelNode where functions do their thing. The RexCallConverter and RexLiteralConverter get called via the CreateExprVisitor for this conversion. Since Calcite is providing the analysis portion of the planning, there is no need to go through Impala's Analyzer object. However, the Impala planner requires Expr objects to be analyzed. To get around this, the AnalyzedFunctionCallExpr and AnalyzedNullLiteral objects exist which analyze the expression in the constructor. While this could potentially be combined with the existing FunctionCallExpr and NullLiteral objects, this fits in with the general plan to avoid changing "fe" Impala code as much as we can until much later in the commit cycle. Also, there will be other Analyzed*Expr classes created in the future, but this commit is intended for basic function call expressions only. One minor change to the parser is added with this commit. Calcite parser does not have acknowledge the "string" datatype, so this has been added here in Parser.jj and config.fmpp. Change-Id: I2dd4e402d69ee10547abeeafe893164ffd789b88 Reviewed-on: http://gerrit.cloudera.org:8080/21357 Reviewed-by: Michael Smith Tested-by: Impala Public Jenkins > Allow function parsing for Impala Calcite planner > - > > Key: IMPALA-12935 > URL: https://issues.apache.org/jira/browse/IMPALA-12935 > Project: IMPALA > Issue Type: Sub-task >Reporter: Steve Carlin >Priority: Major > > We need the ability to parse and validate Impala functions using the Calcite > planner > This commit is not attended to work for all functions, or even most > functions. It will work as a base to be reviewed, and at least some > functions will work. More complicated functions will be added in a later > commit. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12616) test_restart_catalogd_while_handling_rpc_response* tests fail not reaching expected states
[ https://issues.apache.org/jira/browse/IMPALA-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853196#comment-17853196 ] ASF subversion and git services commented on IMPALA-12616: -- Commit 1935f9e1a199c958c5fb12ad53277fa720d6ae5c in impala's branch refs/heads/master from Joe McDonnell [ https://gitbox.apache.org/repos/asf?p=impala.git;h=1935f9e1a ] IMPALA-12616: Fix test_restart_services.py::TestRestart tests for S3 The test_restart_catalogd_while_handling_rpc_response* tests from custom_cluster/test_restart_services.py have been failing consistently on s3. The alter table statement is expected to succeed, but instead it fails with: "CatalogException: Detected catalog service ID changes" This manifests as a timeout waiting for the statement to reach the finished state. The test relies on specific timing with a sleep injected via a debug action. The failure stems from the catalog being slower on s3. The alter table wakes up before the catalog service ID change has fully completed, and it fails when it sees the catalog service ID change. This increases two sleep times: 1. This increases the sleep time before restarting the catalogd from 0.5 seconds to 5 seconds. This gives the catalogd longer to receive the message about the alter table and respond back to the impalad. 2. This increases the WAIT_BEFORE_PROCESSING_CATALOG_UPDATE sleep from 10 seconds to 30 seconds so the alter table statement doesn't wake up until the catalog service ID change is finalized. The test is verifying that the right messages are in the impalad logs, so we know this is still testing the same condition. This modifies the tests to use wait_for_finished_timeout() rather than wait_for_state(). This bails out immediately if the query fails rather than waiting unnecessarily for the full timeout. This also clears the query options so that later statements don't inherit the debug_action that the alter table statement used. Testing: - Ran the tests 100x in a loop on s3 - Ran the tests 100x in a loop on HDFS Change-Id: Ieb5699b8fb0b2ad8bad4ac30922a7b4d7fa17d29 Reviewed-on: http://gerrit.cloudera.org:8080/21485 Tested-by: Impala Public Jenkins Reviewed-by: Daniel Becker > test_restart_catalogd_while_handling_rpc_response* tests fail not reaching > expected states > -- > > Key: IMPALA-12616 > URL: https://issues.apache.org/jira/browse/IMPALA-12616 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 1.4.2 >Reporter: Andrew Sherman >Assignee: Daniel Becker >Priority: Critical > > There are failures in both > custom_cluster.test_restart_services.TestRestart.test_restart_catalogd_while_handling_rpc_response_with_timeout > and > custom_cluster.test_restart_services.TestRestart.test_restart_catalogd_while_handling_rpc_response_with_max_iters, > both look the same: > {code:java} > custom_cluster/test_restart_services.py:232: in > test_restart_catalogd_while_handling_rpc_response_with_timeout > self.wait_for_state(handle, self.client.QUERY_STATES["FINISHED"], > max_wait_time) > common/impala_test_suite.py:1181: in wait_for_state > self.wait_for_any_state(handle, [expected_state], timeout, client) > common/impala_test_suite.py:1199: in wait_for_any_state > raise Timeout(timeout_msg) > E Timeout: query '6a4e0bad9b511ccf:bf93de68' did not reach one of > the expected states [4], last known state 5 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13130) Under heavy load, Impala does not prioritize data stream operations
[ https://issues.apache.org/jira/browse/IMPALA-13130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853074#comment-17853074 ] ASF subversion and git services commented on IMPALA-13130: -- Commit 3f827bfc2447d8c11a4f09bcb96e86c53b92d753 in impala's branch refs/heads/master from Michael Smith [ https://gitbox.apache.org/repos/asf?p=impala.git;h=3f827bfc2 ] IMPALA-13130: Prioritize EndDataStream messages Prioritize EndDataStream messages over other types handled by DataStreamService, and avoid rejecting them when memory limit is reached. They take very little memory (~75 bytes) and will usually help reduce memory use by closing out in-progress operations. Adds the 'data_stream_sender_eos_timeout_ms' flag to control EOS timeouts. Defaults to 1 hour, and can be disabled by setting to -1. Adds unit tests ensuring EOS are processed even if mem limit is reached and ahead of TransmitData messages in the queue. Change-Id: I2829e1ab5bcde36107e10bff5fe629c5ee60f3e8 Reviewed-on: http://gerrit.cloudera.org:8080/21476 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Under heavy load, Impala does not prioritize data stream operations > --- > > Key: IMPALA-13130 > URL: https://issues.apache.org/jira/browse/IMPALA-13130 > Project: IMPALA > Issue Type: Bug >Reporter: Michael Smith >Assignee: Michael Smith >Priority: Major > > Under heavy load - where Impala reaches max memory for the DataStreamService > and applies backpressure via > https://github.com/apache/impala/blob/4.4.0/be/src/rpc/impala-service-pool.cc#L191-L199 > - DataStreamService does not differentiate between types of requests and may > reject requests that could help reduce load. > The DataStreamService deals with TransmitData, PublishFilter, UpdateFilter, > UpdateFilterFromRemote, and EndDataStream. It seems like we should prioritize > completing EndDataStream, especially under heavy load, to complete work and > release resources more quickly. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13119) CostingSegment.java is initialized with wrong cost
[ https://issues.apache.org/jira/browse/IMPALA-13119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852621#comment-17852621 ] ASF subversion and git services commented on IMPALA-13119: -- Commit 753ee9b8a80d8e4c0db966a3132446a5aceb05cd in impala's branch refs/heads/master from Riza Suminto [ https://gitbox.apache.org/repos/asf?p=impala.git;h=753ee9b8a ] IMPALA-13119: Fix cost_ initialization at CostingSegment.java This patch fix cost_ initialization of CostingSegment. The public constructor should initialize cost_ with ProcessingCost directly taken from PlanNode or DataSink parameter. The private constructor still initialize cost_ with ProcessingCost.zero(). Testing: - Add TpcdsCpuCostPlannerTest#testQ43Verbose Verify that "#cons:#prod" is correct in verbose profile. - Pass FE tests TpcdsCpuCostPlannerTest, PlannerTest#testProcessingCost, and PlannerTest#testProcessingCostPlanAdmissionSlots - Pass test_executor_groups.py Change-Id: I5b3c99c87a1d0a08edc8d276cf33d709bd39fe14 Reviewed-on: http://gerrit.cloudera.org:8080/21468 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > CostingSegment.java is initialized with wrong cost > -- > > Key: IMPALA-13119 > URL: https://issues.apache.org/jira/browse/IMPALA-13119 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Affects Versions: Impala 4.4.0 >Reporter: Riza Suminto >Assignee: Riza Suminto >Priority: Major > > CostingSegment.java has two public constructor: one accept PlanNode, while > the other accept DataSink as parameter. Both call appendCost method, which > sum the additionalCost with the segment's current cost_. > However, if cost_ were ProcessingCost.zero(), it can mistakenly > setNumRowToConsume to 0. > [https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/CostingSegment.java#L114] > > The public constructor should just initialize cost_ with ProcessingCost from > PlanNode or DataSink from constructor. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13134) DDL hang with SYNC_DDL enabled when Catalogd is changed to standby status
[ https://issues.apache.org/jira/browse/IMPALA-13134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852620#comment-17852620 ] ASF subversion and git services commented on IMPALA-13134: -- Commit 70b7b6a78d49c30933d79e0a1c2a725f7e0a3e50 in impala's branch refs/heads/master from wzhou-code [ https://gitbox.apache.org/repos/asf?p=impala.git;h=70b7b6a78 ] IMPALA-13134: DDL hang with SYNC_DDL enabled when Catalogd is changed to standby status Catalogd waits for SYNC_DDL version when it processes a DDL with SYNC_DDL enabled. If the status of Catalogd is changed from active to standby when CatalogServiceCatalog.waitForSyncDdlVersion() is called, the standby catalogd does not receive catalog topic updates from statestore, hence catalogd thread waits indefinitely. This patch fixed the issue by re-generating service id when Catalogd is changed to standby status and throwing exception if its service id has been changed when waiting for SYNC_DDL version. Testing: - Added unit-test code for CatalogD HA to run DDL with SYNC_DDL enabled and injected delay when waiting SYNC_DLL version, then verify that DDL query fails due to catalog failover. - Passed test_catalogd_ha.py. Change-Id: I2dcd628cff3c10d2e7566ba2d9de0b5886a18fc1 Reviewed-on: http://gerrit.cloudera.org:8080/21480 Reviewed-by: Riza Suminto Tested-by: Impala Public Jenkins > DDL hang with SYNC_DDL enabled when Catalogd is changed to standby status > - > > Key: IMPALA-13134 > URL: https://issues.apache.org/jira/browse/IMPALA-13134 > Project: IMPALA > Issue Type: Bug > Components: Backend, Catalog >Reporter: Wenzhe Zhou >Assignee: Wenzhe Zhou >Priority: Major > > Catalogd waits for SYNC_DDL version when it processes a DDL with SYNC_DDL > enabled. If the status of Catalogd is changed from active to standby when > CatalogServiceCatalog.waitForSyncDdlVersion() is called, the standby catalogd > does not receive catalog topic updates from statestore. This causes catalogd > thread waits indefinitely and DDL query hanging. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12705) Add a page to show the catalog's HA information
[ https://issues.apache.org/jira/browse/IMPALA-12705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852597#comment-17852597 ] ASF subversion and git services commented on IMPALA-12705: -- Commit f67f5f1815c60a4723887ea6fcdaa067b7fa4ca5 in impala's branch refs/heads/master from ttz [ https://gitbox.apache.org/repos/asf?p=impala.git;h=f67f5f181 ] IMPALA-12705: Add /catalog_ha_info page on Statestore to show catalog HA information This patch adds /catalog_ha_info page on Statestore to show catalog HA information. The page contains the following information: Active Node, Standby Node, and Notified Subscribers table. In the Notified Subscribers table, include the following information items: -- Id, -- Address, -- Registration ID, -- Subscriber Type, -- Catalogd Version, -- Catalogd Address, -- Last Update Catalogd Time Change-Id: If85f6a827ae8180d13caac588b92af0511ac35e3 Reviewed-on: http://gerrit.cloudera.org:8080/21418 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Add a page to show the catalog's HA information > --- > > Key: IMPALA-12705 > URL: https://issues.apache.org/jira/browse/IMPALA-12705 > Project: IMPALA > Issue Type: Improvement >Affects Versions: Impala 4.3.0 >Reporter: Zhi Tang >Assignee: Zhi Tang >Priority: Major > Attachments: image-2024-05-27-10-57-37-158.png > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13129) Hit DCHECK when skipping MIN_MAX runtime filter
[ https://issues.apache.org/jira/browse/IMPALA-13129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852181#comment-17852181 ] ASF subversion and git services commented on IMPALA-13129: -- Commit e2e45401e2bead4090fd5c562709db521cbc6d38 in impala's branch refs/heads/master from Riza Suminto [ https://gitbox.apache.org/repos/asf?p=impala.git;h=e2e45401e ] IMPALA-13129: Move runtime filter skipping at registerRuntimeFilter A DCHECK in hdfs-scanner.h was hit when skipping a MIN_MAX runtime filter using RUNTIME_FILTER_IDS_TO_SKIP query option. This is because HdfsScanNode.tryToComputeOverlapPredicate() is called and register a TOverlapPredicateDesc during runtime filter generation, but the minmax filter is then skipped later, causing backend to hit DCHECK. This patch move the runtime filter skipping at registerRuntimeFilter() so that HdfsScanNode.tryToComputeOverlapPredicate() will not be called at all once a filter is skipped. Testing: - Add test in overlap_min_max_filters.test to explicitly skip a minmax runtime filter. - Pass test_runtime_filters.py Change-Id: I43c1c4abc88019aadaa85d2e3d0ecda417297bfc Reviewed-on: http://gerrit.cloudera.org:8080/21477 Reviewed-by: Wenzhe Zhou Tested-by: Impala Public Jenkins > Hit DCHECK when skipping MIN_MAX runtime filter > --- > > Key: IMPALA-13129 > URL: https://issues.apache.org/jira/browse/IMPALA-13129 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Affects Versions: Impala 4.4.0 >Reporter: Riza Suminto >Assignee: Riza Suminto >Priority: Major > > A [DCHECK in > hdfs-scanner.h|https://github.com/apache/impala/blob/ce8078204e5995277f79e226e26fe8b9eaca408b/be/src/exec/hdfs-scanner.h#L199] > is hit when skipping a MIN_MAX runtime filter using > RUNTIME_FILTER_IDS_TO_SKIP query option. This is because during runtime > filter generation, HdfsScanNode.tryToComputeOverlapPredicate() is called and > register a > TOverlapPredicateDesc, but the minmax filter is then skipped later, causing > backend to hit DCHECK. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13111) impala-gdb.py's find-query-ids/find-fragment-instances return unusable query ids
[ https://issues.apache.org/jira/browse/IMPALA-13111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851225#comment-17851225 ] ASF subversion and git services commented on IMPALA-13111: -- Commit ce8078204e5995277f79e226e26fe8b9eaca408b in impala's branch refs/heads/master from Joe McDonnell [ https://gitbox.apache.org/repos/asf?p=impala.git;h=ce8078204 ] IMPALA-13111: Fix the calculation of fragment ids for impala-gdb.py The gdb helpers in impala-gdb.py provide functions to look on the stack for the information added in IMPALA-6416 and get the fragment/query ids. Right now, it is incorrectly using a signed integer, which leads to incorrect ids like this: -3cbda1606b3ade7c:f170c4bd This changes the logic to AND the integer with an 0xFF* sequence of the right length. This forces the integer to be unsigned, producing the right query id. Testing: - Ran this on a minidump and verified the the listed query ids were valid (and existed in the profile log) Change-Id: I59798407e99ee0e9100cac6b4b082cdb85ed43d1 Reviewed-on: http://gerrit.cloudera.org:8080/21472 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > impala-gdb.py's find-query-ids/find-fragment-instances return unusable query > ids > > > Key: IMPALA-13111 > URL: https://issues.apache.org/jira/browse/IMPALA-13111 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 4.5.0 >Reporter: Joe McDonnell >Assignee: Joe McDonnell >Priority: Major > > The gdb helpers in lib/python/impala_py_lib/gdb/impala-gdb.py provide > information about the queries / fragments running in a core file. However, > the query/fragment ids that it returns have issues with the signedness of the > integers: > {noformat} > (gdb) find-fragment-instances > Fragment Instance Id Thread IDs > -23b76c1699a831a1:279358680036 [117120] > -23b76c1699a831a1:279358680037 [117121] > -23b76c1699a831a1:279358680038 [117122] > .. > (gdb) find-query-ids > -3cbda1606b3ade7c:f170c4bd > -23b76c1699a831a1:27935868 > 68435df1364aa90f:1752944f > 3442ed6354c7355d:78c83d20{noformat} > The low values for find-query-ids don't have this problem, because it is > ANDed with 0x: > {noformat} > qid_low = format(int(qid_low, 16) & 0x, > 'x'){noformat} > We can fix the other locations by ANDing with 0x. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-6416) Extend Thread::Create to track fragment instance id automatically based on parent's fid
[ https://issues.apache.org/jira/browse/IMPALA-6416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851226#comment-17851226 ] ASF subversion and git services commented on IMPALA-6416: - Commit ce8078204e5995277f79e226e26fe8b9eaca408b in impala's branch refs/heads/master from Joe McDonnell [ https://gitbox.apache.org/repos/asf?p=impala.git;h=ce8078204 ] IMPALA-13111: Fix the calculation of fragment ids for impala-gdb.py The gdb helpers in impala-gdb.py provide functions to look on the stack for the information added in IMPALA-6416 and get the fragment/query ids. Right now, it is incorrectly using a signed integer, which leads to incorrect ids like this: -3cbda1606b3ade7c:f170c4bd This changes the logic to AND the integer with an 0xFF* sequence of the right length. This forces the integer to be unsigned, producing the right query id. Testing: - Ran this on a minidump and verified the the listed query ids were valid (and existed in the profile log) Change-Id: I59798407e99ee0e9100cac6b4b082cdb85ed43d1 Reviewed-on: http://gerrit.cloudera.org:8080/21472 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Extend Thread::Create to track fragment instance id automatically based on > parent's fid > --- > > Key: IMPALA-6416 > URL: https://issues.apache.org/jira/browse/IMPALA-6416 > Project: IMPALA > Issue Type: Sub-task >Reporter: Zoltán Borók-Nagy >Assignee: Zoltán Borók-Nagy >Priority: Major > Fix For: Impala 2.12.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13057) Incorporate tuple/slot information into the tuple cache key
[ https://issues.apache.org/jira/browse/IMPALA-13057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850883#comment-17850883 ] ASF subversion and git services commented on IMPALA-13057: -- Commit 825900fa6c3a51941b7b90edb8af6f7dba5e5fe8 in impala's branch refs/heads/master from Joe McDonnell [ https://gitbox.apache.org/repos/asf?p=impala.git;h=825900fa6 ] IMPALA-13057: Incorporate tuple/slot information into tuple cache key The tuple cache keys currently do not include information about the tuples or slots, as that information is stored outside the PlanNode thrift structures. The tuple/slot information is critical to determining which columns are referenced and what data layout the result tuple has. This adds code to incorporate the TupleDescriptors and SlotDescriptors into the cache key. Since the tuple and slot ids are indexes into a global structure (the descriptor table), they hinder cache key matches across different queries. If a query has an extra filter, it can shift all the slot ids. If the query has an extra join, it can shift all the tuple ids. To eliminate this effect, this adds the ability to translate tuple and slot ids from global indices to local indices. The translation only contains information from the subtree below that point, so it is not influenced by unrelated parts of the query. When the code registers a tuple with the TupleCacheInfo, it also registers a translation from the global index to a local index. Any code that puts SlotIds or TupleIds into a Thrift data structure can use the translateTupleId() and translateSlotId() functions to get the local index. These are exposed on ThriftSerializationCtx by functions of the same name, but those functions apply the translation only when working for the tuple cache. This passes the ThriftSerializationCtx into Exprs that have TupleIds or SlotIds and applies the translation. It also passes the ThriftSerializationCtx into PlanNode::toThrift(), which is used to translate TupleIds in HdfsScanNode. This also adds a way to register a table with the tuple cache and incorporate information about it. This allows us to mask out additional fields in PlanNode and enable a test case that relies on matching with different table aliases. Testing: - This fixes some commented out test cases in TupleCacheTest (specifically telling columns apart) - This adds new test cases that match due to id translation (extra filters, extra joins) - This adds a unit test for the id translation to TupleCacheInfoTest Change-Id: I7f5278e9dbb976cbebdc6a21a6e66bc90ce06c6c Reviewed-on: http://gerrit.cloudera.org:8080/21398 Reviewed-by: Joe McDonnell Tested-by: Impala Public Jenkins > Incorporate tuple/slot information into the tuple cache key > --- > > Key: IMPALA-13057 > URL: https://issues.apache.org/jira/browse/IMPALA-13057 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Affects Versions: Impala 4.4.0 >Reporter: Joe McDonnell >Assignee: Joe McDonnell >Priority: Major > > Since the tuple and slot information is kept separately in the descriptor > table, it does not get incorporated into the PlanNode thrift used for the > tuple cache key. This means that the tuple cache can't distinguish between > these two queries: > {noformat} > select int_col1 from table; > select int_col2 from table;{noformat} > To solve this, the tuple/slot information needs to be incorporated into the > cache key. PlanNode::initThrift() walks through each tuple, so this is a good > place to serialize the TupleDescriptor/SlotDescriptors and incorporate it > into the hash. > The tuple ids and slot ids are global ids, so the value is influenced by the > entirety of the query. This is a problem for matching cache results across > different queries. As part of incorporating the tuple/slot information, we > should also add an ability to translate tuple/slot ids into ids local to a > subtree. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13108) Update Impala version to 4.5.0-SNAPSHOT
[ https://issues.apache.org/jira/browse/IMPALA-13108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850526#comment-17850526 ] ASF subversion and git services commented on IMPALA-13108: -- Commit 1324a6e6c9589300424c84ad2a2aa7fd256068b2 in impala's branch refs/heads/master from Zoltan Borok-Nagy [ https://gitbox.apache.org/repos/asf?p=impala.git;h=1324a6e6c ] IMPALA-13108: Update version to 4.5.0-SNAPSHOT Updated IMPALA_VERSION in impala-config.sh Executed the followings for Java: cd java mvn versions:set -DnewVersion=4.5.0-SNAPSHOT Change-Id: Ie7803fe523406dbdd1ac066a35bb31d21765a244 Reviewed-on: http://gerrit.cloudera.org:8080/21460 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Update Impala version to 4.5.0-SNAPSHOT > --- > > Key: IMPALA-13108 > URL: https://issues.apache.org/jira/browse/IMPALA-13108 > Project: IMPALA > Issue Type: Task >Reporter: Zoltán Borók-Nagy >Priority: Major > > With the release of 4.4.0, we should update the master to version 4.5.0. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13107) Invalid TExecPlanFragmentInfo received by executor with instance number as 0
[ https://issues.apache.org/jira/browse/IMPALA-13107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850525#comment-17850525 ] ASF subversion and git services commented on IMPALA-13107: -- Commit 3e1b10556bc83b0e697b7a2aac411ccad6094563 in impala's branch refs/heads/master from wzhou-code [ https://gitbox.apache.org/repos/asf?p=impala.git;h=3e1b10556 ] IMPALA-13107: Don't start query on executor if instance number equals 0 In bad networking condition, TExecPlanFragmentInfo in KRPC messages received by executors could be truncated due to KRPC failures, but truncation may not cause thrift deserialization error. The invalid TExecPlanFragmentInfo causes Impala daemon to crash. To avoid crash, this patch checks number of instances in received TExecPlanFragment on executor. The query will not be started if number of instances equals 0. Also adds DCHECK on coordinator side to make sure it does not send TExecPlanFragment without any instance. Testing: - Passed core tests. - Passed exhaustive tests in debug build. The new DCHECKs were not hit. Change-Id: Ie92ee120f1e9369f8dc2512792a05b7f8be5f007 Reviewed-on: http://gerrit.cloudera.org:8080/21458 Reviewed-by: Wenzhe Zhou Tested-by: Impala Public Jenkins > Invalid TExecPlanFragmentInfo received by executor with instance number as 0 > > > Key: IMPALA-13107 > URL: https://issues.apache.org/jira/browse/IMPALA-13107 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Wenzhe Zhou >Assignee: Wenzhe Zhou >Priority: Major > Fix For: Impala 4.5.0 > > > In a customer reported case, TExecPlanFragmentInfo received by executors with > instance number equals 0, which caused impala daemon to crash. Here are log > messages collected on the Impala executors: > {code:java} > impalad.executor.net.impala.log.INFO.20240522-160138.197583:I0523 > 00:59:16.892853 199528 control-service.cc:148] > 624c47e9264ebb62:5aa89af3] ExecQueryFInstances(): > query_id=624c47e9264ebb62:5aa89af3 coord=coordinator.net:27000 > #instances=0 > .. > I0523 00:59:19.306522 199185 kMinidump in thread > [1890723]query-state-624c47e9264ebb62:5aa89af3 running query > 624c47e9264ebb62:5aa89af3, fragment instance > : > Wrote minidump to > /var/log/impala-minidumps/impalad/021b06ea-1627-4c69-9f27858a-f3cd9026.dmp > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x012ff9d9, pid=197583, tid=0x7eefc98a0700 > # > # JRE version: Java(TM) SE Runtime Environment (8.0_381) (build 1.8.0_381-b09) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.381-b09 mixed mode > linux-amd64 ) > # Problematic frame: > # C [impalad+0xeff9d9] > impala::FragmentState::FragmentState(impala::QueryState*, > impala::TPlanFragment const&, impala::PlanFragmentCtxPB const&)+0xf9 > # > # Failed to write core dump. Core dumps have been disabled. To enable core > dumping, try "ulimit -c unlimited" before starting Java again > # > {code} > From the collected profiles, there was no fragment with instance number as 0 > in the corresponding query plan so coordinator should not send fragments to > executor with number of instances as 0. Executor log files showed that there > were lots of KRPC errors around the time when receiving invalid > TExecPlanFragmentInfo. It seems KRPC messages were truncated due to KRPC > failures, but truncation might not cause thrift deserialization error. The > invalid TExecPlanFragmentInfo caused Impala daemon to crash with following > stack trace when the query was started on executor. > {code:java} > #0 SubstituteArg (value=..., this=0x7f86cec79d30) at > ../gutil/strings/substitute.h:79 > #1 impala::FragmentState::FragmentState (this=0x35c78f40, > query_state=0x7972db00, fragment=..., > fragment_ctx= 0x35c78f88>) at fragment-state.cc:143 > #2 0x013019aa in impala::FragmentState::CreateFragmentStateMap > (fragment_info=..., exec_request=..., > state=state@entry=0x7972db00, fragment_map=...) at fragment-state.cc:47 > #3 0x01292d71 in impala::QueryState::StartFInstances > (this=this@entry=0x7972db00) at query-state.cc:820 > #4 0x01284810 in impala::QueryExecMgr::ExecuteQueryHelper > (this=0x11943b00, qs=0x7972db00) > at query-exec-mgr.cc:162 > #5 0x01752915 in operator() (this=0x7f86cec7ab40) > at > ../../../toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:770 > #6 impala::Thread::SuperviseThread(std::__cxx11::basic_string std::char_traits, std::allocator > const&, > std::__cxx11::basic_string, std::allocator > > const&, boost::function, impala::ThreadDebugInfo
[jira] [Commented] (IMPALA-13085) Add warning and NULL out DECIMAL values in Iceberg metadata tables
[ https://issues.apache.org/jira/browse/IMPALA-13085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850054#comment-17850054 ] ASF subversion and git services commented on IMPALA-13085: -- Commit 2e093bbc8ae06f89f17bbe57f41d5e91749572c4 in impala's branch refs/heads/master from Daniel Becker [ https://gitbox.apache.org/repos/asf?p=impala.git;h=2e093bbc8 ] IMPALA-13085: Add warning and NULL out DECIMAL values in Iceberg metadata tables DECIMAL values are not supported in Iceberg metadata tables and Impala runs on a DCHECK and crashes if it encounters one. Until this issue is properly fixed (see IMPALA-13080), this commit introduces a temporary solution: DECIMAL values coming from Iceberg metadata tables are NULLed out and a warning is issued. Testing: - added a DECIMAL column to the 'iceberg_metadata_alltypes' test table, so querying the `files` metadata table will include a DECIMAL in the 'readable_metrics' struct. Change-Id: I0c8791805bc4fa2112e092e65366ca2815f3fa22 Reviewed-on: http://gerrit.cloudera.org:8080/21429 Reviewed-by: Daniel Becker Tested-by: Impala Public Jenkins > Add warning and NULL out DECIMAL values in Iceberg metadata tables > -- > > Key: IMPALA-13085 > URL: https://issues.apache.org/jira/browse/IMPALA-13085 > Project: IMPALA > Issue Type: Bug >Reporter: Daniel Becker >Assignee: Daniel Becker >Priority: Major > Labels: impala-iceberg > > IMPALA-13080 is about adding support for DECIMAL values in Iceberg metadata > tables. Until it is done, we should NULL out the values and issue a warning > instead of running on a DCHECK and crashing. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13080) Add support for DECIMAL in Iceberg metadata tables
[ https://issues.apache.org/jira/browse/IMPALA-13080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850055#comment-17850055 ] ASF subversion and git services commented on IMPALA-13080: -- Commit 2e093bbc8ae06f89f17bbe57f41d5e91749572c4 in impala's branch refs/heads/master from Daniel Becker [ https://gitbox.apache.org/repos/asf?p=impala.git;h=2e093bbc8 ] IMPALA-13085: Add warning and NULL out DECIMAL values in Iceberg metadata tables DECIMAL values are not supported in Iceberg metadata tables and Impala runs on a DCHECK and crashes if it encounters one. Until this issue is properly fixed (see IMPALA-13080), this commit introduces a temporary solution: DECIMAL values coming from Iceberg metadata tables are NULLed out and a warning is issued. Testing: - added a DECIMAL column to the 'iceberg_metadata_alltypes' test table, so querying the `files` metadata table will include a DECIMAL in the 'readable_metrics' struct. Change-Id: I0c8791805bc4fa2112e092e65366ca2815f3fa22 Reviewed-on: http://gerrit.cloudera.org:8080/21429 Reviewed-by: Daniel Becker Tested-by: Impala Public Jenkins > Add support for DECIMAL in Iceberg metadata tables > -- > > Key: IMPALA-13080 > URL: https://issues.apache.org/jira/browse/IMPALA-13080 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Daniel Becker >Priority: Major > Labels: impala-iceberg, ramp-up > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8042) Better selectivity estimate for BETWEEN
[ https://issues.apache.org/jira/browse/IMPALA-8042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849437#comment-17849437 ] ASF subversion and git services commented on IMPALA-8042: - Commit d0237fbe47eb5089ee19a0a201045b862d65ecaa in impala's branch refs/heads/master from Riza Suminto [ https://gitbox.apache.org/repos/asf?p=impala.git;h=d0237fbe4 ] IMPALA-8042: Assign BETWEEN selectivity for discrete-unique column Impala frontend can not evaluate BETWEEN/NOT BETWEEN predicate directly. It needs to transform a BetweenPredicate into a CompoundPredicate consisting of upper bound and lower bound BinaryPredicate through BetweenToCompoundRule.java. The BinaryPredicate can then be pushed down or rewritten into other form by another expression rewrite rule. However, the selectivity of BetweenPredicate or its derivatives remains unassigned and often collapses with other unknown selectivity predicates to have collective selectivity equals Expr.DEFAULT_SELECTIVITY (0.1). This patch adds a narrow optimization of BetweenPredicate selectivity when the following criteria are met: 1. The BetweenPredicate is bound to a slot reference of a single column of a table. 2. The column type is discrete, such as INTEGER or DATE. 3. The column stats are available. 4. The column is sufficiently unique based on available stats. 5. The BETWEEN/NOT BETWEEN predicate is in good form (lower bound value <= upper bound value). 6. The final calculated selectivity is less than or equal to Expr.DEFAULT_SELECTIVITY. If these criteria are unmet, the Planner will revert to the old behavior, which is letting the selectivity unassigned. Since this patch only target BetweenPredicate over unique column, the following query will still have the default scan selectivity (0.1): select count(*) from tpch.customer c where c.c_custkey >= 1234 and c.c_custkey <= 2345; While this equivalent query written with BETWEEN predicate will have lower scan selectivity: select count(*) from tpch.customer c where c.c_custkey between 1234 and 2345; This patch calculates the BetweenPredicate selectivity during transformation at BetweenToCompoundRule.java. The selectivity is piggy-backed into the resulting CompoundPredicate and BinaryPredicate as betweenSelectivity_ field, separate from the selectivity_ field. Analyzer.getBoundPredicates() is modified to prioritize the derived BinaryPredicate over ordinary BinaryPredicate in its return value to prevent the derived BinaryPredicate from being eliminated by a matching ordinary BinaryPredicate. Testing: - Add table functional_parquet.unique_with_nulls. - Add FE tests in ExprCardinalityTest#testBetweenSelectivity, ExprCardinalityTest#testNotBetweenSelectivity, and PlannerTest#testScanCardinality. - Pass core tests. Change-Id: Ib349d97349d1ee99788645a66be1b81749684d10 Reviewed-on: http://gerrit.cloudera.org:8080/21377 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Better selectivity estimate for BETWEEN > --- > > Key: IMPALA-8042 > URL: https://issues.apache.org/jira/browse/IMPALA-8042 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Affects Versions: Impala 3.1.0 >Reporter: Paul Rogers >Assignee: Riza Suminto >Priority: Minor > > The analyzer rewrites a BETWEEN expression into a pair of inequalities. > IMPALA-8037 explains that the planner then groups all such non-quality > conditions together and assigns a selectivity of 0.1. IMPALA-8031 explains > that the analyzer should handle inequalities better. > BETWEEN is a special case and informs the final result. If we assume a > selectivity of s for inequality, then BETWEEN should be something like s/2. > The intuition is that if c >= x includes, say, ⅓ of values, and c <= y > includes a third of values, then c BETWEEN x AND y should be a narrower set > of values, say ⅙. > [Ramakrishnan an > Gherke|http://pages.cs.wisc.edu/~dbbook/openAccess/Minibase/optimizer/costformula.html\ > recommend 0.4 for between, 0.3 for inequality, and 0.3^2 = 0.09 for the > general expression x <= c AND c <= Y. Note the discrepancy between the > compound inequality case and the BETWEEN case, likely reflecting the > additional information we obtain when the user chooses to use BETWEEN. > To implement a special BETWEEN selectivity in Impala, we must remember the > selectivity of BETWEEN during the rewrite to a compound inequality. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13105) Multiple imported query profiles fail to import/clear at once
[ https://issues.apache.org/jira/browse/IMPALA-13105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849436#comment-17849436 ] ASF subversion and git services commented on IMPALA-13105: -- Commit 8a6f2824b8abf53ea022ca571da33619d564a14a in impala's branch refs/heads/master from Surya Hebbar [ https://gitbox.apache.org/repos/asf?p=impala.git;h=8a6f2824b ] IMPALA-13105: Fix multiple imported query profiles fail to import/clear at once On importing multiple query profiles, insertion of the last query in the queue fails as no delay is provided for the insertion. This has been fixed by providing a delay after inserting the final query. On clearing all the imported queries, in some instances page reloads before clearing the IndexedDB object store. This has been fixed by triggering the page reload after clearing the object store succeeds. Change-Id: I42470fecd0cff6e193f080102575e51d86a2d562 Reviewed-on: http://gerrit.cloudera.org:8080/21450 Reviewed-by: Wenzhe Zhou Reviewed-by: Riza Suminto Tested-by: Impala Public Jenkins > Multiple imported query profiles fail to import/clear at once > - > > Key: IMPALA-13105 > URL: https://issues.apache.org/jira/browse/IMPALA-13105 > Project: IMPALA > Issue Type: Bug >Reporter: Surya Hebbar >Assignee: Surya Hebbar >Priority: Major > > When multiple query profiles are chosen at once, the last query profile in > the insertion queue fails as the page reloads without providing a delay for > inserting it. > > The same behavior is seen when clearing all the query profiles. > > This is mostly seen in Chromium based browsers. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13034) Add logs for slow HTTP requests dumping the profile
[ https://issues.apache.org/jira/browse/IMPALA-13034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849073#comment-17849073 ] ASF subversion and git services commented on IMPALA-13034: -- Commit b975165a0acfe37af302dd7c007360633df54917 in impala's branch refs/heads/master from stiga-huang [ https://gitbox.apache.org/repos/asf?p=impala.git;h=b975165a0 ] IMPALA-13034: Add logs and counters for HTTP profile requests blocking client fetches There are several endpoints in WebUI that can dump a query profile: /query_profile, /query_profile_encoded, /query_profile_plain_text, /query_profile_json. The HTTP handler thread goes into ImpalaServer::GetRuntimeProfileOutput() which acquires lock of the ClientRequestState. This could block client requests in fetching query results. To help identify this issue, this patch adds warning logs when such profile dumping requests run slow and the query is still in-flight. Also adds a profile counter, GetInFlightProfileTimeStats, for the summary stats of this time. Dumping the profiles after the query is archived (e.g. closed) won't be tracked. Logs for slow http responses are also added. The thresholds are defined by two new flags, slow_profile_dump_warning_threshold_ms, and slow_http_response_warning_threshold_ms. Note that dumping the profile in-flight won't always block the query, e.g. if there are no client fetch requests or if the coordinator fragment is idle waiting for executor fragment instances. So a long time shown in GetInFlightProfileTimeStats doesn't mean it's hitting the issue. To better identify this issue, this patch adds another profile counter, ClientFetchLockWaitTimer, as the cumulative time client fetch requests waiting for locks. Also fixes false positive logs for complaining invalid query handles. Such logs are added in GetQueryHandle() when the query is not found in the active query map, but it could still exist in the query log. This removes the logs in GetQueryHandle() and lets the callers decide whether to log the error. Tests: - Added e2e test - Ran CORE tests Change-Id: I538ebe914f70f460bc8412770a8f7a1cc8b505dc Reviewed-on: http://gerrit.cloudera.org:8080/21412 Reviewed-by: Impala Public Jenkins Tested-by: Michael Smith > Add logs for slow HTTP requests dumping the profile > --- > > Key: IMPALA-13034 > URL: https://issues.apache.org/jira/browse/IMPALA-13034 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Critical > Fix For: Impala 4.5.0 > > > There are several endpoints in WebUI that can dump a query profile: > /query_profile, /query_profile_encoded, /query_profile_plain_text, > /query_profile_json > The HTTP handler thread goes into ImpalaServer::GetRuntimeProfileOutput() > which acquires lock of the ClientRequestState. This could blocks client > requests in fetching query results. We should add warning logs when such HTTP > requests run slow (e.g. when the profile is too large to download in a short > time). IP address and other info of such requests should also be logged. > Related codes: > https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-server.cc#L736 > https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-beeswax-server.cc#L601 > https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-hs2-server.cc#L207 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13102) Loading tables with illegal stats failed
[ https://issues.apache.org/jira/browse/IMPALA-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849072#comment-17849072 ] ASF subversion and git services commented on IMPALA-13102: -- Commit e35f8183cb1ba069ae00ee93e71451eccd505d0a in impala's branch refs/heads/master from stiga-huang [ https://gitbox.apache.org/repos/asf?p=impala.git;h=e35f8183c ] IMPALA-13102: Normalize invalid column stats from HMS Column stats like numDVs, numNulls in HMS could have arbitrary values. Impala expects them to be non-negative or -1 for unknown. So loading tables with invalid stats values (<-1) will fail. This patch adds logic to normalize the stats values. If the value < -1, use -1 for it and add corresponding warning logs. Also refactor some redundant codes in ColumnStats. Tests: - Add e2e test Change-Id: If6216e3d6e73a529a9b3a8c0ea9d22727ab43f1a Reviewed-on: http://gerrit.cloudera.org:8080/21445 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Loading tables with illegal stats failed > > > Key: IMPALA-13102 > URL: https://issues.apache.org/jira/browse/IMPALA-13102 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Critical > > When the table has illegal stats, e.g. numDVs=-100, Impala can't load the > table. So DROP STATS or DROP TABLE can't be perform on the table. > {code:sql} > [localhost:21050] default> drop stats alltypes_bak; > Query: drop stats alltypes_bak > ERROR: AnalysisException: Failed to load metadata for table: 'alltypes_bak' > CAUSED BY: TableLoadingException: Failed to load metadata for table: > default.alltypes_bak > CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, > avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, > numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1}{code} > We should allow at least dropping the stats or dropping the table. So user > can use Impala to recover the stats. > Stacktrace in the logs: > {noformat} > I0520 08:00:56.661746 17543 jni-util.cc:321] > 5343142d1173494f:44dcde8c] > org.apache.impala.common.AnalysisException: Failed to load metadata for > table: 'alltypes_bak' > at > org.apache.impala.analysis.Analyzer.resolveTableRef(Analyzer.java:974) > at > org.apache.impala.analysis.DropStatsStmt.analyze(DropStatsStmt.java:94) > at > org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:551) > at > org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:498) > at > org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2542) > at > org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2224) > at > org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1985) > at > org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175) > Caused by: org.apache.impala.catalog.TableLoadingException: Failed to load > metadata for table: default.alltypes_bak > CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, > avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, > numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1} > at > org.apache.impala.catalog.IncompleteTable.loadFromThrift(IncompleteTable.java:162) > at org.apache.impala.catalog.Table.fromThrift(Table.java:586) > at > org.apache.impala.catalog.ImpaladCatalog.addTable(ImpaladCatalog.java:479) > at > org.apache.impala.catalog.ImpaladCatalog.addCatalogObject(ImpaladCatalog.java:334) > at > org.apache.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:262) > at > org.apache.impala.service.FeCatalogManager$CatalogdImpl.updateCatalogCache(FeCatalogManager.java:114) > at > org.apache.impala.service.Frontend.updateCatalogCache(Frontend.java:585) > at > org.apache.impala.service.JniFrontend.updateCatalogCache(JniFrontend.java:196) > at .: > org.apache.impala.catalog.TableLoadingException: Failed to load metadata for > table: default.alltypes_bak > at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1318) > at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1213) > at org.apache.impala.catalog.TableLoader.load(TableLoader.java:145) > at > org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:251) > at > org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:247) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at >
[jira] [Commented] (IMPALA-11735) Handle CREATE_TABLE event when the db is invisible to the impala server user
[ https://issues.apache.org/jira/browse/IMPALA-11735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848916#comment-17848916 ] ASF subversion and git services commented on IMPALA-11735: -- Commit 9672312015be959360795a8af0843fdf386b557c in impala's branch refs/heads/master from Sai Hemanth Gantasala [ https://gitbox.apache.org/repos/asf?p=impala.git;h=967231201 ] IMPALA-11735: Handle CREATE_TABLE event when the db is invisible to the impala server user It's possible that some dbs are invisible to Impala cluster due to authorization restrictions. However, the CREATE_TABLE events in such dbs will lead the event-processor into ERROR state. Event processor should ignore such CREAT_TABLE events when database is not found. note: This is an incorrect setup, where 'impala' super user is denied access on the metadata object database but given access to fetch events from notification log table of metastore. Testing: - Manually verified this on local cluster. - Added automated unit test to verify the same. Change-Id: I90275bb8c065fc5af61186901ac7e9839a68c43b Reviewed-on: http://gerrit.cloudera.org:8080/21188 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Handle CREATE_TABLE event when the db is invisible to the impala server user > > > Key: IMPALA-11735 > URL: https://issues.apache.org/jira/browse/IMPALA-11735 > Project: IMPALA > Issue Type: Bug >Reporter: Quanlong Huang >Assignee: Sai Hemanth Gantasala >Priority: Critical > > It's possible that some dbs are invisible to Impala cluster due to > authorization restrictions. However, the CREATE_TABLE events in such dbs will > lead the event-processor into ERROR state: > {noformat} > E1026 03:02:30.650302 116774 MetastoreEventsProcessor.java:684] Unexpected > exception received while processing event > Java exception follows: > org.apache.impala.catalog.events.MetastoreNotificationException: EventId: > 184240416 EventType: CREATE_TABLE Unable to process event > at > org.apache.impala.catalog.events.MetastoreEvents$CreateTableEvent.process(MetastoreEvents.java:735) > at > org.apache.impala.catalog.events.MetastoreEvents$MetastoreEvent.processIfEnabled(MetastoreEvents.java:345) > at > org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:772) > at > org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:670) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > E1026 03:02:30.650447 116774 MetastoreEventsProcessor.java:795] Notification > event is null > {noformat} > It should be handled (e.g. ignored) and reported to the admin (e.g. in logs). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13083) Clarify REASON_MEM_LIMIT_TOO_LOW_FOR_RESERVATION error message
[ https://issues.apache.org/jira/browse/IMPALA-13083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848917#comment-17848917 ] ASF subversion and git services commented on IMPALA-13083: -- Commit 98739a84557a209e05694abd79f62f7f7daf8777 in impala's branch refs/heads/master from Riza Suminto [ https://gitbox.apache.org/repos/asf?p=impala.git;h=98739a845 ] IMPALA-13083: Clarify REASON_MEM_LIMIT_TOO_LOW_FOR_RESERVATION This patch improves REASON_MEM_LIMIT_TOO_LOW_FOR_RESERVATION error message by saying the specific configuration that must be adjusted such that the query can pass the Admission Control. New fields 'per_backend_mem_to_admit_source' and 'coord_backend_mem_to_admit_source' of type MemLimitSourcePB are added into QuerySchedulePB. These fields explain what limiting factor drives final numbers at 'per_backend_mem_to_admit' and 'coord_backend_mem_to_admit' respectively. In turn, Admission Control will use this information to compose a more informative error message that the user can act upon. The new error message pattern also explicitly mentions "Per Host Min Memory Reservation" as a place to look at to investigate memory reservations scheduled for each backend node. Updated documentation with examples of query rejection by Admission Control and how to read the error message. Testing: - Add BE tests at admission-controller-test.cc - Adjust and pass affected EE tests Change-Id: I1ef7fb7e7a194b2036c2948639a06c392590bf66 Reviewed-on: http://gerrit.cloudera.org:8080/21436 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Clarify REASON_MEM_LIMIT_TOO_LOW_FOR_RESERVATION error message > -- > > Key: IMPALA-13083 > URL: https://issues.apache.org/jira/browse/IMPALA-13083 > Project: IMPALA > Issue Type: Improvement > Components: Distributed Exec >Reporter: Riza Suminto >Assignee: Riza Suminto >Priority: Major > > REASON_MEM_LIMIT_TOO_LOW_FOR_RESERVATION error message is too vague for > user/administrator to make necessary adjustment to run query that is rejected > by admission-controller. > {code:java} > const string REASON_MEM_LIMIT_TOO_LOW_FOR_RESERVATION = > "minimum memory reservation is greater than memory available to the query > for buffer " > "reservations. Memory reservation needed given the current plan: $0. > Adjust either " > "the mem_limit or the pool config (max-query-mem-limit, > min-query-mem-limit) for the " > "query to allow the query memory limit to be at least $1. Note that > changing the " > "mem_limit may also change the plan. See the query profile for more > information " > "about the per-node memory requirements."; > {code} > There are many config and options that directly and indirectly clamp > schedule.per_backend_mem_limit() and schedule.per_backend_mem_to_admit(). > [https://github.com/apache/impala/blob/3b35ddc8ca7b0e540fc16c413a170a25e164462b/be/src/scheduling/schedule-state.cc#L262-L361] > Ideally, this error message should clearly mention which query option / llama > config / backend flag that influence per_backend_mem_limit decision so that > user can make directly make adjustment on that config. It should also clearly > mention 'Per Host Min Memory Reservation' info string at query profile > instead of just 'per-node memory requirements'. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13040) SIGSEGV in QueryState::UpdateFilterFromRemote
[ https://issues.apache.org/jira/browse/IMPALA-13040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848461#comment-17848461 ] ASF subversion and git services commented on IMPALA-13040: -- Commit aa01079478773aed28c9a4d8b07c062202de698d in impala's branch refs/heads/master from Riza Suminto [ https://gitbox.apache.org/repos/asf?p=impala.git;h=aa0107947 ] IMPALA-13040: (addendum) Inject larger delay for sanitized build TestLateQueryStateInit has been flaky in sanitized build because the largest delay injection time is fixed at 3 seconds. This patch fixes the issue by setting largest delay injection time equal to RUNTIME_FILTER_WAIT_TIME_MS, which is 3 second for regular build and 10 seconds for sanitized build. Testing: - Loop and pass test_runtime_filter_aggregation.py 10 times in ASAN build and 50 times in UBSAN build. Change-Id: I09e5ae4646f53632e9a9f519d370a33a5534df19 Reviewed-on: http://gerrit.cloudera.org:8080/21439 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > SIGSEGV in QueryState::UpdateFilterFromRemote > -- > > Key: IMPALA-13040 > URL: https://issues.apache.org/jira/browse/IMPALA-13040 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Csaba Ringhofer >Assignee: Riza Suminto >Priority: Critical > Fix For: Impala 4.5.0 > > > {code} > Crash reason: SIGSEGV /SEGV_MAPERR > Crash address: 0x48 > Process uptime: not available > Thread 114 (crashed) > 0 libpthread.so.0 + 0x9d00 > rax = 0x00019e57ad00 rdx = 0x2a656720 > rcx = 0x059a9860 rbx = 0x > rsi = 0x00019e57ad00 rdi = 0x0038 > rbp = 0x7f6233d544e0 rsp = 0x7f6233d544a8 > r8 = 0x06a53540r9 = 0x0039 > r10 = 0x r11 = 0x000a > r12 = 0x00019e57ad00 r13 = 0x7f62a2f997d0 > r14 = 0x7f6233d544f8 r15 = 0x1632c0f0 > rip = 0x7f62a2f96d00 > Found by: given as instruction pointer in context > 1 > impalad!impala::QueryState::UpdateFilterFromRemote(impala::UpdateFilterParamsPB > const&, kudu::rpc::RpcContext*) [query-state.cc : 1033 + 0x5] > rbp = 0x7f6233d54520 rsp = 0x7f6233d544f0 > rip = 0x015c0837 > Found by: previous frame's frame pointer > 2 > impalad!impala::DataStreamService::UpdateFilterFromRemote(impala::UpdateFilterParamsPB > const*, impala::UpdateFilterResultPB*, kudu::rpc::RpcContext*) > [data-stream-service.cc : 134 + 0xb] > rbp = 0x7f6233d54640 rsp = 0x7f6233d54530 > rip = 0x017c05de > Found by: previous frame's frame pointer > {code} > The line that crashes is > https://github.com/apache/impala/blob/b39cd79ae84c415e0aebec2c2b4d7690d2a0cc7a/be/src/runtime/query-state.cc#L1033 > My guess is that inside the actual segfault is within WaitForPrepare() but it > was inlined. Not sure if a remote filter can arrive even before > QueryState::Init is finished - that would explain the issue, as > instances_prepared_barrier_ is not yet created at that point. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12800) Queries with many nested inline views see performance issues with ExprSubstitutionMap
[ https://issues.apache.org/jira/browse/IMPALA-12800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848462#comment-17848462 ] ASF subversion and git services commented on IMPALA-12800: -- Commit ae6846b1cd039b2cd6f8753ce3ff810c5b2d3ce3 in impala's branch refs/heads/master from Joe McDonnell [ https://gitbox.apache.org/repos/asf?p=impala.git;h=ae6846b1c ] IMPALA-12800: Skip O(n^2) ExprSubstitutionMap::verify() for release builds ExprSubstitutionMap::compose() and combine() call verify() to check the new ExprSubstitutionMap for duplicates. This algorithm is O(n^2) and can add significant overhead to SQLs with a large number of expressions or inline views. This changes verify() to skip the check for release builds (keeping it for debug builds). In a query with 20+ layers of inline views and thousands of expressions, turning off the verify() call cuts the execution time from 51 minutes to 18 minutes. This doesn't fully solve slowness in ExprSubstitutionMap. Further improvement would require Expr to support hash-based algorithms, which is a much larger change. Testing: - Manual performance comparison with/without the verify() call Change-Id: Ieeacfec6a5b487076ce5b19747319630616411f0 Reviewed-on: http://gerrit.cloudera.org:8080/21444 Reviewed-by: Joe McDonnell Tested-by: Impala Public Jenkins > Queries with many nested inline views see performance issues with > ExprSubstitutionMap > - > > Key: IMPALA-12800 > URL: https://issues.apache.org/jira/browse/IMPALA-12800 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Affects Versions: Impala 4.3.0 >Reporter: Joe McDonnell >Priority: Critical > Attachments: impala12800repro.sql, impala12800schema.sql, > long_query_jstacks.tar.gz > > > A user running a query with many layers of inline views saw a large amount of > time spent in analysis. > > {noformat} > - Authorization finished (ranger): 7s518ms (13.134ms) > - Value transfer graph computed: 7s760ms (241.953ms) > - Single node plan created: 2m47s (2m39s) > - Distributed plan created: 2m47s (7.430ms) > - Lineage info computed: 2m47s (39.017ms) > - Planning finished: 2m47s (672.518ms){noformat} > In reproducing it locally, we found that most of the stacks end up in > ExprSubstitutionMap. > > Here are the main stacks seen while running jstack every 3 seconds during a > 75 second execution: > Location 1: (ExprSubstitutionMap::compose -> contains -> indexOf -> Expr > equals) (4 samples) > {noformat} > java.lang.Thread.State: RUNNABLE > at org.apache.impala.analysis.Expr.equals(Expr.java:1008) > at java.util.ArrayList.indexOf(ArrayList.java:323) > at java.util.ArrayList.contains(ArrayList.java:306) > at > org.apache.impala.analysis.ExprSubstitutionMap.compose(ExprSubstitutionMap.java:120){noformat} > Location 2: (ExprSubstitutionMap::compose -> verify -> Expr equals) (9 > samples) > {noformat} > java.lang.Thread.State: RUNNABLE > at org.apache.impala.analysis.Expr.equals(Expr.java:1008) > at > org.apache.impala.analysis.ExprSubstitutionMap.verify(ExprSubstitutionMap.java:173) > at > org.apache.impala.analysis.ExprSubstitutionMap.compose(ExprSubstitutionMap.java:126){noformat} > Location 3: (ExprSubstitutionMap::combine -> verify -> Expr equals) (5 > samples) > {noformat} > java.lang.Thread.State: RUNNABLE > at org.apache.impala.analysis.Expr.equals(Expr.java:1008) > at > org.apache.impala.analysis.ExprSubstitutionMap.verify(ExprSubstitutionMap.java:173) > at > org.apache.impala.analysis.ExprSubstitutionMap.combine(ExprSubstitutionMap.java:143){noformat} > Location 4: (TupleIsNullPredicate.wrapExprs -> Analyzer.isTrueWithNullSlots > -> FeSupport.EvalPredicate -> Thrift serialization) (4 samples) > {noformat} > java.lang.Thread.State: RUNNABLE > at java.lang.StringCoding.encode(StringCoding.java:364) > at java.lang.String.getBytes(String.java:941) > at > org.apache.thrift.protocol.TBinaryProtocol.writeString(TBinaryProtocol.java:227) > at > org.apache.impala.thrift.TClientRequest$TClientRequestStandardScheme.write(TClientRequest.java:532) > at > org.apache.impala.thrift.TClientRequest$TClientRequestStandardScheme.write(TClientRequest.java:467) > at org.apache.impala.thrift.TClientRequest.write(TClientRequest.java:394) > at > org.apache.impala.thrift.TQueryCtx$TQueryCtxStandardScheme.write(TQueryCtx.java:3034) > at > org.apache.impala.thrift.TQueryCtx$TQueryCtxStandardScheme.write(TQueryCtx.java:2709) > at org.apache.impala.thrift.TQueryCtx.write(TQueryCtx.java:2400) > at org.apache.thrift.TSerializer.serialize(TSerializer.java:84) > at >
[jira] [Commented] (IMPALA-13079) Add support for FLOAT/DOUBLE in Iceberg metadata tables
[ https://issues.apache.org/jira/browse/IMPALA-13079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848164#comment-17848164 ] ASF subversion and git services commented on IMPALA-13079: -- Commit e5fdcb4f4b7e2f37e5f7bb357eede8092de8f429 in impala's branch refs/heads/master from Daniel Becker [ https://gitbox.apache.org/repos/asf?p=impala.git;h=e5fdcb4f4 ] IMPALA-13091: query_test.test_iceberg.TestIcebergV2Table.test_metadata_tables fails on an expected constant IMPALA-13079 added a test in iceberg-metadata-tables.test that included assertions about values that can change across builds, e.g. file sizes, which caused test failures. This commit fixes it by doing two things: 1. narrowing down the result set of the query to the column that the test is really about - this removes some of the problematic values 2. using regexes for the remaining problematic values. Change-Id: Ic056079eed87a68afa95cd111ce2037314cd9620 Reviewed-on: http://gerrit.cloudera.org:8080/21440 Tested-by: Impala Public Jenkins Reviewed-by: Riza Suminto > Add support for FLOAT/DOUBLE in Iceberg metadata tables > --- > > Key: IMPALA-13079 > URL: https://issues.apache.org/jira/browse/IMPALA-13079 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Daniel Becker >Assignee: Daniel Becker >Priority: Major > Labels: impala-iceberg > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13091) query_test.test_iceberg.TestIcebergV2Table.test_metadata_tables fails on an expected constant
[ https://issues.apache.org/jira/browse/IMPALA-13091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848163#comment-17848163 ] ASF subversion and git services commented on IMPALA-13091: -- Commit e5fdcb4f4b7e2f37e5f7bb357eede8092de8f429 in impala's branch refs/heads/master from Daniel Becker [ https://gitbox.apache.org/repos/asf?p=impala.git;h=e5fdcb4f4 ] IMPALA-13091: query_test.test_iceberg.TestIcebergV2Table.test_metadata_tables fails on an expected constant IMPALA-13079 added a test in iceberg-metadata-tables.test that included assertions about values that can change across builds, e.g. file sizes, which caused test failures. This commit fixes it by doing two things: 1. narrowing down the result set of the query to the column that the test is really about - this removes some of the problematic values 2. using regexes for the remaining problematic values. Change-Id: Ic056079eed87a68afa95cd111ce2037314cd9620 Reviewed-on: http://gerrit.cloudera.org:8080/21440 Tested-by: Impala Public Jenkins Reviewed-by: Riza Suminto > query_test.test_iceberg.TestIcebergV2Table.test_metadata_tables fails on an > expected constant > - > > Key: IMPALA-13091 > URL: https://issues.apache.org/jira/browse/IMPALA-13091 > Project: IMPALA > Issue Type: Bug >Affects Versions: Impala 4.5.0 >Reporter: Laszlo Gaal >Assignee: Daniel Becker >Priority: Critical > Labels: impala-iceberg > > This fails in various sanitizer builds (ASAN, UBSAN): > Failure report:{code} > query_test/test_iceberg.py:1527: in test_metadata_tables > '$OVERWRITE_SNAPSHOT_TS': str(overwrite_snapshot_ts.data[0])}) > common/impala_test_suite.py:820: in run_test_case > self.__verify_results_and_errors(vector, test_section, result, use_db) > common/impala_test_suite.py:627: in __verify_results_and_errors > replace_filenames_with_placeholder) > common/test_result_verifier.py:520: in verify_raw_results > VERIFIER_MAP[verifier](expected, actual) > common/test_result_verifier.py:313: in verify_query_result_is_equal > assert expected_results == actual_results > E assert Comparing QueryTestResults (expected vs actual): > E > 0,regex:'.*\.parquet','PARQUET',0,3,3648,'{1:32,2:63,3:71,4:43,5:55,6:47,7:39,8:58,9:47,13:63,14:96,15:75,16:78}','{1:3,2:3,3:3,4:3,5:3,6:3,7:3,8:3,9:3,13:3,14:6,15:6,16:6}','{1:1,2:0,3:0,4:0,5:0,6:1,7:1,8:1,9:1,13:0,14:0,15:0,16:0}','{16:0,4:1,5:1,14:0}','{1:"AA==",2:"AQ==",3:"9v8=",4:"/+ZbLw==",5:"MAWO5C7/O6s=",6:"AFgLImsYBgA=",7:"kU0AAA==",8:"QSBzdHJpbmc=",9:"YmluMQ==",13:"av///w==",14:"fcOUJa1JwtQ=",16:"Pw=="}','{1:"AQ==",2:"BQ==",3:"lgA=",4:"qV/jWA==",5:"fcOUJa1JwlQ=",6:"AMhZw6A3BgA=",7:"Hk8AAA==",8:"U29tZSBzdHJpbmc=",9:"YmluMg==",13:"Cg==",14:"NEA=",16:"AAB6RA=="}','NULL','[4]','NULL',0,'{"arr.element":{"column_size":96,"value_count":6,"null_value_count":0,"nan_value_count":0,"lower_bound":-2e+100,"upper_bound":20},"b":{"column_size":32,"value_count":3,"null_value_count":1,"nan_value_count":null,"lower_bound":false,"upper_bound":true},"bn":{"column_size":47,"value_count":3,"null_value_count":1,"nan_value_count":null,"lower_bound":"YmluMQ==","upper_bound":"YmluMg=="},"d":{"column_size":55,"value_count":3,"null_value_count":0,"nan_value_count":1,"lower_bound":-2e-100,"upper_bound":2e+100},"dt":{"column_size":39,"value_count":3,"null_value_count":1,"nan_value_count":null,"lower_bound":"2024-05-14","upper_bound":"2025-06-15"},"f":{"column_size":43,"value_count":3,"null_value_count":0,"nan_value_count":1,"lower_bound":2.00026702864e-10,"upper_bound":199973982208},"i":{"column_size":63,"value_count":3,"null_value_count":0,"nan_value_count":null,"lower_bound":1,"upper_bound":5},"l":{"column_size":71,"value_count":3,"null_value_count":0,"nan_value_count":null,"lower_bound":-10,"upper_bound":150},"mp.key":{"column_size":75,"value_count":6,"null_value_count":0,"nan_value_count":null,"lower_bound":null,"upper_bound":null},"mp.value":{"column_size":78,"value_count":6,"null_value_count":0,"nan_value_count":0,"lower_bound":0.5,"upper_bound":1000},"s":{"column_size":58,"value_count":3,"null_value_count":1,"nan_value_count":null,"lower_bound":"A > string","upper_bound":"Some > string"},"strct.i":{"column_size":63,"value_count":3,"null_value_count":0,"nan_value_count":null,"lower_bound":-150,"upper_bound":10},"ts":{"column_size":47,"value_count":3,"null_value_count":1,"nan_value_count":null,"lower_bound":"2024-05-14 > 14:51:12","upper_bound":"2025-06-15 18:51:12"}}' != >
[jira] [Commented] (IMPALA-12362) Improve Linux packaging support.
[ https://issues.apache.org/jira/browse/IMPALA-12362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848162#comment-17848162 ] ASF subversion and git services commented on IMPALA-12362: -- Commit a5e5aa16d887faedee4eea1bc809fba41d758f5b in impala's branch refs/heads/branch-3.4.2 from Xiang Yang [ https://gitbox.apache.org/repos/asf?p=impala.git;h=a5e5aa16d ] IMPALA-12362: (part-4/4) Refactor linux packaging related cmake files. Independent linux packaging related content to package/CMakeLists.txt to make it more clearly. This patch also add LICENSE and NOTICE file in the final package. Testing: - Manually deploy package on Ubuntu22.04 and verify it. Backport note for 3.4.x: - Resolved conflicts in CMakeLists.txt and modified package/CMakeLists.txt accordingly. Change-Id: If3914dcda69f81a735cdf70d76c59fa09454777b Reviewed-on: http://gerrit.cloudera.org:8080/20263 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins Reviewed-on: http://gerrit.cloudera.org:8080/21410 Reviewed-by: Xiang Yang Reviewed-by: Zihao Ye Tested-by: Quanlong Huang > Improve Linux packaging support. > > > Key: IMPALA-12362 > URL: https://issues.apache.org/jira/browse/IMPALA-12362 > Project: IMPALA > Issue Type: Improvement >Reporter: XiangYang >Assignee: XiangYang >Priority: Major > Fix For: Impala 4.4.0 > > > including: > (part-1/4) Refactor service management scripts. > (part-2/4) Optimize default configurations for packaging module. > (part-3/4) Add admissiond service and impala-profile-tool to packaging module. > (part-4/4) Refactor linux packaging related cmake files, add LICENSE and > NOTICE files. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13020) catalog-topic updates >2GB do not work due to Thrift's max message size
[ https://issues.apache.org/jira/browse/IMPALA-13020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847651#comment-17847651 ] ASF subversion and git services commented on IMPALA-13020: -- Commit c8415513158842e2ddb1d64891298d76fb0b367f in impala's branch refs/heads/branch-4.4.0 from Joe McDonnell [ https://gitbox.apache.org/repos/asf?p=impala.git;h=c84155131 ] IMPALA-13020 (part 1): Change thrift_rpc_max_message_size to int64_t Thrift 0.16.0 introduced a max message size to protect receivers against a malicious message allocating large amounts of memory. That limit is a 32-bit signed integer, so the max value is 2GB. Impala introduced the thrift_rpc_max_message_size startup option to set that for Impala's thrift servers. There are times when Impala wants to send a message that is larger than 2GB. In particular, the catalog-update topic for the statestore can exceed 2GBs when there is a lot of metadata loaded using the old v1 catalog. When there is a 2GB max message size, the statestore can create and send a >2GB message, but the impalads will reject it. This can lead to impalads having stale metadata. This switches to a patched Thrift that uses an int64_t for the max message size for C++ code. It does not modify the limit. The MaxMessageSize error was being swallowed in TAcceptQueueServer.cpp, so this fixes that location to always print MaxMessageSize exceptions. This is only patching the Thrift C++ library. It does not patch the Thrift Java library. There are a few reasons for that: - This specific issue involves C++ to C++ communication and will be solved by patching the C++ library. - C++ is easy to patch as it is built via the native-toolchain. There is no corresponding mechanism for patching our Java dependencies (though one could be developed). - Java modifications have implications for other dependencies like Hive which use Thrift to communicate with HMS. For the Java code that uses max message size, this converts the 64-bit value to 32-bit value by capping the value at Integer.MAX_VALUE. Testing: - Added enough tables to produce a >2GB catalog-topic and restarted an impalad with a higher limit specific. Without the patch, the catalog-topic update would be rejected by the impalad. With the patch, it succeeds. Change-Id: I681b1849cc565dcb25de8c070c18776ce69cbb87 Reviewed-on: http://gerrit.cloudera.org:8080/21367 Reviewed-by: Michael Smith Reviewed-by: Joe McDonnell Tested-by: Joe McDonnell (cherry picked from commit 13df8239d82a61afc3196295a7878ca2ffe91873) > catalog-topic updates >2GB do not work due to Thrift's max message size > --- > > Key: IMPALA-13020 > URL: https://issues.apache.org/jira/browse/IMPALA-13020 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.2.0, Impala 4.3.0 >Reporter: Joe McDonnell >Assignee: Joe McDonnell >Priority: Critical > Fix For: Impala 4.5.0 > > > Thrift 0.16.0 added a max message size to protect against malicious packets > that can consume a large amount of memory on the receiver side. This max > message size is a signed 32-bit integer, so it maxes out at 2GB (which we set > via thrift_rpc_max_message_size). > In catalog v1, the catalog-update statestore topic can become larger than 2GB > when there are a large number of tables / partitions / files. If this happens > and an Impala coordinator needs to start up (or needs a full topic update for > any other reason), it is expecting the statestore to send it the full topic > update, but the coordinator actually can't process the message. The > deserialization of the message hits the 2GB max message size limit and fails. > On the statestore side, it shows this message: > {noformat} > I0418 16:54:51.727290 3844140 statestore.cc:507] Preparing initial > catalog-update topic update for > impa...@mcdonnellthrift.vpc.cloudera.com:27000. Size = 2.27 GB > I0418 16:54:53.889446 3844140 thrift-util.cc:198] TSocket::write_partial() > send() : Broken pipe > I0418 16:54:53.889488 3844140 client-cache.cc:82] ReopenClient(): re-creating > client for mcdonnellthrift.vpc.cloudera.com:23000 > I0418 16:54:53.889493 3844140 thrift-util.cc:198] TSocket::write_partial() > send() : Broken pipe > I0418 16:54:53.889503 3844140 thrift-client.cc:116] Error closing connection > to: mcdonnellthrift.vpc.cloudera.com:23000, ignoring (write() send(): Broken > pipe) > I0418 16:54:56.052882 3844140 thrift-util.cc:198] TSocket::write_partial() > send() : Broken pipe > I0418 16:54:56.052932 3844140 client-cache.h:363] RPC Error: Client for > mcdonnellthrift.vpc.cloudera.com:23000 hit an unexpected exception: write() > send(): Broken pipe, type: N6apache6thrift9transport19TTransportExceptionE, >
[jira] [Commented] (IMPALA-13020) catalog-topic updates >2GB do not work due to Thrift's max message size
[ https://issues.apache.org/jira/browse/IMPALA-13020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847652#comment-17847652 ] ASF subversion and git services commented on IMPALA-13020: -- Commit c9745fd5b941f52b3cd3496c425722fcbbffe894 in impala's branch refs/heads/branch-4.4.0 from Joe McDonnell [ https://gitbox.apache.org/repos/asf?p=impala.git;h=c9745fd5b ] IMPALA-13020 (part 2): Split out external vs internal Thrift max message size The Thrift max message size is designed to protect against malicious messages that consume a lot of memory on the receiver. This is an important security measure for externally facing services, but it can interfere with internal communication within the cluster. Currently, the max message size is controlled by a single startup flag for both. This puts tensions between having a low value to protect against malicious messages versus having a high value to avoid issues with internal communication (e.g. large statestore updates). This introduces a new flag thrift_external_rpc_max_message_size to specify the limit for externally-facing services. The current thrift_rpc_max_message_size now applies only for internal services. Splitting them apart allows setting a much higher value for internal services (64GB) while leaving the externally facing services using the current 2GB limit. This modifies various code locations that wrap a Thrift transport to pass in the original transport's TConfiguration. This also adds DCHECKs to make sure that the new transport inherits the max message size. This limits the locations where we actually need to set max message size. ThriftServer/ThriftServerBuilder have a setting "is_external_facing" which can be specified on each ThriftServer. This modifies statestore and catalog to set is_external_facing to false. All other servers stay with the default of true. Testing: - This adds a test case to verify that is_external_facing uses the higher limit. - Ran through the steps in testdata/scale_test_metadata/README.md and updated the value in that doc. - Created many tables to push the catalog-update topic to be >2GB and verified that statestore successfully sends it when an impalad restarts. Change-Id: Ib9a649ef49a8a99c7bd9a1b73c37c4c621661311 Reviewed-on: http://gerrit.cloudera.org:8080/21420 Tested-by: Impala Public Jenkins Reviewed-by: Riza Suminto Reviewed-by: Michael Smith (cherry picked from commit bcff4df6194b2f192d937bb9c031721feccb69df) > catalog-topic updates >2GB do not work due to Thrift's max message size > --- > > Key: IMPALA-13020 > URL: https://issues.apache.org/jira/browse/IMPALA-13020 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.2.0, Impala 4.3.0 >Reporter: Joe McDonnell >Assignee: Joe McDonnell >Priority: Critical > Fix For: Impala 4.5.0 > > > Thrift 0.16.0 added a max message size to protect against malicious packets > that can consume a large amount of memory on the receiver side. This max > message size is a signed 32-bit integer, so it maxes out at 2GB (which we set > via thrift_rpc_max_message_size). > In catalog v1, the catalog-update statestore topic can become larger than 2GB > when there are a large number of tables / partitions / files. If this happens > and an Impala coordinator needs to start up (or needs a full topic update for > any other reason), it is expecting the statestore to send it the full topic > update, but the coordinator actually can't process the message. The > deserialization of the message hits the 2GB max message size limit and fails. > On the statestore side, it shows this message: > {noformat} > I0418 16:54:51.727290 3844140 statestore.cc:507] Preparing initial > catalog-update topic update for > impa...@mcdonnellthrift.vpc.cloudera.com:27000. Size = 2.27 GB > I0418 16:54:53.889446 3844140 thrift-util.cc:198] TSocket::write_partial() > send() : Broken pipe > I0418 16:54:53.889488 3844140 client-cache.cc:82] ReopenClient(): re-creating > client for mcdonnellthrift.vpc.cloudera.com:23000 > I0418 16:54:53.889493 3844140 thrift-util.cc:198] TSocket::write_partial() > send() : Broken pipe > I0418 16:54:53.889503 3844140 thrift-client.cc:116] Error closing connection > to: mcdonnellthrift.vpc.cloudera.com:23000, ignoring (write() send(): Broken > pipe) > I0418 16:54:56.052882 3844140 thrift-util.cc:198] TSocket::write_partial() > send() : Broken pipe > I0418 16:54:56.052932 3844140 client-cache.h:363] RPC Error: Client for > mcdonnellthrift.vpc.cloudera.com:23000 hit an unexpected exception: write() > send(): Broken pipe, type: N6apache6thrift9transport19TTransportExceptionE, > rpc: N6impala20TUpdateStateResponseE, send: not done > I0418 16:54:56.052937
[jira] [Commented] (IMPALA-13020) catalog-topic updates >2GB do not work due to Thrift's max message size
[ https://issues.apache.org/jira/browse/IMPALA-13020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847419#comment-17847419 ] ASF subversion and git services commented on IMPALA-13020: -- Commit 13df8239d82a61afc3196295a7878ca2ffe91873 in impala's branch refs/heads/master from Joe McDonnell [ https://gitbox.apache.org/repos/asf?p=impala.git;h=13df8239d ] IMPALA-13020 (part 1): Change thrift_rpc_max_message_size to int64_t Thrift 0.16.0 introduced a max message size to protect receivers against a malicious message allocating large amounts of memory. That limit is a 32-bit signed integer, so the max value is 2GB. Impala introduced the thrift_rpc_max_message_size startup option to set that for Impala's thrift servers. There are times when Impala wants to send a message that is larger than 2GB. In particular, the catalog-update topic for the statestore can exceed 2GBs when there is a lot of metadata loaded using the old v1 catalog. When there is a 2GB max message size, the statestore can create and send a >2GB message, but the impalads will reject it. This can lead to impalads having stale metadata. This switches to a patched Thrift that uses an int64_t for the max message size for C++ code. It does not modify the limit. The MaxMessageSize error was being swallowed in TAcceptQueueServer.cpp, so this fixes that location to always print MaxMessageSize exceptions. This is only patching the Thrift C++ library. It does not patch the Thrift Java library. There are a few reasons for that: - This specific issue involves C++ to C++ communication and will be solved by patching the C++ library. - C++ is easy to patch as it is built via the native-toolchain. There is no corresponding mechanism for patching our Java dependencies (though one could be developed). - Java modifications have implications for other dependencies like Hive which use Thrift to communicate with HMS. For the Java code that uses max message size, this converts the 64-bit value to 32-bit value by capping the value at Integer.MAX_VALUE. Testing: - Added enough tables to produce a >2GB catalog-topic and restarted an impalad with a higher limit specific. Without the patch, the catalog-topic update would be rejected by the impalad. With the patch, it succeeds. Change-Id: I681b1849cc565dcb25de8c070c18776ce69cbb87 Reviewed-on: http://gerrit.cloudera.org:8080/21367 Reviewed-by: Michael Smith Reviewed-by: Joe McDonnell Tested-by: Joe McDonnell > catalog-topic updates >2GB do not work due to Thrift's max message size > --- > > Key: IMPALA-13020 > URL: https://issues.apache.org/jira/browse/IMPALA-13020 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.2.0, Impala 4.3.0 >Reporter: Joe McDonnell >Priority: Critical > > Thrift 0.16.0 added a max message size to protect against malicious packets > that can consume a large amount of memory on the receiver side. This max > message size is a signed 32-bit integer, so it maxes out at 2GB (which we set > via thrift_rpc_max_message_size). > In catalog v1, the catalog-update statestore topic can become larger than 2GB > when there are a large number of tables / partitions / files. If this happens > and an Impala coordinator needs to start up (or needs a full topic update for > any other reason), it is expecting the statestore to send it the full topic > update, but the coordinator actually can't process the message. The > deserialization of the message hits the 2GB max message size limit and fails. > On the statestore side, it shows this message: > {noformat} > I0418 16:54:51.727290 3844140 statestore.cc:507] Preparing initial > catalog-update topic update for > impa...@mcdonnellthrift.vpc.cloudera.com:27000. Size = 2.27 GB > I0418 16:54:53.889446 3844140 thrift-util.cc:198] TSocket::write_partial() > send() : Broken pipe > I0418 16:54:53.889488 3844140 client-cache.cc:82] ReopenClient(): re-creating > client for mcdonnellthrift.vpc.cloudera.com:23000 > I0418 16:54:53.889493 3844140 thrift-util.cc:198] TSocket::write_partial() > send() : Broken pipe > I0418 16:54:53.889503 3844140 thrift-client.cc:116] Error closing connection > to: mcdonnellthrift.vpc.cloudera.com:23000, ignoring (write() send(): Broken > pipe) > I0418 16:54:56.052882 3844140 thrift-util.cc:198] TSocket::write_partial() > send() : Broken pipe > I0418 16:54:56.052932 3844140 client-cache.h:363] RPC Error: Client for > mcdonnellthrift.vpc.cloudera.com:23000 hit an unexpected exception: write() > send(): Broken pipe, type: N6apache6thrift9transport19TTransportExceptionE, > rpc: N6impala20TUpdateStateResponseE, send: not done > I0418 16:54:56.052937 3844140 client-cache.cc:174] Broken Connection, destroy > client for
[jira] [Commented] (IMPALA-13020) catalog-topic updates >2GB do not work due to Thrift's max message size
[ https://issues.apache.org/jira/browse/IMPALA-13020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847420#comment-17847420 ] ASF subversion and git services commented on IMPALA-13020: -- Commit bcff4df6194b2f192d937bb9c031721feccb69df in impala's branch refs/heads/master from Joe McDonnell [ https://gitbox.apache.org/repos/asf?p=impala.git;h=bcff4df61 ] IMPALA-13020 (part 2): Split out external vs internal Thrift max message size The Thrift max message size is designed to protect against malicious messages that consume a lot of memory on the receiver. This is an important security measure for externally facing services, but it can interfere with internal communication within the cluster. Currently, the max message size is controlled by a single startup flag for both. This puts tensions between having a low value to protect against malicious messages versus having a high value to avoid issues with internal communication (e.g. large statestore updates). This introduces a new flag thrift_external_rpc_max_message_size to specify the limit for externally-facing services. The current thrift_rpc_max_message_size now applies only for internal services. Splitting them apart allows setting a much higher value for internal services (64GB) while leaving the externally facing services using the current 2GB limit. This modifies various code locations that wrap a Thrift transport to pass in the original transport's TConfiguration. This also adds DCHECKs to make sure that the new transport inherits the max message size. This limits the locations where we actually need to set max message size. ThriftServer/ThriftServerBuilder have a setting "is_external_facing" which can be specified on each ThriftServer. This modifies statestore and catalog to set is_external_facing to false. All other servers stay with the default of true. Testing: - This adds a test case to verify that is_external_facing uses the higher limit. - Ran through the steps in testdata/scale_test_metadata/README.md and updated the value in that doc. - Created many tables to push the catalog-update topic to be >2GB and verified that statestore successfully sends it when an impalad restarts. Change-Id: Ib9a649ef49a8a99c7bd9a1b73c37c4c621661311 Reviewed-on: http://gerrit.cloudera.org:8080/21420 Tested-by: Impala Public Jenkins Reviewed-by: Riza Suminto Reviewed-by: Michael Smith > catalog-topic updates >2GB do not work due to Thrift's max message size > --- > > Key: IMPALA-13020 > URL: https://issues.apache.org/jira/browse/IMPALA-13020 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.2.0, Impala 4.3.0 >Reporter: Joe McDonnell >Priority: Critical > > Thrift 0.16.0 added a max message size to protect against malicious packets > that can consume a large amount of memory on the receiver side. This max > message size is a signed 32-bit integer, so it maxes out at 2GB (which we set > via thrift_rpc_max_message_size). > In catalog v1, the catalog-update statestore topic can become larger than 2GB > when there are a large number of tables / partitions / files. If this happens > and an Impala coordinator needs to start up (or needs a full topic update for > any other reason), it is expecting the statestore to send it the full topic > update, but the coordinator actually can't process the message. The > deserialization of the message hits the 2GB max message size limit and fails. > On the statestore side, it shows this message: > {noformat} > I0418 16:54:51.727290 3844140 statestore.cc:507] Preparing initial > catalog-update topic update for > impa...@mcdonnellthrift.vpc.cloudera.com:27000. Size = 2.27 GB > I0418 16:54:53.889446 3844140 thrift-util.cc:198] TSocket::write_partial() > send() : Broken pipe > I0418 16:54:53.889488 3844140 client-cache.cc:82] ReopenClient(): re-creating > client for mcdonnellthrift.vpc.cloudera.com:23000 > I0418 16:54:53.889493 3844140 thrift-util.cc:198] TSocket::write_partial() > send() : Broken pipe > I0418 16:54:53.889503 3844140 thrift-client.cc:116] Error closing connection > to: mcdonnellthrift.vpc.cloudera.com:23000, ignoring (write() send(): Broken > pipe) > I0418 16:54:56.052882 3844140 thrift-util.cc:198] TSocket::write_partial() > send() : Broken pipe > I0418 16:54:56.052932 3844140 client-cache.h:363] RPC Error: Client for > mcdonnellthrift.vpc.cloudera.com:23000 hit an unexpected exception: write() > send(): Broken pipe, type: N6apache6thrift9transport19TTransportExceptionE, > rpc: N6impala20TUpdateStateResponseE, send: not done > I0418 16:54:56.052937 3844140 client-cache.cc:174] Broken Connection, destroy > client for mcdonnellthrift.vpc.cloudera.com:23000{noformat} > On the Impala side, it doesn't
[jira] [Commented] (IMPALA-13055) Some Iceberg metadata table tests doesn't assert
[ https://issues.apache.org/jira/browse/IMPALA-13055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847380#comment-17847380 ] ASF subversion and git services commented on IMPALA-13055: -- Commit 3a8eb999cbc746c055708425e071c30e3c00422e in impala's branch refs/heads/master from Gabor Kaszab [ https://gitbox.apache.org/repos/asf?p=impala.git;h=3a8eb999c ] IMPALA-13055: Some Iceberg metadata table tests don't assert Some tests in the Iceberg metadata table suite use the following regex to verify numbers in the output: [1-9]\d*|0 However, if this format is given, the test unconditionally passes. This patch changes this format to \d+ and fixes the test results that incorrectly passed before due to the test not asserting. Opened IMPALA-13067 to investigate why the test framework works like this for |0 in the regexes. Change-Id: Ie47093f25a70253b3e6faca27d466d7cf6999fad Reviewed-on: http://gerrit.cloudera.org:8080/21394 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Some Iceberg metadata table tests doesn't assert > > > Key: IMPALA-13055 > URL: https://issues.apache.org/jira/browse/IMPALA-13055 > Project: IMPALA > Issue Type: Test >Reporter: Gabor Kaszab >Priority: Major > Labels: impala-iceberg > > Some test in the Iceberg metadata table suite use the following regex to > verify numbers in the output: [1-9]\d*|0 > However, if this format is given, the test unconditionally passes. On could > put the formula within parentheses, or simply verify for \d+ -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13067) Some regex make the tests unconditionally pass
[ https://issues.apache.org/jira/browse/IMPALA-13067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847381#comment-17847381 ] ASF subversion and git services commented on IMPALA-13067: -- Commit 3a8eb999cbc746c055708425e071c30e3c00422e in impala's branch refs/heads/master from Gabor Kaszab [ https://gitbox.apache.org/repos/asf?p=impala.git;h=3a8eb999c ] IMPALA-13055: Some Iceberg metadata table tests don't assert Some tests in the Iceberg metadata table suite use the following regex to verify numbers in the output: [1-9]\d*|0 However, if this format is given, the test unconditionally passes. This patch changes this format to \d+ and fixes the test results that incorrectly passed before due to the test not asserting. Opened IMPALA-13067 to investigate why the test framework works like this for |0 in the regexes. Change-Id: Ie47093f25a70253b3e6faca27d466d7cf6999fad Reviewed-on: http://gerrit.cloudera.org:8080/21394 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Some regex make the tests unconditionally pass > -- > > Key: IMPALA-13067 > URL: https://issues.apache.org/jira/browse/IMPALA-13067 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Reporter: Gabor Kaszab >Priority: Major > Labels: test-framework > > This issue came out in the Iceberg metadata table tests where this regex was > used: > [1-9]\d*|0 > > The "|0" part for some reason made the test framework confused and then > regardless of what you provide as an expected result the tests passed. One > workaround was to put the regex expression between parentheses. Or simply use > "d+". https://issues.apache.org/jira/browse/IMPALA-13055 applied this second > workaround on the tests. > Some analysis would be great why this is the behavior of the test framework, > and if it's indeed the issue of the framnework, we should fix it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12559) Support x5c Parameter in JSON Web Keys (JWK)
[ https://issues.apache.org/jira/browse/IMPALA-12559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846970#comment-17846970 ] ASF subversion and git services commented on IMPALA-12559: -- Commit 7550eb607c2b92b1367dc5cf5667b681d59a8915 in impala's branch refs/heads/master from wzhou-code [ https://gitbox.apache.org/repos/asf?p=impala.git;h=7550eb607 ] IMPALA-12559 (part 2): Fix build issue for different versions of openssl Previous patch calls OpenSSL API X509_get0_tbs_sigalg() which is not available in the version of OpenSSL in ToolChain. It causes build failures. This patch fixes the issue by calling X509_get_signature_nid(). Testing: - Passed jwt-test unit-test and end-end unit-test. Change-Id: I62b9f0c00f91c2b13be30c415e3f1ebd0e1bd2bc Reviewed-on: http://gerrit.cloudera.org:8080/21432 Reviewed-by: gaurav singh Tested-by: Impala Public Jenkins Reviewed-by: Abhishek Rawat > Support x5c Parameter in JSON Web Keys (JWK) > > > Key: IMPALA-12559 > URL: https://issues.apache.org/jira/browse/IMPALA-12559 > Project: IMPALA > Issue Type: Bug > Components: be, Security >Reporter: Jason Fehr >Assignee: gaurav singh >Priority: Critical > Labels: JWT, jwt, security > > The ["x5u"|https://datatracker.ietf.org/doc/html/rfc7517#section-4.6], > ["x5c"|https://datatracker.ietf.org/doc/html/rfc7517#section-4.7], > ["x5t"|https://datatracker.ietf.org/doc/html/rfc7517#section-4.8], and > ["x5t#S256|https://datatracker.ietf.org/doc/html/rfc7517#section-4.9] > parameters in JWKs is not supported by Impala. Implement support for this > parameter using the available methods in the [Thalhammer/jwt-cpp > library|https://github.com/Thalhammer/jwt-cpp/blob/ce1f9df3a9f861d136d6f0c93a6f811c364d1d3d/example/jwks-verify.cpp]. > Note: If the "alg" property is specified and so is "x5u" or "x5c", then the > value of the "alg" property must match the algorithm on the certificate from > the "x5u" or "x5c" property. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12559) Support x5c Parameter in JSON Web Keys (JWK)
[ https://issues.apache.org/jira/browse/IMPALA-12559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846760#comment-17846760 ] ASF subversion and git services commented on IMPALA-12559: -- Commit 34c084cebb2f52a6ee11d3d93609b3e4e238816f in impala's branch refs/heads/master from gaurav1086 [ https://gitbox.apache.org/repos/asf?p=impala.git;h=34c084ceb ] IMPALA-12559: Support x5c Parameter for RSA JSON Web Keys This enables the jwt verification using the x5c certificate(s) in the RSA jwks keys. The x5c claim can be part of the jwks either as a string or an array. This patch only supports a single x5c certificate per jwk. If the "x5c" is present and "alg" is not present, then "alg" is extracted from the "x5c" certificate using the signature algorithm. However, if "x5c" is not preseent, then "alg" is a mandatory field on jwk. Current mapping of signature algorithm string => algorithm: sha256WithRSAEncryption => rs256 sha384WithRSAEncryption => rs384 sha512WithRSAEncryption => rs512 If "x5c" is present, then it is given priority over other mandatory fields like "n", "e" to construct the public key. Testing: * added unit test VerifyJwtTokenWithx5cCertificate to verify jwt with x5c certificate. * added unit test VerifyJwtTokenWithx5cCertificateWithoutAlg to verify jwt with x5c certificate without "alg". * added e2e test testJwtAuthWithJwksX5cHttpUrl to verify jwt with x5c certificate. Change-Id: I70be6f9f54190544aa005b2644e2ed8db6f6bb74 Reviewed-on: http://gerrit.cloudera.org:8080/21382 Reviewed-by: Jason Fehr Reviewed-by: Wenzhe Zhou Tested-by: Impala Public Jenkins > Support x5c Parameter in JSON Web Keys (JWK) > > > Key: IMPALA-12559 > URL: https://issues.apache.org/jira/browse/IMPALA-12559 > Project: IMPALA > Issue Type: Bug > Components: be, Security >Reporter: Jason Fehr >Assignee: gaurav singh >Priority: Critical > Labels: JWT, jwt, security > > The ["x5u"|https://datatracker.ietf.org/doc/html/rfc7517#section-4.6], > ["x5c"|https://datatracker.ietf.org/doc/html/rfc7517#section-4.7], > ["x5t"|https://datatracker.ietf.org/doc/html/rfc7517#section-4.8], and > ["x5t#S256|https://datatracker.ietf.org/doc/html/rfc7517#section-4.9] > parameters in JWKs is not supported by Impala. Implement support for this > parameter using the available methods in the [Thalhammer/jwt-cpp > library|https://github.com/Thalhammer/jwt-cpp/blob/ce1f9df3a9f861d136d6f0c93a6f811c364d1d3d/example/jwks-verify.cpp]. > Note: If the "alg" property is specified and so is "x5u" or "x5c", then the > value of the "alg" property must match the algorithm on the certificate from > the "x5u" or "x5c" property. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13079) Add support for FLOAT/DOUBLE in Iceberg metadata tables
[ https://issues.apache.org/jira/browse/IMPALA-13079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846761#comment-17846761 ] ASF subversion and git services commented on IMPALA-13079: -- Commit bbfba13ed4d084681b542d7c5e1b5156576a603b in impala's branch refs/heads/master from Daniel Becker [ https://gitbox.apache.org/repos/asf?p=impala.git;h=bbfba13ed ] IMPALA-13079: Add support for FLOAT/DOUBLE in Iceberg metadata tables Until now, the float and double data types were not supported in Iceberg metadata tables. This commit adds support for them. Testing: - added a test table that contains all primitive types (except for decimal, which is still not supported), a struct, an array and a map - added a test query that queries the `files` metadata table of the above table - the 'readable_metrics' struct contains lower and upper bounds for all columns in the original table, with the original type Change-Id: I2171c9aa9b6d2b634b8c511263b1610cb1d7cb29 Reviewed-on: http://gerrit.cloudera.org:8080/21425 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Add support for FLOAT/DOUBLE in Iceberg metadata tables > --- > > Key: IMPALA-13079 > URL: https://issues.apache.org/jira/browse/IMPALA-13079 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Daniel Becker >Assignee: Daniel Becker >Priority: Major > Labels: impala-iceberg > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9577) Use `system_unsync` time for Kudu test clusters
[ https://issues.apache.org/jira/browse/IMPALA-9577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846448#comment-17846448 ] ASF subversion and git services commented on IMPALA-9577: - Commit f507a02b60e905c51e80e6139eef00946cf6d453 in impala's branch refs/heads/branch-3.4.2 from Grant Henke [ https://gitbox.apache.org/repos/asf?p=impala.git;h=f507a02b6 ] IMPALA-9577: [test] Use `system_unsync` time for Kudu test clusters Recently Kudu made enhancements to time source configuration and adjusted the time source for local clusters/tests to `system_unsync`. This patch mirrors that behavior in Impala test clusters given there is no need to require NTP-synchronized clock for a test where all the participating Kudu masters and tablet servers are run at the same node using the same local wallclock. See the Kudu commit here for details: https://github.com/apache/kudu/commit/eb2b70d4b96be2fc2fdd6b3625acc284ac5774be While making this change, I removed all ntp related packages and special handling as they should not be needed in a development environment any more. I also added curl and gawk which were missing in my Docker ubuntu environment and broke my testing. Testing: I tested with the steps below using Docker for Mac: docker rm impala-dev docker volume rm impala docker run --privileged --interactive --tty --name impala-dev -v impala:/home -p 25000:25000 -p 25010:25010 -p 25020:25020 ubuntu:16.04 /bin/bash apt-get update apt-get install sudo adduser --disabled-password --gecos '' impdev echo 'impdev ALL=(ALL) NOPASSWD:ALL' >> /etc/sudoers su - impdev cd ~ sudo apt-get --yes install git git clone https://git-wip-us.apache.org/repos/asf/impala.git ~/Impala cd ~/Impala export IMPALA_HOME=`pwd` git remote add fork https://github.com/granthenke/impala.git git fetch fork git checkout kudu-system-time $IMPALA_HOME/bin/bootstrap_development.sh source $IMPALA_HOME/bin/impala-config.sh (pushd fe && mvn -fae test -Dtest=AnalyzeDDLTest) (pushd fe && mvn -fae test -Dtest=AnalyzeKuduDDLTest) $IMPALA_HOME/bin/start-impala-cluster.py ./tests/run-tests.py query_test/test_kudu.py Change-Id: Id99e5cb58ab988c3ad4f98484be8db193d5eaf99 Reviewed-on: http://gerrit.cloudera.org:8080/15568 Reviewed-by: Impala Public Jenkins Reviewed-by: Alexey Serbin Tested-by: Impala Public Jenkins Reviewed-on: http://gerrit.cloudera.org:8080/21422 Reviewed-by: Alexey Serbin Reviewed-by: Zihao Ye Tested-by: Quanlong Huang > Use `system_unsync` time for Kudu test clusters > --- > > Key: IMPALA-9577 > URL: https://issues.apache.org/jira/browse/IMPALA-9577 > Project: IMPALA > Issue Type: Improvement >Reporter: Grant Henke >Assignee: Grant Henke >Priority: Major > Fix For: Impala 4.0.0 > > > Recently Kudu made enhancements to time source configuration and adjusted the > time source for local clusters/tests to `system_unsync`. Impala should mirror > that behavior in Impala test clusters given there is no need to require > NTP-synchronized clock for a test where all the participating Kudu masters > and tablet servers are run at the same node using the same local wallclock. > > See the Kudu commit here for details: > [https://github.com/apache/kudu/commit/eb2b70d4b96be2fc2fdd6b3625acc284ac5774be] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13051) Speed up test_query_log test runs
[ https://issues.apache.org/jira/browse/IMPALA-13051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846267#comment-17846267 ] ASF subversion and git services commented on IMPALA-13051: -- Commit 3b35ddc8ca7b0e540fc16c413a170a25e164462b in impala's branch refs/heads/master from Michael Smith [ https://gitbox.apache.org/repos/asf?p=impala.git;h=3b35ddc8c ] IMPALA-13051: Speed up, refactor query log tests Sets faster default shutdown_grace_period_s and shutdown_deadline_s when impalad_graceful_shutdown=True in tests. Impala waits until grace period has passed and all queries are stopped (or deadline is exceeded) before flushing the query log, so grace period of 0 is sufficient. Adds them in setup_method to reduce duplication in test declarations. Re-uses TQueryTableColumn Thrift definitions for testing. Moves waiting for query log table to exist to setup_method rather than as a side-effect of get_client. Refactors workload management code to reduce if-clause nesting. Adds functional query workload tests for both the sys.impala_query_log and the sys.impala_query_live tables to assert the names and order of the individual columns within each table. Renames the python tests for the sys.impala_query_log table removing the unnecessary "_query_log_table_" string from the name of each test. Change-Id: I1127ef041a3e024bf2b262767d56ec5f29bf3855 Reviewed-on: http://gerrit.cloudera.org:8080/21358 Tested-by: Impala Public Jenkins Reviewed-by: Riza Suminto > Speed up test_query_log test runs > - > > Key: IMPALA-13051 > URL: https://issues.apache.org/jira/browse/IMPALA-13051 > Project: IMPALA > Issue Type: Task >Affects Versions: Impala 4.4.0 >Reporter: Michael Smith >Assignee: Jason Fehr >Priority: Minor > > test_query_log.py takes 11 minutes to run. Most of them use graceful > shutdown, and provide an unnecessary grace period. Optimize test_query_log > test runs, and do some other code cleanup around workload management. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13061) Query Live table fails to load if default_transactional_type=insert_only set globally
[ https://issues.apache.org/jira/browse/IMPALA-13061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845906#comment-17845906 ] ASF subversion and git services commented on IMPALA-13061: -- Commit 338fedb44703646664e2e22c6e2f35336924db22 in impala's branch refs/heads/branch-4.4.0 from Michael Smith [ https://gitbox.apache.org/repos/asf?p=impala.git;h=338fedb44 ] IMPALA-13061: Create query live as external table Impala determines whether a managed table is transactional based on the 'transactional' table property. It assumes any managed table with transactional=true returns non-null getValidWriteIds. When 'default_transactional_type=insert_only' is set at startup (via default_query_options), impala_query_live is created as a managed table with transactional=true, but SystemTables don't implement getValidWriteIds and are not meant to be transactional. DataSourceTable has a similar problem, and when a JDBC table is created setJdbcDataSourceProperties sets transactional=false. This patch uses CREATE EXTERNAL TABLE sys.impala_Query_live so that it is not created as a managed table and 'transactional' is not set. That avoids creating a SystemTable that Impala can't read (it encounters an IllegalStateException). Change-Id: Ie60a2bd03fabc63c85bcd9fa2489e9d47cd2aa65 Reviewed-on: http://gerrit.cloudera.org:8080/21401 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins (cherry picked from commit 1233ac3c579b5929866dba23debae63e5d2aae90) > Query Live table fails to load if default_transactional_type=insert_only set > globally > - > > Key: IMPALA-13061 > URL: https://issues.apache.org/jira/browse/IMPALA-13061 > Project: IMPALA > Issue Type: Bug >Reporter: Michael Smith >Assignee: Michael Smith >Priority: Critical > Fix For: Impala 4.5.0 > > > If transactional type defaults to insert_only for all queries via > {code} > --default_query_options=default_transactional_type=insert_only > {code} > the table definition for {{sys.impala_query_live}} is set to transactional, > which causes an exception in catalogd > {code} > I0506 22:07:42.808758 3972 jni-util.cc:302] > 4547b965aeebc5f0:8ba96c58] java.lang.IllegalStateException > at > com.google.common.base.Preconditions.checkState(Preconditions.java:496) > at org.apache.impala.catalog.Table.getPartialInfo(Table.java:851) > at > org.apache.impala.catalog.CatalogServiceCatalog.doGetPartialCatalogObject(CatalogServiceCatalog.java:3818) > at > org.apache.impala.catalog.CatalogServiceCatalog.getPartialCatalogObject(CatalogServiceCatalog.java:3714) > at > org.apache.impala.catalog.CatalogServiceCatalog.getPartialCatalogObject(CatalogServiceCatalog.java:3681) > at > org.apache.impala.service.JniCatalog.lambda$getPartialCatalogObject$10(JniCatalog.java:431) > at > org.apache.impala.service.JniCatalogOp.lambda$execAndSerialize$1(JniCatalogOp.java:90) > at org.apache.impala.service.JniCatalogOp.execOp(JniCatalogOp.java:58) > at > org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:89) > at > org.apache.impala.service.JniCatalogOp.execAndSerializeSilentStartAndFinish(JniCatalogOp.java:109) > at > org.apache.impala.service.JniCatalog.execAndSerializeSilentStartAndFinish(JniCatalog.java:253) > at > org.apache.impala.service.JniCatalog.getPartialCatalogObject(JniCatalog.java:430) > {code} > We need to override that setting while creating {{sys.impala_query_live}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13045) Fix intermittent failure in TestQueryLive.test_local_catalog
[ https://issues.apache.org/jira/browse/IMPALA-13045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845905#comment-17845905 ] ASF subversion and git services commented on IMPALA-13045: -- Commit 39233ba3d134b8c18f6f208a7d85c3fadf8ee371 in impala's branch refs/heads/branch-4.4.0 from Michael Smith [ https://gitbox.apache.org/repos/asf?p=impala.git;h=39233ba3d ] IMPALA-13045: Wait for impala_query_live to exist Waits for creation of 'sys.impala_query_live' in tests to ensure it has been registered with HMS. Change-Id: I5cc3fa3c43be7af9a5f097359a0d4f20d057a207 Reviewed-on: http://gerrit.cloudera.org:8080/21372 Reviewed-by: Impala Public Jenkins Tested-by: Michael Smith (cherry picked from commit b35aa819653dce062109e61d8f30171234dce5f9) > Fix intermittent failure in TestQueryLive.test_local_catalog > > > Key: IMPALA-13045 > URL: https://issues.apache.org/jira/browse/IMPALA-13045 > Project: IMPALA > Issue Type: Task >Reporter: Michael Smith >Assignee: Michael Smith >Priority: Major > Fix For: Impala 4.5.0 > > > IMPALA-13005 introduced {{drop table sys.impala_query_live}}. In some test > environments (notably testing with Ozone), recreating that table in the > following test - test_local_catalog - does not occur before running the test > case portion that attempts to query that table. > Update the test to wait for the table to be available. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12910) Run TPCH/TPCDS queries for external JDBC tables
[ https://issues.apache.org/jira/browse/IMPALA-12910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845902#comment-17845902 ] ASF subversion and git services commented on IMPALA-12910: -- Commit 01401a0368cb8f19c86dc3fab764ee4b5732f2f6 in impala's branch refs/heads/branch-4.4.0 from wzhou-code [ https://gitbox.apache.org/repos/asf?p=impala.git;h=01401a036 ] IMPALA-12910: Support running TPCH/TPCDS queries for JDBC tables This patch adds script to create external JDBC tables for the dataset of TPCH and TPCDS, and adds unit-tests to run TPCH and TPCDS queries for external JDBC tables with Impala-Impala federation. Note that JDBC tables are mapping tables, they don't take additional disk spaces. It fixes the race condition when caching of SQL DataSource objects by using a new DataSourceObjectCache class, which checks reference count before closing SQL DataSource. Adds a new query-option 'clean_dbcp_ds_cache' with default value as true. When it's set as false, SQL DataSource object will not be closed when its reference count equals 0 and will be kept in cache until the SQL DataSource is idle for more than 5 minutes. Flag variable 'dbcp_data_source_idle_timeout_s' is added to make the duration configurable. java.sql.Connection.close() fails to remove a closed connection from connection pool sometimes, which causes JDBC working threads to wait for available connections from the connection pool for a long time. The work around is to call BasicDataSource.invalidateConnection() API to close a connection. Two flag variables are added for DBCP configuration properties 'maxTotal' and 'maxWaitMillis'. Note that 'maxActive' and 'maxWait' properties are renamed to 'maxTotal' and 'maxWaitMillis' respectively in apache.commons.dbcp v2. Fixes a bug for database type comparison since the type strings specified by user could be lower case or mix of upper/lower cases, but the code compares the types with upper case string. Fixes issue to close SQL DataSource object in JdbcDataSource.open() and JdbcDataSource.getNext() when some errors returned from DBCP APIs or JDBC drivers. testdata/bin/create-tpc-jdbc-tables.py supports to create JDBC tables for Impala-Impala, Postgres and MySQL. Following sample commands creates TPCDS JDBC tables for Impala-Impala federation with remote coordinator running at 10.19.10.86, and Postgres server running at 10.19.10.86: ${IMPALA_HOME}/testdata/bin/create-tpc-jdbc-tables.py \ --jdbc_db_name=tpcds_jdbc --workload=tpcds \ --database_type=IMPALA --database_host=10.19.10.86 --clean ${IMPALA_HOME}/testdata/bin/create-tpc-jdbc-tables.py \ --jdbc_db_name=tpcds_jdbc --workload=tpcds \ --database_type=POSTGRES --database_host=10.19.10.86 \ --database_name=tpcds --clean TPCDS tests for JDBC tables run only for release/exhaustive builds. TPCH tests for JDBC tables run for core and exhaustive builds, except Dockerized builds. Remaining Issues: - tpcds-decimal_v2-q80a failed with returned rows not matching expected results for some decimal values. This will be fixed in IMPALA-13018. Testing: - Passed core tests. - Passed query_test/test_tpcds_queries.py in release/exhaustive build. - Manually verified that only one SQL DataSource object was created for test_tpcds_queries.py::TestTpcdsQueryForJdbcTables since query option 'clean_dbcp_ds_cache' was set as false, and the SQL DataSource object was closed by cleanup thread. Change-Id: I44e8c1bb020e90559c7f22483a7ab7a151b8f48a Reviewed-on: http://gerrit.cloudera.org:8080/21304 Reviewed-by: Abhishek Rawat Tested-by: Impala Public Jenkins (cherry picked from commit 08f8a300250df7b4f9a517cdb6bab48c379b7e03) > Run TPCH/TPCDS queries for external JDBC tables > --- > > Key: IMPALA-12910 > URL: https://issues.apache.org/jira/browse/IMPALA-12910 > Project: IMPALA > Issue Type: Sub-task > Components: Perf Investigation >Reporter: Wenzhe Zhou >Assignee: Wenzhe Zhou >Priority: Major > Fix For: Impala 4.5.0 > > > Need performance data for queries on external JDBC tables to be documented in > the design doc. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-11499) Refactor UrlEncode function to handle special characters
[ https://issues.apache.org/jira/browse/IMPALA-11499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845907#comment-17845907 ] ASF subversion and git services commented on IMPALA-11499: -- Commit b8a66b0e104f8e25e70fce0326d36c9b48672dbb in impala's branch refs/heads/branch-4.4.0 from pranavyl [ https://gitbox.apache.org/repos/asf?p=impala.git;h=b8a66b0e1 ] IMPALA-11499: Refactor UrlEncode function to handle special characters An error came from an issue with URL encoding, where certain Unicode characters were being incorrectly encoded due to their UTF-8 representation matching characters in the set of characters to escape. For example, the string '运', which consists of three bytes 0xe8 0xbf 0x90 was wrongly getting encoded into '\E8%FFBF\90', because the middle byte matched one of the two bytes that represented the "\u00FF" literal. Inclusion of "\u00FF" was likely a mistake from the beginning and it should have been '\x7F'. The patch makes three key changes: 1. Before the change, the set of characters that need to be escaped was stored as a string. The current patch uses an unordered_set instead. 2. '\xFF', which is an invalid UTF-8 byte and whose inclusion was erroneous from the beginning, is replaced with '\x7F', which is a control character for DELETE, ensuring consistency and correctness in URL encoding. 3. The list of characters to be escaped is extended to match the current list in Hive. Testing: Tests on both traditional Hive tables and Iceberg tables are included in unicode-column-name.test, insert.test, coding-util-test.cc and test_insert.py. Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb Reviewed-on: http://gerrit.cloudera.org:8080/21131 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins (cherry picked from commit 85cd07a11e876f3d8773f2638f699c61a6b0dd4c) > Refactor UrlEncode function to handle special characters > > > Key: IMPALA-11499 > URL: https://issues.apache.org/jira/browse/IMPALA-11499 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Quanlong Huang >Assignee: Pranav Yogi Lodha >Priority: Critical > Fix For: Impala 4.5.0 > > > Partition values are incorrectly URL-encoded in backend for unicode > characters, e.g. '运营业务数据' is encoded to '�%FFBF�营业务数据' which is wrong. > To reproduce the issue, first create a partition table: > {code:sql} > create table my_part_tbl (id int) partitioned by (p string) stored as parquet; > {code} > Then insert data into it using partition values containing '运'. They will > fail: > {noformat} > [localhost:21050] default> insert into my_part_tbl partition(p='运营业务数据') > values (0); > Query: insert into my_part_tbl partition(p='运营业务数据') values (0) > Query submitted at: 2022-08-16 10:03:56 (Coordinator: > http://quanlong-OptiPlex-BJ:25000) > Query progress can be monitored at: > http://quanlong-OptiPlex-BJ:25000/query_plan?query_id=404ac3027c4b7169:39d16a2d > ERROR: Error(s) moving partition files. First error (of 1) was: Hdfs op > (RENAME > hdfs://localhost:20500/test-warehouse/my_part_tbl/_impala_insert_staging/404ac3027c4b7169_39d16a2d/.404ac3027c4b7169-39d16a2d_1475855322_dir/p=�%FFBF�营业务数据/404ac3027c4b7169-39d16a2d_1585092794_data.0.parq > TO > hdfs://localhost:20500/test-warehouse/my_part_tbl/p=�%FFBF�营业务数据/404ac3027c4b7169-39d16a2d_1585092794_data.0.parq) > failed, error was: > hdfs://localhost:20500/test-warehouse/my_part_tbl/_impala_insert_staging/404ac3027c4b7169_39d16a2d/.404ac3027c4b7169-39d16a2d_1475855322_dir/p=�%FFBF�营业务数据/404ac3027c4b7169-39d16a2d_1585092794_data.0.parq > Error(5): Input/output error > [localhost:21050] default> insert into my_part_tbl partition(p='运') values > (0); > Query: insert into my_part_tbl partition(p='运') values (0) > Query submitted at: 2022-08-16 10:04:22 (Coordinator: > http://quanlong-OptiPlex-BJ:25000) > Query progress can be monitored at: > http://quanlong-OptiPlex-BJ:25000/query_plan?query_id=a64e5883473ec28d:86e7e335 > ERROR: Error(s) moving partition files. First error (of 1) was: Hdfs op > (RENAME > hdfs://localhost:20500/test-warehouse/my_part_tbl/_impala_insert_staging/a64e5883473ec28d_86e7e335/.a64e5883473ec28d-86e7e335_1582623091_dir/p=�%FFBF�/a64e5883473ec28d-86e7e335_163454510_data.0.parq > TO > hdfs://localhost:20500/test-warehouse/my_part_tbl/p=�%FFBF�/a64e5883473ec28d-86e7e335_163454510_data.0.parq) > failed, error was: > hdfs://localhost:20500/test-warehouse/my_part_tbl/_impala_insert_staging/a64e5883473ec28d_86e7e335/.a64e5883473ec28d-86e7e335_1582623091_dir/p=�%FFBF�/a64e5883473ec28d-86e7e335_163454510_data.0.parq >
[jira] [Commented] (IMPALA-13018) Fix test_tpcds_queries.py/TestTpcdsQueryForJdbcTables.test_tpcds-decimal_v2-q80a failure
[ https://issues.apache.org/jira/browse/IMPALA-13018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845903#comment-17845903 ] ASF subversion and git services commented on IMPALA-13018: -- Commit 01401a0368cb8f19c86dc3fab764ee4b5732f2f6 in impala's branch refs/heads/branch-4.4.0 from wzhou-code [ https://gitbox.apache.org/repos/asf?p=impala.git;h=01401a036 ] IMPALA-12910: Support running TPCH/TPCDS queries for JDBC tables This patch adds script to create external JDBC tables for the dataset of TPCH and TPCDS, and adds unit-tests to run TPCH and TPCDS queries for external JDBC tables with Impala-Impala federation. Note that JDBC tables are mapping tables, they don't take additional disk spaces. It fixes the race condition when caching of SQL DataSource objects by using a new DataSourceObjectCache class, which checks reference count before closing SQL DataSource. Adds a new query-option 'clean_dbcp_ds_cache' with default value as true. When it's set as false, SQL DataSource object will not be closed when its reference count equals 0 and will be kept in cache until the SQL DataSource is idle for more than 5 minutes. Flag variable 'dbcp_data_source_idle_timeout_s' is added to make the duration configurable. java.sql.Connection.close() fails to remove a closed connection from connection pool sometimes, which causes JDBC working threads to wait for available connections from the connection pool for a long time. The work around is to call BasicDataSource.invalidateConnection() API to close a connection. Two flag variables are added for DBCP configuration properties 'maxTotal' and 'maxWaitMillis'. Note that 'maxActive' and 'maxWait' properties are renamed to 'maxTotal' and 'maxWaitMillis' respectively in apache.commons.dbcp v2. Fixes a bug for database type comparison since the type strings specified by user could be lower case or mix of upper/lower cases, but the code compares the types with upper case string. Fixes issue to close SQL DataSource object in JdbcDataSource.open() and JdbcDataSource.getNext() when some errors returned from DBCP APIs or JDBC drivers. testdata/bin/create-tpc-jdbc-tables.py supports to create JDBC tables for Impala-Impala, Postgres and MySQL. Following sample commands creates TPCDS JDBC tables for Impala-Impala federation with remote coordinator running at 10.19.10.86, and Postgres server running at 10.19.10.86: ${IMPALA_HOME}/testdata/bin/create-tpc-jdbc-tables.py \ --jdbc_db_name=tpcds_jdbc --workload=tpcds \ --database_type=IMPALA --database_host=10.19.10.86 --clean ${IMPALA_HOME}/testdata/bin/create-tpc-jdbc-tables.py \ --jdbc_db_name=tpcds_jdbc --workload=tpcds \ --database_type=POSTGRES --database_host=10.19.10.86 \ --database_name=tpcds --clean TPCDS tests for JDBC tables run only for release/exhaustive builds. TPCH tests for JDBC tables run for core and exhaustive builds, except Dockerized builds. Remaining Issues: - tpcds-decimal_v2-q80a failed with returned rows not matching expected results for some decimal values. This will be fixed in IMPALA-13018. Testing: - Passed core tests. - Passed query_test/test_tpcds_queries.py in release/exhaustive build. - Manually verified that only one SQL DataSource object was created for test_tpcds_queries.py::TestTpcdsQueryForJdbcTables since query option 'clean_dbcp_ds_cache' was set as false, and the SQL DataSource object was closed by cleanup thread. Change-Id: I44e8c1bb020e90559c7f22483a7ab7a151b8f48a Reviewed-on: http://gerrit.cloudera.org:8080/21304 Reviewed-by: Abhishek Rawat Tested-by: Impala Public Jenkins (cherry picked from commit 08f8a300250df7b4f9a517cdb6bab48c379b7e03) > Fix > test_tpcds_queries.py/TestTpcdsQueryForJdbcTables.test_tpcds-decimal_v2-q80a > failure > > > Key: IMPALA-13018 > URL: https://issues.apache.org/jira/browse/IMPALA-13018 > Project: IMPALA > Issue Type: Sub-task > Components: Backend, Frontend >Reporter: Wenzhe Zhou >Assignee: Wenzhe Zhou >Priority: Major > Fix For: Impala 4.5.0 > > > The returned rows are not matching expected results for some decimal type of > columns. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13038) Support profile tab for imported query profiles
[ https://issues.apache.org/jira/browse/IMPALA-13038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845519#comment-17845519 ] ASF subversion and git services commented on IMPALA-13038: -- Commit 0d215da8d4e3f93ad3c1cd72aa801fbcb9464fb0 in impala's branch refs/heads/master from Surya Hebbar [ https://gitbox.apache.org/repos/asf?p=impala.git;h=0d215da8d ] IMPALA-13038: Support profile tab for imported query profiles For query profile imports currently the following tabs are supported. - Query Statement - Query Timeline - Query Text Plan With the current patch "Query Profile" tab will also be supported. In the "QueryProfileHandler", "query_id" is now added before verifying its existence in the query log as in "QuerySummaryHandler" and others. "getQueryID" function has been added to "util.js", as it is helpful across multiple query pages for retrieving the query ID into JS scripts, before the page loads up. On loading the imported "Query Profile" page, query profile download section and server's non-existing query ID alerts are removed. All unsupported navbar tabs are removed and current tab is set to active. The query profile is retrieved from the indexedDB's "imported_queries" database. Then query profile is passed onto "profileToString" function, which converts the profile into indented text for displaying on the profile page. Each profile and its child profiles are printed in the following order with the right indentation(fields are skipped, if they do not exist). Profile name: - Info strings: - Event sequences: - Offset: - Events: - Child profile(recursive): - Counters: Change-Id: Iddcf2e285abbf42f97bde19014be076ccd6374bc Reviewed-on: http://gerrit.cloudera.org:8080/21400 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Support profile tab for imported query profiles > --- > > Key: IMPALA-13038 > URL: https://issues.apache.org/jira/browse/IMPALA-13038 > Project: IMPALA > Issue Type: New Feature >Reporter: Surya Hebbar >Assignee: Surya Hebbar >Priority: Major > Attachments: json_profile_a34485359bfdfe1f_3ca8177b.json, > json_profile_a34485359bfdfe1f_3ca8177b.txt > > > Query profile imports currently support the following tabs. > - Query Statement > - Query Timeline > - Query Text Plan > It would be helpful to support "Query Profile" tab for these imports. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13036) Document Iceberg metadata tables
[ https://issues.apache.org/jira/browse/IMPALA-13036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845335#comment-17845335 ] ASF subversion and git services commented on IMPALA-13036: -- Commit aba27edc3338765a6b5133be095989f83cce4747 in impala's branch refs/heads/master from Daniel Becker [ https://gitbox.apache.org/repos/asf?p=impala.git;h=aba27edc3 ] IMPALA-13036: Document Iceberg metadata tables This change adds documentation on how Iceberg metadata tables can be used. Testing: - built docs locally Change-Id: Ic453f567b814cb4363a155e2008029e94efb6ed1 Reviewed-on: http://gerrit.cloudera.org:8080/21387 Tested-by: Impala Public Jenkins Reviewed-by: Peter Rozsa > Document Iceberg metadata tables > > > Key: IMPALA-13036 > URL: https://issues.apache.org/jira/browse/IMPALA-13036 > Project: IMPALA > Issue Type: Documentation >Reporter: Daniel Becker >Assignee: Daniel Becker >Priority: Major > Labels: impala-iceberg > > Impala now supports displaying Iceberg metadata tables, we should document > this feature. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org