[jira] [Created] (IMPALA-10200) WebUI static directory cleanup
Tamas Mate created IMPALA-10200: --- Summary: WebUI static directory cleanup Key: IMPALA-10200 URL: https://issues.apache.org/jira/browse/IMPALA-10200 Project: IMPALA Issue Type: Task Components: Backend Affects Versions: Impala 3.0, Impala 4.0 Reporter: Tamas Mate Assignee: Tamas Mate There is an unused index.html file under the default /www/ WebUI static file directory which only contains the {{Impala Webserver}} text: {code:html} Impala Webserver {code} This file could be removed as there is no need for an index.html in the static directory. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-10143) TestAcid.test_full_acid_original_files
[ https://issues.apache.org/jira/browse/IMPALA-10143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy resolved IMPALA-10143. Fix Version/s: Impala 4.0 Resolution: Fixed > TestAcid.test_full_acid_original_files > -- > > Key: IMPALA-10143 > URL: https://issues.apache.org/jira/browse/IMPALA-10143 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.0 >Reporter: Tamas Mate >Assignee: Zoltán Borók-Nagy >Priority: Major > Labels: flaky > Fix For: Impala 4.0 > > Attachments: > https_^^jenkins.impala.io^job^ubuntu-16.04-dockerised-tests^3077^.log > > > This test seems to be flaky. > > [https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/3077/testReport/junit/query_test.test_acid/TestAcid/test_full_acid_original_files_protocol__beeswax___exec_optionbatch_size___0___num_nodes___0___disable_codegen_rows_threshold___5000___disable_codegen___False___abort_on_error___1___exec_single_node_rows_threshold___0table_format__text_none_/] > {code:java} > query_test/test_acid.py:153: in test_full_acid_original_files > self.run_test_case('QueryTest/full-acid-original-file', vector, > unique_database) > common/impala_test_suite.py:693: in run_test_case > self.__verify_results_and_errors(vector, test_section, result, use_db) > common/impala_test_suite.py:529: in __verify_results_and_errors > replace_filenames_with_placeholder) > common/test_result_verifier.py:456: in verify_raw_results > VERIFIER_MAP[verifier](expected, actual) > common/test_result_verifier.py:278: in verify_query_result_is_equal > assert expected_results == actual_results > E assert Comparing QueryTestResults (expected vs actual): > E 0,0,0 != 0,19,0 > E 0,1,1 != 0,20,1 > E 0,2,2 != 0,21,2 > E 0,3,3 != 0,22,3 > E 0,4,4 != 0,23,4 > {code} > The test was added in IMPALA-9515. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10201) WebUI CSP best practice
Tamas Mate created IMPALA-10201: --- Summary: WebUI CSP best practice Key: IMPALA-10201 URL: https://issues.apache.org/jira/browse/IMPALA-10201 Project: IMPALA Issue Type: Improvement Affects Versions: Impala 4.0 Reporter: Tamas Mate The Debug WebUI currently supports only the {{X-Frame-Options}} header, which is necessary due to backward compatibility, however in the future it will be replaced by the Content Security Policy’s {{frame-ancestors}} directive: {quote}Content Security Policy’s frame-ancestors directive obsoletes the X-Frame-Options header. If a resource has both policies, the frame-ancestors policy SHOULD be enforced and the X-Frame-Options policy SHOULD be ignored [[w3.org]|https://www.w3.org/TR/CSP2/#frame-ancestors-and-frame-options]. {quote} {quote}As described in Section 2.3.2.2, not all browsers implement X-Frame-Options in exactly the same way, which can lead to unintended results. And, given that the "X-" construction is deprecated [RFC6648], the X-Frame-Options header field will be replaced in the future by the Frame-Options directive in the Content Security Policy (CSP) version 1.1 [CSP-1-1]. [[RFC 7034]|https://www.ietf.org/rfc/rfc7034.txt] {quote} CSP's {{frame-ancestor}} header should be implemented to adhere the current security best practices and depending on a deprecated feature in the future. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Work started] (IMPALA-10200) WebUI static directory cleanup
[ https://issues.apache.org/jira/browse/IMPALA-10200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-10200 started by Tamas Mate. --- > WebUI static directory cleanup > -- > > Key: IMPALA-10200 > URL: https://issues.apache.org/jira/browse/IMPALA-10200 > Project: IMPALA > Issue Type: Task > Components: Backend >Affects Versions: Impala 3.0, Impala 4.0 >Reporter: Tamas Mate >Assignee: Tamas Mate >Priority: Trivial > > There is an unused index.html file under the default /www/ WebUI static file > directory which only contains the {{Impala Webserver}} text: > {code:html} > > > Impala Webserver > > > {code} > This file could be removed as there is no need for an index.html in the > static directory. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10181) Do a best-effort cleanup on failed INSERTs into Iceberg tables
[ https://issues.apache.org/jira/browse/IMPALA-10181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy updated IMPALA-10181: --- Labels: impala-iceberg (was: impala-acid) > Do a best-effort cleanup on failed INSERTs into Iceberg tables > -- > > Key: IMPALA-10181 > URL: https://issues.apache.org/jira/browse/IMPALA-10181 > Project: IMPALA > Issue Type: Sub-task >Reporter: Zoltán Borók-Nagy >Priority: Major > Labels: impala-iceberg > > If the Impala Coordinator doesn't fail, we can clean up most in-progress > files. > Although files written by crashed executors can still remain. > If the coordinator crashes, then unfortunately there's not much we can do > currently. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-6686) Change the DESCRIBE DATABASE output to look more like Hive output
[ https://issues.apache.org/jira/browse/IMPALA-6686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205636#comment-17205636 ] Csaba Ringhofer commented on IMPALA-6686: - I have kept the current multi-row output when adding managedlocation in https://gerrit.cloudera.org/#/c/16529/2 My plan is to add a query option in another commit that can switch between the current solution and one that uses exactly the same output as Hive. The current solution can be potentially deprecated in the future. Creating this switch seems smaller work than handling the potential issues coming from breaking tools rely on the current output. > Change the DESCRIBE DATABASE output to look more like Hive output > - > > Key: IMPALA-6686 > URL: https://issues.apache.org/jira/browse/IMPALA-6686 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Reporter: Fredy Wijaya >Priority: Minor > Labels: compatibility, incompatibility > > In Hive: > {noformat} > describe database functional; > +--+--++-+-+-+ > | db_name | comment | location | > owner_name | owner_type | parameters | > +--+--++-+-+-+ > | tpch | | hdfs://localhost:20500/test-warehouse/tpch.db | foo >| USER| | > +--+--++-+-+-+{noformat} > In Impala: > {noformat} > describe database extended functional; > +-+---+-+ > | name| location | comment | > +-+---+-+ > | tpch| hdfs://localhost:20500/test-warehouse/tpch.db | | > | Owner: | | | > | | foo | USER| > +-+---+-+ > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9952) Invalid offset index in Parquet file
[ https://issues.apache.org/jira/browse/IMPALA-9952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205668#comment-17205668 ] Zoltán Borók-Nagy commented on IMPALA-9952: --- Hi [~guojingfeng], did you have a chance to try the patch? > Invalid offset index in Parquet file > - > > Key: IMPALA-9952 > URL: https://issues.apache.org/jira/browse/IMPALA-9952 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 3.4.0 >Reporter: guojingfeng >Assignee: Zoltán Borók-Nagy >Priority: Major > Labels: Parquet > > When reading parquet file in impala 3.4, encountered the following error: > {code:java} > I0714 16:11:48.307806 1075820 runtime-state.cc:207] > 8c43203adb2d4fc8:0478df9b018b] Error from query > 8c43203adb2d4fc8:0478df9b: Invalid offset index in Parquet file > hdfs://path/4844de7af4545a39-e8ebc7da005f_2015704758_data.0.parq. > I0714 16:11:48.834901 1075838 status.cc:126] > 8c43203adb2d4fc8:0478df9b02c0] Invalid offset index in Parquet file > hdfs://path/4844de7af4545a39-e8ebc7da005f_2015704758_data.0.parq. > @ 0xbf4ef9 > @ 0x1748c41 > @ 0x174e170 > @ 0x1750e58 > @ 0x17519f0 > @ 0x1748559 > @ 0x1510b41 > @ 0x1512c8f > @ 0x137488a > @ 0x1375759 > @ 0x1b48a19 > @ 0x7f34509f5e24 > @ 0x7f344d5ed35c > I0714 16:11:48.835763 1075838 runtime-state.cc:207] > 8c43203adb2d4fc8:0478df9b02c0] Error from query > 8c43203adb2d4fc8:0478df9b: Invalid offset index in Parquet file > hdfs://path/4844de7af4545a39-e8ebc7da005f_2015704758_data.0.parq. > I0714 16:11:48.893784 1075820 status.cc:126] > 8c43203adb2d4fc8:0478df9b018b] Top level rows aren't in sync during page > filtering in file > hdfs://path/4844de7af4545a39-e8ebc7da005f_2015704758_data.0.parq. > @ 0xbf4ef9 > @ 0x1749104 > @ 0x17494cc > @ 0x1751aee > @ 0x1748559 > @ 0x1510b41 > @ 0x1512c8f > @ 0x137488a > @ 0x1375759 > @ 0x1b48a19 > @ 0x7f34509f5e24 > @ 0x7f344d5ed35c > {code} > Corresponding source code: > {code:java} > Status HdfsParquetScanner::CheckPageFiltering() { > if (candidate_ranges_.empty() || scalar_readers_.empty()) return > Status::OK(); int64_t current_row = scalar_readers_[0]->LastProcessedRow(); > for (int i = 1; i < scalar_readers_.size(); ++i) { > if (current_row != scalar_readers_[i]->LastProcessedRow()) { > DCHECK(false); > return Status(Substitute( > "Top level rows aren't in sync during page filtering in file $0.", > filename())); > } > } > return Status::OK(); > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9180) Remove legacy ImpalaInternalService
[ https://issues.apache.org/jira/browse/IMPALA-9180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205705#comment-17205705 ] ASF subversion and git services commented on IMPALA-9180: - Commit 6bb3b88d05f89fb7a1a54f302b4d329cbf4f69ec in impala's branch refs/heads/master from wzhou-code [ https://gitbox.apache.org/repos/asf?p=impala.git;h=6bb3b88 ] IMPALA-9180 (part 1): Remove legacy ImpalaInternalService The legacy Thrift based Impala internal service has been deprecated and can be removed now. This patch removes ImpalaInternalService. All infrastructures around it are cleaned up, except one place for flag be_port. StatestoreSubscriber::subscriber_id consists be_port, but we cannot change format of subscriber_id now. This remaining be_port issue will be fixed in a succeeding patch (part 4). TQueryCtx.coord_address is changed to TQueryCtx.coord_hostname since the port in TQueryCtx.coord_address is set as be_port and is unused now. Also Rename TQueryCtx.coord_krpc_address as TQueryCtx.coord_ip_address. Testing: - Passed the exhaustive test. - Passed Quasar-L0 test. Change-Id: I5fa83c8009590124dded4783f77ef70fa30119e6 Reviewed-on: http://gerrit.cloudera.org:8080/16291 Reviewed-by: Thomas Tauber-Marshall Tested-by: Impala Public Jenkins > Remove legacy ImpalaInternalService > --- > > Key: IMPALA-9180 > URL: https://issues.apache.org/jira/browse/IMPALA-9180 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 3.4.0 >Reporter: Michael Ho >Assignee: Wenzhe Zhou >Priority: Minor > > Now that IMPALA-7984 is done, the legacy Thrift based Impala internal service > can now be removed. The port 22000 can also be freed up. In addition to code > change, the doc probably needs to be updated to reflect the fact that 22000 > is no longer in use. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10164) Support HadoopCatalog for Iceberg table
[ https://issues.apache.org/jira/browse/IMPALA-10164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205707#comment-17205707 ] ASF subversion and git services commented on IMPALA-10164: -- Commit 5b720a4d18cc2f2ade54ab223663521a3822343f in impala's branch refs/heads/master from skyyws [ https://gitbox.apache.org/repos/asf?p=impala.git;h=5b720a4 ] IMPALA-10164: Supporting HadoopCatalog for Iceberg table This patch mainly realizes creating Iceberg table by HadoopCatalog. We only supported HadoopTables api before this patch, but now we can use HadoopCatalog to create Iceberg table. When creating managed table, we can use SQL like this: CREATE TABLE default.iceberg_test ( level string, event_time timestamp, message string, ) STORED AS ICEBERG TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog', 'iceberg.catalog_location'='hdfs://test-warehouse/iceberg_test'); We supported two values ('hadoop.catalog', 'hadoop.tables') for 'iceberg.catalog' now. If you don't specify this property in your SQL, default catalog type is 'hadoop.catalog'. As for external Iceberg table, you can use SQL like this: CREATE EXTERNAL TABLE default.iceberg_test_external STORED AS ICEBERG TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog', 'iceberg.catalog_location'='hdfs://test-warehouse/iceberg_test', 'iceberg.table_identifier'='default.iceberg_test'); We cannot set table location for both managed and external Iceberg table with 'hadoop.catalog', and 'SHOW CREATE TABLE' will not display table location yet. We need to use 'DESCRIBE FORMATTED/EXTENDED' to get this location info. 'iceberg.catalog_location' is necessary for 'hadoop.catalog' table, which used to reserved Iceberg table metadata and data, and we use this location to load table metadata from Iceberg. 'iceberg.table_identifier' is used for Icebreg TableIdentifier.If this property not been specified in SQL, Impala will use database and table name to load Iceberg table, which is 'default.iceberg_test_external' in above SQL. This property value is splitted by '.', you can alse set this value like this: 'org.my_db.my_tbl'. And this property is valid for both managed and external table. Testing: - Create table tests in functional_schema_template.sql - Iceberg table create test in test_iceberg.py - Iceberg table query test in test_scanners.py - Iceberg table show create table test in test_show_create_table.py Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Reviewed-on: http://gerrit.cloudera.org:8080/16446 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Support HadoopCatalog for Iceberg table > --- > > Key: IMPALA-10164 > URL: https://issues.apache.org/jira/browse/IMPALA-10164 > Project: IMPALA > Issue Type: Improvement >Reporter: WangSheng >Assignee: WangSheng >Priority: Minor > Labels: impala-iceberg > > We just supported HadoopTable api to create Iceberg table in Impala now, it's > apparently not enough, so we preparing to support HadoopCatalog. The main > design is to add a new table property named 'iceberg.catalog', and default > value is 'hadoop.tables', we implement 'hadoop.catalog' to supported > HadoopCatalog api. We may even support 'hive.catalog' in the future. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10193) Limit the memory usage of the whole mini-cluster
[ https://issues.apache.org/jira/browse/IMPALA-10193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205706#comment-17205706 ] ASF subversion and git services commented on IMPALA-10193: -- Commit a0a25a61c302d864315daa7f09827b37a37419d5 in impala's branch refs/heads/master from fifteencai [ https://gitbox.apache.org/repos/asf?p=impala.git;h=a0a25a6 ] IMPALA-10193: Limit the memory usage for the whole test cluster This patch introduces a new approach of limiting the memory usage for both mini-cluster and CDH cluster. Without this limit, clusters are prone to getting killed when running in docker containers with a lower mem limit than host's memory size. i.e. The mini-cluster may running in a container with 32GB limitted by CGROUPS, while the host machine has 128GB. Under this circumstance, if the container is started with '-privileged' command argument, both mini and CDH clusters compute their mem_limit according to 128GB rather than 32GB. They will be killed when attempting to apply for extra resource. Currently, the mem-limit estimating algorithms for Impalad and Node Manager are different: for Impalad: mem_limit = 0.7 * sys_mem / cluster_size (default is 3) for Node Manager: 1. Leave aside 24GB, then fit the left into threasholds below. 2. The bare limit is 4GB and maximum limit 48GB In headge of over-consumption, we - Added a new environment variable IMPALA_CLUSTER_MAX_MEM_GB - Modified the algorithm in 'bin/start-impala-cluster.py', making it taking IMPALA_CLUSTER_MAX_MEM_GB rather than sys_mem into account. - Modified the logic in 'testdata/cluster/node_templates/common/etc/hadoop/conf/yarn-site.xml.py' Similarly, making IMPALA_CLUSTER_MAX_MEM_GB substitutes for sys_mem . Testing: this patch worked in a 32GB docker container running on a 128GB host machine. All 1188 unit tests get passed. Change-Id: I8537fd748e279d5a0e689872aeb4dbfd0c84dc93 Reviewed-on: http://gerrit.cloudera.org:8080/16522 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Limit the memory usage of the whole mini-cluster > > > Key: IMPALA-10193 > URL: https://issues.apache.org/jira/browse/IMPALA-10193 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 3.4.0 >Reporter: Fifteen >Assignee: Fifteen >Priority: Minor > Attachments: image-2020-09-28-17-18-15-358.png > > > The mini-cluster contains 3 virtual nodes, and all of them runs in a single > 'Machine'. By quoting, it implies the machine can be a docker container. If > the container is started with `-priviledged` and the actual memory is limited > by CGROUPS, then the total memory in `htop` and the actual available memory > can be different! > > For example, in the container below, `htop` tells us the total memory is > 128GB, while the total memory set in CGROUPS is actually 32GB. If the acutal > mem usage exceeds 32GB, process (such as impalad, hivemaster2 etc.) get > killed. > !image-2020-09-28-17-18-15-358.png! > > So we may need a way to limit the whole mini-cluster's memory usage. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10202) Enable file handle cache for ABFS files
Sahil Takiar created IMPALA-10202: - Summary: Enable file handle cache for ABFS files Key: IMPALA-10202 URL: https://issues.apache.org/jira/browse/IMPALA-10202 Project: IMPALA Issue Type: Improvement Reporter: Sahil Takiar Assignee: Sahil Takiar We should enable the file handle cache for ABFS, we have already seen it benefit jobs that read data from S3A. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10201) WebUI CSP best practice
[ https://issues.apache.org/jira/browse/IMPALA-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-10201: --- Labels: newbie ramp-up (was: ) > WebUI CSP best practice > --- > > Key: IMPALA-10201 > URL: https://issues.apache.org/jira/browse/IMPALA-10201 > Project: IMPALA > Issue Type: Improvement >Affects Versions: Impala 4.0 >Reporter: Tamas Mate >Priority: Minor > Labels: newbie, ramp-up > > The Debug WebUI currently supports only the {{X-Frame-Options}} header, which > is necessary due to backward compatibility, however in the future it will be > replaced by the Content Security Policy’s {{frame-ancestors}} directive: > {quote}Content Security Policy’s frame-ancestors directive obsoletes the > X-Frame-Options header. If a resource has both policies, the frame-ancestors > policy SHOULD be enforced and the X-Frame-Options policy SHOULD be ignored > [[w3.org]|https://www.w3.org/TR/CSP2/#frame-ancestors-and-frame-options]. > {quote} > {quote}As described in Section 2.3.2.2, not all browsers implement > X-Frame-Options in exactly the same way, which can lead to unintended > results. And, given that the "X-" construction is deprecated [RFC6648], the > X-Frame-Options header field will be replaced in the future by the > Frame-Options directive in the Content Security Policy (CSP) version 1.1 > [CSP-1-1]. [[RFC 7034]|https://www.ietf.org/rfc/rfc7034.txt] > {quote} > CSP's {{frame-ancestor}} header should be implemented to adhere the current > security best practices and depending on a deprecated feature in the future. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10189) Avoid unnecessarily loading metadata for drop stats DDL
[ https://issues.apache.org/jira/browse/IMPALA-10189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-10189: --- Summary: Avoid unnecessarily loading metadata for drop stats DDL (was: Avoid unnecessarily loading metadata for compute/drop stats DDLs) > Avoid unnecessarily loading metadata for drop stats DDL > --- > > Key: IMPALA-10189 > URL: https://issues.apache.org/jira/browse/IMPALA-10189 > Project: IMPALA > Issue Type: Sub-task > Components: Catalog >Affects Versions: Impala 2.10.0, Impala 2.11.0, Impala 3.0, Impala 2.12.0, > Impala 3.1.0, Impala 3.2.0, Impala 3.3.0, Impala 3.4.0 >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Critical > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10203) Avoid unnecessarily loading metadata for compute stats DDL
Tim Armstrong created IMPALA-10203: -- Summary: Avoid unnecessarily loading metadata for compute stats DDL Key: IMPALA-10203 URL: https://issues.apache.org/jira/browse/IMPALA-10203 Project: IMPALA Issue Type: Sub-task Components: Catalog Reporter: Tim Armstrong Assignee: Kurt Deschler -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-3335) Allow single-node optimization with joins.
[ https://issues.apache.org/jira/browse/IMPALA-3335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar reassigned IMPALA-3335: Assignee: Sahil Takiar > Allow single-node optimization with joins. > -- > > Key: IMPALA-3335 > URL: https://issues.apache.org/jira/browse/IMPALA-3335 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Affects Versions: Impala 2.5.0 >Reporter: Alexander Behm >Assignee: Sahil Takiar >Priority: Minor > Labels: ramp-up > > Now that IMPALA-561 has been fixed, we can remove the workaround that > disables the our single-node optimization for any plan with joins. See > MaxRowsProcessedVisitor.java: > {code} > } else if (caller instanceof HashJoinNode || caller instanceof > NestedLoopJoinNode) { > // Revisit when multiple scan nodes can be executed in a single fragment, > IMPALA-561 > abort_ = true; > return; > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-3335) Allow single-node optimization with joins.
[ https://issues.apache.org/jira/browse/IMPALA-3335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-3335. -- Fix Version/s: Impala 4.0 Resolution: Fixed > Allow single-node optimization with joins. > -- > > Key: IMPALA-3335 > URL: https://issues.apache.org/jira/browse/IMPALA-3335 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Affects Versions: Impala 2.5.0 >Reporter: Alexander Behm >Assignee: Sahil Takiar >Priority: Minor > Labels: ramp-up > Fix For: Impala 4.0 > > > Now that IMPALA-561 has been fixed, we can remove the workaround that > disables the our single-node optimization for any plan with joins. See > MaxRowsProcessedVisitor.java: > {code} > } else if (caller instanceof HashJoinNode || caller instanceof > NestedLoopJoinNode) { > // Revisit when multiple scan nodes can be executed in a single fragment, > IMPALA-561 > abort_ = true; > return; > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10204) Evaluate AdmitQuery params for efficiency
Thomas Tauber-Marshall created IMPALA-10204: --- Summary: Evaluate AdmitQuery params for efficiency Key: IMPALA-10204 URL: https://issues.apache.org/jira/browse/IMPALA-10204 Project: IMPALA Issue Type: Sub-task Components: Distributed Exec Reporter: Thomas Tauber-Marshall In the first version of the AdmissionControlService, we're sending the entire TQueryExecRequest/TQueryOptions as a sidecar to the admission controller. There are various things contained in the TQueryExecRequest/TQueryOptions that are not actually needed by the admission controller, and sending them increases network load and query running time unnecessarily. We should evaluate how much of a perf impact there is due to this and how much could actually be removed. Some small things may be non-trivial to remove and ultimately not worth it, for example the tree of TPlanNodes contains some info needed by the admission controller (eg. memory estimates) and some things that are not (eg. runtime filter descriptors). Making two parallel trees, one with only admission-required data (which would require extensive refactoring in the planner or wasted work in the coordinator copying out the required parts from what the planner returns) may be too complicated/introduce too much other overhead to be worth it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10205) Replace MD5 hash with SHA-512 for data file path of IcebergTable
Wenzhe Zhou created IMPALA-10205: Summary: Replace MD5 hash with SHA-512 for data file path of IcebergTable Key: IMPALA-10205 URL: https://issues.apache.org/jira/browse/IMPALA-10205 Project: IMPALA Issue Type: Improvement Components: Frontend Affects Versions: Impala 4.0 Reporter: Wenzhe Zhou Fix For: Impala 4.0 To support FIPS, all new code have to use FIPS-approved algorithms. Current code generate data path hash in MD5 hash for Iceberg Table. But MD5 is one of forbidden algorithms. We have to replace MD5 with FIPS-approved algorithm, like SHA-512. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-10205) Replace MD5 hash with SHA-512 for data file path of IcebergTable
[ https://issues.apache.org/jira/browse/IMPALA-10205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenzhe Zhou reassigned IMPALA-10205: Assignee: Wenzhe Zhou > Replace MD5 hash with SHA-512 for data file path of IcebergTable > > > Key: IMPALA-10205 > URL: https://issues.apache.org/jira/browse/IMPALA-10205 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Affects Versions: Impala 4.0 >Reporter: Wenzhe Zhou >Assignee: Wenzhe Zhou >Priority: Major > Fix For: Impala 4.0 > > > To support FIPS, all new code have to use FIPS-approved algorithms. Current > code generate data path hash in MD5 hash for Iceberg Table. But MD5 is one of > forbidden algorithms. We have to replace MD5 with FIPS-approved algorithm, > like SHA-512. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9606) ABFS reads should use hdfsPreadFully
[ https://issues.apache.org/jira/browse/IMPALA-9606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205896#comment-17205896 ] ASF subversion and git services commented on IMPALA-9606: - Commit 8e9cf51f6b328f500acf7c577289c5b888fd15d2 in impala's branch refs/heads/master from Sahil Takiar [ https://gitbox.apache.org/repos/asf?p=impala.git;h=8e9cf51 ] IMPALA-9606: ABFS reads should use hdfsPreadFully Similar to IMPALA-8525, but for ABFS, instead of S3A. I don't expect this to make a major improvement in performance, like it did for S3A, although I am still seeing a marginal improvement during some ad-hoc testing (about 5% scan perf improvement). The reason is that the implementation of the ABFS and S3A clients are very different, ABFS already reads all data requested in a single hdfsRead call. I ran the query 'select * from abfs_test_store_sales order by ss_net_profit limit 10;' several times to validate that perf does not regress. In fact, it does improve slightly for this query. The table 'abfs_test_store_sales' is just a copy of the mini-cluster's tpcds_parquet.store_sales, although it is not partitioned. Testing: * Tested against a ABFS storage account I have access to * Ran several queries to validate there are no functional or perf regressions. Change-Id: I994ea30cf31abc66f5d82d9b3c8e185d2bd06147 Reviewed-on: http://gerrit.cloudera.org:8080/16531 Reviewed-by: Joe McDonnell Tested-by: Impala Public Jenkins > ABFS reads should use hdfsPreadFully > > > Key: IMPALA-9606 > URL: https://issues.apache.org/jira/browse/IMPALA-9606 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > > In IMPALA-8525, hdfs preads were enabled by default when reading data from > S3. IMPALA-8525 deferred enabling preads for ABFS because they didn't > significantly improve performance. After some more investigation into the > ABFS input streams, I think it is safe to use {{hdfsPreadFully}} for ABFS > reads. > The ABFS client uses a different model for fetching data compared to S3A. > Details are beyond the scope of this JIRA, but it is related to a feature in > ABFS called "read-aheads". ABFS has logic to pre-fetch data it *thinks* will > be required by the client. By default, it pre-fetches # cores * 4 MB of data. > If the requested data exists in the client cache, it is read from the cache. > However, there is no real drawback to using {{hdfsPreadFully}} for ABFS > reads. It's definitely safer, because while the current implementation of > ABFS always returns the amount of requested data, only the {{hdfsPreadFully}} > API makes that guarantee. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8525) preads should use hdfsPreadFully rather than hdfsPread
[ https://issues.apache.org/jira/browse/IMPALA-8525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205897#comment-17205897 ] ASF subversion and git services commented on IMPALA-8525: - Commit 8e9cf51f6b328f500acf7c577289c5b888fd15d2 in impala's branch refs/heads/master from Sahil Takiar [ https://gitbox.apache.org/repos/asf?p=impala.git;h=8e9cf51 ] IMPALA-9606: ABFS reads should use hdfsPreadFully Similar to IMPALA-8525, but for ABFS, instead of S3A. I don't expect this to make a major improvement in performance, like it did for S3A, although I am still seeing a marginal improvement during some ad-hoc testing (about 5% scan perf improvement). The reason is that the implementation of the ABFS and S3A clients are very different, ABFS already reads all data requested in a single hdfsRead call. I ran the query 'select * from abfs_test_store_sales order by ss_net_profit limit 10;' several times to validate that perf does not regress. In fact, it does improve slightly for this query. The table 'abfs_test_store_sales' is just a copy of the mini-cluster's tpcds_parquet.store_sales, although it is not partitioned. Testing: * Tested against a ABFS storage account I have access to * Ran several queries to validate there are no functional or perf regressions. Change-Id: I994ea30cf31abc66f5d82d9b3c8e185d2bd06147 Reviewed-on: http://gerrit.cloudera.org:8080/16531 Reviewed-by: Joe McDonnell Tested-by: Impala Public Jenkins > preads should use hdfsPreadFully rather than hdfsPread > -- > > Key: IMPALA-8525 > URL: https://issues.apache.org/jira/browse/IMPALA-8525 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Fix For: Impala 3.4.0 > > > Impala preads (only enabled if {{use_hdfs_pread}} is true) use the > {{hdfsPread}} API from libhdfs, which ultimately invokes > {{PositionedReadable#read(long position, byte[] buffer, int offset, int > length)}} in the HDFS-client. > {{PositionedReadable}} also exposes the method {{readFully(long position, > byte[] buffer, int offset, int length)}}. The difference is that {{#read}} > will "Read up to the specified number of bytes" whereas {{#readFully}} will > "Read the specified number of bytes". So there is no guarantee that {{#read}} > will read *all* of the request bytes. > Impala calls {{hdfsPread}} inside {{hdfs-file-reader.cc}} and invokes it > inside a while loop until all the requested bytes have been read from the > file. This can cause a few performance issues: > (1) if the underlying {{FileSystem}} does not support ByteBuffer reads > (HDFS-2834) (e.g. S3A does not support this feature) then {{hdfsPread}} will > allocate a Java array equal in size to specified length of the buffer; the > call to {{PositionedReadable#read}} may only fill up the buffer partially; > Impala will repeat the call to {{hdfsPread}} since the buffer was not filled, > which will cause another large array allocation; this can result in a lot of > wasted time doing unnecessary array allocations > (2) given that Impala calls {{hdfsPread}} in a while loop, there is no point > in continuously calling {{hdfsPread}} when a single call to > {{hdfsPreadFully}} will achieve the same thing (this doesn't actually affect > performance much, but is unnecessary) > Prior solutions to this problem have been to introduce a "chunk-size" to > Impala reads (https://gerrit.cloudera.org/#/c/63/ - S3: DiskIoMgr related > changes for S3). However, with the migration to {{hdfsPreadFully}} the > chunk-size is no longer necessary. > Furthermore, preads are most effective when the data is read all at once > (e.g. in 8 MB chunks as specified by {{read_size}}) rather than in smaller > chunks (typically 128K). For example, {{DFSInputStream#read(long position, > byte[] buffer, int offset, int length)}} opens up remote block readers with a > byte range determined by the value of {{length}} passed into the {{#read}} > call. Similarly, {{S3AInputStream#readFully}} will issue an HTTP GET request > with the size of the read specified by the given {{length}} (although fadvise > must be set to RANDOM for this to work). > This work is dependent on exposing {{readFully}} via libhdfs first: HDFS-14564 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10200) WebUI static directory cleanup
[ https://issues.apache.org/jira/browse/IMPALA-10200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205895#comment-17205895 ] ASF subversion and git services commented on IMPALA-10200: -- Commit fd00efb32342b8201f20fd6d38ab773adebd3621 in impala's branch refs/heads/master from Tamas Mate [ https://gitbox.apache.org/repos/asf?p=impala.git;h=fd00efb ] IMPALA-10200: WebUI static directory cleanup This change removes the unused index.html file from the static directory. The page only contained a simple "Impala Webserver" header but it was renderable and therefore could have been misleading. After removal, the static www/ endpoint returns the expected "Directory Listing Denied" error. Testing: - Manually tested the WebUI without www/index.html Change-Id: I108bb4c6a371b6d1ec157d54ac078604b243ecc2 Reviewed-on: http://gerrit.cloudera.org:8080/16528 Reviewed-by: Csaba Ringhofer Tested-by: Impala Public Jenkins > WebUI static directory cleanup > -- > > Key: IMPALA-10200 > URL: https://issues.apache.org/jira/browse/IMPALA-10200 > Project: IMPALA > Issue Type: Task > Components: Backend >Affects Versions: Impala 3.0, Impala 4.0 >Reporter: Tamas Mate >Assignee: Tamas Mate >Priority: Trivial > > There is an unused index.html file under the default /www/ WebUI static file > directory which only contains the {{Impala Webserver}} text: > {code:html} > > > Impala Webserver > > > {code} > This file could be removed as there is no need for an index.html in the > static directory. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-9606) ABFS reads should use hdfsPreadFully
[ https://issues.apache.org/jira/browse/IMPALA-9606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-9606. -- Fix Version/s: Impala 4.0 Resolution: Fixed > ABFS reads should use hdfsPreadFully > > > Key: IMPALA-9606 > URL: https://issues.apache.org/jira/browse/IMPALA-9606 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Fix For: Impala 4.0 > > > In IMPALA-8525, hdfs preads were enabled by default when reading data from > S3. IMPALA-8525 deferred enabling preads for ABFS because they didn't > significantly improve performance. After some more investigation into the > ABFS input streams, I think it is safe to use {{hdfsPreadFully}} for ABFS > reads. > The ABFS client uses a different model for fetching data compared to S3A. > Details are beyond the scope of this JIRA, but it is related to a feature in > ABFS called "read-aheads". ABFS has logic to pre-fetch data it *thinks* will > be required by the client. By default, it pre-fetches # cores * 4 MB of data. > If the requested data exists in the client cache, it is read from the cache. > However, there is no real drawback to using {{hdfsPreadFully}} for ABFS > reads. It's definitely safer, because while the current implementation of > ABFS always returns the amount of requested data, only the {{hdfsPreadFully}} > API makes that guarantee. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10206) Replace MD5 and SHA1 hash with SHA-512 for Squeasel
Wenzhe Zhou created IMPALA-10206: Summary: Replace MD5 and SHA1 hash with SHA-512 for Squeasel Key: IMPALA-10206 URL: https://issues.apache.org/jira/browse/IMPALA-10206 Project: IMPALA Issue Type: Improvement Components: Backend Affects Versions: Impala 4.0 Reporter: Wenzhe Zhou To support FIPS, we have to use FIPS-approved algorithms. Squeasel use MD5 and SHA-1 hash algorithms. But MD5 is one of forbidden algorithms, and SHA-1 is soon to be deprecated. We have to replace MD5 and SHA-1 with FIPS-approved algorithms, like SHA-512. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10205) Replace MD5 hash with SHA-512 for data file path of IcebergTable
[ https://issues.apache.org/jira/browse/IMPALA-10205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenzhe Zhou updated IMPALA-10205: - Labels: FIPS (was: ) > Replace MD5 hash with SHA-512 for data file path of IcebergTable > > > Key: IMPALA-10205 > URL: https://issues.apache.org/jira/browse/IMPALA-10205 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Affects Versions: Impala 4.0 >Reporter: Wenzhe Zhou >Assignee: Wenzhe Zhou >Priority: Major > Labels: FIPS > Fix For: Impala 4.0 > > > To support FIPS, all new code have to use FIPS-approved algorithms. Current > code generate data path hash in MD5 hash for Iceberg Table. But MD5 is one of > forbidden algorithms. We have to replace MD5 with FIPS-approved algorithm, > like SHA-512. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-10206) Replace MD5 and SHA1 hash with SHA-512 for Squeasel
[ https://issues.apache.org/jira/browse/IMPALA-10206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenzhe Zhou reassigned IMPALA-10206: Assignee: Wenzhe Zhou > Replace MD5 and SHA1 hash with SHA-512 for Squeasel > --- > > Key: IMPALA-10206 > URL: https://issues.apache.org/jira/browse/IMPALA-10206 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 4.0 >Reporter: Wenzhe Zhou >Assignee: Wenzhe Zhou >Priority: Major > Labels: FIPS > > To support FIPS, we have to use FIPS-approved algorithms. Squeasel use MD5 > and SHA-1 hash algorithms. But MD5 is one of forbidden algorithms, and SHA-1 > is soon to be deprecated. We have to replace MD5 and SHA-1 with FIPS-approved > algorithms, like SHA-512. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10207) Replace MD5 hash with SHA-512 for lineage graph
Wenzhe Zhou created IMPALA-10207: Summary: Replace MD5 hash with SHA-512 for lineage graph Key: IMPALA-10207 URL: https://issues.apache.org/jira/browse/IMPALA-10207 Project: IMPALA Issue Type: Improvement Components: Frontend Affects Versions: Impala 4.0 Reporter: Wenzhe Zhou Assignee: Wenzhe Zhou To support FIPS, we have to use FIPS-approved algorithms. We use MD5 hash algorithms for lineage graph. But MD5 is one of forbidden algorithms. We have to replace MD5 with FIPS-approved algorithms, like SHA-512. We might need to figure out if there are external dependencies on the hash. e.g. the lineage graph was consumed by component services, and maybe they somehow rely on the hash being consistent across versions (seems unlikely). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10207) Replace MD5 hash with SHA-512 for lineage graph
[ https://issues.apache.org/jira/browse/IMPALA-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenzhe Zhou updated IMPALA-10207: - Description: To support FIPS, we have to use FIPS-approved algorithms. We use MD5 hash algorithms for lineage graph. But MD5 is one of forbidden algorithms for FIPS. We have to replace MD5 with FIPS-approved algorithm, like SHA-512. We might need to figure out if there are external dependencies on the hash. e.g. the lineage graph was consumed by component services, and maybe they somehow rely on the hash being consistent across versions (seems unlikely). was: To support FIPS, we have to use FIPS-approved algorithms. We use MD5 hash algorithms for lineage graph. But MD5 is one of forbidden algorithms. We have to replace MD5 with FIPS-approved algorithms, like SHA-512. We might need to figure out if there are external dependencies on the hash. e.g. the lineage graph was consumed by component services, and maybe they somehow rely on the hash being consistent across versions (seems unlikely). > Replace MD5 hash with SHA-512 for lineage graph > --- > > Key: IMPALA-10207 > URL: https://issues.apache.org/jira/browse/IMPALA-10207 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Affects Versions: Impala 4.0 >Reporter: Wenzhe Zhou >Assignee: Wenzhe Zhou >Priority: Major > Labels: FIPS > > To support FIPS, we have to use FIPS-approved algorithms. We use MD5 hash > algorithms for lineage graph. But MD5 is one of forbidden algorithms for > FIPS. We have to replace MD5 with FIPS-approved algorithm, like SHA-512. > We might need to figure out if there are external dependencies on the hash. > e.g. the lineage graph was consumed by component services, and maybe they > somehow rely on the hash being consistent across versions (seems unlikely). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10205) Replace MD5 hash with SHA-512 for data file path of IcebergTable
[ https://issues.apache.org/jira/browse/IMPALA-10205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenzhe Zhou updated IMPALA-10205: - Description: To support FIPS, all new code have to use FIPS-approved algorithms. Current code generate data path hash in MD5 hash for Iceberg Table. But MD5 is one of forbidden algorithms for FIPS. We have to replace MD5 with FIPS-approved algorithm, like SHA-512. (was: To support FIPS, all new code have to use FIPS-approved algorithms. Current code generate data path hash in MD5 hash for Iceberg Table. But MD5 is one of forbidden algorithms. We have to replace MD5 with FIPS-approved algorithm, like SHA-512. ) > Replace MD5 hash with SHA-512 for data file path of IcebergTable > > > Key: IMPALA-10205 > URL: https://issues.apache.org/jira/browse/IMPALA-10205 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Affects Versions: Impala 4.0 >Reporter: Wenzhe Zhou >Assignee: Wenzhe Zhou >Priority: Major > Labels: FIPS > Fix For: Impala 4.0 > > > To support FIPS, all new code have to use FIPS-approved algorithms. Current > code generate data path hash in MD5 hash for Iceberg Table. But MD5 is one of > forbidden algorithms for FIPS. We have to replace MD5 with FIPS-approved > algorithm, like SHA-512. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10206) Replace MD5 and SHA1 hash with SHA-512 for Squeasel
[ https://issues.apache.org/jira/browse/IMPALA-10206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenzhe Zhou updated IMPALA-10206: - Description: To support FIPS, we have to use FIPS-approved algorithms. Squeasel use MD5 and SHA-1 hash algorithms. But MD5 is one of forbidden algorithms for FIPS, and SHA-1 is soon to be deprecated. We have to replace MD5 and SHA-1 with FIPS-approved algorithms, like SHA-512. (was: To support FIPS, we have to use FIPS-approved algorithms. Squeasel use MD5 and SHA-1 hash algorithms. But MD5 is one of forbidden algorithms, and SHA-1 is soon to be deprecated. We have to replace MD5 and SHA-1 with FIPS-approved algorithms, like SHA-512. ) > Replace MD5 and SHA1 hash with SHA-512 for Squeasel > --- > > Key: IMPALA-10206 > URL: https://issues.apache.org/jira/browse/IMPALA-10206 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 4.0 >Reporter: Wenzhe Zhou >Assignee: Wenzhe Zhou >Priority: Major > Labels: FIPS > > To support FIPS, we have to use FIPS-approved algorithms. Squeasel use MD5 > and SHA-1 hash algorithms. But MD5 is one of forbidden algorithms for FIPS, > and SHA-1 is soon to be deprecated. We have to replace MD5 and SHA-1 with > FIPS-approved algorithms, like SHA-512. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10208) Drop stats doesnt remove impala_intermediate_stats_num_chunks from PARTITION_PARAMS
Venkat Sambath created IMPALA-10208: --- Summary: Drop stats doesnt remove impala_intermediate_stats_num_chunks from PARTITION_PARAMS Key: IMPALA-10208 URL: https://issues.apache.org/jira/browse/IMPALA-10208 Project: IMPALA Issue Type: Bug Reporter: Venkat Sambath Attachments: image-2020-10-02-10-38-16-153.png, image-2020-10-02-10-38-48-144.png, image-2020-10-02-10-39-18-642.png Steps to replicate the issue: Step1: CREATE TABLE impala_partition_test1 ( a INT ) PARTITIONED BY ( b STRING ); alter table impala_partition_test1 add partition(b="part1"); alter table impala_partition_test1 add partition(b="part2"); alter table impala_partition_test1 add partition(b="part3"); alter table impala_partition_test1 add partition(b="part4"); Step2: Populating the partitions for i in `seq 1 10`; do base64 /dev/urandom | head -c 5000K > text_data && hdfs dfs -put text_data hdfs://nameservice1/user/hive/warehouse/impala_partition_test1/b=part1/test_${i}; done for i in `seq 1 10`; do base64 /dev/urandom | head -c 5000K > text_data && hdfs dfs -put text_data hdfs://nameservice1/user/hive/warehouse/impala_partition_test1/b=part2/test_${i}; done for i in `seq 1 10`; do base64 /dev/urandom | head -c 5000K > text_data && hdfs dfs -put text_data hdfs://nameservice1/user/hive/warehouse/impala_partition_test1/b=part3/test_${i}; done for i in `seq 1 10`; do base64 /dev/urandom | head -c 5000K > text_data && hdfs dfs -put text_data hdfs://nameservice1/user/hive/warehouse/impala_partition_test1/b=part4/test_${i}; done Step3: Run compute incremental stats impala_partition_test1; Step4: In HMS DB when you run the below query {code:java} select A.TBL_NAME, B.PART_NAME, C.PARAM_KEY, sum(length(C.PARAM_KEY) + length(C.PARAM_VALUE)) from TBLS A join PARTITIONS B join PARTITION_PARAMS C on A.TBL_ID = B.TBL_ID and C.PART_ID=B.PART_ID and C.PARAM_KEY like "%impala_intermediate_stats%" group by A.TBL_NAME,B.PART_NAME,C.PARAM_KEY; {code} You will be noticing !image-2020-10-02-10-39-18-642.png! Step5: After you drop the stats [drop stats impala_partition_test1 ] you still be noticing impala_intermediate_stats_num_chunks left unremoved. !image-2020-10-02-10-38-48-144.png! When you have million partitions this could contribute to 37mb I suppose. Requesting you to remove impala_intermediate_stats_num_chunks while we drop stats from table. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10208) Drop stats doesnt remove impala_intermediate_stats_num_chunks from PARTITION_PARAMS
[ https://issues.apache.org/jira/browse/IMPALA-10208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venkat Sambath updated IMPALA-10208: Description: Steps to replicate the issue: Step1: {code:java} CREATE TABLE impala_partition_test1 ( a INT ) PARTITIONED BY ( b STRING ); alter table impala_partition_test1 add partition(b="part1"); alter table impala_partition_test1 add partition(b="part2"); alter table impala_partition_test1 add partition(b="part3"); alter table impala_partition_test1 add partition(b="part4"); {code} Step2: Populating the partitions {code:java} for i in `seq 1 10`; do base64 /dev/urandom | head -c 5000K > text_data && hdfs dfs -put text_data hdfs://nameservice1/user/hive/warehouse/impala_partition_test1/b=part1/test_${i}; done for i in `seq 1 10`; do base64 /dev/urandom | head -c 5000K > text_data && hdfs dfs -put text_data hdfs://nameservice1/user/hive/warehouse/impala_partition_test1/b=part2/test_${i}; done for i in `seq 1 10`; do base64 /dev/urandom | head -c 5000K > text_data && hdfs dfs -put text_data hdfs://nameservice1/user/hive/warehouse/impala_partition_test1/b=part3/test_${i}; done for i in `seq 1 10`; do base64 /dev/urandom | head -c 5000K > text_data && hdfs dfs -put text_data hdfs://nameservice1/user/hive/warehouse/impala_partition_test1/b=part4/test_${i}; done {code} Step3: Run compute incremental stats impala_partition_test1; Step4: In HMS DB when you run the below query {code:java} select A.TBL_NAME, B.PART_NAME, C.PARAM_KEY, sum(length(C.PARAM_KEY) + length(C.PARAM_VALUE)) from TBLS A join PARTITIONS B join PARTITION_PARAMS C on A.TBL_ID = B.TBL_ID and C.PART_ID=B.PART_ID and C.PARAM_KEY like "%impala_intermediate_stats%" group by A.TBL_NAME,B.PART_NAME,C.PARAM_KEY; {code} You will be noticing !image-2020-10-02-10-39-18-642.png! Step5: After you drop the stats [drop stats impala_partition_test1 ] you still be noticing impala_intermediate_stats_num_chunks left unremoved. !image-2020-10-02-10-38-48-144.png! When you have million partitions this could contribute to 37mb I suppose. Requesting you to remove impala_intermediate_stats_num_chunks while we drop stats from table. was: Steps to replicate the issue: Step1: CREATE TABLE impala_partition_test1 ( a INT ) PARTITIONED BY ( b STRING ); alter table impala_partition_test1 add partition(b="part1"); alter table impala_partition_test1 add partition(b="part2"); alter table impala_partition_test1 add partition(b="part3"); alter table impala_partition_test1 add partition(b="part4"); Step2: Populating the partitions for i in `seq 1 10`; do base64 /dev/urandom | head -c 5000K > text_data && hdfs dfs -put text_data hdfs://nameservice1/user/hive/warehouse/impala_partition_test1/b=part1/test_${i}; done for i in `seq 1 10`; do base64 /dev/urandom | head -c 5000K > text_data && hdfs dfs -put text_data hdfs://nameservice1/user/hive/warehouse/impala_partition_test1/b=part2/test_${i}; done for i in `seq 1 10`; do base64 /dev/urandom | head -c 5000K > text_data && hdfs dfs -put text_data hdfs://nameservice1/user/hive/warehouse/impala_partition_test1/b=part3/test_${i}; done for i in `seq 1 10`; do base64 /dev/urandom | head -c 5000K > text_data && hdfs dfs -put text_data hdfs://nameservice1/user/hive/warehouse/impala_partition_test1/b=part4/test_${i}; done Step3: Run compute incremental stats impala_partition_test1; Step4: In HMS DB when you run the below query {code:java} select A.TBL_NAME, B.PART_NAME, C.PARAM_KEY, sum(length(C.PARAM_KEY) + length(C.PARAM_VALUE)) from TBLS A join PARTITIONS B join PARTITION_PARAMS C on A.TBL_ID = B.TBL_ID and C.PART_ID=B.PART_ID and C.PARAM_KEY like "%impala_intermediate_stats%" group by A.TBL_NAME,B.PART_NAME,C.PARAM_KEY; {code} You will be noticing !image-2020-10-02-10-39-18-642.png! Step5: After you drop the stats [drop stats impala_partition_test1 ] you still be noticing impala_intermediate_stats_num_chunks left unremoved. !image-2020-10-02-10-38-48-144.png! When you have million partitions this could contribute to 37mb I suppose. Requesting you to remove impala_intermediate_stats_num_chunks while we drop st
[jira] [Updated] (IMPALA-10208) Drop stats doesnt remove impala_intermediate_stats_num_chunks from PARTITION_PARAMS
[ https://issues.apache.org/jira/browse/IMPALA-10208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-10208: --- Component/s: Catalog > Drop stats doesnt remove impala_intermediate_stats_num_chunks from > PARTITION_PARAMS > --- > > Key: IMPALA-10208 > URL: https://issues.apache.org/jira/browse/IMPALA-10208 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Venkat Sambath >Priority: Minor > Labels: newbie, ramp-up > Attachments: image-2020-10-02-10-38-16-153.png, > image-2020-10-02-10-38-48-144.png, image-2020-10-02-10-39-18-642.png > > > Steps to replicate the issue: > Step1: > {code:java} > CREATE TABLE impala_partition_test1 ( > >a INT > > ) > > PARTITIONED BY ( > >b STRING > > ); alter table impala_partition_test1 add partition(b="part1"); > alter table impala_partition_test1 add partition(b="part2"); > alter table impala_partition_test1 add partition(b="part3"); > alter table impala_partition_test1 add partition(b="part4"); > {code} > Step2: Populating the partitions > {code:java} > for i in `seq 1 10`; do base64 /dev/urandom | head -c 5000K > text_data && > hdfs dfs -put text_data > hdfs://nameservice1/user/hive/warehouse/impala_partition_test1/b=part1/test_${i}; > done > for i in `seq 1 10`; do base64 /dev/urandom | head -c 5000K > text_data && > hdfs dfs -put text_data > hdfs://nameservice1/user/hive/warehouse/impala_partition_test1/b=part2/test_${i}; > done > for i in `seq 1 10`; do base64 /dev/urandom | head -c 5000K > text_data && > hdfs dfs -put text_data > hdfs://nameservice1/user/hive/warehouse/impala_partition_test1/b=part3/test_${i}; > done > for i in `seq 1 10`; do base64 /dev/urandom | head -c 5000K > text_data && > hdfs dfs -put text_data > hdfs://nameservice1/user/hive/warehouse/impala_partition_test1/b=part4/test_${i}; > done > {code} > Step3: Run compute incremental stats impala_partition_test1; > Step4: In HMS DB when you run the below query > {code:java} > select A.TBL_NAME, B.PART_NAME, C.PARAM_KEY, sum(length(C.PARAM_KEY) + > length(C.PARAM_VALUE)) from TBLS A join PARTITIONS B join PARTITION_PARAMS C > on A.TBL_ID = B.TBL_ID and C.PART_ID=B.PART_ID and C.PARAM_KEY like > "%impala_intermediate_stats%" group by A.TBL_NAME,B.PART_NAME,C.PARAM_KEY; > {code} > You will be noticing > !image-2020-10-02-10-39-18-642.png! > Step5: After you drop the stats [drop stats impala_partition_test1 ] you > still be noticing impala_intermediate_stats_num_chunks left unremoved. > !image-2020-10-02-10-38-48-144.png! > When you have million partitions this could contribute to 37mb I suppose. > Requesting you to remove impala_intermediate_stats_num_chunks while we drop > stats from table. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10208) Drop stats doesnt remove impala_intermediate_stats_num_chunks from PARTITION_PARAMS
[ https://issues.apache.org/jira/browse/IMPALA-10208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-10208: --- Labels: newbie ramp-up (was: ) > Drop stats doesnt remove impala_intermediate_stats_num_chunks from > PARTITION_PARAMS > --- > > Key: IMPALA-10208 > URL: https://issues.apache.org/jira/browse/IMPALA-10208 > Project: IMPALA > Issue Type: Bug >Reporter: Venkat Sambath >Priority: Minor > Labels: newbie, ramp-up > Attachments: image-2020-10-02-10-38-16-153.png, > image-2020-10-02-10-38-48-144.png, image-2020-10-02-10-39-18-642.png > > > Steps to replicate the issue: > Step1: > {code:java} > CREATE TABLE impala_partition_test1 ( > >a INT > > ) > > PARTITIONED BY ( > >b STRING > > ); alter table impala_partition_test1 add partition(b="part1"); > alter table impala_partition_test1 add partition(b="part2"); > alter table impala_partition_test1 add partition(b="part3"); > alter table impala_partition_test1 add partition(b="part4"); > {code} > Step2: Populating the partitions > {code:java} > for i in `seq 1 10`; do base64 /dev/urandom | head -c 5000K > text_data && > hdfs dfs -put text_data > hdfs://nameservice1/user/hive/warehouse/impala_partition_test1/b=part1/test_${i}; > done > for i in `seq 1 10`; do base64 /dev/urandom | head -c 5000K > text_data && > hdfs dfs -put text_data > hdfs://nameservice1/user/hive/warehouse/impala_partition_test1/b=part2/test_${i}; > done > for i in `seq 1 10`; do base64 /dev/urandom | head -c 5000K > text_data && > hdfs dfs -put text_data > hdfs://nameservice1/user/hive/warehouse/impala_partition_test1/b=part3/test_${i}; > done > for i in `seq 1 10`; do base64 /dev/urandom | head -c 5000K > text_data && > hdfs dfs -put text_data > hdfs://nameservice1/user/hive/warehouse/impala_partition_test1/b=part4/test_${i}; > done > {code} > Step3: Run compute incremental stats impala_partition_test1; > Step4: In HMS DB when you run the below query > {code:java} > select A.TBL_NAME, B.PART_NAME, C.PARAM_KEY, sum(length(C.PARAM_KEY) + > length(C.PARAM_VALUE)) from TBLS A join PARTITIONS B join PARTITION_PARAMS C > on A.TBL_ID = B.TBL_ID and C.PART_ID=B.PART_ID and C.PARAM_KEY like > "%impala_intermediate_stats%" group by A.TBL_NAME,B.PART_NAME,C.PARAM_KEY; > {code} > You will be noticing > !image-2020-10-02-10-39-18-642.png! > Step5: After you drop the stats [drop stats impala_partition_test1 ] you > still be noticing impala_intermediate_stats_num_chunks left unremoved. > !image-2020-10-02-10-38-48-144.png! > When you have million partitions this could contribute to 37mb I suppose. > Requesting you to remove impala_intermediate_stats_num_chunks while we drop > stats from table. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10208) Drop stats doesnt remove impala_intermediate_stats_num_chunks from PARTITION_PARAMS
[ https://issues.apache.org/jira/browse/IMPALA-10208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205985#comment-17205985 ] Tim Armstrong commented on IMPALA-10208: Thanks for the bug report! This definitely looks like a bug but the severity doesn't seem too bad given the per-partition overhead is not too high. I labelled it as a newbie/ramp-up because it would be ideal for someone new to the project to pick up. > Drop stats doesnt remove impala_intermediate_stats_num_chunks from > PARTITION_PARAMS > --- > > Key: IMPALA-10208 > URL: https://issues.apache.org/jira/browse/IMPALA-10208 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Venkat Sambath >Priority: Minor > Labels: newbie, ramp-up > Attachments: image-2020-10-02-10-38-16-153.png, > image-2020-10-02-10-38-48-144.png, image-2020-10-02-10-39-18-642.png > > > Steps to replicate the issue: > Step1: > {code:java} > CREATE TABLE impala_partition_test1 ( > >a INT > > ) > > PARTITIONED BY ( > >b STRING > > ); alter table impala_partition_test1 add partition(b="part1"); > alter table impala_partition_test1 add partition(b="part2"); > alter table impala_partition_test1 add partition(b="part3"); > alter table impala_partition_test1 add partition(b="part4"); > {code} > Step2: Populating the partitions > {code:java} > for i in `seq 1 10`; do base64 /dev/urandom | head -c 5000K > text_data && > hdfs dfs -put text_data > hdfs://nameservice1/user/hive/warehouse/impala_partition_test1/b=part1/test_${i}; > done > for i in `seq 1 10`; do base64 /dev/urandom | head -c 5000K > text_data && > hdfs dfs -put text_data > hdfs://nameservice1/user/hive/warehouse/impala_partition_test1/b=part2/test_${i}; > done > for i in `seq 1 10`; do base64 /dev/urandom | head -c 5000K > text_data && > hdfs dfs -put text_data > hdfs://nameservice1/user/hive/warehouse/impala_partition_test1/b=part3/test_${i}; > done > for i in `seq 1 10`; do base64 /dev/urandom | head -c 5000K > text_data && > hdfs dfs -put text_data > hdfs://nameservice1/user/hive/warehouse/impala_partition_test1/b=part4/test_${i}; > done > {code} > Step3: Run compute incremental stats impala_partition_test1; > Step4: In HMS DB when you run the below query > {code:java} > select A.TBL_NAME, B.PART_NAME, C.PARAM_KEY, sum(length(C.PARAM_KEY) + > length(C.PARAM_VALUE)) from TBLS A join PARTITIONS B join PARTITION_PARAMS C > on A.TBL_ID = B.TBL_ID and C.PART_ID=B.PART_ID and C.PARAM_KEY like > "%impala_intermediate_stats%" group by A.TBL_NAME,B.PART_NAME,C.PARAM_KEY; > {code} > You will be noticing > !image-2020-10-02-10-39-18-642.png! > Step5: After you drop the stats [drop stats impala_partition_test1 ] you > still be noticing impala_intermediate_stats_num_chunks left unremoved. > !image-2020-10-02-10-38-48-144.png! > When you have million partitions this could contribute to 37mb I suppose. > Requesting you to remove impala_intermediate_stats_num_chunks while we drop > stats from table. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org