[jira] [Created] (IMPALA-10200) WebUI static directory cleanup

2020-10-01 Thread Tamas Mate (Jira)
Tamas Mate created IMPALA-10200:
---

 Summary: WebUI static directory cleanup
 Key: IMPALA-10200
 URL: https://issues.apache.org/jira/browse/IMPALA-10200
 Project: IMPALA
  Issue Type: Task
  Components: Backend
Affects Versions: Impala 3.0, Impala 4.0
Reporter: Tamas Mate
Assignee: Tamas Mate


There is an unused index.html file under the default /www/ WebUI static file 
directory which only contains the {{Impala Webserver}} text:
{code:html}


Impala Webserver


{code}
This file could be removed as there is no need for an index.html in the static 
directory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-10143) TestAcid.test_full_acid_original_files

2020-10-01 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-10143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy resolved IMPALA-10143.

Fix Version/s: Impala 4.0
   Resolution: Fixed

> TestAcid.test_full_acid_original_files
> --
>
> Key: IMPALA-10143
> URL: https://issues.apache.org/jira/browse/IMPALA-10143
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.0
>Reporter: Tamas Mate
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: flaky
> Fix For: Impala 4.0
>
> Attachments: 
> https_^^jenkins.impala.io^job^ubuntu-16.04-dockerised-tests^3077^.log
>
>
> This test seems to be flaky.
>  
> [https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/3077/testReport/junit/query_test.test_acid/TestAcid/test_full_acid_original_files_protocol__beeswax___exec_optionbatch_size___0___num_nodes___0___disable_codegen_rows_threshold___5000___disable_codegen___False___abort_on_error___1___exec_single_node_rows_threshold___0table_format__text_none_/]
> {code:java}
>  query_test/test_acid.py:153: in test_full_acid_original_files
> self.run_test_case('QueryTest/full-acid-original-file', vector, 
> unique_database)
> common/impala_test_suite.py:693: in run_test_case
> self.__verify_results_and_errors(vector, test_section, result, use_db)
> common/impala_test_suite.py:529: in __verify_results_and_errors
> replace_filenames_with_placeholder)
> common/test_result_verifier.py:456: in verify_raw_results
> VERIFIER_MAP[verifier](expected, actual)
> common/test_result_verifier.py:278: in verify_query_result_is_equal
> assert expected_results == actual_results
> E   assert Comparing QueryTestResults (expected vs actual):
> E 0,0,0 != 0,19,0
> E 0,1,1 != 0,20,1
> E 0,2,2 != 0,21,2
> E 0,3,3 != 0,22,3
> E 0,4,4 != 0,23,4
> {code}
> The test was added in IMPALA-9515.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10201) WebUI CSP best practice

2020-10-01 Thread Tamas Mate (Jira)
Tamas Mate created IMPALA-10201:
---

 Summary: WebUI CSP best practice
 Key: IMPALA-10201
 URL: https://issues.apache.org/jira/browse/IMPALA-10201
 Project: IMPALA
  Issue Type: Improvement
Affects Versions: Impala 4.0
Reporter: Tamas Mate


The Debug WebUI currently supports only the {{X-Frame-Options}} header, which 
is necessary due to backward compatibility, however in the future it will be 
replaced by the Content Security Policy’s {{frame-ancestors}} directive:
{quote}Content Security Policy’s frame-ancestors directive obsoletes the 
X-Frame-Options header. If a resource has both policies, the frame-ancestors 
policy SHOULD be enforced and the X-Frame-Options policy SHOULD be ignored 
[[w3.org]|https://www.w3.org/TR/CSP2/#frame-ancestors-and-frame-options].
{quote}
{quote}As described in Section 2.3.2.2, not all browsers implement 
X-Frame-Options in exactly the same way, which can lead to unintended results. 
And, given that the "X-" construction is deprecated [RFC6648], the 
X-Frame-Options header field will be replaced in the future by the 
Frame-Options directive in the Content Security Policy (CSP) version 1.1 
[CSP-1-1]. [[RFC 7034]|https://www.ietf.org/rfc/rfc7034.txt]
{quote}
CSP's {{frame-ancestor}} header should be implemented to adhere the current 
security best practices and depending on a deprecated feature in the future.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-10200) WebUI static directory cleanup

2020-10-01 Thread Tamas Mate (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-10200 started by Tamas Mate.
---
> WebUI static directory cleanup
> --
>
> Key: IMPALA-10200
> URL: https://issues.apache.org/jira/browse/IMPALA-10200
> Project: IMPALA
>  Issue Type: Task
>  Components: Backend
>Affects Versions: Impala 3.0, Impala 4.0
>Reporter: Tamas Mate
>Assignee: Tamas Mate
>Priority: Trivial
>
> There is an unused index.html file under the default /www/ WebUI static file 
> directory which only contains the {{Impala Webserver}} text:
> {code:html}
> 
> 
> Impala Webserver
> 
> 
> {code}
> This file could be removed as there is no need for an index.html in the 
> static directory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10181) Do a best-effort cleanup on failed INSERTs into Iceberg tables

2020-10-01 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-10181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy updated IMPALA-10181:
---
Labels: impala-iceberg  (was: impala-acid)

> Do a best-effort cleanup on failed INSERTs into Iceberg tables
> --
>
> Key: IMPALA-10181
> URL: https://issues.apache.org/jira/browse/IMPALA-10181
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-iceberg
>
> If the Impala Coordinator doesn't fail, we can clean up most in-progress 
> files.
> Although files written by crashed executors can still remain.
> If the coordinator crashes, then unfortunately there's not much we can do 
> currently.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-6686) Change the DESCRIBE DATABASE output to look more like Hive output

2020-10-01 Thread Csaba Ringhofer (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-6686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205636#comment-17205636
 ] 

Csaba Ringhofer commented on IMPALA-6686:
-

I have kept the current multi-row  output when adding managedlocation in 
https://gerrit.cloudera.org/#/c/16529/2

My plan is to add a query option in another commit that can switch between the 
current solution and one that uses exactly the same output as Hive. The current 
solution can be potentially deprecated in the future. Creating this switch 
seems smaller work than handling the potential issues coming from breaking 
tools rely on the current output.

> Change the DESCRIBE DATABASE output to look more like Hive output
> -
>
> Key: IMPALA-6686
> URL: https://issues.apache.org/jira/browse/IMPALA-6686
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Reporter: Fredy Wijaya
>Priority: Minor
>  Labels: compatibility, incompatibility
>
> In Hive:
> {noformat}
> describe database functional;
> +--+--++-+-+-+
> | db_name  | comment  | location   | 
> owner_name  | owner_type  | parameters  |
> +--+--++-+-+-+
> | tpch |  | hdfs://localhost:20500/test-warehouse/tpch.db  | foo  
>| USER| |
> +--+--++-+-+-+{noformat}
> In Impala:
> {noformat}
> describe database extended functional;
> +-+---+-+
> | name| location  | comment |
> +-+---+-+
> | tpch| hdfs://localhost:20500/test-warehouse/tpch.db | |
> | Owner:  |   | |
> | | foo   | USER|
> +-+---+-+
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9952) Invalid offset index in Parquet file

2020-10-01 Thread Jira


[ 
https://issues.apache.org/jira/browse/IMPALA-9952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205668#comment-17205668
 ] 

Zoltán Borók-Nagy commented on IMPALA-9952:
---

Hi [~guojingfeng], did you have a chance to try the patch?

>  Invalid offset index in Parquet file
> -
>
> Key: IMPALA-9952
> URL: https://issues.apache.org/jira/browse/IMPALA-9952
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.4.0
>Reporter: guojingfeng
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: Parquet
>
> When reading parquet file in impala 3.4, encountered the following error:
> {code:java}
> I0714 16:11:48.307806 1075820 runtime-state.cc:207] 
> 8c43203adb2d4fc8:0478df9b018b] Error from query 
> 8c43203adb2d4fc8:0478df9b: Invalid offset index in Parquet file 
> hdfs://path/4844de7af4545a39-e8ebc7da005f_2015704758_data.0.parq.
> I0714 16:11:48.834901 1075838 status.cc:126] 
> 8c43203adb2d4fc8:0478df9b02c0] Invalid offset index in Parquet file 
> hdfs://path/4844de7af4545a39-e8ebc7da005f_2015704758_data.0.parq.
> @   0xbf4ef9
> @  0x1748c41
> @  0x174e170
> @  0x1750e58
> @  0x17519f0
> @  0x1748559
> @  0x1510b41
> @  0x1512c8f
> @  0x137488a
> @  0x1375759
> @  0x1b48a19
> @ 0x7f34509f5e24
> @ 0x7f344d5ed35c
> I0714 16:11:48.835763 1075838 runtime-state.cc:207] 
> 8c43203adb2d4fc8:0478df9b02c0] Error from query 
> 8c43203adb2d4fc8:0478df9b: Invalid offset index in Parquet file 
> hdfs://path/4844de7af4545a39-e8ebc7da005f_2015704758_data.0.parq.
> I0714 16:11:48.893784 1075820 status.cc:126] 
> 8c43203adb2d4fc8:0478df9b018b] Top level rows aren't in sync during page 
> filtering in file 
> hdfs://path/4844de7af4545a39-e8ebc7da005f_2015704758_data.0.parq.
> @   0xbf4ef9
> @  0x1749104
> @  0x17494cc
> @  0x1751aee
> @  0x1748559
> @  0x1510b41
> @  0x1512c8f
> @  0x137488a
> @  0x1375759
> @  0x1b48a19
> @ 0x7f34509f5e24
> @ 0x7f344d5ed35c
> {code}
>  Corresponding source code:
> {code:java}
> Status HdfsParquetScanner::CheckPageFiltering() {
>   if (candidate_ranges_.empty() || scalar_readers_.empty()) return 
> Status::OK();  int64_t current_row = scalar_readers_[0]->LastProcessedRow();
>   for (int i = 1; i < scalar_readers_.size(); ++i) {
> if (current_row != scalar_readers_[i]->LastProcessedRow()) {
>   DCHECK(false);
>   return Status(Substitute(
>   "Top level rows aren't in sync during page filtering in file $0.", 
> filename()));
> }
>   }
>   return Status::OK();
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9180) Remove legacy ImpalaInternalService

2020-10-01 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205705#comment-17205705
 ] 

ASF subversion and git services commented on IMPALA-9180:
-

Commit 6bb3b88d05f89fb7a1a54f302b4d329cbf4f69ec in impala's branch 
refs/heads/master from wzhou-code
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=6bb3b88 ]

IMPALA-9180 (part 1): Remove legacy ImpalaInternalService

The legacy Thrift based Impala internal service has been deprecated
and can be removed now.

This patch removes ImpalaInternalService. All infrastructures around it
are cleaned up, except one place for flag be_port.
StatestoreSubscriber::subscriber_id consists be_port, but we cannot
change format of subscriber_id now. This remaining be_port issue will be
fixed in a succeeding patch (part 4).
TQueryCtx.coord_address is changed to TQueryCtx.coord_hostname since the
port in TQueryCtx.coord_address is set as be_port and is unused now.
Also Rename TQueryCtx.coord_krpc_address as TQueryCtx.coord_ip_address.

Testing:
 - Passed the exhaustive test.
 - Passed Quasar-L0 test.

Change-Id: I5fa83c8009590124dded4783f77ef70fa30119e6
Reviewed-on: http://gerrit.cloudera.org:8080/16291
Reviewed-by: Thomas Tauber-Marshall 
Tested-by: Impala Public Jenkins 


> Remove legacy ImpalaInternalService
> ---
>
> Key: IMPALA-9180
> URL: https://issues.apache.org/jira/browse/IMPALA-9180
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 3.4.0
>Reporter: Michael Ho
>Assignee: Wenzhe Zhou
>Priority: Minor
>
> Now that IMPALA-7984 is done, the legacy Thrift based Impala internal service 
> can now be removed. The port 22000 can also be freed up. In addition to code 
> change, the doc probably needs to be updated to reflect the fact that 22000 
> is no longer in use.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10164) Support HadoopCatalog for Iceberg table

2020-10-01 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205707#comment-17205707
 ] 

ASF subversion and git services commented on IMPALA-10164:
--

Commit 5b720a4d18cc2f2ade54ab223663521a3822343f in impala's branch 
refs/heads/master from skyyws
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=5b720a4 ]

IMPALA-10164: Supporting HadoopCatalog for Iceberg table

This patch mainly realizes creating Iceberg table by HadoopCatalog.
We only supported HadoopTables api before this patch, but now we can
use HadoopCatalog to create Iceberg table. When creating managed table,
we can use SQL like this:
  CREATE TABLE default.iceberg_test (
level string,
event_time timestamp,
message string,
  )
  STORED AS ICEBERG
  TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog',
'iceberg.catalog_location'='hdfs://test-warehouse/iceberg_test');
We supported two values ('hadoop.catalog', 'hadoop.tables') for
'iceberg.catalog' now. If you don't specify this property in your SQL,
default catalog type is 'hadoop.catalog'.
As for external Iceberg table, you can use SQL like this:
  CREATE EXTERNAL TABLE default.iceberg_test_external
  STORED AS ICEBERG
  TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog',
'iceberg.catalog_location'='hdfs://test-warehouse/iceberg_test',
'iceberg.table_identifier'='default.iceberg_test');
We cannot set table location for both managed and external Iceberg
table with 'hadoop.catalog', and 'SHOW CREATE TABLE' will not display
table location yet. We need to use 'DESCRIBE FORMATTED/EXTENDED' to
get this location info.
'iceberg.catalog_location' is necessary for 'hadoop.catalog' table,
which used to reserved Iceberg table metadata and data, and we use this
location to load table metadata from Iceberg.
'iceberg.table_identifier' is used for Icebreg TableIdentifier.If this
property not been specified in SQL, Impala will use database and table name
to load Iceberg table, which is 'default.iceberg_test_external' in above SQL.
This property value is splitted by '.', you can alse set this value like this:
'org.my_db.my_tbl'. And this property is valid for both managed and external
table.

Testing:
- Create table tests in functional_schema_template.sql
- Iceberg table create test in test_iceberg.py
- Iceberg table query test in test_scanners.py
- Iceberg table show create table test in test_show_create_table.py

Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Reviewed-on: http://gerrit.cloudera.org:8080/16446
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Support HadoopCatalog for Iceberg table
> ---
>
> Key: IMPALA-10164
> URL: https://issues.apache.org/jira/browse/IMPALA-10164
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Minor
>  Labels: impala-iceberg
>
> We just supported HadoopTable api to create Iceberg table in Impala now, it's 
> apparently not enough, so we preparing to support HadoopCatalog. The main 
> design is to add a new table property named 'iceberg.catalog', and default 
> value is 'hadoop.tables', we implement 'hadoop.catalog' to supported 
> HadoopCatalog api. We may even support 'hive.catalog' in the future.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10193) Limit the memory usage of the whole mini-cluster

2020-10-01 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205706#comment-17205706
 ] 

ASF subversion and git services commented on IMPALA-10193:
--

Commit a0a25a61c302d864315daa7f09827b37a37419d5 in impala's branch 
refs/heads/master from fifteencai
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=a0a25a6 ]

IMPALA-10193: Limit the memory usage for the whole test cluster

This patch introduces a new approach of limiting the memory usage
for both mini-cluster and CDH cluster.

Without this limit, clusters are prone to getting killed when running
in docker containers with a lower mem limit than host's memory size.
i.e. The mini-cluster may running in a
container with 32GB limitted by CGROUPS, while the host machine has
128GB. Under this circumstance, if the container is started with
'-privileged' command argument, both mini and CDH clusters compute
their mem_limit according to 128GB rather than 32GB. They will be
killed when attempting to apply for extra resource.

Currently, the mem-limit estimating algorithms for Impalad and Node
Manager are different:

for Impalad:  mem_limit = 0.7 * sys_mem / cluster_size (default is 3)

for Node Manager:
1. Leave aside 24GB, then fit the left into threasholds below.
2. The bare limit is 4GB and maximum limit 48GB

In headge of over-consumption, we

- Added a new environment variable IMPALA_CLUSTER_MAX_MEM_GB
- Modified the algorithm in 'bin/start-impala-cluster.py', making it
  taking IMPALA_CLUSTER_MAX_MEM_GB rather than sys_mem into account.
- Modified the logic in
 'testdata/cluster/node_templates/common/etc/hadoop/conf/yarn-site.xml.py'
  Similarly, making IMPALA_CLUSTER_MAX_MEM_GB substitutes for sys_mem .

Testing: this patch worked in a 32GB docker container running on a 128GB
 host machine. All 1188 unit tests get passed.

Change-Id: I8537fd748e279d5a0e689872aeb4dbfd0c84dc93
Reviewed-on: http://gerrit.cloudera.org:8080/16522
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Limit the memory usage of the whole mini-cluster
> 
>
> Key: IMPALA-10193
> URL: https://issues.apache.org/jira/browse/IMPALA-10193
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.4.0
>Reporter: Fifteen
>Assignee: Fifteen
>Priority: Minor
> Attachments: image-2020-09-28-17-18-15-358.png
>
>
> The mini-cluster contains 3 virtual nodes, and all of them runs in a single 
> 'Machine'. By quoting, it implies the machine can be a docker container. If 
> the container is started with `-priviledged` and the actual memory is limited 
> by CGROUPS, then the total memory in `htop` and the actual available memory 
> can be different! 
>  
> For example, in the container below, `htop` tells us the total memory is 
> 128GB, while the total memory set in CGROUPS is actually 32GB. If the acutal 
> mem usage exceeds 32GB, process (such as impalad, hivemaster2 etc.) get 
> killed.
>   !image-2020-09-28-17-18-15-358.png!
>  
> So we may need a way to limit the whole mini-cluster's memory usage.
>    



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10202) Enable file handle cache for ABFS files

2020-10-01 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10202:
-

 Summary: Enable file handle cache for ABFS files
 Key: IMPALA-10202
 URL: https://issues.apache.org/jira/browse/IMPALA-10202
 Project: IMPALA
  Issue Type: Improvement
Reporter: Sahil Takiar
Assignee: Sahil Takiar


We should enable the file handle cache for ABFS, we have already seen it 
benefit jobs that read data from S3A.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10201) WebUI CSP best practice

2020-10-01 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-10201:
---
Labels: newbie ramp-up  (was: )

> WebUI CSP best practice
> ---
>
> Key: IMPALA-10201
> URL: https://issues.apache.org/jira/browse/IMPALA-10201
> Project: IMPALA
>  Issue Type: Improvement
>Affects Versions: Impala 4.0
>Reporter: Tamas Mate
>Priority: Minor
>  Labels: newbie, ramp-up
>
> The Debug WebUI currently supports only the {{X-Frame-Options}} header, which 
> is necessary due to backward compatibility, however in the future it will be 
> replaced by the Content Security Policy’s {{frame-ancestors}} directive:
> {quote}Content Security Policy’s frame-ancestors directive obsoletes the 
> X-Frame-Options header. If a resource has both policies, the frame-ancestors 
> policy SHOULD be enforced and the X-Frame-Options policy SHOULD be ignored 
> [[w3.org]|https://www.w3.org/TR/CSP2/#frame-ancestors-and-frame-options].
> {quote}
> {quote}As described in Section 2.3.2.2, not all browsers implement 
> X-Frame-Options in exactly the same way, which can lead to unintended 
> results. And, given that the "X-" construction is deprecated [RFC6648], the 
> X-Frame-Options header field will be replaced in the future by the 
> Frame-Options directive in the Content Security Policy (CSP) version 1.1 
> [CSP-1-1]. [[RFC 7034]|https://www.ietf.org/rfc/rfc7034.txt]
> {quote}
> CSP's {{frame-ancestor}} header should be implemented to adhere the current 
> security best practices and depending on a deprecated feature in the future.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10189) Avoid unnecessarily loading metadata for drop stats DDL

2020-10-01 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-10189:
---
Summary: Avoid unnecessarily loading metadata for drop stats DDL  (was: 
Avoid unnecessarily loading metadata for compute/drop stats DDLs)

> Avoid unnecessarily loading metadata for drop stats DDL
> ---
>
> Key: IMPALA-10189
> URL: https://issues.apache.org/jira/browse/IMPALA-10189
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Catalog
>Affects Versions: Impala 2.10.0, Impala 2.11.0, Impala 3.0, Impala 2.12.0, 
> Impala 3.1.0, Impala 3.2.0, Impala 3.3.0, Impala 3.4.0
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Critical
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10203) Avoid unnecessarily loading metadata for compute stats DDL

2020-10-01 Thread Tim Armstrong (Jira)
Tim Armstrong created IMPALA-10203:
--

 Summary: Avoid unnecessarily loading metadata for compute stats DDL
 Key: IMPALA-10203
 URL: https://issues.apache.org/jira/browse/IMPALA-10203
 Project: IMPALA
  Issue Type: Sub-task
  Components: Catalog
Reporter: Tim Armstrong
Assignee: Kurt Deschler






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-3335) Allow single-node optimization with joins.

2020-10-01 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-3335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar reassigned IMPALA-3335:


Assignee: Sahil Takiar

> Allow single-node optimization with joins.
> --
>
> Key: IMPALA-3335
> URL: https://issues.apache.org/jira/browse/IMPALA-3335
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 2.5.0
>Reporter: Alexander Behm
>Assignee: Sahil Takiar
>Priority: Minor
>  Labels: ramp-up
>
> Now that IMPALA-561 has been fixed, we can remove the workaround that 
> disables the our single-node optimization for any plan with joins. See 
> MaxRowsProcessedVisitor.java:
> {code}
> } else if (caller instanceof HashJoinNode || caller instanceof 
> NestedLoopJoinNode) {
>   // Revisit when multiple scan nodes can be executed in a single fragment, 
> IMPALA-561
>   abort_ = true;
>   return;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-3335) Allow single-node optimization with joins.

2020-10-01 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-3335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-3335.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Allow single-node optimization with joins.
> --
>
> Key: IMPALA-3335
> URL: https://issues.apache.org/jira/browse/IMPALA-3335
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 2.5.0
>Reporter: Alexander Behm
>Assignee: Sahil Takiar
>Priority: Minor
>  Labels: ramp-up
> Fix For: Impala 4.0
>
>
> Now that IMPALA-561 has been fixed, we can remove the workaround that 
> disables the our single-node optimization for any plan with joins. See 
> MaxRowsProcessedVisitor.java:
> {code}
> } else if (caller instanceof HashJoinNode || caller instanceof 
> NestedLoopJoinNode) {
>   // Revisit when multiple scan nodes can be executed in a single fragment, 
> IMPALA-561
>   abort_ = true;
>   return;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10204) Evaluate AdmitQuery params for efficiency

2020-10-01 Thread Thomas Tauber-Marshall (Jira)
Thomas Tauber-Marshall created IMPALA-10204:
---

 Summary: Evaluate AdmitQuery params for efficiency
 Key: IMPALA-10204
 URL: https://issues.apache.org/jira/browse/IMPALA-10204
 Project: IMPALA
  Issue Type: Sub-task
  Components: Distributed Exec
Reporter: Thomas Tauber-Marshall


In the first version of the AdmissionControlService, we're sending the entire 
TQueryExecRequest/TQueryOptions as a sidecar to the admission controller. There 
are various things contained in the TQueryExecRequest/TQueryOptions that are 
not actually needed by the admission controller, and sending them increases 
network load and query running time unnecessarily.

We should evaluate how much of a perf impact there is due to this and how much 
could actually be removed.

Some small things may be non-trivial to remove and ultimately not worth it, for 
example the tree of TPlanNodes contains some info needed by the admission 
controller (eg. memory estimates) and some things that are not (eg. runtime 
filter descriptors). Making two parallel trees, one with only 
admission-required data (which would require extensive refactoring in the 
planner or wasted work in the coordinator copying out the required parts from 
what the planner returns) may be too complicated/introduce too much other 
overhead to be worth it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10205) Replace MD5 hash with SHA-512 for data file path of IcebergTable

2020-10-01 Thread Wenzhe Zhou (Jira)
Wenzhe Zhou created IMPALA-10205:


 Summary: Replace MD5 hash with SHA-512 for data file path of 
IcebergTable
 Key: IMPALA-10205
 URL: https://issues.apache.org/jira/browse/IMPALA-10205
 Project: IMPALA
  Issue Type: Improvement
  Components: Frontend
Affects Versions: Impala 4.0
Reporter: Wenzhe Zhou
 Fix For: Impala 4.0


To support FIPS, all new code have to use FIPS-approved algorithms. Current 
code generate data path hash in MD5 hash for Iceberg Table. But MD5 is one of 
forbidden algorithms. We have to replace MD5 with FIPS-approved algorithm, like 
SHA-512.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-10205) Replace MD5 hash with SHA-512 for data file path of IcebergTable

2020-10-01 Thread Wenzhe Zhou (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenzhe Zhou reassigned IMPALA-10205:


Assignee: Wenzhe Zhou

> Replace MD5 hash with SHA-512 for data file path of IcebergTable
> 
>
> Key: IMPALA-10205
> URL: https://issues.apache.org/jira/browse/IMPALA-10205
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 4.0
>Reporter: Wenzhe Zhou
>Assignee: Wenzhe Zhou
>Priority: Major
> Fix For: Impala 4.0
>
>
> To support FIPS, all new code have to use FIPS-approved algorithms. Current 
> code generate data path hash in MD5 hash for Iceberg Table. But MD5 is one of 
> forbidden algorithms. We have to replace MD5 with FIPS-approved algorithm, 
> like SHA-512.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9606) ABFS reads should use hdfsPreadFully

2020-10-01 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205896#comment-17205896
 ] 

ASF subversion and git services commented on IMPALA-9606:
-

Commit 8e9cf51f6b328f500acf7c577289c5b888fd15d2 in impala's branch 
refs/heads/master from Sahil Takiar
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=8e9cf51 ]

IMPALA-9606: ABFS reads should use hdfsPreadFully

Similar to IMPALA-8525, but for ABFS, instead of S3A.
I don't expect this to make a major improvement in performance,
like it did for S3A, although I am still seeing a marginal
improvement during some ad-hoc testing (about 5% scan perf
improvement). The reason is that the implementation of the ABFS
and S3A clients are very different, ABFS already reads all data
requested in a single hdfsRead call.

I ran the query 'select * from abfs_test_store_sales order by
ss_net_profit limit 10;' several times to validate that perf
does not regress. In fact, it does improve slightly for this query.
The table 'abfs_test_store_sales' is just a copy of the mini-cluster's
tpcds_parquet.store_sales, although it is not partitioned.

Testing:
* Tested against a ABFS storage account I have access to
* Ran several queries to validate there are no functional
  or perf regressions.

Change-Id: I994ea30cf31abc66f5d82d9b3c8e185d2bd06147
Reviewed-on: http://gerrit.cloudera.org:8080/16531
Reviewed-by: Joe McDonnell 
Tested-by: Impala Public Jenkins 


> ABFS reads should use hdfsPreadFully
> 
>
> Key: IMPALA-9606
> URL: https://issues.apache.org/jira/browse/IMPALA-9606
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> In IMPALA-8525, hdfs preads were enabled by default when reading data from 
> S3. IMPALA-8525 deferred enabling preads for ABFS because they didn't 
> significantly improve performance. After some more investigation into the 
> ABFS input streams, I think it is safe to use {{hdfsPreadFully}} for ABFS 
> reads.
> The ABFS client uses a different model for fetching data compared to S3A. 
> Details are beyond the scope of this JIRA, but it is related to a feature in 
> ABFS called "read-aheads". ABFS has logic to pre-fetch data it *thinks* will 
> be required by the client. By default, it pre-fetches # cores * 4 MB of data. 
> If the requested data exists in the client cache, it is read from the cache.
> However, there is no real drawback to using {{hdfsPreadFully}} for ABFS 
> reads. It's definitely safer, because while the current implementation of 
> ABFS always returns the amount of requested data, only the {{hdfsPreadFully}} 
> API makes that guarantee.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8525) preads should use hdfsPreadFully rather than hdfsPread

2020-10-01 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-8525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205897#comment-17205897
 ] 

ASF subversion and git services commented on IMPALA-8525:
-

Commit 8e9cf51f6b328f500acf7c577289c5b888fd15d2 in impala's branch 
refs/heads/master from Sahil Takiar
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=8e9cf51 ]

IMPALA-9606: ABFS reads should use hdfsPreadFully

Similar to IMPALA-8525, but for ABFS, instead of S3A.
I don't expect this to make a major improvement in performance,
like it did for S3A, although I am still seeing a marginal
improvement during some ad-hoc testing (about 5% scan perf
improvement). The reason is that the implementation of the ABFS
and S3A clients are very different, ABFS already reads all data
requested in a single hdfsRead call.

I ran the query 'select * from abfs_test_store_sales order by
ss_net_profit limit 10;' several times to validate that perf
does not regress. In fact, it does improve slightly for this query.
The table 'abfs_test_store_sales' is just a copy of the mini-cluster's
tpcds_parquet.store_sales, although it is not partitioned.

Testing:
* Tested against a ABFS storage account I have access to
* Ran several queries to validate there are no functional
  or perf regressions.

Change-Id: I994ea30cf31abc66f5d82d9b3c8e185d2bd06147
Reviewed-on: http://gerrit.cloudera.org:8080/16531
Reviewed-by: Joe McDonnell 
Tested-by: Impala Public Jenkins 


> preads should use hdfsPreadFully rather than hdfsPread
> --
>
> Key: IMPALA-8525
> URL: https://issues.apache.org/jira/browse/IMPALA-8525
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 3.4.0
>
>
> Impala preads (only enabled if {{use_hdfs_pread}} is true) use the 
> {{hdfsPread}} API from libhdfs, which ultimately invokes 
> {{PositionedReadable#read(long position, byte[] buffer, int offset, int 
> length)}} in the HDFS-client.
> {{PositionedReadable}} also exposes the method {{readFully(long position, 
> byte[] buffer, int offset, int length)}}. The difference is that {{#read}} 
> will "Read up to the specified number of bytes" whereas {{#readFully}} will 
> "Read the specified number of bytes". So there is no guarantee that {{#read}} 
> will read *all* of the request bytes.
> Impala calls {{hdfsPread}} inside {{hdfs-file-reader.cc}} and invokes it 
> inside a while loop until all the requested bytes have been read from the 
> file. This can cause a few performance issues:
> (1) if the underlying {{FileSystem}} does not support ByteBuffer reads 
> (HDFS-2834) (e.g. S3A does not support this feature) then {{hdfsPread}} will 
> allocate a Java array equal in size to specified length of the buffer; the 
> call to {{PositionedReadable#read}} may only fill up the buffer partially; 
> Impala will repeat the call to {{hdfsPread}} since the buffer was not filled, 
> which will cause another large array allocation; this can result in a lot of 
> wasted time doing unnecessary array allocations
> (2) given that Impala calls {{hdfsPread}} in a while loop, there is no point 
> in continuously calling {{hdfsPread}} when a single call to 
> {{hdfsPreadFully}} will achieve the same thing (this doesn't actually affect 
> performance much, but is unnecessary)
> Prior solutions to this problem have been to introduce a "chunk-size" to 
> Impala reads (https://gerrit.cloudera.org/#/c/63/ - S3: DiskIoMgr related 
> changes for S3). However, with the migration to {{hdfsPreadFully}} the 
> chunk-size is no longer necessary.
> Furthermore, preads are most effective when the data is read all at once 
> (e.g. in 8 MB chunks as specified by {{read_size}}) rather than in smaller 
> chunks (typically 128K). For example, {{DFSInputStream#read(long position, 
> byte[] buffer, int offset, int length)}} opens up remote block readers with a 
> byte range determined by the value of {{length}} passed into the {{#read}} 
> call. Similarly, {{S3AInputStream#readFully}} will issue an HTTP GET request 
> with the size of the read specified by the given {{length}} (although fadvise 
> must be set to RANDOM for this to work).
> This work is dependent on exposing {{readFully}} via libhdfs first: HDFS-14564



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10200) WebUI static directory cleanup

2020-10-01 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205895#comment-17205895
 ] 

ASF subversion and git services commented on IMPALA-10200:
--

Commit fd00efb32342b8201f20fd6d38ab773adebd3621 in impala's branch 
refs/heads/master from Tamas Mate
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=fd00efb ]

IMPALA-10200: WebUI static directory cleanup

This change removes the unused index.html file from the static
directory. The page only contained a simple "Impala Webserver" header
but it was renderable and therefore could have been misleading.
After removal, the static www/ endpoint returns the expected "Directory
Listing Denied" error.

Testing:
 - Manually tested the WebUI without www/index.html

Change-Id: I108bb4c6a371b6d1ec157d54ac078604b243ecc2
Reviewed-on: http://gerrit.cloudera.org:8080/16528
Reviewed-by: Csaba Ringhofer 
Tested-by: Impala Public Jenkins 


> WebUI static directory cleanup
> --
>
> Key: IMPALA-10200
> URL: https://issues.apache.org/jira/browse/IMPALA-10200
> Project: IMPALA
>  Issue Type: Task
>  Components: Backend
>Affects Versions: Impala 3.0, Impala 4.0
>Reporter: Tamas Mate
>Assignee: Tamas Mate
>Priority: Trivial
>
> There is an unused index.html file under the default /www/ WebUI static file 
> directory which only contains the {{Impala Webserver}} text:
> {code:html}
> 
> 
> Impala Webserver
> 
> 
> {code}
> This file could be removed as there is no need for an index.html in the 
> static directory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-9606) ABFS reads should use hdfsPreadFully

2020-10-01 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-9606.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> ABFS reads should use hdfsPreadFully
> 
>
> Key: IMPALA-9606
> URL: https://issues.apache.org/jira/browse/IMPALA-9606
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
>
> In IMPALA-8525, hdfs preads were enabled by default when reading data from 
> S3. IMPALA-8525 deferred enabling preads for ABFS because they didn't 
> significantly improve performance. After some more investigation into the 
> ABFS input streams, I think it is safe to use {{hdfsPreadFully}} for ABFS 
> reads.
> The ABFS client uses a different model for fetching data compared to S3A. 
> Details are beyond the scope of this JIRA, but it is related to a feature in 
> ABFS called "read-aheads". ABFS has logic to pre-fetch data it *thinks* will 
> be required by the client. By default, it pre-fetches # cores * 4 MB of data. 
> If the requested data exists in the client cache, it is read from the cache.
> However, there is no real drawback to using {{hdfsPreadFully}} for ABFS 
> reads. It's definitely safer, because while the current implementation of 
> ABFS always returns the amount of requested data, only the {{hdfsPreadFully}} 
> API makes that guarantee.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10206) Replace MD5 and SHA1 hash with SHA-512 for Squeasel

2020-10-01 Thread Wenzhe Zhou (Jira)
Wenzhe Zhou created IMPALA-10206:


 Summary: Replace MD5 and SHA1 hash with SHA-512 for Squeasel
 Key: IMPALA-10206
 URL: https://issues.apache.org/jira/browse/IMPALA-10206
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Affects Versions: Impala 4.0
Reporter: Wenzhe Zhou


To support FIPS, we have to use FIPS-approved algorithms. Squeasel use MD5 and 
SHA-1 hash algorithms. But MD5 is one of forbidden algorithms, and SHA-1 is 
soon to be deprecated. We have to replace MD5 and SHA-1 with FIPS-approved 
algorithms, like SHA-512. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10205) Replace MD5 hash with SHA-512 for data file path of IcebergTable

2020-10-01 Thread Wenzhe Zhou (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenzhe Zhou updated IMPALA-10205:
-
Labels: FIPS  (was: )

> Replace MD5 hash with SHA-512 for data file path of IcebergTable
> 
>
> Key: IMPALA-10205
> URL: https://issues.apache.org/jira/browse/IMPALA-10205
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 4.0
>Reporter: Wenzhe Zhou
>Assignee: Wenzhe Zhou
>Priority: Major
>  Labels: FIPS
> Fix For: Impala 4.0
>
>
> To support FIPS, all new code have to use FIPS-approved algorithms. Current 
> code generate data path hash in MD5 hash for Iceberg Table. But MD5 is one of 
> forbidden algorithms. We have to replace MD5 with FIPS-approved algorithm, 
> like SHA-512.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-10206) Replace MD5 and SHA1 hash with SHA-512 for Squeasel

2020-10-01 Thread Wenzhe Zhou (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenzhe Zhou reassigned IMPALA-10206:


Assignee: Wenzhe Zhou

> Replace MD5 and SHA1 hash with SHA-512 for Squeasel
> ---
>
> Key: IMPALA-10206
> URL: https://issues.apache.org/jira/browse/IMPALA-10206
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 4.0
>Reporter: Wenzhe Zhou
>Assignee: Wenzhe Zhou
>Priority: Major
>  Labels: FIPS
>
> To support FIPS, we have to use FIPS-approved algorithms. Squeasel use MD5 
> and SHA-1 hash algorithms. But MD5 is one of forbidden algorithms, and SHA-1 
> is soon to be deprecated. We have to replace MD5 and SHA-1 with FIPS-approved 
> algorithms, like SHA-512. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10207) Replace MD5 hash with SHA-512 for lineage graph

2020-10-01 Thread Wenzhe Zhou (Jira)
Wenzhe Zhou created IMPALA-10207:


 Summary: Replace MD5 hash with SHA-512 for lineage graph
 Key: IMPALA-10207
 URL: https://issues.apache.org/jira/browse/IMPALA-10207
 Project: IMPALA
  Issue Type: Improvement
  Components: Frontend
Affects Versions: Impala 4.0
Reporter: Wenzhe Zhou
Assignee: Wenzhe Zhou


To support FIPS, we have to use FIPS-approved algorithms. We use MD5 hash 
algorithms for lineage graph. But MD5 is one of forbidden algorithms. We have 
to replace MD5 with FIPS-approved algorithms, like SHA-512. 

We might need to figure out if there are external dependencies on the hash. 
e.g. the lineage graph was consumed by component services, and maybe they 
somehow rely on the hash being consistent across versions (seems unlikely).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10207) Replace MD5 hash with SHA-512 for lineage graph

2020-10-01 Thread Wenzhe Zhou (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenzhe Zhou updated IMPALA-10207:
-
Description: 
To support FIPS, we have to use FIPS-approved algorithms. We use MD5 hash 
algorithms for lineage graph. But MD5 is one of forbidden algorithms for FIPS. 
We have to replace MD5 with FIPS-approved algorithm, like SHA-512. 

We might need to figure out if there are external dependencies on the hash. 
e.g. the lineage graph was consumed by component services, and maybe they 
somehow rely on the hash being consistent across versions (seems unlikely).

  was:
To support FIPS, we have to use FIPS-approved algorithms. We use MD5 hash 
algorithms for lineage graph. But MD5 is one of forbidden algorithms. We have 
to replace MD5 with FIPS-approved algorithms, like SHA-512. 

We might need to figure out if there are external dependencies on the hash. 
e.g. the lineage graph was consumed by component services, and maybe they 
somehow rely on the hash being consistent across versions (seems unlikely).


> Replace MD5 hash with SHA-512 for lineage graph
> ---
>
> Key: IMPALA-10207
> URL: https://issues.apache.org/jira/browse/IMPALA-10207
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 4.0
>Reporter: Wenzhe Zhou
>Assignee: Wenzhe Zhou
>Priority: Major
>  Labels: FIPS
>
> To support FIPS, we have to use FIPS-approved algorithms. We use MD5 hash 
> algorithms for lineage graph. But MD5 is one of forbidden algorithms for 
> FIPS. We have to replace MD5 with FIPS-approved algorithm, like SHA-512. 
> We might need to figure out if there are external dependencies on the hash. 
> e.g. the lineage graph was consumed by component services, and maybe they 
> somehow rely on the hash being consistent across versions (seems unlikely).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10205) Replace MD5 hash with SHA-512 for data file path of IcebergTable

2020-10-01 Thread Wenzhe Zhou (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenzhe Zhou updated IMPALA-10205:
-
Description: To support FIPS, all new code have to use FIPS-approved 
algorithms. Current code generate data path hash in MD5 hash for Iceberg Table. 
But MD5 is one of forbidden algorithms for FIPS. We have to replace MD5 with 
FIPS-approved algorithm, like SHA-512.    (was: To support FIPS, all new code 
have to use FIPS-approved algorithms. Current code generate data path hash in 
MD5 hash for Iceberg Table. But MD5 is one of forbidden algorithms. We have to 
replace MD5 with FIPS-approved algorithm, like SHA-512.  )

> Replace MD5 hash with SHA-512 for data file path of IcebergTable
> 
>
> Key: IMPALA-10205
> URL: https://issues.apache.org/jira/browse/IMPALA-10205
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 4.0
>Reporter: Wenzhe Zhou
>Assignee: Wenzhe Zhou
>Priority: Major
>  Labels: FIPS
> Fix For: Impala 4.0
>
>
> To support FIPS, all new code have to use FIPS-approved algorithms. Current 
> code generate data path hash in MD5 hash for Iceberg Table. But MD5 is one of 
> forbidden algorithms for FIPS. We have to replace MD5 with FIPS-approved 
> algorithm, like SHA-512.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10206) Replace MD5 and SHA1 hash with SHA-512 for Squeasel

2020-10-01 Thread Wenzhe Zhou (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenzhe Zhou updated IMPALA-10206:
-
Description: To support FIPS, we have to use FIPS-approved algorithms. 
Squeasel use MD5 and SHA-1 hash algorithms. But MD5 is one of forbidden 
algorithms for FIPS, and SHA-1 is soon to be deprecated. We have to replace MD5 
and SHA-1 with FIPS-approved algorithms, like SHA-512.   (was: To support FIPS, 
we have to use FIPS-approved algorithms. Squeasel use MD5 and SHA-1 hash 
algorithms. But MD5 is one of forbidden algorithms, and SHA-1 is soon to be 
deprecated. We have to replace MD5 and SHA-1 with FIPS-approved algorithms, 
like SHA-512. )

> Replace MD5 and SHA1 hash with SHA-512 for Squeasel
> ---
>
> Key: IMPALA-10206
> URL: https://issues.apache.org/jira/browse/IMPALA-10206
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 4.0
>Reporter: Wenzhe Zhou
>Assignee: Wenzhe Zhou
>Priority: Major
>  Labels: FIPS
>
> To support FIPS, we have to use FIPS-approved algorithms. Squeasel use MD5 
> and SHA-1 hash algorithms. But MD5 is one of forbidden algorithms for FIPS, 
> and SHA-1 is soon to be deprecated. We have to replace MD5 and SHA-1 with 
> FIPS-approved algorithms, like SHA-512. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10208) Drop stats doesnt remove impala_intermediate_stats_num_chunks from PARTITION_PARAMS

2020-10-01 Thread Venkat Sambath (Jira)
Venkat Sambath created IMPALA-10208:
---

 Summary: Drop stats doesnt remove 
impala_intermediate_stats_num_chunks from PARTITION_PARAMS
 Key: IMPALA-10208
 URL: https://issues.apache.org/jira/browse/IMPALA-10208
 Project: IMPALA
  Issue Type: Bug
Reporter: Venkat Sambath
 Attachments: image-2020-10-02-10-38-16-153.png, 
image-2020-10-02-10-38-48-144.png, image-2020-10-02-10-39-18-642.png

Steps to replicate the issue:

Step1: 
CREATE TABLE impala_partition_test1 (  
   a INT
   
 )  
   
 PARTITIONED BY (   
   
   b STRING 
   
 ); alter table impala_partition_test1 add partition(b="part1");
 alter table impala_partition_test1 add partition(b="part2");
 alter table impala_partition_test1 add partition(b="part3");
 alter table impala_partition_test1 add partition(b="part4");

Step2: Populating the partitions
for i in `seq 1 10`; do base64 /dev/urandom | head -c 5000K > text_data  && 
hdfs dfs -put text_data 
hdfs://nameservice1/user/hive/warehouse/impala_partition_test1/b=part1/test_${i};
 done
 for i in `seq 1 10`; do base64 /dev/urandom | head -c 5000K > text_data  && 
hdfs dfs -put text_data 
hdfs://nameservice1/user/hive/warehouse/impala_partition_test1/b=part2/test_${i};
 done
 for i in `seq 1 10`; do base64 /dev/urandom | head -c 5000K > text_data  && 
hdfs dfs -put text_data 
hdfs://nameservice1/user/hive/warehouse/impala_partition_test1/b=part3/test_${i};
 done
 for i in `seq 1 10`; do base64 /dev/urandom | head -c 5000K > text_data  && 
hdfs dfs -put text_data 
hdfs://nameservice1/user/hive/warehouse/impala_partition_test1/b=part4/test_${i};
 done

Step3: Run compute incremental stats impala_partition_test1;

Step4: In HMS DB when you run the below query 

{code:java}
select A.TBL_NAME, B.PART_NAME, C.PARAM_KEY, sum(length(C.PARAM_KEY) + 
length(C.PARAM_VALUE)) from TBLS A join PARTITIONS B join PARTITION_PARAMS C on 
A.TBL_ID = B.TBL_ID and C.PART_ID=B.PART_ID and C.PARAM_KEY like 
"%impala_intermediate_stats%" group by A.TBL_NAME,B.PART_NAME,C.PARAM_KEY;  
{code}

You will be noticing
 !image-2020-10-02-10-39-18-642.png! 

Step5: After you drop the stats [drop stats impala_partition_test1 ] you still 
be noticing impala_intermediate_stats_num_chunks left unremoved.

 !image-2020-10-02-10-38-48-144.png! 

When you have million partitions this could contribute to 37mb I suppose. 
Requesting you to remove impala_intermediate_stats_num_chunks while we drop 
stats from table.









--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10208) Drop stats doesnt remove impala_intermediate_stats_num_chunks from PARTITION_PARAMS

2020-10-01 Thread Venkat Sambath (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venkat Sambath updated IMPALA-10208:

Description: 
Steps to replicate the issue:

Step1: 
{code:java}
CREATE TABLE impala_partition_test1 (  
   a INT
   
 )  
   
 PARTITIONED BY (   
   
   b STRING 
   
 ); alter table impala_partition_test1 add partition(b="part1");
 alter table impala_partition_test1 add partition(b="part2");
 alter table impala_partition_test1 add partition(b="part3");
 alter table impala_partition_test1 add partition(b="part4");
{code}


Step2: Populating the partitions
{code:java}
for i in `seq 1 10`; do base64 /dev/urandom | head -c 5000K > text_data  && 
hdfs dfs -put text_data 
hdfs://nameservice1/user/hive/warehouse/impala_partition_test1/b=part1/test_${i};
 done
 for i in `seq 1 10`; do base64 /dev/urandom | head -c 5000K > text_data  && 
hdfs dfs -put text_data 
hdfs://nameservice1/user/hive/warehouse/impala_partition_test1/b=part2/test_${i};
 done
 for i in `seq 1 10`; do base64 /dev/urandom | head -c 5000K > text_data  && 
hdfs dfs -put text_data 
hdfs://nameservice1/user/hive/warehouse/impala_partition_test1/b=part3/test_${i};
 done
 for i in `seq 1 10`; do base64 /dev/urandom | head -c 5000K > text_data  && 
hdfs dfs -put text_data 
hdfs://nameservice1/user/hive/warehouse/impala_partition_test1/b=part4/test_${i};
 done
{code}


Step3: Run compute incremental stats impala_partition_test1;

Step4: In HMS DB when you run the below query 

{code:java}
select A.TBL_NAME, B.PART_NAME, C.PARAM_KEY, sum(length(C.PARAM_KEY) + 
length(C.PARAM_VALUE)) from TBLS A join PARTITIONS B join PARTITION_PARAMS C on 
A.TBL_ID = B.TBL_ID and C.PART_ID=B.PART_ID and C.PARAM_KEY like 
"%impala_intermediate_stats%" group by A.TBL_NAME,B.PART_NAME,C.PARAM_KEY;  
{code}

You will be noticing
 !image-2020-10-02-10-39-18-642.png! 

Step5: After you drop the stats [drop stats impala_partition_test1 ] you still 
be noticing impala_intermediate_stats_num_chunks left unremoved.

 !image-2020-10-02-10-38-48-144.png! 

When you have million partitions this could contribute to 37mb I suppose. 
Requesting you to remove impala_intermediate_stats_num_chunks while we drop 
stats from table.







  was:
Steps to replicate the issue:

Step1: 
CREATE TABLE impala_partition_test1 (  
   a INT
   
 )  
   
 PARTITIONED BY (   
   
   b STRING 
   
 ); alter table impala_partition_test1 add partition(b="part1");
 alter table impala_partition_test1 add partition(b="part2");
 alter table impala_partition_test1 add partition(b="part3");
 alter table impala_partition_test1 add partition(b="part4");

Step2: Populating the partitions
for i in `seq 1 10`; do base64 /dev/urandom | head -c 5000K > text_data  && 
hdfs dfs -put text_data 
hdfs://nameservice1/user/hive/warehouse/impala_partition_test1/b=part1/test_${i};
 done
 for i in `seq 1 10`; do base64 /dev/urandom | head -c 5000K > text_data  && 
hdfs dfs -put text_data 
hdfs://nameservice1/user/hive/warehouse/impala_partition_test1/b=part2/test_${i};
 done
 for i in `seq 1 10`; do base64 /dev/urandom | head -c 5000K > text_data  && 
hdfs dfs -put text_data 
hdfs://nameservice1/user/hive/warehouse/impala_partition_test1/b=part3/test_${i};
 done
 for i in `seq 1 10`; do base64 /dev/urandom | head -c 5000K > text_data  && 
hdfs dfs -put text_data 
hdfs://nameservice1/user/hive/warehouse/impala_partition_test1/b=part4/test_${i};
 done

Step3: Run compute incremental stats impala_partition_test1;

Step4: In HMS DB when you run the below query 

{code:java}
select A.TBL_NAME, B.PART_NAME, C.PARAM_KEY, sum(length(C.PARAM_KEY) + 
length(C.PARAM_VALUE)) from TBLS A join PARTITIONS B join PARTITION_PARAMS C on 
A.TBL_ID = B.TBL_ID and C.PART_ID=B.PART_ID and C.PARAM_KEY like 
"%impala_intermediate_stats%" group by A.TBL_NAME,B.PART_NAME,C.PARAM_KEY;  
{code}

You will be noticing
 !image-2020-10-02-10-39-18-642.png! 

Step5: After you drop the stats [drop stats impala_partition_test1 ] you still 
be noticing impala_intermediate_stats_num_chunks left unremoved.

 !image-2020-10-02-10-38-48-144.png! 

When you have million partitions this could contribute to 37mb I suppose. 
Requesting you to remove impala_intermediate_stats_num_chunks while we drop 
st

[jira] [Updated] (IMPALA-10208) Drop stats doesnt remove impala_intermediate_stats_num_chunks from PARTITION_PARAMS

2020-10-01 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-10208:
---
Component/s: Catalog

> Drop stats doesnt remove impala_intermediate_stats_num_chunks from 
> PARTITION_PARAMS
> ---
>
> Key: IMPALA-10208
> URL: https://issues.apache.org/jira/browse/IMPALA-10208
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Venkat Sambath
>Priority: Minor
>  Labels: newbie, ramp-up
> Attachments: image-2020-10-02-10-38-16-153.png, 
> image-2020-10-02-10-38-48-144.png, image-2020-10-02-10-39-18-642.png
>
>
> Steps to replicate the issue:
> Step1: 
> {code:java}
> CREATE TABLE impala_partition_test1 ( 
>  
>a INT  
>  
>  )
>  
>  PARTITIONED BY ( 
>  
>b STRING   
>  
>  ); alter table impala_partition_test1 add partition(b="part1");
>  alter table impala_partition_test1 add partition(b="part2");
>  alter table impala_partition_test1 add partition(b="part3");
>  alter table impala_partition_test1 add partition(b="part4");
> {code}
> Step2: Populating the partitions
> {code:java}
> for i in `seq 1 10`; do base64 /dev/urandom | head -c 5000K > text_data  && 
> hdfs dfs -put text_data 
> hdfs://nameservice1/user/hive/warehouse/impala_partition_test1/b=part1/test_${i};
>  done
>  for i in `seq 1 10`; do base64 /dev/urandom | head -c 5000K > text_data  && 
> hdfs dfs -put text_data 
> hdfs://nameservice1/user/hive/warehouse/impala_partition_test1/b=part2/test_${i};
>  done
>  for i in `seq 1 10`; do base64 /dev/urandom | head -c 5000K > text_data  && 
> hdfs dfs -put text_data 
> hdfs://nameservice1/user/hive/warehouse/impala_partition_test1/b=part3/test_${i};
>  done
>  for i in `seq 1 10`; do base64 /dev/urandom | head -c 5000K > text_data  && 
> hdfs dfs -put text_data 
> hdfs://nameservice1/user/hive/warehouse/impala_partition_test1/b=part4/test_${i};
>  done
> {code}
> Step3: Run compute incremental stats impala_partition_test1;
> Step4: In HMS DB when you run the below query 
> {code:java}
> select A.TBL_NAME, B.PART_NAME, C.PARAM_KEY, sum(length(C.PARAM_KEY) + 
> length(C.PARAM_VALUE)) from TBLS A join PARTITIONS B join PARTITION_PARAMS C 
> on A.TBL_ID = B.TBL_ID and C.PART_ID=B.PART_ID and C.PARAM_KEY like 
> "%impala_intermediate_stats%" group by A.TBL_NAME,B.PART_NAME,C.PARAM_KEY;
> {code}
> You will be noticing
>  !image-2020-10-02-10-39-18-642.png! 
> Step5: After you drop the stats [drop stats impala_partition_test1 ] you 
> still be noticing impala_intermediate_stats_num_chunks left unremoved.
>  !image-2020-10-02-10-38-48-144.png! 
> When you have million partitions this could contribute to 37mb I suppose. 
> Requesting you to remove impala_intermediate_stats_num_chunks while we drop 
> stats from table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10208) Drop stats doesnt remove impala_intermediate_stats_num_chunks from PARTITION_PARAMS

2020-10-01 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-10208:
---
Labels: newbie ramp-up  (was: )

> Drop stats doesnt remove impala_intermediate_stats_num_chunks from 
> PARTITION_PARAMS
> ---
>
> Key: IMPALA-10208
> URL: https://issues.apache.org/jira/browse/IMPALA-10208
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Venkat Sambath
>Priority: Minor
>  Labels: newbie, ramp-up
> Attachments: image-2020-10-02-10-38-16-153.png, 
> image-2020-10-02-10-38-48-144.png, image-2020-10-02-10-39-18-642.png
>
>
> Steps to replicate the issue:
> Step1: 
> {code:java}
> CREATE TABLE impala_partition_test1 ( 
>  
>a INT  
>  
>  )
>  
>  PARTITIONED BY ( 
>  
>b STRING   
>  
>  ); alter table impala_partition_test1 add partition(b="part1");
>  alter table impala_partition_test1 add partition(b="part2");
>  alter table impala_partition_test1 add partition(b="part3");
>  alter table impala_partition_test1 add partition(b="part4");
> {code}
> Step2: Populating the partitions
> {code:java}
> for i in `seq 1 10`; do base64 /dev/urandom | head -c 5000K > text_data  && 
> hdfs dfs -put text_data 
> hdfs://nameservice1/user/hive/warehouse/impala_partition_test1/b=part1/test_${i};
>  done
>  for i in `seq 1 10`; do base64 /dev/urandom | head -c 5000K > text_data  && 
> hdfs dfs -put text_data 
> hdfs://nameservice1/user/hive/warehouse/impala_partition_test1/b=part2/test_${i};
>  done
>  for i in `seq 1 10`; do base64 /dev/urandom | head -c 5000K > text_data  && 
> hdfs dfs -put text_data 
> hdfs://nameservice1/user/hive/warehouse/impala_partition_test1/b=part3/test_${i};
>  done
>  for i in `seq 1 10`; do base64 /dev/urandom | head -c 5000K > text_data  && 
> hdfs dfs -put text_data 
> hdfs://nameservice1/user/hive/warehouse/impala_partition_test1/b=part4/test_${i};
>  done
> {code}
> Step3: Run compute incremental stats impala_partition_test1;
> Step4: In HMS DB when you run the below query 
> {code:java}
> select A.TBL_NAME, B.PART_NAME, C.PARAM_KEY, sum(length(C.PARAM_KEY) + 
> length(C.PARAM_VALUE)) from TBLS A join PARTITIONS B join PARTITION_PARAMS C 
> on A.TBL_ID = B.TBL_ID and C.PART_ID=B.PART_ID and C.PARAM_KEY like 
> "%impala_intermediate_stats%" group by A.TBL_NAME,B.PART_NAME,C.PARAM_KEY;
> {code}
> You will be noticing
>  !image-2020-10-02-10-39-18-642.png! 
> Step5: After you drop the stats [drop stats impala_partition_test1 ] you 
> still be noticing impala_intermediate_stats_num_chunks left unremoved.
>  !image-2020-10-02-10-38-48-144.png! 
> When you have million partitions this could contribute to 37mb I suppose. 
> Requesting you to remove impala_intermediate_stats_num_chunks while we drop 
> stats from table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10208) Drop stats doesnt remove impala_intermediate_stats_num_chunks from PARTITION_PARAMS

2020-10-01 Thread Tim Armstrong (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205985#comment-17205985
 ] 

Tim Armstrong commented on IMPALA-10208:


Thanks for the bug report! This definitely looks like a bug but the severity 
doesn't seem too bad given the per-partition overhead is not too high. I 
labelled it as a newbie/ramp-up because it would be ideal for someone new to 
the project to pick up.

> Drop stats doesnt remove impala_intermediate_stats_num_chunks from 
> PARTITION_PARAMS
> ---
>
> Key: IMPALA-10208
> URL: https://issues.apache.org/jira/browse/IMPALA-10208
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Venkat Sambath
>Priority: Minor
>  Labels: newbie, ramp-up
> Attachments: image-2020-10-02-10-38-16-153.png, 
> image-2020-10-02-10-38-48-144.png, image-2020-10-02-10-39-18-642.png
>
>
> Steps to replicate the issue:
> Step1: 
> {code:java}
> CREATE TABLE impala_partition_test1 ( 
>  
>a INT  
>  
>  )
>  
>  PARTITIONED BY ( 
>  
>b STRING   
>  
>  ); alter table impala_partition_test1 add partition(b="part1");
>  alter table impala_partition_test1 add partition(b="part2");
>  alter table impala_partition_test1 add partition(b="part3");
>  alter table impala_partition_test1 add partition(b="part4");
> {code}
> Step2: Populating the partitions
> {code:java}
> for i in `seq 1 10`; do base64 /dev/urandom | head -c 5000K > text_data  && 
> hdfs dfs -put text_data 
> hdfs://nameservice1/user/hive/warehouse/impala_partition_test1/b=part1/test_${i};
>  done
>  for i in `seq 1 10`; do base64 /dev/urandom | head -c 5000K > text_data  && 
> hdfs dfs -put text_data 
> hdfs://nameservice1/user/hive/warehouse/impala_partition_test1/b=part2/test_${i};
>  done
>  for i in `seq 1 10`; do base64 /dev/urandom | head -c 5000K > text_data  && 
> hdfs dfs -put text_data 
> hdfs://nameservice1/user/hive/warehouse/impala_partition_test1/b=part3/test_${i};
>  done
>  for i in `seq 1 10`; do base64 /dev/urandom | head -c 5000K > text_data  && 
> hdfs dfs -put text_data 
> hdfs://nameservice1/user/hive/warehouse/impala_partition_test1/b=part4/test_${i};
>  done
> {code}
> Step3: Run compute incremental stats impala_partition_test1;
> Step4: In HMS DB when you run the below query 
> {code:java}
> select A.TBL_NAME, B.PART_NAME, C.PARAM_KEY, sum(length(C.PARAM_KEY) + 
> length(C.PARAM_VALUE)) from TBLS A join PARTITIONS B join PARTITION_PARAMS C 
> on A.TBL_ID = B.TBL_ID and C.PART_ID=B.PART_ID and C.PARAM_KEY like 
> "%impala_intermediate_stats%" group by A.TBL_NAME,B.PART_NAME,C.PARAM_KEY;
> {code}
> You will be noticing
>  !image-2020-10-02-10-39-18-642.png! 
> Step5: After you drop the stats [drop stats impala_partition_test1 ] you 
> still be noticing impala_intermediate_stats_num_chunks left unremoved.
>  !image-2020-10-02-10-38-48-144.png! 
> When you have million partitions this could contribute to 37mb I suppose. 
> Requesting you to remove impala_intermediate_stats_num_chunks while we drop 
> stats from table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org