[jira] [Created] (IMPALA-10406) Query with analytic function doesn't need to materialize the predicate pushed down to kudu

2020-12-23 Thread Xianqing He (Jira)
Xianqing He created IMPALA-10406:


 Summary: Query with analytic function doesn't need to materialize 
the predicate pushed down to kudu
 Key: IMPALA-10406
 URL: https://issues.apache.org/jira/browse/IMPALA-10406
 Project: IMPALA
  Issue Type: Improvement
  Components: Frontend
Affects Versions: Impala 4.0
Reporter: Xianqing He
Assignee: Xianqing He
 Fix For: Impala 4.0


The query with analytic function doesn't need to materialize the predicate 
pushed down to kudu.

E.g.

 
{code:java}
select min(n_nationkey) over(partition by n_regionkey) from tpch_kudu.nation t1 
where t1.n_name in ('ALGERIA', 'ARGENTINA');
{code}
The plan

 

 
{code:java}
PLAN-ROOT SINK
|
02:ANALYTIC
|  functions: min(n_nationkey)
|  partition by: n_regionkey
|  row-size=25B cardinality=2
|
01:SORT
|  order by: n_regionkey ASC NULLS LAST
|  row-size=23B cardinality=2
|
00:SCAN KUDU [tpch_kudu.nation t1]
   kudu predicates: t1.n_name IN ('ALGERIA', 'ARGENTINA')
   row-size=27B cardinality=2
{code}
We don't need to materialize the slot 'n_name'.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10406) Query with analytic function doesn't need to materialize the predicate pushed down to kudu

2020-12-23 Thread Xianqing He (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianqing He updated IMPALA-10406:
-
Description: 
The query with analytic function doesn't need to materialize the predicate 
pushed down to kudu.

E.g.

 
{code:java}
select min(n_nationkey) over(partition by n_regionkey) from tpch_kudu.nation t1 
where t1.n_name in ('ALGERIA', 'ARGENTINA');
{code}
The plan
{code:java}
PLAN-ROOT SINK
|
02:ANALYTIC
|  functions: min(n_nationkey)
|  partition by: n_regionkey
|  row-size=25B cardinality=2
|
01:SORT
|  order by: n_regionkey ASC NULLS LAST
|  row-size=23B cardinality=2
|
00:SCAN KUDU [tpch_kudu.nation t1]
   kudu predicates: t1.n_name IN ('ALGERIA', 'ARGENTINA')
   row-size=27B cardinality=2
{code}
We don't need to materialize the slot 'n_name'.

 

  was:
The query with analytic function doesn't need to materialize the predicate 
pushed down to kudu.

E.g.

 
{code:java}
select min(n_nationkey) over(partition by n_regionkey) from tpch_kudu.nation t1 
where t1.n_name in ('ALGERIA', 'ARGENTINA');
{code}
The plan

 

 
{code:java}
PLAN-ROOT SINK
|
02:ANALYTIC
|  functions: min(n_nationkey)
|  partition by: n_regionkey
|  row-size=25B cardinality=2
|
01:SORT
|  order by: n_regionkey ASC NULLS LAST
|  row-size=23B cardinality=2
|
00:SCAN KUDU [tpch_kudu.nation t1]
   kudu predicates: t1.n_name IN ('ALGERIA', 'ARGENTINA')
   row-size=27B cardinality=2
{code}
We don't need to materialize the slot 'n_name'.

 


> Query with analytic function doesn't need to materialize the predicate pushed 
> down to kudu
> --
>
> Key: IMPALA-10406
> URL: https://issues.apache.org/jira/browse/IMPALA-10406
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 4.0
>Reporter: Xianqing He
>Assignee: Xianqing He
>Priority: Minor
> Fix For: Impala 4.0
>
>
> The query with analytic function doesn't need to materialize the predicate 
> pushed down to kudu.
> E.g.
>  
> {code:java}
> select min(n_nationkey) over(partition by n_regionkey) from tpch_kudu.nation 
> t1 where t1.n_name in ('ALGERIA', 'ARGENTINA');
> {code}
> The plan
> {code:java}
> PLAN-ROOT SINK
> |
> 02:ANALYTIC
> |  functions: min(n_nationkey)
> |  partition by: n_regionkey
> |  row-size=25B cardinality=2
> |
> 01:SORT
> |  order by: n_regionkey ASC NULLS LAST
> |  row-size=23B cardinality=2
> |
> 00:SCAN KUDU [tpch_kudu.nation t1]
>kudu predicates: t1.n_name IN ('ALGERIA', 'ARGENTINA')
>row-size=27B cardinality=2
> {code}
> We don't need to materialize the slot 'n_name'.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10406) Query with analytic function doesn't need to materialize the predicate pushed down to kudu

2020-12-23 Thread Xianqing He (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianqing He updated IMPALA-10406:
-
Description: 
The query with analytic function doesn't need to materialize the predicate 
pushed down to kudu.

E.g.
{code:java}
select min(n_nationkey) over(partition by n_regionkey) from tpch_kudu.nation t1 
where t1.n_name in ('ALGERIA', 'ARGENTINA');
{code}
The plan
{code:java}
PLAN-ROOT SINK
|
02:ANALYTIC
|  functions: min(n_nationkey)
|  partition by: n_regionkey
|  row-size=25B cardinality=2
|
01:SORT
|  order by: n_regionkey ASC NULLS LAST
|  row-size=23B cardinality=2
|
00:SCAN KUDU [tpch_kudu.nation t1]
   kudu predicates: t1.n_name IN ('ALGERIA', 'ARGENTINA')
   row-size=27B cardinality=2
{code}
We don't need to materialize the slot 'n_name'.

 

  was:
The query with analytic function doesn't need to materialize the predicate 
pushed down to kudu.

E.g.

 
{code:java}
select min(n_nationkey) over(partition by n_regionkey) from tpch_kudu.nation t1 
where t1.n_name in ('ALGERIA', 'ARGENTINA');
{code}
The plan
{code:java}
PLAN-ROOT SINK
|
02:ANALYTIC
|  functions: min(n_nationkey)
|  partition by: n_regionkey
|  row-size=25B cardinality=2
|
01:SORT
|  order by: n_regionkey ASC NULLS LAST
|  row-size=23B cardinality=2
|
00:SCAN KUDU [tpch_kudu.nation t1]
   kudu predicates: t1.n_name IN ('ALGERIA', 'ARGENTINA')
   row-size=27B cardinality=2
{code}
We don't need to materialize the slot 'n_name'.

 


> Query with analytic function doesn't need to materialize the predicate pushed 
> down to kudu
> --
>
> Key: IMPALA-10406
> URL: https://issues.apache.org/jira/browse/IMPALA-10406
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 4.0
>Reporter: Xianqing He
>Assignee: Xianqing He
>Priority: Minor
> Fix For: Impala 4.0
>
>
> The query with analytic function doesn't need to materialize the predicate 
> pushed down to kudu.
> E.g.
> {code:java}
> select min(n_nationkey) over(partition by n_regionkey) from tpch_kudu.nation 
> t1 where t1.n_name in ('ALGERIA', 'ARGENTINA');
> {code}
> The plan
> {code:java}
> PLAN-ROOT SINK
> |
> 02:ANALYTIC
> |  functions: min(n_nationkey)
> |  partition by: n_regionkey
> |  row-size=25B cardinality=2
> |
> 01:SORT
> |  order by: n_regionkey ASC NULLS LAST
> |  row-size=23B cardinality=2
> |
> 00:SCAN KUDU [tpch_kudu.nation t1]
>kudu predicates: t1.n_name IN ('ALGERIA', 'ARGENTINA')
>row-size=27B cardinality=2
> {code}
> We don't need to materialize the slot 'n_name'.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-3898) Ensure LZO is not required for test

2020-12-23 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-3898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-3898.
---
Fix Version/s: Not Applicable
   Resolution: Fixed

LZO support was removed in Impala 4.

> Ensure LZO is not required for test
> ---
>
> Key: IMPALA-3898
> URL: https://issues.apache.org/jira/browse/IMPALA-3898
> Project: IMPALA
>  Issue Type: Task
>  Components: Infrastructure
>Affects Versions: Impala 2.7.0
>Reporter: Jim Apple
>Assignee: David Knupp
>Priority: Minor
>  Labels: asf
> Fix For: Not Applicable
>
>
> https://github.com/cloudera/impala-lzo/blob/cdh5-trunk/lzo-header.h is GPL2 
> (or higher). Impala is Apache2. We should ensure that Impala builds and its 
> tests pass even if LZO is not available.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10259) Hit DCHECK in TestImpalaShell.test_completed_query_errors_2

2020-12-23 Thread Wenzhe Zhou (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17254356#comment-17254356
 ] 

Wenzhe Zhou commented on IMPALA-10259:
--

The issue could be reproduced by adding some artificial delays (sleep for 3 
seconds) in the beginning of QueryState::ErrorDuringExecute() when repeatedly 
running test case 
test_spilling.py::TestSpillingDebugActionDimensions::test_spilling_naaj for 
Impala built with -asan option.

Shell script for running unit-test:

#!/bin/bash

set -euo pipefail

for iter in `seq 1 100`; do
   ${IMPALA_HOME}/bin/impala-py.test 
tests/query_test/test_spilling.py::TestSpillingDebugActionDimensions::test_spilling_naaj
done

> Hit DCHECK in TestImpalaShell.test_completed_query_errors_2
> ---
>
> Key: IMPALA-10259
> URL: https://issues.apache.org/jira/browse/IMPALA-10259
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Quanlong Huang
>Assignee: Wenzhe Zhou
>Priority: Blocker
>  Labels: broken-build, crash
>
> TestImpalaShell.test_completed_query_errors_2 hits a DCHECK in a core ASAN 
> build:
> {code:java}
> F1016 17:08:54.728466 19955 query-state.cc:877] 
> 924f4ce603ac07bb:a08656e3] Check failed: is_cancelled_.Load() == 1 (0 
> vs. 1)  {code}
> The test is:
> {code:java}
> shell.test_shell_commandline.TestImpalaShell.test_completed_query_errors_2[table_format_and_file_extension:
>  ('textfile', '.txt') | protocol: hs2] {code}
> The query is:
> {code:java}
> I1016 17:08:49.026532 19947 Frontend.java:1522] 
> 924f4ce603ac07bb:a08656e3] Analyzing query: select id, cnt from 
> functional_parquet.bad_column_metadata t, (select 1 cnt) u db: default {code}
> Query options:
> {code:java}
> I1016 17:08:49.020670 19947 impala-hs2-server.cc:269] 
> TClientRequest.queryOptions: TQueryOptions {
>   01: abort_on_error (bool) = true,
>   02: max_errors (i32) = 100,
>   03: disable_codegen (bool) = false,
>   04: batch_size (i32) = 0,
>   05: num_nodes (i32) = 0,
>   06: max_scan_range_length (i64) = 0,
>   07: num_scanner_threads (i32) = 0,
>   11: debug_action (string) = "",
>   12: mem_limit (i64) = 0,
>   15: hbase_caching (i32) = 0,
>   16: hbase_cache_blocks (bool) = false,
>   17: parquet_file_size (i64) = 0,
>   18: explain_level (i32) = 1,
>   19: sync_ddl (bool) = false,
>   24: disable_outermost_topn (bool) = false,
>   26: query_timeout_s (i32) = 0,
>   28: appx_count_distinct (bool) = false,
>   29: disable_unsafe_spills (bool) = false,
>   31: exec_single_node_rows_threshold (i32) = 100,
>   32: optimize_partition_key_scans (bool) = false,
>   33: replica_preference (i32) = 0,
>   34: schedule_random_replica (bool) = false,
>   36: disable_streaming_preaggregations (bool) = false,
>   37: runtime_filter_mode (i32) = 2,
>   38: runtime_bloom_filter_size (i32) = 1048576,
>   39: runtime_filter_wait_time_ms (i32) = 0,
>   40: disable_row_runtime_filtering (bool) = false,
>   41: max_num_runtime_filters (i32) = 10,
>   42: parquet_annotate_strings_utf8 (bool) = false,
>   43: parquet_fallback_schema_resolution (i32) = 0,
>   45: s3_skip_insert_staging (bool) = true,
>   46: runtime_filter_min_size (i32) = 1048576,
>   47: runtime_filter_max_size (i32) = 16777216,
>   48: prefetch_mode (i32) = 1,
>   49: strict_mode (bool) = false,
>   50: scratch_limit (i64) = -1,
>   51: enable_expr_rewrites (bool) = true,
>   52: decimal_v2 (bool) = true,
>   53: parquet_dictionary_filtering (bool) = true,
>   54: parquet_array_resolution (i32) = 0,
>   55: parquet_read_statistics (bool) = true,
>   56: default_join_distribution_mode (i32) = 0,
>   57: disable_codegen_rows_threshold (i32) = 5,
>   58: default_spillable_buffer_size (i64) = 2097152,
>   59: min_spillable_buffer_size (i64) = 65536,
>   60: max_row_size (i64) = 524288,
>   61: idle_session_timeout (i32) = 0,
>   62: compute_stats_min_sample_size (i64) = 1073741824,
>   63: exec_time_limit_s (i32) = 0,
>   64: shuffle_distinct_exprs (bool) = true,
>   65: max_mem_estimate_for_admission (i64) = 0,
>   66: thread_reservation_limit (i32) = 3000,
>   67: thread_reservation_aggregate_limit (i32) = 0,
>   68: kudu_read_mode (i32) = 0,
>   69: allow_erasure_coded_files (bool) = false,
>   70: timezone (string) = "",
>   71: scan_bytes_limit (i64) = 0,
>   72: cpu_limit_s (i64) = 0,
>   73: topn_bytes_limit (i64) = 536870912,
>   74: client_identifier (string) = "Impala Shell v4.0.0-SNAPSHOT (1e30eec) 
> built on Fri Oct 16 13:26:18 PDT 2020",
>   75: resource_trace_ratio (double) = 0,
>   76: num_remote_executor_candidates (i32) = 3,
>   77: num_rows_produced_limit (i64) = 0,
>   78: planner_testcase_mode (bool) = false,
>   79: default_file_format (i32) = 0,
>   80: parquet_timestamp_type (i32) = 0,
>   81: parquet_read_page_index 

[jira] [Resolved] (IMPALA-4186) Extra characters in web site header

2020-12-23 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-4186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-4186.
---
Fix Version/s: Not Applicable
   Resolution: Fixed

> Extra characters in web site header
> ---
>
> Key: IMPALA-4186
> URL: https://issues.apache.org/jira/browse/IMPALA-4186
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 2.6.0
>Reporter: John Russell
>Priority: Trivial
> Fix For: Not Applicable
>
>
> The HTML page source at:
> http://impala.apache.org/
> has a metatag with some spurious characters after the project name:
> {code}
> 
> {code}
> I saw the "gg" appear in a snippet shown in Google search results for "impala 
> sql".
> I don't know enough about how the website is put together to fix it myself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-4203) Tests should run in a pipeline, with each stage dependent upon the success of the former stage.

2020-12-23 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-4203.
---
Fix Version/s: Not Applicable
   Resolution: Won't Fix

> Tests should run in a pipeline, with each stage dependent upon the success of 
> the former stage.
> ---
>
> Key: IMPALA-4203
> URL: https://issues.apache.org/jira/browse/IMPALA-4203
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 2.7.0
>Reporter: David Knupp
>Priority: Major
> Fix For: Not Applicable
>
>
> Currently, our test execution proceeds in the order of:
> * backend tests
> * frontend tests
> * system, or end-to-end tests
> If any of these logical stages produces a test failure, the Jenkins job 
> should stop. However, we seem to continue to execute subsequent stages even 
> when previous stages have failed. At the very least, we know that end-to-end 
> tests run, even after BE tests have failed.
> It's a basic tenet of software testing that if unit tests fail, those need to 
> be triaged and fixed before any further testing can be done.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-4375) Confirm the specific HDFS permissions for loading data to a remote cluster.

2020-12-23 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-4375.
---
Fix Version/s: Not Applicable
   Resolution: Later

> Confirm the specific HDFS permissions for loading data to a remote cluster.
> ---
>
> Key: IMPALA-4375
> URL: https://issues.apache.org/jira/browse/IMPALA-4375
> Project: IMPALA
>  Issue Type: Task
>  Components: Infrastructure
>Affects Versions: Impala 2.8.0
>Reporter: David Knupp
>Priority: Major
>  Labels: remote_cluster_test
> Fix For: Not Applicable
>
>
> We need to confirm the exact permissions required for loading data to a 
> remote cluster, and if we make any changes during the data load phase, we 
> need to return them to their defaults so as not to affect the later running 
> of any tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-4367) Refactor remote_data_load.py to make the various bits of functionality accessible from QAINFRA code, e.g., quasar.

2020-12-23 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-4367.
---
Fix Version/s: Not Applicable
   Resolution: Invalid

> Refactor remote_data_load.py to make the various bits of functionality 
> accessible from QAINFRA code, e.g., quasar.
> --
>
> Key: IMPALA-4367
> URL: https://issues.apache.org/jira/browse/IMPALA-4367
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 2.8.0
>Reporter: David Knupp
>Priority: Major
>  Labels: remote_cluster_test
> Fix For: Not Applicable
>
>
> Move common functionality -- such as for downloading client configs -- from 
> {{remote_data_load.py}} to someplace where it can be more easily shared by 
> other parts of the QA infrastructure code, e.g. tests.common, and refactor 
> according to comments in https://gerrit.cloudera.org/#/c/4769.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-4376) Refactor the actual running of tests out of the remote_data_load.py script.

2020-12-23 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-4376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-4376.
---
Fix Version/s: Not Applicable
   Resolution: Invalid

> Refactor the actual running of tests out of the remote_data_load.py script.
> ---
>
> Key: IMPALA-4376
> URL: https://issues.apache.org/jira/browse/IMPALA-4376
> Project: IMPALA
>  Issue Type: Task
>  Components: Infrastructure
>Affects Versions: Impala 2.8.0
>Reporter: David Knupp
>Priority: Major
>  Labels: remote_cluster_test
> Fix For: Not Applicable
>
>
> The remote_data_load.py script includes a function for running tests, but 
> this seems misplaced. If running remote cluster tests relies on the ephemeral 
> environment changes made during the data load phases, we should find another 
> way to persist/propagate those changes to py.test.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-4399) create-load-data.sh has bitrotted to some extent, and needs to be cleaned up

2020-12-23 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-4399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-4399.
---
Fix Version/s: Not Applicable
   Resolution: Cannot Reproduce

> create-load-data.sh has bitrotted to some extent, and needs to be cleaned up
> 
>
> Key: IMPALA-4399
> URL: https://issues.apache.org/jira/browse/IMPALA-4399
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 2.7.0
>Reporter: David Knupp
>Priority: Major
> Fix For: Not Applicable
>
>
> {{create-load-data.sh}} needs to be refactored. Some things that were noticed 
> recently when working on getting remote cluster testing set up:
> {{-exploration_strategy}} is an apparent parameter one can pass in, but it 
> has no effect.
> Trying to skip the snapshot load produces this error:
> {noformat}
> $ ./testdata/bin/create-load-data.sh -skip_snapshot_load 
> -exploration_strategy core
> Executing: create-load-data.sh -skip_snapshot_load -exploration_strategy core
> Loading Hive Builtins (logging to 
> /home/dknupp/Impala/logs/data_loading/load-hive-builtins.log)... OK
> Generating HBase data (logging to 
> /home/dknupp/Impala/logs/data_loading/create-hbase.log)... OK
> Creating /test-warehouse HDFS directory (logging to 
> /home/dknupp/Impala/logs/data_loading/create-test-warehouse-dir.log)... FAILED
> 'hadoop fs -mkdir /test-warehouse' failed. Tail of log:
> Log for command 'hadoop fs -mkdir /test-warehouse'
> mkdir: `/test-warehouse': File exists
> Error in ./testdata/bin/create-load-data.sh at line 46: SKIP_METADATA_LOAD=0
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-4398) Impala-Kudu upgrade test

2020-12-23 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-4398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-4398.
---
Fix Version/s: Not Applicable
   Resolution: Won't Fix

> Impala-Kudu upgrade test
> 
>
> Key: IMPALA-4398
> URL: https://issues.apache.org/jira/browse/IMPALA-4398
> Project: IMPALA
>  Issue Type: Task
>  Components: Infrastructure
>Affects Versions: Kudu_Impala
>Reporter: Matthew Jacobs
>Priority: Major
>  Labels: kudu, test, usability
> Fix For: Not Applicable
>
>
> Given all of the changes to Kudu tables in 2.8. We should have tests that 
> ensure that Kudu tables created before Impala 2.8 will continue to work after.
> We should at least test both internal and external tables can be read after 
> an upgrade.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-4405) Make it possible to do backwards/forwards compatibility testing of client binaries against a remote cluster

2020-12-23 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-4405.
---
Fix Version/s: Not Applicable
   Resolution: Won't Fix

> Make it possible to do backwards/forwards compatibility testing of client 
> binaries against a remote cluster
> ---
>
> Key: IMPALA-4405
> URL: https://issues.apache.org/jira/browse/IMPALA-4405
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 2.8.0
>Reporter: David Knupp
>Priority: Major
> Fix For: Not Applicable
>
>
> Remote cluster testing currently is set up to run with whatever client 
> binaries are installed on the client/test runner machine. It would be useful 
> to be able to quickly test with various version of client binaries (by which 
> is meant beeline, hbase shell, impala-shell, jdbc client, hadoop cmd line, 
> and so on.)
> Options for getting different client binaries include:
> * Pull client binaries from an S3 bucket
> * Pull the client binaries from a repo directly
> * Add a feature to the Quasar test framework to run commands remotely. E.g. 
> if we ran impala-shell on a remote host (via ssh), then we could test that 
> impala shell worked OK on different platforms more easily, and then the local 
> hadoop config woudn't be necessary.  This may be too slow for the bulk of 
> tests, but some compatibility tests could be run this way.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-4482) Loading Impala's test data produces different TPCDS data on local vs. remote clusters for partitioned tables

2020-12-23 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-4482.
---
Fix Version/s: Not Applicable
   Resolution: Later

> Loading Impala's test data produces different TPCDS data on local vs. remote 
> clusters for partitioned tables
> 
>
> Key: IMPALA-4482
> URL: https://issues.apache.org/jira/browse/IMPALA-4482
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 2.8.0
>Reporter: David Knupp
>Assignee: David Knupp
>Priority: Major
>  Labels: remote_cluster_test, test-infra
> Fix For: Not Applicable
>
>
> For example:
> *Local mini-cluster*
> {noformat}
> [localhost:21000] > show table stats store_sales;
> Query: show table stats store_sales
> +-+++--+--+---++---+-+
> | ss_sold_date_sk | #Rows  | #Files | Size | Bytes Cached | Cache 
> Replication | Format | Incremental stats | Location   
>  |
> +-+++--+--+---++---+-+
> | 2450829 | 1071   | 1  | 127.20KB | NOT CACHED   | NOT CACHED
> | TEXT   | false | 
> hdfs://localhost:20500/test-warehouse/tpcds.store_sales/ss_sold_date_sk=2450829
>  |
> | 2450846 | 839| 1  | 99.89KB  | NOT CACHED   | NOT CACHED
> | TEXT   | false | 
> hdfs://localhost:20500/test-warehouse/tpcds.store_sales/ss_sold_date_sk=2450846
>  |
> | 2450860 | 747| 1  | 88.97KB  | NOT CACHED   | NOT CACHED
> | TEXT   | false | 
> hdfs://localhost:20500/test-warehouse/tpcds.store_sales/ss_sold_date_sk=2450860
>  |
> | 2450874 | 922| 1  | 109.33KB | NOT CACHED   | NOT CACHED
> | TEXT   | false | 
> hdfs://localhost:20500/test-warehouse/tpcds.store_sales/ss_sold_date_sk=2450874
>  |
> | 2450888 | 856| 1  | 102.36KB | NOT CACHED   | NOT CACHED
> | TEXT   | false | 
> hdfs://localhost:20500/test-warehouse/tpcds.store_sales/ss_sold_date_sk=2450888
>  |
> | 2450905 | 969| 1  | 115.28KB | NOT CACHED   | NOT CACHED
> | TEXT   | false | 
> hdfs://localhost:20500/test-warehouse/tpcds.store_sales/ss_sold_date_sk=2450905
>  |
> [...]
> | Total   | 183592 | 120| 21.31MB  | 0B   |   
> ||   |
>  |
> {noformat}
> *Same table, remote cluster*
> {noformat}
> [impala-new-test-cluster-4.gce.cloudera.com:21000] > show table stats 
> store_sales;
> Query: show table stats store_sales
> +-+++--+--+---++---+-+
> | ss_sold_date_sk | #Rows  | #Files | Size | Bytes Cached | Cache 
> Replication | Format | Incremental stats | Location   
>  |
> +-+++--+--+---++---+-+
> | 2450829 | 2142   | 2  | 254.39KB | NOT CACHED   | NOT CACHED
> | TEXT   | false | 
> hdfs://impala-new-test-cluster-1.gce.cloudera.com:8020/test-warehouse/tpcds.store_sales/ss_sold_date_sk=2450829
>  |
> | 2450846 | 1678   | 2  | 199.79KB | NOT CACHED   | NOT CACHED
> | TEXT   | false | 
> hdfs://impala-new-test-cluster-1.gce.cloudera.com:8020/test-warehouse/tpcds.store_sales/ss_sold_date_sk=2450846
>  |
> | 2450860 | 1494   | 2  | 177.94KB | NOT CACHED   | NOT CACHED
> | TEXT   | false | 
> hdfs://impala-new-test-cluster-1.gce.cloudera.com:8020/test-warehouse/tpcds.store_sales/ss_sold_date_sk=2450860
>  |
> | 2450874 | 922| 1  | 109.33KB | NOT CACHED   | NOT CACHED
> | TEXT   | false | 
> hdfs://impala-new-test-cluster-1.gce.cloudera.com:8020/test-warehouse/tpcds.store_sales/ss_sold_date_sk=2450874
>  |
> | 2450888 | 1712   | 2  | 204.72KB | NOT CACHED 

[jira] [Resolved] (IMPALA-4486) deploy.py provides no options for connecting to a TLS-enabled CM

2020-12-23 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-4486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-4486.
---
Fix Version/s: Not Applicable
   Resolution: Won't Fix

> deploy.py provides no options for connecting to a TLS-enabled CM
> 
>
> Key: IMPALA-4486
> URL: https://issues.apache.org/jira/browse/IMPALA-4486
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Kudu_Impala
>Reporter: Adar Dembo
>Priority: Major
> Fix For: Not Applicable
>
>
> (no idea what the right "Affects Version" is; it's clearly an issue on trunk).
> See [this community 
> post|http://community.cloudera.com/t5/Beta-Releases-Apache-Kudu/Impala-Kudu-install-fails-with-amp-quot-Error-401-amp-quot/td-p/47317/]
>  for more details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-4508) Move RC and HBase testing into exhaustive tests

2020-12-23 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-4508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-4508.
---
Fix Version/s: Not Applicable
   Resolution: Later

> Move RC and HBase testing into exhaustive tests
> ---
>
> Key: IMPALA-4508
> URL: https://issues.apache.org/jira/browse/IMPALA-4508
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 2.8.0
>Reporter: Henry Robinson
>Priority: Major
> Fix For: Not Applicable
>
>
> Let's move HBase and RC-compressed tests (and data loading) out of core runs 
> into exhaustive. They are relatively stable, and contribute significantly to 
> build times, which have recently crept up over our 4 hour threshold.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-4520) Use RECOVER PARTITIONS in our data load process where possible

2020-12-23 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-4520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-4520.
---
Fix Version/s: Not Applicable
   Resolution: Won't Fix

> Use RECOVER PARTITIONS in our data load process where possible
> --
>
> Key: IMPALA-4520
> URL: https://issues.apache.org/jira/browse/IMPALA-4520
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 2.8.0
>Reporter: David Knupp
>Priority: Major
>  Labels: test-infra
> Fix For: Not Applicable
>
>
> The various scripts employed in our data load process were written before the 
> {{RECOVER PARTITIONS}} clause was added (in Impala 2.3.) But now that we have 
> that feature available, we should refactor our data load process to make use 
> of it.
> The places where we should add {{ALTER TABLE t RECOVER PARTITIONS}} are 
> anywhere in the various schema template .sql files in testdata/datasests/* 
> where we partition a table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-4534) Not all of the data load files follow the accepted format when partitioning test data tables

2020-12-23 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-4534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-4534.
---
Fix Version/s: Not Applicable
   Resolution: Won't Fix

> Not all of the data load files follow the accepted format when partitioning 
> test data tables
> 
>
> Key: IMPALA-4534
> URL: https://issues.apache.org/jira/browse/IMPALA-4534
> Project: IMPALA
>  Issue Type: Task
>  Components: Infrastructure
>Affects Versions: Impala 2.8.0
>Reporter: David Knupp
>Priority: Major
>  Labels: test-infra
> Fix For: Not Applicable
>
>
> When loading Impala test data, we "generally" partition tables in our data 
> load process by adding PARTITION_COLUMNS and ALTER sections to the schema 
> template files, e.g. from functional_schema_template.sql:
> {noformat}
>  DATASET
> functional
>  BASE_TABLE_NAME
> alltypessmall
>  COLUMNS
> id int
> bool_col boolean
> tinyint_col tinyint
> smallint_col smallint
> int_col int
> bigint_col bigint
> float_col float
> double_col double
> date_string_col string
> string_col string
> timestamp_col timestamp
>  PARTITION_COLUMNS
> year int
> month int
>  ALTER
> ALTER TABLE {table_name} ADD IF NOT EXISTS PARTITION(year=2009, month=1);
> ALTER TABLE {table_name} ADD IF NOT EXISTS PARTITION(year=2009, month=2);
> ALTER TABLE {table_name} ADD IF NOT EXISTS PARTITION(year=2009, month=3);
> ALTER TABLE {table_name} ADD IF NOT EXISTS PARTITION(year=2009, month=4);
> {noformat}
> However, some tables forego this, and combine the PARTITION BY clause with 
> the CREATE TABLE clause, and they may or may not include an ALTER section. 
> This sidesteps logic in generate-schema-statements.py that specifically 
> branches based upon whether the PARTITION_COLUMNS and/or ALTER sections have 
> been defined.
> We should investigate what effect the omission of these sections has on our 
> data load process for those tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-4621) Builds and all tests should work without access to the internet

2020-12-23 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-4621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-4621.
---
Fix Version/s: Not Applicable
   Resolution: Later

> Builds and all tests should work without access to the internet
> ---
>
> Key: IMPALA-4621
> URL: https://issues.apache.org/jira/browse/IMPALA-4621
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 2.8.0
>Reporter: Jim Apple
>Priority: Major
> Fix For: Not Applicable
>
>
> A complete build-and-test, from scratch, should work even if the user's 
> internet is down, as long as they have the pre-requisites installed.
> Getting this list of pre-requisites should be easy, and the should be 
> downloadable ahead of time.
> Instead, I just saw this:
> {noformat}
> 
> Running mvn clean
> Directory: /home/ubuntu/Impala/ext-data-source
> 
> [INFO] Apache Impala (Incubating) External Data Source Test Library
> [INFO] Apache Impala (Incubating) External Data Source ... FAILURE [2.283s]
> [INFO] Apache Impala (Incubating) External Data Source Test Library  SKIPPED
> [INFO] BUILD FAILURE
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-clean-plugin:2.5:clean (default-clean) on 
> project impala-data-source: Execution default-clean of goal 
> org.apache.maven.plugins:maven-clean-plugin:2.5:clean failed: Plugin 
> org.apache.maven.plugins:maven-clean-plugin:2.5 or one of its dependencies 
> could not be resolved: Could not transfer artifact 
> org.codehaus.plexus:plexus-utils:jar:3.0 from/to central 
> (http://repo.maven.apache.org/maven2): GET request of: 
> org/codehaus/plexus/plexus-utils/3.0/plexus-utils-3.0.jar from central 
> failed: Connection reset -> [Help 1]
> [ERROR] 
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR] 
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/PluginResolutionException
> mvn clean exited with code 0
> Error in ./bin/clean.sh at line 34: ${IMPALA_HOME}/bin/mvn-quiet.sh clean
> {noformat}
> http://jenkins.impala.io:8080/job/clang-tidy/83/console



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-4661) Setting up HBase end-to-end tests does not work for remote clusters

2020-12-23 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-4661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-4661.
---
Fix Version/s: Not Applicable
   Resolution: Won't Fix

> Setting up HBase end-to-end tests does not work for remote clusters
> ---
>
> Key: IMPALA-4661
> URL: https://issues.apache.org/jira/browse/IMPALA-4661
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 2.7.0
>Reporter: David Knupp
>Priority: Major
>  Labels: hbase, test, test-infra
> Fix For: Not Applicable
>
>
> When testing on our local dev environments, our HBase tests rely upon the 
> fact that we presplit the data by running {{./testdata/bin/split-hbase.sh}} 
> during the data load phase. As-is, this script doesn't work against a remote 
> cluster, and without it, remote cluster tests involving hbase will fail.
> To make hbase tests run against a remote cluster, we need to either:
> * Not rely on pre-splitting (although this has the problem of probably losing 
> coverage, e.g. for testing HBase region pruning)
> * Come up with a more reliable split method, or somehow synthesize the HBase 
> metadata



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-4832) Set up email notifications for jenkins.impala.io

2020-12-23 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-4832.
---
Fix Version/s: Not Applicable
   Resolution: Won't Fix

jenkins.impala.io now comments on Gerrit reviews. We can open a new issue if 
this is still an issue.

> Set up email notifications for jenkins.impala.io
> 
>
> Key: IMPALA-4832
> URL: https://issues.apache.org/jira/browse/IMPALA-4832
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 2.9.0
>Reporter: Lars Volker
>Assignee: Laszlo Gaal
>Priority: Minor
>  Labels: asf, jenkins
> Fix For: Not Applicable
>
>
> We may want to set up email notifications for the new jenkins.impala.io 
> instance. To do so we could use the [Amazon Simple Email Service 
> (SES)|https://aws.amazon.com/ses/]. However this will require to have a 
> validated email address to send the emails from.
> We could ask the ASF for something like jenk...@impala.apache.org. If that 
> does not work, we might also consider using an @cloudera.com address, since 
> Cloudera currently provides and administers the infrastructure. As yet 
> another alternative, we might try to find a way to receive emails for 
> jenk...@impala.io and use that one.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-4928) Install instructions for Nikola are not Linux compatible

2020-12-23 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-4928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-4928.
---
Fix Version/s: Not Applicable
   Resolution: Invalid

> Install instructions for Nikola are not Linux compatible
> 
>
> Key: IMPALA-4928
> URL: https://issues.apache.org/jira/browse/IMPALA-4928
> Project: IMPALA
>  Issue Type: Task
>  Components: Infrastructure
>Affects Versions: Impala 2.9.0
> Environment: Ubuntu 14.04
>Reporter: Jim Apple
>Assignee: David Knupp
>Priority: Minor
> Fix For: Not Applicable
>
>
> {{requirements.txt}} includes {{MacFSEvents==0.7}}. Right now Impala only 
> supports building on Linux. It would be nice if developers could blog and 
> develop on the same machine.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-4959) Impala build may pick up system boost cmake module

2020-12-23 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-4959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-4959.
---
Fix Version/s: Not Applicable
   Resolution: Cannot Reproduce

> Impala build may pick up system boost cmake module
> --
>
> Key: IMPALA-4959
> URL: https://issues.apache.org/jira/browse/IMPALA-4959
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 2.8.0
>Reporter: Matthew Jacobs
>Assignee: David Alves
>Priority: Minor
>  Labels: build
> Fix For: Not Applicable
>
>
> In some systems with an old boost installed system-wide the impala build
> would fail with something like:
> CMake Error at /usr/lib64/boost/BoostConfig.cmake:64 (get_target_property):
>   get_target_property() called with non-existent target
>   "boost_thread-shared".
> Call Stack (most recent call first):
>   toolchain/cmake-3.2.3-p1/share/cmake-3.2/Modules/FindBoost.cmake:206 
> (find_package)
>   CMakeLists.txt:116 (find_package)
> CMake Error at /usr/lib64/boost/BoostConfig.cmake:72 (get_target_property):
>   get_target_property() called with non-existent target
>   "boost_thread-shared-debug".
> Call Stack (most recent call first):
>   toolchain/cmake-3.2.3-p1/share/cmake-3.2/Modules/FindBoost.cmake:206 
> (find_package)
>   CMakeLists.txt:116 (find_package)
> This because, if it exists, cmake's FindBoost.cmake will look for and
> use that module, even though boost's cmake build hasn't been maintained
> in years and the impala build is actually configured to not use the
> systems boost.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-5001) Allow bootstrap_toolchain.py to update dependencies if toolchain version changes

2020-12-23 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-5001.
---
Fix Version/s: Not Applicable
   Resolution: Won't Fix

> Allow bootstrap_toolchain.py to update dependencies if toolchain version 
> changes
> 
>
> Key: IMPALA-5001
> URL: https://issues.apache.org/jira/browse/IMPALA-5001
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 2.9.0
>Reporter: Henry Robinson
>Priority: Major
> Fix For: Not Applicable
>
>
> Currently, if you update a dependency without changing its version, and run 
> {{bin/bootstrap_toolchain.py}}, it won't download the new dependency.
> This is a problem for situations like IMPALA-4983 where we want to change the 
> build flags but don't have a new version number. 
> My suggestion is to use the toolchain build ID as a 'version' parameter, and 
> write that to disk. If the toolchain ID changes, {{bootstrap_toolchain.py}} 
> should detect that case and redownload all dependencies. Although it's 
> suboptimal to download *all* dependencies, it's better than not downloading 
> enough of them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-5219) Configure python logging such that Impala e2e tests log output to files

2020-12-23 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-5219.
---
Fix Version/s: Not Applicable
   Resolution: Fixed

We have been preserving logs to JUnitXML for a while. I don't know exactly when 
this was fixed, but closing.

> Configure python logging such that Impala e2e tests log output to files
> ---
>
> Key: IMPALA-5219
> URL: https://issues.apache.org/jira/browse/IMPALA-5219
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 2.9.0
>Reporter: David Knupp
>Assignee: David Knupp
>Priority: Major
>  Labels: test-infra
> Fix For: Not Applicable
>
>
> Currently, {{impala-py.test}} log output is not actually written anywhere 
> that's recoverable. This is probably because the logging module is not 
> configured properly.
> It would be helpful if tests would write log files. The ostensible location 
> would be to add test module specific logs to logs/ee_tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-5695) test_role_update test leaves the mini-cluster in a bad state after execution

2020-12-23 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-5695.
---
Fix Version/s: Not Applicable
   Resolution: Cannot Reproduce

> test_role_update test leaves the mini-cluster in a bad state after execution
> 
>
> Key: IMPALA-5695
> URL: https://issues.apache.org/jira/browse/IMPALA-5695
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 2.10.0
>Reporter: Anuj Phadke
>Assignee: Anuj Phadke
>Priority: Major
> Fix For: Not Applicable
>
>
> test_role_update right now is possibly the last test that gets executed when 
> running all the tests.
> Adding any test after test_role_update that restarts impala mini-cluster will 
> hang.
> Ex: I ran test_role_privilege_case after test_role_update and it failed with 
> this error -
> {code}
> anuj@anuj-OptiPlex-9020:~/Impala$ ps auxw | grep impalad
> anuj 25516 49.8  0.0  0 0 pts/6Z+   12:21   0:04 [impalad] 
> << zombie process 
> anuj 25792  0.6  0.0  56356 12160 pts/6S+   12:21   0:00 
> /home/anuj/Impala/bin/../infra/python/env/bin/python 
> /home/anuj/Impala/bin/start-impala-cluster.py --cluster_size=3 
> --num_coordinators=3 --log_dir=/tmp/ --log_level=1 
> --impalad_args="--server_name=server1"  
> --state_store_args="--statestore_update_frequency_ms=300"  
> --catalogd_args="--sentry_config=/home/anuj/Impala/fe/src/test/resources/sentry-site.xml
>  --sentry_catalog_polling_frequency_s=1" 
> {code}
> Here is the stack trace -
> {code}
> Traceback (most recent call last):
>   File "/home/anuj/Impala/bin/start-impala-cluster.py", line 365, in 
> kill_cluster_processes(force=options.force_kill)
>   File "/home/anuj/Impala/bin/start-impala-cluster.py", line 143, in 
> kill_cluster_processes
> kill_matching_processes(binaries, force)
>   File "/home/anuj/Impala/bin/start-impala-cluster.py", line 163, in 
> kill_matching_processes
> process.pid, KILL_TIMEOUT_IN_SECONDS))
> RuntimeError: Unable to kill impalad (pid 25516) after 240 seconds.
> MainThread: Found 0 impalad/0 statestored/0 catalogd process(es)
> ERROR
> = short test summary info 
> ==
> ERROR 
> authorization/test_grant_revoke.py::TestGrantRevoke::()::test_role_privilege_case[exec_option:
>  {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 
> 'disable_codegen': False, 'abort_on_error': 1, 
> 'exec_single_node_rows_threshold': 0} | table_format: text/none]
> == ERRORS 
> ==
>  ERROR at setup of TestGrantRevoke.test_role_privilege_case[exec_option: 
> {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 
> 'disable_codegen': False, 'abort_on_error': 1, 
> 'exec_single_node_rows_threshold': 0} | table_format: text/none] 
> authorization/test_grant_revoke.py:46: in setup_method
> super(TestGrantRevoke, self).setup_method(method)
> common/custom_cluster_test_suite.py:103: in setup_method
> self._start_impala_cluster(cluster_args)
> common/custom_cluster_test_suite.py:129: in _start_impala_cluster
> check_call(cmd + options, close_fds=True)
> /usr/lib/python2.7/subprocess.py:540: in check_call
> raise CalledProcessError(retcode, cmd)
> E   CalledProcessError: Command 
> '['/home/anuj/Impala/bin/start-impala-cluster.py', '--cluster_size=3', 
> '--num_coordinators=3', '--log_dir=/tmp/', '--log_level=1', 
> '--impalad_args="--server_name=server1" ', 
> '--state_store_args="--statestore_update_frequency_ms=300" ', 
> '--catalogd_args="--sentry_config=/home/anuj/Impala/fe/src/test/resources/sentry-site.xml
>  --sentry_catalog_polling_frequency_s=1" ']' returned non-zero exit status 1
> !! Interrupted: stopping after 1 
> failures !!
> == 3 passed, 1 pytest-warnings, 1 error 
> in 364.06 seconds ==
> anuj@anuj-OptiPlex-9020:~/Impala/tests$ 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-5791) bin/bootstrap_development.sh should be more resilient to failure

2020-12-23 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-5791.
---
Fix Version/s: Impala 2.10.0
   Resolution: Fixed

This has been pretty resilient, so I think we can close this for now.

> bin/bootstrap_development.sh should be more resilient to failure
> 
>
> Key: IMPALA-5791
> URL: https://issues.apache.org/jira/browse/IMPALA-5791
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Reporter: Jim Apple
>Priority: Minor
> Fix For: Impala 2.10.0
>
>
> A number of the operations in {{bin/bootstrap_development.sh}} can fail due 
> to network issues. The script should be robust to transient failures.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-5793) bin/bootstrap_development.py should be able to use Oracle Java

2020-12-23 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-5793.
---
Fix Version/s: Not Applicable
   Resolution: Won't Fix

> bin/bootstrap_development.py should be able to use Oracle Java
> --
>
> Key: IMPALA-5793
> URL: https://issues.apache.org/jira/browse/IMPALA-5793
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 2.10.0
>Reporter: Jim Apple
>Priority: Minor
> Fix For: Not Applicable
>
>
> {{bin/bootstrap_development.sh}} uses OpenJDK. At least for Ubuntu 16.04, 
> Oracle Java 8 is available and can be installed via {{apt-get}}: 
> http://www.webupd8.org/2012/09/install-oracle-java-8-in-ubuntu-via-ppa.html
> In the Jenkins job, the machines must have Java installed for Jenkins to 
> communicate with them, so the installation happens when an instance is 
> created, so this will have to propagate to that stage of the installation of 
> the from-scratch job(s).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-5959) jenkins.impala.io tests pass but job fails deleting workspace

2020-12-23 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-5959.
---
Fix Version/s: Not Applicable
   Resolution: Fixed

> jenkins.impala.io tests pass but job fails deleting workspace
> -
>
> Key: IMPALA-5959
> URL: https://issues.apache.org/jira/browse/IMPALA-5959
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 2.10.0
>Reporter: Matthew Jacobs
>Priority: Major
> Fix For: Not Applicable
>
>
> On jenkins.impala.io, we have seen several cases where 
> ubuntu-16.04-from-scratch tests pass but the job ultimately fails when the 
> workspace cannot be cleaned up.
> E.g.
> {code}
> 21:18:24  66 passed, 31 skipped, 1 xfailed, 2 xpassed in 2353.76 
> seconds 
> 21:18:24 + onexit
> 21:18:24 + df -m
> 21:18:24 Filesystem 1M-blocks  Used Available Use% Mounted on
> 21:18:24 udev   32200 0 32200   0% /dev
> 21:18:24 tmpfs   6442 9  6433   1% /run
> 21:18:24 /dev/xvda1248106 88424159667  36% /
> 21:18:24 tmpfs  32208 1 32208   1% /dev/shm
> 21:18:24 tmpfs  5 0 5   0% /run/lock
> 21:18:24 tmpfs  32208 0 32208   0% /sys/fs/cgroup
> 21:18:24 tmpfs   6442 0  6442   0% /run/user/1000
> 21:18:24 + free -m
> 21:18:24   totalusedfree  shared  buff/cache  
>  available
> 21:18:24 Mem:  64414   16218   28154  59   20041  
>  47524
> 21:18:24 Swap: 0   0   0
> 21:18:24 + uptime -p
> 21:18:24 up 4 hours, 5 minutes
> 21:18:24 + rm -rf /home/ubuntu/Impala/logs_static
> 21:18:24 + mkdir -p /home/ubuntu/Impala/logs_static
> 21:18:24 + cp -r -L /home/ubuntu/Impala/logs /home/ubuntu/Impala/logs_static
> 21:18:48 Process leaked file descriptors. See 
> http://wiki.jenkins-ci.org/display/JENKINS/Spawning+processes+from+build for 
> more information
> 21:18:48 Set build name.
> 21:18:48 New build name is '#295 refs/changes/35/8035/6'
> 21:18:48 Variable with name 'BUILD_DISPLAY_NAME' already exists, current 
> value: '#295 refs/changes/35/8035/6', new value: '#295 refs/changes/35/8035/6'
> 21:18:50 Archiving artifacts
> 21:19:29 [WS-CLEANUP] Deleting project workspace...Cannot delete workspace 
> :remote file operation failed: /home/ubuntu at 
> hudson.remoting.Channel@43c5313:ubuntu-16.04 (i-0f8cf68ee32aebe80): 
> java.io.IOException: Unable to delete '/home/ubuntu'. Tried 3 times (of a 
> maximum of 3) waiting 0.1 sec between attempts.
> 21:19:36 ERROR: Step ‘Delete workspace when build is done’ failed: Cannot 
> delete workspace: remote file operation failed: /home/ubuntu at 
> hudson.remoting.Channel@43c5313:ubuntu-16.04 (i-0f8cf68ee32aebe80): 
> java.io.IOException: Unable to delete '/home/ubuntu'. Tried 3 times (of a 
> maximum of 3) waiting 0.1 sec between attempts.
> 21:19:36 Finished: SUCCESS
> {code}
> The issue and some proposed solutions were discussed on the dev@ mailing list:
> https://mail-archives.apache.org/mod_mbox/incubator-impala-dev/201708.mbox/browser



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-5968) load_nested.py fails on secure cluster: no write access to /user on HDFS

2020-12-23 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-5968.
---
Fix Version/s: Not Applicable
   Resolution: Later

There is no plan to fix this, closing.

> load_nested.py fails on secure cluster: no write access to /user on HDFS
> 
>
> Key: IMPALA-5968
> URL: https://issues.apache.org/jira/browse/IMPALA-5968
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 2.10.0
>Reporter: Matthew Mulder
>Priority: Minor
> Fix For: Not Applicable
>
>
> {code}testdata/bin/load_nested.py \
> -s tpch_100_parquet \
> -t tpch_100_nested_parquet \
> --cm-host=vc0332.test \
> -p 34 --use-kerberos --use-ssl
> Output: 2017-09-20 14:18:06,174 INFO:db_connection[234]:Creating database 
> tpch_100_nested_parquet
> 2017-09-20 14:18:06,737 INFO:load_nested[98]:Creating temp orders (chunk 1 of 
> 34)
> 2017-09-20 14:18:31,420 INFO:load_nested[98]:Creating temp orders (chunk 2 of 
> 34)
> 2017-09-20 14:18:52,146 INFO:load_nested[98]:Creating temp orders (chunk 3 of 
> 34)
> 2017-09-20 14:19:12,398 INFO:load_nested[98]:Creating temp orders (chunk 4 of 
> 34)
> 2017-09-20 14:19:32,109 INFO:load_nested[98]:Creating temp orders (chunk 5 of 
> 34)
> 2017-09-20 14:19:51,823 INFO:load_nested[98]:Creating temp orders (chunk 6 of 
> 34)
> 2017-09-20 14:20:12,521 INFO:load_nested[98]:Creating temp orders (chunk 7 of 
> 34)
> 2017-09-20 14:20:33,220 INFO:load_nested[98]:Creating temp orders (chunk 8 of 
> 34)
> 2017-09-20 14:20:53,417 INFO:load_nested[98]:Creating temp orders (chunk 9 of 
> 34)
> 2017-09-20 14:21:13,143 INFO:load_nested[98]:Creating temp orders (chunk 10 
> of 34)
> 2017-09-20 14:21:34,361 INFO:load_nested[98]:Creating temp orders (chunk 11 
> of 34)
> 2017-09-20 14:21:53,048 INFO:load_nested[98]:Creating temp orders (chunk 12 
> of 34)
> 2017-09-20 14:22:13,743 INFO:load_nested[98]:Creating temp orders (chunk 13 
> of 34)
> 2017-09-20 14:22:34,439 INFO:load_nested[98]:Creating temp orders (chunk 14 
> of 34)
> 2017-09-20 14:22:55,160 INFO:load_nested[98]:Creating temp orders (chunk 15 
> of 34)
> 2017-09-20 14:23:15,868 INFO:load_nested[98]:Creating temp orders (chunk 16 
> of 34)
> 2017-09-20 14:23:37,075 INFO:load_nested[98]:Creating temp orders (chunk 17 
> of 34)
> 2017-09-20 14:23:57,260 INFO:load_nested[98]:Creating temp orders (chunk 18 
> of 34)
> 2017-09-20 14:24:18,967 INFO:load_nested[98]:Creating temp orders (chunk 19 
> of 34)
> 2017-09-20 14:24:39,680 INFO:load_nested[98]:Creating temp orders (chunk 20 
> of 34)
> 2017-09-20 14:24:59,364 INFO:load_nested[98]:Creating temp orders (chunk 21 
> of 34)
> 2017-09-20 14:25:20,556 INFO:load_nested[98]:Creating temp orders (chunk 22 
> of 34)
> 2017-09-20 14:25:40,240 INFO:load_nested[98]:Creating temp orders (chunk 23 
> of 34)
> 2017-09-20 14:26:00,949 INFO:load_nested[98]:Creating temp orders (chunk 24 
> of 34)
> 2017-09-20 14:26:21,146 INFO:load_nested[98]:Creating temp orders (chunk 25 
> of 34)
> 2017-09-20 14:26:41,841 INFO:load_nested[98]:Creating temp orders (chunk 26 
> of 34)
> 2017-09-20 14:27:01,530 INFO:load_nested[98]:Creating temp orders (chunk 27 
> of 34)
> 2017-09-20 14:27:21,219 INFO:load_nested[98]:Creating temp orders (chunk 28 
> of 34)
> 2017-09-20 14:27:42,921 INFO:load_nested[98]:Creating temp orders (chunk 29 
> of 34)
> 2017-09-20 14:28:04,617 INFO:load_nested[98]:Creating temp orders (chunk 30 
> of 34)
> 2017-09-20 14:28:25,808 INFO:load_nested[98]:Creating temp orders (chunk 31 
> of 34)
> 2017-09-20 14:28:48,031 INFO:load_nested[98]:Creating temp orders (chunk 32 
> of 34)
> 2017-09-20 14:29:08,213 INFO:load_nested[98]:Creating temp orders (chunk 33 
> of 34)
> 2017-09-20 14:29:29,918 INFO:load_nested[98]:Creating temp orders (chunk 34 
> of 34)
> 2017-09-20 14:29:50,610 INFO:load_nested[128]:Creating temp customers (chunk 
> 1 of 34)
> 2017-09-20 14:30:18,747 INFO:load_nested[128]:Creating temp customers (chunk 
> 2 of 34)
> 2017-09-20 14:30:44,058 INFO:load_nested[128]:Creating temp customers (chunk 
> 3 of 34)
> 2017-09-20 14:31:07,818 INFO:load_nested[128]:Creating temp customers (chunk 
> 4 of 34)
> 2017-09-20 14:31:31,565 INFO:load_nested[128]:Creating temp customers (chunk 
> 5 of 34)
> 2017-09-20 14:31:54,799 INFO:load_nested[128]:Creating temp customers (chunk 
> 6 of 34)
> 2017-09-20 14:32:18,529 INFO:load_nested[128]:Creating temp customers (chunk 
> 7 of 34)
> 2017-09-20 14:32:41,762 INFO:load_nested[128]:Creating temp customers (chunk 
> 8 of 34)
> 2017-09-20 14:33:04,974 INFO:load_nested[128]:Creating temp customers (chunk 
> 9 of 34)
> 2017-09-20 14:33:28,713 INFO:load_nested[128]:Creating temp customers (chunk 
> 10 of 34)
> 2017-09-20 14:33:51,923 INFO:

[jira] [Resolved] (IMPALA-6000) Variable subsitution in Hue

2020-12-23 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-6000.
---
Fix Version/s: Not Applicable
   Resolution: Invalid

This is Hue specific behavior that can be tracked by the Hue JIRA.

> Variable subsitution in Hue
> ---
>
> Key: IMPALA-6000
> URL: https://issues.apache.org/jira/browse/IMPALA-6000
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
> Environment: Impala on Hue 4.0, chrome, windows 10
>Reporter: SUSHANT GUPTA
>Priority: Major
> Fix For: Not Applicable
>
>
> I can get the following to work on impala shell but same doesnt work on Hue 
> 4.0 as part of Impala query. 
> set var:variable_name:value;
> select * from table1 where table1.col1 = ${var:variable_name};



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-6056) Dataload from snapshot should only compute statistics for new tables

2020-12-23 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-6056.
---
Fix Version/s: Not Applicable
   Resolution: Won't Fix

We rarely use snapshots now, so closing this.

> Dataload from snapshot should only compute statistics for new tables
> 
>
> Key: IMPALA-6056
> URL: https://issues.apache.org/jira/browse/IMPALA-6056
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 2.10.0
>Reporter: Joe McDonnell
>Priority: Major
> Fix For: Not Applicable
>
>
> When loading data from a snapshot, create-load-data.sh runs 
> compute-table-stats.sh, which will compute statistics for all of the tables. 
> However, the hive metastore snapshot already contains statistics from most of 
> those tables. Only the Kudu tables are created from scratch in the load from 
> snapshot.
> Computing the statistics for everything takes 11 minutes, whereas computing 
> statistics only for Kudu takes roughly 3 minutes. This is a meaningful 
> savings and hand tests show that only computing statistics for the Kudu 
> tables does not impact subsequent tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-3073) Verify if compressed data in avro, sequence file format could be multistream

2020-12-23 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-3073.
---
Resolution: Cannot Reproduce

I don't think we have an example of this happening so will close for now.

> Verify if compressed data in avro, sequence file format could be multistream
> 
>
> Key: IMPALA-3073
> URL: https://issues.apache.org/jira/browse/IMPALA-3073
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend
>Affects Versions: Impala 2.0
>Reporter: Juan Yu
>Priority: Minor
>
> When generating compressed text file, certain tools (like pbzip2) could 
> parallel compression and create multistream compressed data. We need to 
> verify if this applies to other file formats like parquet, avro, sequence 
> that use those codec. If yes, Codec::ProcessBlock() should support 
> multistream compressd block data. decompression should not stop when reach 
> the end of compressed stream, e.g. Z_STREAM_END, BZ_STREAM_END.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-3073) Verify if compressed data in avro, sequence file format could be multistream

2020-12-23 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-3073:
--
Priority: Minor  (was: Major)

> Verify if compressed data in avro, sequence file format could be multistream
> 
>
> Key: IMPALA-3073
> URL: https://issues.apache.org/jira/browse/IMPALA-3073
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend
>Affects Versions: Impala 2.0
>Reporter: Juan Yu
>Priority: Minor
>
> When generating compressed text file, certain tools (like pbzip2) could 
> parallel compression and create multistream compressed data. We need to 
> verify if this applies to other file formats like parquet, avro, sequence 
> that use those codec. If yes, Codec::ProcessBlock() should support 
> multistream compressd block data. decompression should not stop when reach 
> the end of compressed stream, e.g. Z_STREAM_END, BZ_STREAM_END.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-3025) Add empty string test coverage

2020-12-23 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-3025.
---
Resolution: Later

> Add empty string test coverage
> --
>
> Key: IMPALA-3025
> URL: https://issues.apache.org/jira/browse/IMPALA-3025
> Project: IMPALA
>  Issue Type: Test
>  Components: Backend
>Affects Versions: Impala 2.5.0
>Reporter: Skye Wanderman-Milne
>Priority: Minor
>  Labels: ramp-up
>
> As revealed by IMPALA-3018, we have little to no test coverage for empty 
> strings. We should add more coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-2842) "SCAN HDFS" "hosts" doesn't account for num_nodes or unsplittable formats

2020-12-23 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-2842:
--
Fix Version/s: Impala 2.10.0

> "SCAN HDFS" "hosts" doesn't account for num_nodes or unsplittable formats
> -
>
> Key: IMPALA-2842
> URL: https://issues.apache.org/jira/browse/IMPALA-2842
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.5.0
>Reporter: Juan Yu
>Priority: Minor
> Fix For: Impala 2.10.0
>
>
> According to the comments, "hosts" should be "number of nodes on which the 
> plan tree rooted at this node would execute".
> But for "SCAN HDFS", it always equals to the # of backend where the data is.
> For example for query "select * from sc;"
> distributed plan
> {code}
> Query: explain select * from sc limit 1000
> +--+
> | Explain String   |
> +--+
> | Estimated Per-Host Requirements: Memory=32.00MB VCores=1 |
> |  |
> | F01:PLAN FRAGMENT [UNPARTITIONED]|
> |   01:EXCHANGE [UNPARTITIONED]|
> |  limit: 1000 |
> |  hosts=3 per-host-mem=unavailable|
> |  tuple-ids=0 row-size=58B cardinality=8  |
> |  |
> | F00:PLAN FRAGMENT [RANDOM]   |
> |   DATASTREAM SINK [FRAGMENT=F01, EXCHANGE=01, UNPARTITIONED] |
> |   00:SCAN HDFS [default.sc, RANDOM]  |
> |  partitions=1/1 files=3 size=163B|
> |  table stats: 8 rows total   |
> |  column stats: all   |
> |  limit: 1000 |
> |  hosts=3 per-host-mem=32.00MB|
> |  tuple-ids=0 row-size=58B cardinality=8  |
> +--+
> {code}
> single node plan
> {code}
> Query: explain select * from sc
> +-+
> | Explain String  |
> +-+
> | Estimated Per-Host Requirements: Memory=0B VCores=0 |
> | |
> | F00:PLAN FRAGMENT [UNPARTITIONED]   |
> |   00:SCAN HDFS [default.sc] |
> |  partitions=1/1 files=3 size=163B   |
> |  table stats: 8 rows total  |
> |  column stats: all  |
> |  hosts=3 per-host-mem=unavailable   |
> |  tuple-ids=0 row-size=58B cardinality=8 |
> +-+
> {code}
> Query summary and profile do show correct # of executing nodes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-2842) "SCAN HDFS" "hosts" doesn't account for num_nodes or unsplittable formats

2020-12-23 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-2842.
---
Resolution: Fixed

This was fixed a while back when hosts= was moved to the FRAGMENT level in the 
explain plan.


commit 9a29dfc91b1ff8bbae3c94b53bf2b6ac81a271e0
Author: Tim Armstrong 
Date:   Wed Jan 25 15:19:35 2017 -0800

IMPALA-3748: minimum buffer requirements in planner

Compute the minimum buffer requirement for spilling nodes and
per-host estimates for the entire plan tree.

This builds on top of the existing resource estimation code, which
computes the sets of plan nodes that can execute concurrently. This is
cleaned up so that the process of producing resource requirements is
clearer. It also removes the unused VCore estimates.

Fixes various bugs and other issues:
* computeCosts() was not called for unpartitioned fragments, so
  the per-operator memory estimate was not visible.
* Nested loop join was not treated as a blocking join.
* The TODO comment about union was misleading
* Fix the computation for mt_dop > 1 by distinguishing per-instance and
  per-host estimates.
* Always generate an estimate instead of unpredictably returning
  -1/"unavailable" in many circumstances - there was little rhyme or
  reason to when this happened.
* Remove the special "trivial plan" estimates. With the rest of the
  cleanup we generate estimates <= 10MB for those trivial plans through
  the normal code path.

I left one bug (IMPALA-4862) unfixed because it is subtle, will affect
estimates for many plans and will be easier to review once we have the
test infra in place.

Testing:
Added basic planner tests for resource requirements in both the MT and
non-MT cases.

Re-enabled the explain_level tests, which appears to be the only
coverage for many of these estimates. Removed the complex and
brittle test cases and replaced with a couple of much simpler
end-to-end tests.

Change-Id: I1e358182bcf2bc5fe5c73883eb97878735b12d37
Reviewed-on: http://gerrit.cloudera.org:8080/5847
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins


> "SCAN HDFS" "hosts" doesn't account for num_nodes or unsplittable formats
> -
>
> Key: IMPALA-2842
> URL: https://issues.apache.org/jira/browse/IMPALA-2842
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.5.0
>Reporter: Juan Yu
>Priority: Minor
>
> According to the comments, "hosts" should be "number of nodes on which the 
> plan tree rooted at this node would execute".
> But for "SCAN HDFS", it always equals to the # of backend where the data is.
> For example for query "select * from sc;"
> distributed plan
> {code}
> Query: explain select * from sc limit 1000
> +--+
> | Explain String   |
> +--+
> | Estimated Per-Host Requirements: Memory=32.00MB VCores=1 |
> |  |
> | F01:PLAN FRAGMENT [UNPARTITIONED]|
> |   01:EXCHANGE [UNPARTITIONED]|
> |  limit: 1000 |
> |  hosts=3 per-host-mem=unavailable|
> |  tuple-ids=0 row-size=58B cardinality=8  |
> |  |
> | F00:PLAN FRAGMENT [RANDOM]   |
> |   DATASTREAM SINK [FRAGMENT=F01, EXCHANGE=01, UNPARTITIONED] |
> |   00:SCAN HDFS [default.sc, RANDOM]  |
> |  partitions=1/1 files=3 size=163B|
> |  table stats: 8 rows total   |
> |  column stats: all   |
> |  limit: 1000 |
> |  hosts=3 per-host-mem=32.00MB|
> |  tuple-ids=0 row-size=58B cardinality=8  |
> +--+
> {code}
> single node plan
> {code}
> Query: explain select * from sc
> +-+
> | Explain String  |
> +-+
> | Estimated Per-Host Requirements: Memory=0B VCores=0 |
> | |
> | F00:PLAN FRAGMENT [UNPARTITIONED]   |
> |   00:SCAN HDFS [default.sc] |
> |  parti

[jira] [Resolved] (IMPALA-2787) Support de-duplicate records in Impala

2020-12-23 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-2787.
---
Resolution: Won't Fix

It's a little underspecified what the use-case is, but I don't know that we 
necessarily want to add a customized operation for this (as opposed to INSERT 
OVERWRITE .. SELECT DISTINCT)

> Support de-duplicate records in Impala
> --
>
> Key: IMPALA-2787
> URL: https://issues.apache.org/jira/browse/IMPALA-2787
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend
>Affects Versions: Impala 2.2.4
>Reporter: Eric Lin
>Priority: Minor
>
> Two use cases:
> Use Case 1: Remove duplicate rows where the all data in the row is identical
> Use Case 2: Remove duplicate rows where the all data in the row is identicalm 
> except for a small number of columns
> Rather than using SELECT DISTINCT from one table to another table, it would 
> be great if Impala can support it natively and remove duplicate records on 
> the table itself without a new table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-3290) Catch std::bad_alloc from LLVM optimisation

2020-12-23 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-3290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-3290.
---
Resolution: Won't Fix

> Catch std::bad_alloc from LLVM optimisation
> ---
>
> Key: IMPALA-3290
> URL: https://issues.apache.org/jira/browse/IMPALA-3290
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.5.0
>Reporter: Huaisi Xu
>Priority: Major
>
> I have a core dump locally, the stack trace is like:
> {code:java}
> #0  0x7f79a50dacc9 in __GI_raise (sig=sig@entry=6) at 
> ../nptl/sysdeps/unix/sysv/linux/raise.c:56
> #1  0x7f79a50de218 in __GI_abort () at abort.c:118
> #2  0x7f79a59ee6dd in __gnu_cxx::__verbose_terminate_handler() () from 
> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #3  0x7f79a59ec746 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #4  0x7f79a59ec791 in std::terminate() () from 
> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #5  0x7f79a59ec9a8 in __cxa_throw () from 
> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #6  0x023065c8 in (anonymous namespace)::handle_oom(void* (*)(void*), 
> void*, bool, bool) [clone .constprop.53] ()
> #7  0x02326fd0 in tc_newarray ()
> #8  0x021bbb94 in llvm::Instruction::setMetadata(unsigned int, 
> llvm::MDNode*) ()
> #9  0x021a38c0 in llvm::Instruction::clone() const ()
> #10 0x01f5e9c4 in (anonymous 
> namespace)::PruningFunctionCloner::CloneBlock(llvm::BasicBlock const*, 
> std::vector 
> >&) ()
> #11 0x01f5fd43 in llvm::CloneAndPruneFunctionInto(llvm::Function*, 
> llvm::Function const*, llvm::ValueMap llvm::ValueMapConfig >&, bool, 
> llvm::SmallVectorImpl&, char const*, 
> llvm::ClonedCodeInfo*, llvm::DataLayout const*, llvm::Instruction*) ()
> #12 0x01f647b6 in llvm::InlineFunction(llvm::CallSite, 
> llvm::InlineFunctionInfo&, bool) ()
> #13 0x01987f1f in llvm::Inliner::runOnSCC(llvm::CallGraphSCC&) ()
> #14 0x01fb9d6e in (anonymous 
> namespace)::CGPassManager::runOnModule(llvm::Module&) ()
> #15 0x021c8158 in llvm::MPPassManager::runOnModule(llvm::Module&) ()
> #16 0x021c8331 in llvm::PassManagerImpl::run(llvm::Module&) ()
> #17 0x015325f5 in impala::LlvmCodeGen::OptimizeModule 
> (this=0x7e209200) at /home/huaisi/Impala/be/src/codegen/llvm-codegen.cc:747
> #18 0x01531c55 in impala::LlvmCodeGen::FinalizeModule 
> (this=0x7e209200) at /home/huaisi/Impala/be/src/codegen/llvm-codegen.cc:673
> #19 0x01874587 in impala::PlanFragmentExecutor::OptimizeLlvmModule 
> (this=0x798f868) at 
> /home/huaisi/Impala/be/src/runtime/plan-fragment-executor.cc:284
> #20 0x01874f86 in impala::PlanFragmentExecutor::Open (this=0x798f868) 
> at /home/huaisi/Impala/be/src/runtime/plan-fragment-executor.cc:326
> #21 0x0144d79c in impala::FragmentMgr::FragmentExecState::Exec 
> (this=0x798f600) at 
> /home/huaisi/Impala/be/src/service/fragment-exec-state.cc:53
> #22 0x01444fa5 in impala::FragmentMgr::FragmentThread 
> (this=0x76a0d00, fragment_instance_id=...) at 
> /home/huaisi/Impala/be/src/service/fragment-mgr.cc:86
> #23 0x01448b02 in boost::_mfi::mf1 impala::TUniqueId>::operator() (this=0x127efc000, p=0x76a0d00, a1=...) at 
> /home/huaisi/Impala/toolchain/boost-1.57.0/include/boost/bind/mem_fn_template.hpp:165
> #24 0x014488bf in 
> boost::_bi::list2, 
> boost::_bi::value >::operator() impala::FragmentMgr, impala::TUniqueId>, boost::_bi::list0> 
> (this=0x127efc010, f=..., a=...)
> at 
> /home/huaisi/Impala/toolchain/boost-1.57.0/include/boost/bind/bind.hpp:313
> #25 0x014481e9 in boost::_bi::bind_t impala::FragmentMgr, impala::TUniqueId>, 
> boost::_bi::list2, 
> boost::_bi::value > >::operator() (this=0x127efc000)
> at 
> /home/huaisi/Impala/toolchain/boost-1.57.0/include/boost/bind/bind_template.hpp:20
> #26 0x01447b52 in 
> boost::detail::function::void_function_obj_invoker0 boost::_mfi::mf1, 
> boost::_bi::list2, 
> boost::_bi::value > >, void>::invoke
> (function_obj_ptr=...) at 
> /home/huaisi/Impala/toolchain/boost-1.57.0/include/boost/function/function_template.hpp:153
> #27 0x0125dd10 in boost::function0::operator() 
> (this=0x7f791a9f3d60) at 
> /home/huaisi/Impala/toolchain/boost-1.57.0/include/boost/function/function_template.hpp:767
> #28 0x01509e3b in impala::Thread::SuperviseThread(std::string const&, 
> std::string const&, boost::function, impala::Promise*) 
> (name="exec-plan-fragment-e348c82278866d37:f5eb00d2bb759789", 
> category="fragment-mgr", functor=..., thread_started=0x7f791d28eaa0)
> at /home/huaisi/Impala/be/src/util/thread.cc:316
> #29 0x0151156c in boost::_bi::list4, 
> boost::_bi::value, boost::_bi::value >, 
> boost::_bi::value*> >::operator()

[jira] [Resolved] (IMPALA-3259) Codegen is not cancellable and can use a lot of CPU and memory

2020-12-23 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-3259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-3259.
---
Resolution: Fixed

This has been mitigated by the various fixes that we've done to improve this. 
Will call it done for now.

> Codegen is not cancellable and can use a lot of CPU and memory
> --
>
> Key: IMPALA-3259
> URL: https://issues.apache.org/jira/browse/IMPALA-3259
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.5.0
>Reporter: Huaisi Xu
>Priority: Major
>  Labels: codegen
> Attachments: screenshot-1.png
>
>
> memz stucks at 
> {code:java}
> Memory Usage
> Memory consumption / limit: 10.43 GB / 20.00 GB
> tcmalloc
> 
> MALLOC:1048989 (10003.9 MiB) Bytes in use by application
> MALLOC: +0 (0.0 MiB) Bytes in page heap freelist
> MALLOC: +674292952 (  643.1 MiB) Bytes in central cache freelist
> MALLOC: + 15303600 (   14.6 MiB) Bytes in transfer cache freelist
> MALLOC: + 15594152 (   14.9 MiB) Bytes in thread cache freelists
> MALLOC: + 51245248 (   48.9 MiB) Bytes in malloc metadata
> MALLOC:   
> MALLOC: =  11246325952 (10725.3 MiB) Actual memory used (physical + swap)
> MALLOC: +   6235471872 ( 5946.6 MiB) Bytes released to OS (aka unmapped)
> MALLOC:   
> MALLOC: =  17481797824 (16671.9 MiB) Virtual address space used
> MALLOC:
> MALLOC: 565828  Spans in use
> MALLOC:245  Thread heaps in use
> MALLOC:   8192  Tcmalloc page size
> 
> Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).
> Bytes released to the OS take up virtual address space but no physical memory.
> {code}
> this query is cancelled because of unreachable impalad because I attached 
> gdb..
> I have flushed the query archiving cache by issuing ~30 "show tables" 
> queries. (this works in IMPALA-3254) but the memory in this case stayed high.
> I will try to reproduce and get a heap profile.
> There is no query running at this moment.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-2666) Investigate and probably codegen partitioned insert path

2020-12-23 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-2666:
-

Assignee: (was: Michael Ho)

> Investigate and probably codegen partitioned insert path
> 
>
> Key: IMPALA-2666
> URL: https://issues.apache.org/jira/browse/IMPALA-2666
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.3.0
>Reporter: Skye Wanderman-Milne
>Priority: Minor
>  Labels: codegen, performance
>
> The partitioned insert path is considerably slower than the unpartitioned 
> insert path ([~mmokhtar] can you comment more?). Codegen can likely fix this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-2463) Measure and optimize SPLIT_PART and REGEXP_LIKE

2020-12-23 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-2463.
---
Resolution: Later

> Measure and optimize SPLIT_PART and REGEXP_LIKE
> ---
>
> Key: IMPALA-2463
> URL: https://issues.apache.org/jira/browse/IMPALA-2463
> Project: IMPALA
>  Issue Type: Task
>  Components: Backend
>Affects Versions: Impala 2.3.0
>Reporter: Jim Apple
>Priority: Minor
>  Labels: performance
>
> IMPALA-2084 These haven't been measured yet and can't use some more optimized 
> string searching functions like strstr(3) which only work on null-terminated 
> strings.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-2454) Add more parquet test files that use two-level array encoding

2020-12-23 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-2454.
---
Resolution: Won't Fix

This is a legacy encoding that we likely won't invest more effort into.

> Add more parquet test files that use two-level array encoding
> -
>
> Key: IMPALA-2454
> URL: https://issues.apache.org/jira/browse/IMPALA-2454
> Project: IMPALA
>  Issue Type: Test
>  Components: Backend
>Affects Versions: Impala 2.3.0
>Reporter: Skye Wanderman-Milne
>Priority: Minor
>  Labels: nested_types, parquet, test
>
> We're lacking coverage of more complicated cases where it's harder to 
> disambiguate two- and three-level arrays, e.g. array>.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-4113) Catalog prints misleading error message when it cannot connect to HMS

2020-12-23 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated IMPALA-4113:

Labels: catalog-server ramp-up  (was: catalog-server)

> Catalog prints misleading error message when it cannot connect to HMS
> -
>
> Key: IMPALA-4113
> URL: https://issues.apache.org/jira/browse/IMPALA-4113
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 2.7.0
>Reporter: Sailesh Mukil
>Priority: Minor
>  Labels: catalog-server, ramp-up
>
> The catalog server prints the following error message when it cannot create 
> the catalog service:
> "Error initialializing Catalog. Please run 'invalidate metadata'"
> This shows up when the catalog fails to initialize, like getDatabase HMS call 
> fails (probably HMS is not ready yet), in which case none of the impalads can 
> run any query because they haven't gotten their initial metadata update.
> {code:java}
> E0909 12:03:33.278311 96437 CatalogServiceCatalog.java:607] 
> NoSuchObjectException(message:cloudera_manager_metastore_canary_test_db_hive_1_hivemetastore_d58189da03eaa0773d6f03557662087b)
> E0909 12:03:33.282852 96437 JniCatalog.java:105] Error initialializing 
> Catalog. Please run 'invalidate metadata'
> Java exception follows:
> com.cloudera.impala.catalog.CatalogException: Error initializing Catalog. 
> Catalog may be empty.
>   at 
> com.cloudera.impala.catalog.CatalogServiceCatalog.reset(CatalogServiceCatalog.java:608)
>   at com.cloudera.impala.service.JniCatalog.(JniCatalog.java:103)
> Caused by: 
> NoSuchObjectException(message:cloudera_manager_metastore_canary_test_db_hive_1_hivemetastore_d58189da03eaa0773d6f03557662087b)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_database_result$get_database_resultStandardScheme.read(ThriftHiveMetastore.java:15543)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_database_result$get_database_resultStandardScheme.read(ThriftHiveMetastore.java:15520)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_database_result.read(ThriftHiveMetastore.java:15451)
>   at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_database(ThriftHiveMetastore.java:662)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_database(ThriftHiveMetastore.java:649)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabase(HiveMetaStoreClient.java:1178)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:101)
>   at com.sun.proxy.$Proxy4.getDatabase(Unknown Source)
>   at 
> com.cloudera.impala.catalog.CatalogServiceCatalog.reset(CatalogServiceCatalog.java:573)
>   ... 1 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-2277) Investigate alternative hash functions

2020-12-23 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-2277.
---
Resolution: Later

We switched to FastHash in some places.

* CRC is very cheap to evaluate and it's hard to outdo for short data types in 
perf-critical places like hash join
* FastHash is good for mixed data where we want a higher quality hash function, 
e.g. data distribution in an exchange
* Many other hash functions only have benefits on long strings.

> Investigate alternative hash functions
> --
>
> Key: IMPALA-2277
> URL: https://issues.apache.org/jira/browse/IMPALA-2277
> Project: IMPALA
>  Issue Type: Task
>  Components: Backend
>Affects Versions: Impala 2.2
>Reporter: Tim Armstrong
>Priority: Minor
>
> Impala currently uses FNV, Murmur2 and CRC hashes in different places 
> depending on requirements. There are additional, newer, hash functions 
> available including Murmur3, SpookyHash, CityHash, and others that may offer 
> benefits.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-2370) Impala on YARN integration

2020-12-23 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-2370.
---
Resolution: Later

> Impala on YARN integration
> --
>
> Key: IMPALA-2370
> URL: https://issues.apache.org/jira/browse/IMPALA-2370
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend
>Affects Versions: Impala 2.0
>Reporter: Matthew Jacobs
>Priority: Major
>  Labels: resource-management
>
> This is a placeholder JIRA to track overall progress of Impala on YARN 
> integration.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-2384) Check memory limits in RowBatch constructor and RowBatch::Reset().

2020-12-23 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-2384:
--
Labels: ramp-up resource-management  (was: resource-management)

> Check memory limits in RowBatch constructor and RowBatch::Reset().
> --
>
> Key: IMPALA-2384
> URL: https://issues.apache.org/jira/browse/IMPALA-2384
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.3.0
>Reporter: Tim Armstrong
>Priority: Minor
>  Labels: ramp-up, resource-management
>
> RowBatch does not check memory limits before allocating tuple pointers and 
> other data. We should switch to a separate Init() function and add a Status 
> return to Reset().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-2306) Investigate hiding internal agg functions

2020-12-23 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-2306.
---
Resolution: Won't Fix

> Investigate hiding internal agg functions
> -
>
> Key: IMPALA-2306
> URL: https://issues.apache.org/jira/browse/IMPALA-2306
> Project: IMPALA
>  Issue Type: Task
>  Components: Frontend
>Affects Versions: Impala 2.3.0
>Reporter: casey
>Priority: Trivial
>  Labels: usability
>
> DISTINCTPC, DISTINCTPCSA, and NDV_NO_FINALIZE were most likely added as a 
> means to implement other functionality (compute stats?) and probably not 
> intended for general use. If possible those functions should be hidden so 
> users don't rely on them (and Impala is able to redefine them as needed).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-2268) implicit casting of string to timestamp for functions

2020-12-23 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-2268:
-

Assignee: Tim Armstrong

> implicit casting of string to timestamp for functions
> -
>
> Key: IMPALA-2268
> URL: https://issues.apache.org/jira/browse/IMPALA-2268
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 2.2
>Reporter: Bharath Vissapragada
>Assignee: Tim Armstrong
>Priority: Minor
>  Labels: newbie, usability
>
> Consider date_add() builtin. string is automatically cast to a timestamp.
> {code}
> select date_add( "1900-01-01", 1 ) ; 
> Query: select date_add( "1900-01-01", 1 ) 
> +---+ 
> | date_add('1900-01-01', 1) | 
> +---+ 
> | 1900-01-02 00:00:00 | 
> +---+ 
> Fetched 1 row(s) in 0.12s 
> {code}
> However with an "interval"
> {code}
> select date_add( '1900-01-01', interval 72 days ) ; 
> Query: select date_add( '1900-01-01', interval 72 days ) 
> ERROR: AnalysisException: Operand ''1900-01-01'' of timestamp arithmetic 
> expression 'DATE_ADD('1900-01-01', INTERVAL 72 days)' returns type 'STRING'. 
> Expected type 'TIMESTAMP'. 
> {code}
> We need to manually cast it to a timestamp, something like,
> {code}
> select date_add(cast("1900-01-01" as TIMESTAMP), interval 10 days ) ; 
> Query: select date_add(cast("1900-01-01" as TIMESTAMP), interval 10 days ) 
> +-+ 
> | date_add(cast('1900-01-01' as timestamp), interval 10 days) | 
> +-+ 
> | 1900-01-11 00:00:00 | 
> +-+ 
> Fetched 1 row(s) in 0.02s 
> {code}
> Its convenient to make this behavior consistent across all builtins.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-1503) Impala's date_sub and date_add functions are not automatically casting strings (in the correct format) to TIMESTAMP

2020-12-23 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-1503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-1503.
---
Resolution: Duplicate

> Impala's date_sub and date_add functions are not automatically casting 
> strings (in the correct format) to TIMESTAMP
> ---
>
> Key: IMPALA-1503
> URL: https://issues.apache.org/jira/browse/IMPALA-1503
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.0
>Reporter: Alex Spencer
>Assignee: Tim Armstrong
>Priority: Minor
>  Labels: planner, timestamp
>
> To reproduce:
> -- This query works:
> select date_sub(now(), interval 5 minutes);
> -- Get the string version of now and copy and place it below:
> select now() as now;
> -- Paste in string and run this, Impala will error:
> select date_sub('2014-11-21 15:49:05.31261', interval 5 minutes);
> -- It should not error, but rather cast the string (as per the documentation:
> http://www.cloudera.com/content/cloudera/en/documentation/cloudera-impala/latest/topics/impala_timestamp.html?scroll=timestamp
> The first argument can be a string, which is automatically cast to 
> TIMESTAMP if it uses the recognized format, as described in TIMESTAMP Data 
> Type.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-3796) Alter table recover partitions fails to detect changes in parent path

2020-12-23 Thread Vihang Karajgaonkar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-3796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17254334#comment-17254334
 ] 

Vihang Karajgaonkar commented on IMPALA-3796:
-

Recover partitions is probably not the right usage here since it is used to 
discover new partitions. A refresh table command is more appropriate. We 
introduced a query option recently {{REFRESH_UPDATED_HMS_PARTITIONS}} which 
will refresh such partitions when locations are changed. I am resolving this as 
duplicate of IMPALA-4364

> Alter table recover partitions fails to detect changes in parent path
> -
>
> Key: IMPALA-3796
> URL: https://issues.apache.org/jira/browse/IMPALA-3796
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 2.3.0
>Reporter: Peter Ebert
>Priority: Minor
>  Labels: ramp-up, usability
>
> If you have a table located at /db/table
> with partitions /db/table/partition=1
> and then move the table location to /db2/table/ along with the partitions and 
> their data
> the location of the partitions will be incorrect when you list them, still 
> pointing to /db/table/partition=1
> running a "Alter table recover partitions" seems to check only for the 
> existence of partition=1 in the +current+ table location, and ignores that 
> the prior path has changed. So when you run a query it will not find those 
> partitions and report 0 rows.
> it should be noted that hive has the same behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-3796) Alter table recover partitions fails to detect changes in parent path

2020-12-23 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-3796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar resolved IMPALA-3796.
-
Resolution: Duplicate

> Alter table recover partitions fails to detect changes in parent path
> -
>
> Key: IMPALA-3796
> URL: https://issues.apache.org/jira/browse/IMPALA-3796
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 2.3.0
>Reporter: Peter Ebert
>Priority: Minor
>  Labels: ramp-up, usability
>
> If you have a table located at /db/table
> with partitions /db/table/partition=1
> and then move the table location to /db2/table/ along with the partitions and 
> their data
> the location of the partitions will be incorrect when you list them, still 
> pointing to /db/table/partition=1
> running a "Alter table recover partitions" seems to check only for the 
> existence of partition=1 in the +current+ table location, and ignores that 
> the prior path has changed. So when you run a query it will not find those 
> partitions and report 0 rows.
> it should be noted that hive has the same behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-2262) Improve timestamp format parser to support more formats and make it consistent with Hive

2020-12-23 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-2262:
--
Description: 
Setup:
{noformat}
create table string_time_table (date_last_changed string);
insert into string_time_table values ('06/09/2006 01:25 PM');
{noformat}

Hive:
{noformat}
SELECT date_last_changed, 
FROM_UNIXTIME(UNIX_TIMESTAMP(date_last_changed,'MM/dd/ hh:mm a')) FROM 
string_time_table; 
+--+--+--+ 
| date_last_changed | _c1 | 
+--+--+--+ 
| 06/09/2006 01:25 PM | 2006-06-09 13:25:00 | 
+--+--+--+ 
{noformat}
Impala:

{noformat}
SELECT date_last_changed, 
FROM_UNIXTIME(UNIX_TIMESTAMP(date_last_changed,'MM/dd/ hh:mm a')) FROM 
string_time_table; 
Query: select date_last_changed, 
FROM_UNIXTIME(UNIX_TIMESTAMP(date_last_changed,'MM/dd/ hh:mm a')) FROM 
string_time_table 
WARNINGS: Bad date/time conversion format: MM/dd/ hh:mm a 
{noformat}

We don't support am/pm based formats [1]. I think we should make impala's usage 
consistent with hive

[1] 
https://github.com/cloudera/Impala/blob/cdh5-trunk/be/src/runtime/timestamp-parse-util.h#L205

  was:
Hive:

SELECT date_last_changed, 
FROM_UNIXTIME(UNIX_TIMESTAMP(date_last_changed,'MM/dd/ hh:mm a')) FROM 
string_time_table; 
+--+--+--+ 
| date_last_changed | _c1 | 
+--+--+--+ 
| 06/09/2006 01:25 PM | 2006-06-09 13:25:00 | 
+--+--+--+ 

Impala:

SELECT date_last_changed, 
FROM_UNIXTIME(UNIX_TIMESTAMP(date_last_changed,'MM/dd/ hh:mm a')) FROM 
string_time_table; 
Query: select date_last_changed, 
FROM_UNIXTIME(UNIX_TIMESTAMP(date_last_changed,'MM/dd/ hh:mm a')) FROM 
string_time_table 
WARNINGS: Bad date/time conversion format: MM/dd/ hh:mm a 


We don't support am/pm based formats [1]. I think we should make impala's usage 
consistent with hive

[1] 
https://github.com/cloudera/Impala/blob/cdh5-trunk/be/src/runtime/timestamp-parse-util.h#L205


> Improve timestamp format parser to support more formats and make it 
> consistent with Hive
> 
>
> Key: IMPALA-2262
> URL: https://issues.apache.org/jira/browse/IMPALA-2262
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.2.4
>Reporter: Bharath Vissapragada
>Priority: Minor
>
> Setup:
> {noformat}
> create table string_time_table (date_last_changed string);
> insert into string_time_table values ('06/09/2006 01:25 PM');
> {noformat}
> Hive:
> {noformat}
> SELECT date_last_changed, 
> FROM_UNIXTIME(UNIX_TIMESTAMP(date_last_changed,'MM/dd/ hh:mm a')) FROM 
> string_time_table; 
> +--+--+--+ 
> | date_last_changed | _c1 | 
> +--+--+--+ 
> | 06/09/2006 01:25 PM | 2006-06-09 13:25:00 | 
> +--+--+--+ 
> {noformat}
> Impala:
> {noformat}
> SELECT date_last_changed, 
> FROM_UNIXTIME(UNIX_TIMESTAMP(date_last_changed,'MM/dd/ hh:mm a')) FROM 
> string_time_table; 
> Query: select date_last_changed, 
> FROM_UNIXTIME(UNIX_TIMESTAMP(date_last_changed,'MM/dd/ hh:mm a')) FROM 
> string_time_table 
> WARNINGS: Bad date/time conversion format: MM/dd/ hh:mm a 
> {noformat}
> We don't support am/pm based formats [1]. I think we should make impala's 
> usage consistent with hive
> [1] 
> https://github.com/cloudera/Impala/blob/cdh5-trunk/be/src/runtime/timestamp-parse-util.h#L205



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-1880) pmod() sometimes returns negative values

2020-12-23 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-1880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-1880.
---
Resolution: Won't Fix

Looks like we've documented the behaviour, and Hive has the same behaviour, so 
probably not worth breaking compatibility here.

> pmod() sometimes returns negative values
> 
>
> Key: IMPALA-1880
> URL: https://issues.apache.org/jira/browse/IMPALA-1880
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.2
>Reporter: John Russell
>Priority: Trivial
>  Labels: sql-language
>
> I noticed when constructing examples to compare/contrast fmod(), pmod(), and 
> the new mod() function that pmod() sometimes returns a negative value, 
> despite its being stated to be the "positive" modulus:
> {code}
> [localhost:21000] > select pmod(5,-2);
> +-+
> | pmod(5, -2) |
> +-+
> | -1  |
> +-+
> {code}
> I see that's the same return value as Hive:
> {code}
> hive> select pmod(5,-2) from t1 limit 1;
> Total MapReduce jobs = 1
> ...
> OK
> -1
> Time taken: 29.269 seconds
> {code}
> This seems to be the origin of the Hive pmod() function, it isn't spec'ed out 
> in detail: https://issues.apache.org/jira/browse/HIVE-656
> I thought we should verify that our behavior is working as intended, and if 
> so let's firm up the explanation of the function a big. ("Returns the 
> positive modulus except if second argument negative", or something like that.)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8301) Eliminate need for SYNC_DDL in local catalog mode

2020-12-23 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar resolved IMPALA-8301.
-
Fix Version/s: Impala 4.0
   Resolution: Duplicate

> Eliminate need for SYNC_DDL in local catalog mode
> -
>
> Key: IMPALA-8301
> URL: https://issues.apache.org/jira/browse/IMPALA-8301
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Reporter: Boris Gitline
>Priority: Major
>  Labels: catalog-v2
> Fix For: Impala 4.0
>
>
> In the following scenario looks like the INSERT on coordinator 2 is gated 
> behind a long-running DDL on coordinator 1. That scenario still requires 
> SYNC_DDL even in metadata v2. We want to change the metadata handling design 
> so that coordinator 3 does not have to wait for the long-running DDL to 
> complete – it would render the correct result on the target table t1 
> reference.
> Step1. coordinator 1
> ##*say*, the following compute stats runs about 100 seconds
> compute stats tao_ddl_contention;
> [steps 2 and 3 are performed while COMPUTE STATS is running]
> Step2. coordinator 2
> create another new table.
> create table t1(c1 int);
> insert into t1 select 1 ;
> select * from t1;
>  [can see the inserted rows]
> Step3. coordinator 3
> query the newly inserted rows in t1 while the COMPUTE STATS is still running:
> select * from t1;
>  [see no rows]
> ##the query in [Step3] won't show the row inserted by step2 until the first 
> step "compute stats" completed, unless:
> - SYNC_DDL is set before the INSERT on coordinator 2, or
>  * when the step1 compute stats is completed, or
>  * you can see the data from this impala session, or
>  * a manual refresh of the t1 table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8301) Eliminate need for SYNC_DDL in local catalog mode

2020-12-23 Thread Vihang Karajgaonkar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-8301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17254331#comment-17254331
 ] 

Vihang Karajgaonkar commented on IMPALA-8301:
-

This is resolved after the fix for IMPALA-6671. The main issue in the example 
given above was that the coordinator 3 was waiting on topic updates which were 
blocked by the compute stats statement from coordinator 1. The fix for 
IMPALA-6671 skips such locked tables from the topic updates to unblock queries 
on unrelated operations.

> Eliminate need for SYNC_DDL in local catalog mode
> -
>
> Key: IMPALA-8301
> URL: https://issues.apache.org/jira/browse/IMPALA-8301
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Reporter: Boris Gitline
>Priority: Major
>  Labels: catalog-v2
>
> In the following scenario looks like the INSERT on coordinator 2 is gated 
> behind a long-running DDL on coordinator 1. That scenario still requires 
> SYNC_DDL even in metadata v2. We want to change the metadata handling design 
> so that coordinator 3 does not have to wait for the long-running DDL to 
> complete – it would render the correct result on the target table t1 
> reference.
> Step1. coordinator 1
> ##*say*, the following compute stats runs about 100 seconds
> compute stats tao_ddl_contention;
> [steps 2 and 3 are performed while COMPUTE STATS is running]
> Step2. coordinator 2
> create another new table.
> create table t1(c1 int);
> insert into t1 select 1 ;
> select * from t1;
>  [can see the inserted rows]
> Step3. coordinator 3
> query the newly inserted rows in t1 while the COMPUTE STATS is still running:
> select * from t1;
>  [see no rows]
> ##the query in [Step3] won't show the row inserted by step2 until the first 
> step "compute stats" completed, unless:
> - SYNC_DDL is set before the INSERT on coordinator 2, or
>  * when the step1 compute stats is completed, or
>  * you can see the data from this impala session, or
>  * a manual refresh of the t1 table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-2118) Add better documentation for end to end python tests.

2020-12-23 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-2118.
---
Resolution: Won't Fix

> Add better documentation for end to end python tests.
> -
>
> Key: IMPALA-2118
> URL: https://issues.apache.org/jira/browse/IMPALA-2118
> Project: IMPALA
>  Issue Type: Task
>  Components: Infrastructure
>Affects Versions: Impala 2.2
>Reporter: Ishaan Joshi
>Priority: Minor
>  Labels: test-infra
>
> In general, we need to better document end to end tests for Impala. I'll add 
> more information to this jira once I have a better idea of what we should do.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-6671) Metadata operations that modify a table blocks topic updates for other unrelated operations

2020-12-23 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar resolved IMPALA-6671.
-
Fix Version/s: Impala 4.0
   Resolution: Fixed

Resolving this. We may still have some tuning to do for these configurations. 
Those will be taken up as separate JIRAs based on real-world usecases' feedback.

> Metadata operations that modify a table blocks topic updates for other 
> unrelated operations
> ---
>
> Key: IMPALA-6671
> URL: https://issues.apache.org/jira/browse/IMPALA-6671
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 2.10.0, Impala 2.11.0, Impala 2.12.0
>Reporter: Mostafa Mokhtar
>Assignee: Vihang Karajgaonkar
>Priority: Critical
>  Labels: catalog-server, perfomance
> Fix For: Impala 4.0
>
>
> Metadata operations that mutate the state of a table like "compute stats foo" 
> or "alter recover partitions" block topic updates for read only operations 
> against unrelated tables as "describe bar".
> Thread for blocked operation
> {code:java}
> "Thread-7" prio=10 tid=0x11613000 nid=0x21b3b waiting on condition 
> [0x7f5f2ef52000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x7f6f57ff0240> (a 
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
> at 
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214)
> at 
> java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
> at 
> org.apache.impala.catalog.CatalogServiceCatalog.addTableToCatalogDeltaHelper(CatalogServiceCatalog.java:639)
> at 
> org.apache.impala.catalog.CatalogServiceCatalog.addTableToCatalogDelta(CatalogServiceCatalog.java:611)
> at 
> org.apache.impala.catalog.CatalogServiceCatalog.addDatabaseToCatalogDelta(CatalogServiceCatalog.java:567)
> at 
> org.apache.impala.catalog.CatalogServiceCatalog.getCatalogDelta(CatalogServiceCatalog.java:449)
> at 
> org.apache.impala.service.JniCatalog.getCatalogDelta(JniCatalog.java:126)
> {code}
> Thread for blocking operation
> {code:java}
> "Thread-130" prio=10 tid=0x113d5800 nid=0x2499d runnable 
> [0x7f5ef80d]
>java.lang.Thread.State: RUNNABLE
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.read(SocketInputStream.java:152)
> at java.net.SocketInputStream.read(SocketInputStream.java:122)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
> - locked <0x7f5fffcd9f18> (a java.io.BufferedInputStream)
> at 
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
> at 
> org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:346)
> at 
> org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:423)
> at 
> org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:405)
> at 
> org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:37)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
> at 
> org.apache.hadoop.hive.thrift.TFilterTransport.readAll(TFilterTransport.java:62)
> at 
> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
> at 
> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
> at 
> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
> at 
> org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_add_partitions_req(ThriftHiveMetastore.java:1639)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.add_partitions_req(ThriftHiveMetastore.java:1626)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.add_partitions(

[jira] [Resolved] (IMPALA-2108) Improve partition pruning by extracting partition-column filters from non-trivial disjunctions.

2020-12-23 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-2108.
---
  Assignee: (was: Bharath Vissapragada)
Resolution: Duplicate

> Improve partition pruning by extracting partition-column filters from 
> non-trivial disjunctions.
> ---
>
> Key: IMPALA-2108
> URL: https://issues.apache.org/jira/browse/IMPALA-2108
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 1.2.4, Impala 1.3, Impala 1.4, Impala 2.1, Impala 
> 2.2
>Reporter: Alexander Behm
>Priority: Minor
>  Labels: newbie, performance
>
> *Problem Statement*
> Impala fails to prune partitions if the partition-column filters are part of 
> a "non-trivial" disjunction where each disjunct itself consists of conjuncts 
> referencing both partition and non-partition columns.
> Consider the following example:
> {code}
> create table test_table (c1 INT, c2 STRING) PARTITIONED BY (pc INT);
> [localhost.localdomain:21000] > explain select c1 from test_table where (pc=1 
> and c2='a') or (pc=2 and c2='b') or (pc=3 and c2='c');
> Query: explain select c1 from test_table where (pc=1 and c2='a') or (pc=2 and 
> c2='b') or (pc=3 and c2='c') <-- Partition-column filters inside non-trivial 
> djsiunctions
> ++
> | Explain String  
>|
> ++
> | Estimated Per-Host Requirements: Memory=176.00MB VCores=1   
>|
> | WARNING: The following tables are missing relevant table and/or column 
> statistics. |
> | default.test_table  
>|
> | 
>|
> | 01:EXCHANGE [UNPARTITIONED] 
>|
> | |   
>|
> | 00:SCAN HDFS [default.test_table]   
>|
> |partitions=5/5 files=9 size=36B  
>|
> |predicates: (pc = 1 AND c2 = 'a') OR (pc = 2 AND c2 = 'b') OR (pc = 3 
> AND c2 = 'c') |
> ++
> Fetched 9 row(s) in 0.04s
> [localhost.localdomain:21000] > 
> {code}
> *Cause*
> This is a limitation in how Impala filters partitions.
> *Workaround*
> The above example can be fixed by manually rewriting the predicate as follows:
> {code}
> select c1 from test_table where ((pc=1 and c2='a') or (pc=2 and c2='b') or 
> (pc=3 and c2='c')) and (pc=1 OR pc=2 OR pc=3);
> {code}
> *Proposed fix*
> The proposed fix is for Impala to automatically do what is stated in the 
> workaround above:
> Extract the partition-column filters from the disjunctions, create a new 
> predicate with all those partition-column filters connected with OR, and add 
> the new predicate to the original one with AND.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-6671) Metadata operations that modify a table blocks topic updates for other unrelated operations

2020-12-23 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-6671 started by Vihang Karajgaonkar.
---
> Metadata operations that modify a table blocks topic updates for other 
> unrelated operations
> ---
>
> Key: IMPALA-6671
> URL: https://issues.apache.org/jira/browse/IMPALA-6671
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 2.10.0, Impala 2.11.0, Impala 2.12.0
>Reporter: Mostafa Mokhtar
>Assignee: Vihang Karajgaonkar
>Priority: Critical
>  Labels: catalog-server, perfomance
>
> Metadata operations that mutate the state of a table like "compute stats foo" 
> or "alter recover partitions" block topic updates for read only operations 
> against unrelated tables as "describe bar".
> Thread for blocked operation
> {code:java}
> "Thread-7" prio=10 tid=0x11613000 nid=0x21b3b waiting on condition 
> [0x7f5f2ef52000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x7f6f57ff0240> (a 
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
> at 
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214)
> at 
> java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
> at 
> org.apache.impala.catalog.CatalogServiceCatalog.addTableToCatalogDeltaHelper(CatalogServiceCatalog.java:639)
> at 
> org.apache.impala.catalog.CatalogServiceCatalog.addTableToCatalogDelta(CatalogServiceCatalog.java:611)
> at 
> org.apache.impala.catalog.CatalogServiceCatalog.addDatabaseToCatalogDelta(CatalogServiceCatalog.java:567)
> at 
> org.apache.impala.catalog.CatalogServiceCatalog.getCatalogDelta(CatalogServiceCatalog.java:449)
> at 
> org.apache.impala.service.JniCatalog.getCatalogDelta(JniCatalog.java:126)
> {code}
> Thread for blocking operation
> {code:java}
> "Thread-130" prio=10 tid=0x113d5800 nid=0x2499d runnable 
> [0x7f5ef80d]
>java.lang.Thread.State: RUNNABLE
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.read(SocketInputStream.java:152)
> at java.net.SocketInputStream.read(SocketInputStream.java:122)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
> - locked <0x7f5fffcd9f18> (a java.io.BufferedInputStream)
> at 
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
> at 
> org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:346)
> at 
> org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:423)
> at 
> org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:405)
> at 
> org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:37)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
> at 
> org.apache.hadoop.hive.thrift.TFilterTransport.readAll(TFilterTransport.java:62)
> at 
> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
> at 
> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
> at 
> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
> at 
> org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_add_partitions_req(ThriftHiveMetastore.java:1639)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.add_partitions_req(ThriftHiveMetastore.java:1626)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.add_partitions(HiveMetaStoreClient.java:609)
> at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.refl

[jira] [Resolved] (IMPALA-2066) Traverse expression trees iteratively

2020-12-23 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-2066.
---
Resolution: Later

> Traverse expression trees iteratively
> -
>
> Key: IMPALA-2066
> URL: https://issues.apache.org/jira/browse/IMPALA-2066
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 2.2
>Reporter: Henry Robinson
>Priority: Minor
>
> In several places, for example in {{Expr.analyze()}} in the frontend, we 
> recursively traverse trees of expression objects. Doing so puts us at risk of 
> blowing through the stack limit, and therefore having a much lower limit on 
> the number of expressions than memory would actually allow (this is not just 
> a theoretical improvement - users have been bumping up against this limit).
> We should transform our tree-walking code to iteratively traverse the exprs. 
> For example, {{analyze()}} could be transformed into a post-order iterative 
> traversal. Then the expr tree size is bounded by the size of stack we can 
> reasonably allocate, which will be much larger than the call stack.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-2840) Avoid storing redundant information about partitions in the catalog

2020-12-23 Thread Vihang Karajgaonkar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-2840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17254327#comment-17254327
 ] 

Vihang Karajgaonkar commented on IMPALA-2840:
-

We already deduplicate the partition locations using this 
{{HdfsPartitionLocationCompressor}}. 

> Avoid storing redundant information about partitions in the catalog
> ---
>
> Key: IMPALA-2840
> URL: https://issues.apache.org/jira/browse/IMPALA-2840
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Catalog
>Affects Versions: Impala 2.2.4
>Reporter: Dimitris Tsirogiannis
>Priority: Major
>  Labels: catalog-server, memory, performance, ramp-up
>
> For each partition we store the entire path in a string. For tables with 
> large number of partitions,   there is lots of redundancy that we should try 
> to avoid in order to reduce the catalog's memory footprint.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-1880) pmod() sometimes returns negative values

2020-12-23 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-1880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-1880:
--
Priority: Trivial  (was: Minor)

> pmod() sometimes returns negative values
> 
>
> Key: IMPALA-1880
> URL: https://issues.apache.org/jira/browse/IMPALA-1880
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.2
>Reporter: John Russell
>Priority: Trivial
>  Labels: sql-language
>
> I noticed when constructing examples to compare/contrast fmod(), pmod(), and 
> the new mod() function that pmod() sometimes returns a negative value, 
> despite its being stated to be the "positive" modulus:
> {code}
> [localhost:21000] > select pmod(5,-2);
> +-+
> | pmod(5, -2) |
> +-+
> | -1  |
> +-+
> {code}
> I see that's the same return value as Hive:
> {code}
> hive> select pmod(5,-2) from t1 limit 1;
> Total MapReduce jobs = 1
> ...
> OK
> -1
> Time taken: 29.269 seconds
> {code}
> This seems to be the origin of the Hive pmod() function, it isn't spec'ed out 
> in detail: https://issues.apache.org/jira/browse/HIVE-656
> I thought we should verify that our behavior is working as intended, and if 
> so let's firm up the explanation of the function a big. ("Returns the 
> positive modulus except if second argument negative", or something like that.)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-1876) add memory leak checking for cluster processes

2020-12-23 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-1876.
---
Resolution: Won't Fix

> add memory leak checking for cluster processes
> --
>
> Key: IMPALA-1876
> URL: https://issues.apache.org/jira/browse/IMPALA-1876
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 2.2
>Reporter: Silvius Rus
>Priority: Major
>  Labels: test-infra
>
> Taras, can you make sure we have an active test that prevents memory leak 
> regressions.
> With HEAPCHECK=normal, impalad should produce something like this to stderr 
> on exit:
> No leaks found for check "_main_"
> or
> Leak of 39996 bytes in  objects allocated from...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-1948) Fix automatic storage Status and codegen

2020-12-23 Thread Tim Armstrong (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-1948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17254326#comment-17254326
 ] 

Tim Armstrong commented on IMPALA-1948:
---

The invoke instructions are generated when a destructor needs to be run (i.e. 
the status destructor) when exiting a scope *and* that scope has some code that 
may throw an exception. In practice any function call that is not marked 
noexcept could throw an exception, so any function call can trigger the 
insertion of an invoke. In some cases if the function is in the same module 
LLVM might be able to infer noexcept.

Note that, even aside from the issue with replacing functions, the 
invoke/cleanup stuff does result in more complex IR because a bunch of cleanup 
blocks get inserted to run the destructors.

> Fix automatic storage Status and codegen
> 
>
> Key: IMPALA-1948
> URL: https://issues.apache.org/jira/browse/IMPALA-1948
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.2
>Reporter: Daniel Hecht
>Priority: Minor
>  Labels: codegen
>
> Codegen IR and Status with automatic storage duration don't work well 
> together.  That's because Status has a destructor that needs to be executed 
> if an exception is thrown, and so the IR will emit INVOKE instructions rather 
> than CALL instructions (to unwind the stack).  And our IR fixup to replace 
> indirect calls with direct calls doesn't know how to deal with INVOKE.
> We should fix this so that writing IR is simpler and more robust. Currently 
> we work around this, for example, by passing Status down IR callstacks as a 
> pointer to Status that lives as member in a heap allocated object (e.g. in 
> partition join/agg code).  Once this (and IMPALA-1916) are fixed, we should 
> remove these workarounds.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-1921) Cast string->*int with a decimal place always returns NULL

2020-12-23 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-1921.
---
Resolution: Not A Bug

I think the current behaviour is reasonable, I don't think there's a reason to 
be overly permissive here and drop data silently.

> Cast string->*int with a decimal place always returns NULL
> --
>
> Key: IMPALA-1921
> URL: https://issues.apache.org/jira/browse/IMPALA-1921
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.0
> Environment: Server version: impalad version 2.0.0-cdh5 RELEASE 
> (build ecf30af0b4d6e56ea80297df2189367ada6b7da7)
>Reporter: gpolaert
>Priority: Minor
>  Labels: sql-language
>
> cast ('-1.0' as int|bigint) should not return NULL but -1.
> Issue:
> {code}
> Query: select cast("-1.0" as int)
> +-+
> | cast('-1.0' as int) |
> +-+
> | NULL|
> +-+
> {code}
> Workaround: 
> {code}
> Query: select cast(cast ("-1.0" as double) as int)
> +-+
> | cast(cast('-1.0' as double) as int) |
> +-+
> | -1  |
> +-+
> Fetched 1 row(s) in 0.01s
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-6921) AnalysisException: Failed to load metadata for table: 'tpch_kudu.ctas_cancel' during data load

2020-12-23 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-6921.
---
Fix Version/s: Not Applicable
   Resolution: Won't Fix

There is no plan to fix this right now.

> AnalysisException: Failed to load metadata for table: 'tpch_kudu.ctas_cancel' 
> during data load
> --
>
> Key: IMPALA-6921
> URL: https://issues.apache.org/jira/browse/IMPALA-6921
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 2.13.0
>Reporter: David Knupp
>Assignee: Joe McDonnell
>Priority: Major
> Fix For: Not Applicable
>
>
> This exception seems to be consistently thrown during the data load phase. It 
> appears in compute-table-stats.log.
> {noformat}
> 2018-04-22 06:50:48,764 Thread-8:  Failed on table tpch_kudu.ctas_cancel
> Traceback (most recent call last):
>   File 
> "/data/jenkins/workspace/impala-cdh6.0.0_beta1-core/repos/Impala/tests/util/compute_table_stats.py",
>  line 40, in compute_stats_table
> result = impala_client.execute(statement)
>   File 
> "/data/jenkins/workspace/impala-cdh6.0.0_beta1-core/repos/Impala/tests/beeswax/impala_beeswax.py",
>  line 173, in execute
> handle = self.__execute_query(query_string.strip(), user=user)
>   File 
> "/data/jenkins/workspace/impala-cdh6.0.0_beta1-core/repos/Impala/tests/beeswax/impala_beeswax.py",
>  line 339, in __execute_query
> handle = self.execute_query_async(query_string, user=user)
>   File 
> "/data/jenkins/workspace/impala-cdh6.0.0_beta1-core/repos/Impala/tests/beeswax/impala_beeswax.py",
>  line 335, in execute_query_async
> return self.__do_rpc(lambda: self.imp_service.query(query,))
>   File 
> "/data/jenkins/workspace/impala-cdh6.0.0_beta1-core/repos/Impala/tests/beeswax/impala_beeswax.py",
>  line 460, in __do_rpc
> raise ImpalaBeeswaxException(self.__build_error_message(b), b)
> ImpalaBeeswaxException: ImpalaBeeswaxException:
>  INNER EXCEPTION: 
>  MESSAGE: AnalysisException: Failed to load metadata for table: 
> 'tpch_kudu.ctas_cancel'
> CAUSED BY: TableLoadingException: Error loading metadata for Kudu table 
> impala::tpch_kudu.ctas_cancel
> CAUSED BY: ImpalaRuntimeException: Error opening Kudu table 
> 'impala::tpch_kudu.ctas_cancel', Kudu error: The table does not exist: 
> table_name: "impala::tpch_kudu.ctas_cancel"
> {noformat}
> ctas_cancel is a table that gets used by query_test/test_cancellation.py
> This doesn't seem to break anything (data loading completes and tests pass), 
> but it's vexing that we part of our standard data load process produces 
> exceptions in any log file.
> Please feel free to mark this as invalid if this is not really an issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-1749) Respect compression table property for Parquet files

2020-12-23 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-1749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-1749:
--
Labels: parquet ramp-up  (was: ramp-up)

> Respect compression table property for Parquet files
> 
>
> Key: IMPALA-1749
> URL: https://issues.apache.org/jira/browse/IMPALA-1749
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Affects Versions: Impala 2.1.1
>Reporter: Henry Robinson
>Priority: Minor
>  Labels: parquet, ramp-up
>
> https://issues.apache.org/jira/browse/HIVE-7858 added support for a 
> {{parquet.compression}} table property to Hive. We should respect that 
> property if it exists if the client does not override {{COMPRESSION_CODEC}} 
> in the query options.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-1644) Move scan ranges for unexpected remote reads to the remote read disk queue

2020-12-23 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-1644.
---
Resolution: Won't Fix

The benefit for the complexity seems limited, since this only really comes into 
play if block locations are stale. It could be mitigated more if we enable the 
remote data and file handle caches too.

> Move scan ranges for unexpected remote reads to the remote read disk queue
> --
>
> Key: IMPALA-1644
> URL: https://issues.apache.org/jira/browse/IMPALA-1644
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.0
>Reporter: Matthew Jacobs
>Priority: Minor
>  Labels: performance
>
> Move scan ranges for remote reads to a special remote read disk queue.
> This would help avoid hammering the cluster with remote reads when metadata 
> becomes outdated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-6962) Cluster test API unable to (re)start processes

2020-12-23 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-6962.
---
Fix Version/s: Impala 3.0
   Resolution: Fixed

> Cluster test API unable to (re)start processes
> --
>
> Key: IMPALA-6962
> URL: https://issues.apache.org/jira/browse/IMPALA-6962
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Product Backlog
>Reporter: Vuk Ercegovac
>Priority: Major
> Fix For: Impala 3.0
>
>
> The API's to control cluster processes (tests/common/impala_cluster.py) do 
> not start processes correctly. When Process::start() is called from pytests, 
> the process dies immediately. For example:
> {noformat}
> ...
> self.cluster.catalogd.start()
> ...{noformat}
> Results in catalogd exiting with the following, indicating the env for Java 
> is not as expected:
> {noformat}
> I0502 14:50:05.983942 26535 status.cc:125] Failed to find JniUtil class. 
> @0x17600bd impala::Status::Status() 
> @ 0x1bda29a impala::JniUtil::Init() 
> @ 0x1759478 impala::InitCommonRuntime() 
> @ 0x1702da5 CatalogdMain() 
> @ 0x17028ef main 
> @ 0x7f503f139830 __libc_start_main 
> @ 0x1702739 _start
> {noformat}
> Making this easier will make it easier to test more scenarios that restart 
> processes.
> Note that starting statestored works fine (no Java).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-6965) Make the EE tests pass when HDFS erasure coding is enabled

2020-12-23 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-6965.
---
Fix Version/s: Impala 3.1.0
   Resolution: Fixed

These tests have been passing for a while.

> Make the EE tests pass when HDFS erasure coding is enabled
> --
>
> Key: IMPALA-6965
> URL: https://issues.apache.org/jira/browse/IMPALA-6965
> Project: IMPALA
>  Issue Type: Task
>  Components: Infrastructure
>Reporter: Taras Bobrovytsky
>Priority: Major
> Fix For: Impala 3.1.0
>
>
> As a first step to regularly running tests on minicluster with HDFS erasure 
> coding enabled, we need the core and exhaustive tests to pass consistently. 
> This can be achieved by disabling the failing EE tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-6967) GVO should only allow patches that apply cleanly to both master and 2.x

2020-12-23 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-6967.
---
Fix Version/s: Not Applicable
   Resolution: Won't Fix

The 2.x branch is no longer actively developed and cherrypicking is disabled.

> GVO should only allow patches that apply cleanly to both master and 2.x
> ---
>
> Key: IMPALA-6967
> URL: https://issues.apache.org/jira/browse/IMPALA-6967
> Project: IMPALA
>  Issue Type: Task
>  Components: Infrastructure
>Reporter: Sailesh Mukil
>Priority: Major
>  Labels: jenkins
> Fix For: Not Applicable
>
>
> Following this thread:
> https://lists.apache.org/thread.html/bba3c5a87635ad3c70c40ac120de2ddb41c3d0e2f5db0b29bc0243ff@%3Cdev.impala.apache.org%3E
> It would take load off authors if the GVO could automatically tell if a patch 
> that's being pushed to master would cleanly cherry-pick to 2.x.
> At the beginning of the GVO, we should try to cherry-pick to 2.x and fail if 
> there are conflicts, unless the commit message has the line:
> "Cherry-picks: not for 2.x"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-7066) Provide option to build impalad with native-toolchain dependencies linked statically

2020-12-23 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-7066.
---
Fix Version/s: Not Applicable
   Resolution: Won't Fix

I don't think we have any plans to do this.

> Provide option to build impalad with native-toolchain dependencies linked 
> statically
> 
>
> Key: IMPALA-7066
> URL: https://issues.apache.org/jira/browse/IMPALA-7066
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Infrastructure
>Reporter: Quanlong Huang
>Priority: Major
> Fix For: Not Applicable
>
>
> Users can be stuck into issues in upgrading Impala if they just replace the 
> impalad binary and jars but forget (or are unaware) to replace the dependent 
> libraries like libkudu_client, libstdc++, libgcc etc.
> It'd be better if we have a build option to link them statically.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-7129) Can't start catalogd in tests under UBSAN

2020-12-23 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-7129.
---
Fix Version/s: Impala 3.0
   Resolution: Fixed

This seems to have been fixed a while ago. UBSAN is functioning, so closing.

> Can't start catalogd in tests under UBSAN 
> --
>
> Key: IMPALA-7129
> URL: https://issues.apache.org/jira/browse/IMPALA-7129
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.0
>Reporter: Jim Apple
>Assignee: Philip Martin
>Priority: Major
> Fix For: Impala 3.0
>
>
> https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/2377/testReport/junit/custom_cluster.test_admission_controller/TestAdmissionController/test_require_user/
> This custom cluster test failed
> https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/2377/artifact/Impala/logs_static/logs/custom_cluster_tests/catalogd-error.log/*view*/
> {{UndefinedBehaviorSanitizer: failed to read suppressions file 
> '/home/ubuntu/Impala/be/build/debug/service/./bin/ubsan-suppressions.txt'}}
> A number of other tests failed, too, and I suspect it's 
> https://github.com/apache/impala/commit/48625335d220566a1d69e65fb34bfca9a7dc3cff
>  that broke it. I guess maybe {{IMPALA_HOME}} is not set in some tests?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-7149) test_mem_usage_scaling.test_low_mem_limit_q7 sometimes fails in the erasure coding build

2020-12-23 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-7149.
---
Fix Version/s: Impala 3.1.0
   Resolution: Fixed

The original issue is fixed. Reenabling tests on erasure coding configurations 
can be tracked separately.

> test_mem_usage_scaling.test_low_mem_limit_q7 sometimes fails in the erasure 
> coding build
> 
>
> Key: IMPALA-7149
> URL: https://issues.apache.org/jira/browse/IMPALA-7149
> Project: IMPALA
>  Issue Type: Task
>  Components: Infrastructure
>Affects Versions: Impala 3.1.0
>Reporter: Taras Bobrovytsky
>Priority: Major
> Fix For: Impala 3.1.0
>
>
> The query fails with the following error message:
> {code:java}
> Memory limit exceeded: Failed to allocate tuple buffer{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-7153) filesystem_client methods expect unusual paths

2020-12-23 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-7153.
---
Fix Version/s: Impala 3.3.0
   Resolution: Fixed

The filesystem methods now take either type of path. This was fixed as part of 
handling S3Guard:

https://github.com/apache/impala/commit/6b09612e763aace6ec3ec22031e4e960b9a41e3d

> filesystem_client methods expect unusual paths
> --
>
> Key: IMPALA-7153
> URL: https://issues.apache.org/jira/browse/IMPALA-7153
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 3.0
>Reporter: Daniel Hecht
>Priority: Major
>  Labels: ramp-up
> Fix For: Impala 3.3.0
>
>
> In Impala end to end python tests, most paths are either absolute or fully 
> qualified (i.e. /test-warehouse/db/table or 
> hdfs://namenode:port/test-warehouse/db/table), yet the classes derived from 
> BaseFilesystem tend to expect paths that are absolute but without the leading 
> slash.
> This leads to awkward and error prone testing code. For example, see 
> IMPALA-7099 and {{test_unsupported_text_compression()}} which does e.g.:
> {code}
> 
> self.filesystem_client.create_file("{0}/fake.lz4".format(lz4_ym_partition_loc)[1:],
> "some test data")
> {code}
> Not only is this confusing, but it doesn't work when the test is run with 
> FILESYSTEM_PREFIX not empty.
> We should fix the filesystem_client classes to handle absolute and fully 
> qualified paths so that each test doesn't have to worry about this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-1503) Impala's date_sub and date_add functions are not automatically casting strings (in the correct format) to TIMESTAMP

2020-12-23 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-1503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-1503:
-

Assignee: Tim Armstrong

> Impala's date_sub and date_add functions are not automatically casting 
> strings (in the correct format) to TIMESTAMP
> ---
>
> Key: IMPALA-1503
> URL: https://issues.apache.org/jira/browse/IMPALA-1503
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.0
>Reporter: Alex Spencer
>Assignee: Tim Armstrong
>Priority: Minor
>  Labels: planner, timestamp
>
> To reproduce:
> -- This query works:
> select date_sub(now(), interval 5 minutes);
> -- Get the string version of now and copy and place it below:
> select now() as now;
> -- Paste in string and run this, Impala will error:
> select date_sub('2014-11-21 15:49:05.31261', interval 5 minutes);
> -- It should not error, but rather cast the string (as per the documentation:
> http://www.cloudera.com/content/cloudera/en/documentation/cloudera-impala/latest/topics/impala_timestamp.html?scroll=timestamp
> The first argument can be a string, which is automatically cast to 
> TIMESTAMP if it uses the recognized format, as described in TIMESTAMP Data 
> Type.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-1409) brew install impala

2020-12-23 Thread Tim Armstrong (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-1409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17254301#comment-17254301
 ] 

Tim Armstrong commented on IMPALA-1409:
---

I think the docker quickstart is a more realistic path to having a foolproof 
setup at this point. Will close this JIRA.

> brew install impala
> ---
>
> Key: IMPALA-1409
> URL: https://issues.apache.org/jira/browse/IMPALA-1409
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Infrastructure
>Affects Versions: Product Backlog
>Reporter: Jeff Hammerbacher
>Priority: Minor
>  Labels: osx
>
> It would be wonderful if I could install Impala on my Mac laptop. It would be 
> even better if Impala could just use my local filesystem to store data. It 
> would be beyond great if this local version of Impala could be installed 
> without pulling in the JVM (long shot given it would require a rewrite of the 
> fe in not Java).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-1409) brew install impala

2020-12-23 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-1409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-1409.
---
Resolution: Won't Fix

> brew install impala
> ---
>
> Key: IMPALA-1409
> URL: https://issues.apache.org/jira/browse/IMPALA-1409
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Infrastructure
>Affects Versions: Product Backlog
>Reporter: Jeff Hammerbacher
>Priority: Minor
>  Labels: osx
>
> It would be wonderful if I could install Impala on my Mac laptop. It would be 
> even better if Impala could just use my local filesystem to store data. It 
> would be beyond great if this local version of Impala could be installed 
> without pulling in the JVM (long shot given it would require a rewrite of the 
> fe in not Java).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-1226) need better cancellation tests

2020-12-23 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-1226.
---
Resolution: Later

> need better cancellation tests
> --
>
> Key: IMPALA-1226
> URL: https://issues.apache.org/jira/browse/IMPALA-1226
> Project: IMPALA
>  Issue Type: Task
>  Components: Infrastructure
>Affects Versions: Impala 2.0
>Reporter: Daniel Hecht
>Assignee: Daniel Hecht
>Priority: Major
>  Labels: query-lifecycle, test, test-infra
> Attachments: impala-1178-test.patch
>
>
> See IMPALA-1178 comments for details for how to manually reproduce the bug.
> I tried writing a pytest test case to reproduce the problem but it appears 
> that we don't get enough parallelism that way.  Even adding a long delay loop 
> to ImpalaServer::ExecuteStatement inside the race window shows that the 
> attached test will not execute the RPCs concurrently, and so doesn't 
> reproduce IMPALA-1178.
> I'll attach a patch of the test case attempt.  The patch also contains some 
> instrumentation I was using to see why I wasn't hitting the race window.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-1306) Avoid passing (empty) tuples of non-materialized slots, if consumer does not need them

2020-12-23 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-1306:
-

Assignee: (was: Marcel Kinard)

> Avoid passing (empty) tuples of non-materialized slots, if consumer does not 
> need them
> --
>
> Key: IMPALA-1306
> URL: https://issues.apache.org/jira/browse/IMPALA-1306
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 1.4.1
>Reporter: Ippokratis Pandis
>Priority: Minor
>  Labels: planner
>
> In case of non-materialized slots we should not producing tuples
> for those slots, if it is the only slot in the tuple and the consumer
> node(s) do not need them. For example, in the query below nodes 03:ANALYTIC 
> and 06:EXCHANGE should not have tuple_id=1.
> This has some impact in perf of BE, as in many codepaths we iterate over all 
> the tuples in the row.
> {code}
> [localhost:21000] > explain select AVG(t1.int_col) OVER ()  FROM alltypestiny 
> t1 WHERE EXISTS (SELECT t1.month FROM alltypestiny t1);
> Query: explain select AVG(t1.int_col) OVER ()  FROM alltypestiny t1 WHERE 
> EXISTS (SELECT t1.month FROM alltypestiny t1)
> +--+
> | Explain String   |
> +--+
> | Estimated Per-Host Requirements: Memory=64.00MB VCores=2 |
> |  |
> | 03:ANALYTIC  |
> | |  functions: avg(t1.int_col)|
> | |  hosts=3 per-host-mem=unavailable  |
> | |  tuple-ids=0,1,6 row-size=12B cardinality=8|
> | ||
> | 06:EXCHANGE [UNPARTITIONED]  |
> | |  hosts=3 per-host-mem=unavailable  |
> | |  tuple-ids=0,1 row-size=4B cardinality=8   |
> | ||
> | 02:CROSS JOIN [BROADCAST]|
> | |  hosts=3 per-host-mem=0B   |
> | |  tuple-ids=0,1 row-size=4B cardinality=8   |
> | ||
> | |--05:EXCHANGE [BROADCAST]   |
> | |  |  hosts=3 per-host-mem=0B|
> | |  |  tuple-ids=1 row-size=0B cardinality=1  |
> | |  | |
> | |  04:EXCHANGE [UNPARTITIONED]   |
> | |  |  limit: 1   |
> | |  |  hosts=3 per-host-mem=unavailable   |
> | |  |  tuple-ids=1 row-size=0B cardinality=1  |
> | |  | |
> | |  01:SCAN HDFS [functional.alltypestiny t1, RANDOM] |
> | | partitions=4/4 size=460B   |
> | | table stats: 8 rows total  |
> | | column stats: all  |
> | | limit: 1   |
> | | hosts=3 per-host-mem=32.00MB   |
> | | tuple-ids=1 row-size=0B cardinality=1  |
> | ||
> | 00:SCAN HDFS [functional.alltypestiny t1, RANDOM]|
> |partitions=4/4 size=460B  |
> |table stats: 8 rows total |
> |column stats: all |
> |hosts=3 per-host-mem=32.00MB  |
> |tuple-ids=0 row-size=4B cardinality=8 |
> +--+
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-1193) Restructure error handling within the impala-shell

2020-12-23 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-1193.
---
Resolution: Won't Fix

> Restructure error handling within the impala-shell
> --
>
> Key: IMPALA-1193
> URL: https://issues.apache.org/jira/browse/IMPALA-1193
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Clients
>Affects Versions: Impala 1.4
>Reporter: Abdullah Yousufi
>Priority: Minor
>  Labels: shell
>
> Take a look at comments (in patch set 1 and 6) 
> http://gerrit.sjc.cloudera.com:8080/#/c/4100/. 
> Essentially, the main points are to move the main control loop of the shell 
> (while shell_is_alive loop) into the shell class. Second, the exception 
> handling that occurs in the _execute_stmt() method should be moved to the top 
> level so errors are caught by the main control loop for interactive mode and 
> at the loop within execute_queries_non_interactive_mode() for non-interactive 
> mode.
> To prevent redundant error handling, the two respective loops should be 
> wrapped/decorated or call the same method that then handles all the errors.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-7405) Save jenkins artifacts when aborted by a timeout

2020-12-23 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-7405.
---
Fix Version/s: Impala 3.2.0
   Resolution: Fixed

We added code to timeout tests before the Jenkins job timeout and dump 
debugging information. Initial change here:

[https://github.com/apache/impala/commit/11c90c3d7d18ec81ae66d7cf1e6d71ac2be4eec2]

There were subsequent changes that expanded what got dumped. There may be more 
to do, but it can be tracked in a separate JIRA.

> Save jenkins artifacts when aborted by a timeout
> 
>
> Key: IMPALA-7405
> URL: https://issues.apache.org/jira/browse/IMPALA-7405
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.1.0
>Reporter: Vuk Ercegovac
>Priority: Major
>  Labels: flaky-build
> Fix For: Impala 3.2.0
>
>
> Occasionally, builds abort due to timeouts. See: 
> [https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/2882] for an 
> example. At the end of the run, there's a large time gap:
> {noformat}
> ...
> 20:08:21 [gw12] PASSED 
> query_test/test_spilling.py::TestSpillingDebugActionDimensions::test_spilling_naaj[exec_option:
>  {'debug_action': None, 'default_spillable_buffer_size': '256k'} | 
> table_format: parquet/none] 
> 20:08:21 
> query_test/test_spilling.py::TestSpillingDebugActionDimensions::test_spilling_naaj[exec_option:
>  {'debug_action': '-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0', 
> 'default_spillable_buffer_size': '256k'} | table_format: parquet/none] 
> 20:09:51 [gw12] PASSED 
> query_test/test_spilling.py::TestSpillingDebugActionDimensions::test_spilling_naaj[exec_option:
>  {'debug_action': '-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0', 
> 'default_spillable_buffer_size': '256k'} | table_format: parquet/none] 
> 20:09:51 
> query_test/test_spilling.py::TestSpillingDebugActionDimensions::test_spilling_regression_exhaustive[exec_option:
>  {'debug_action': None, 'default_spillable_buffer_size': '256k'} | 
> table_format: parquet/none] 
> 20:09:51 [gw12] SKIPPED 
> query_test/test_spilling.py::TestSpillingDebugActionDimensions::test_spilling_regression_exhaustive[exec_option:
>  {'debug_action': None, 'default_spillable_buffer_size': '256k'} | 
> table_format: parquet/none] 
> 20:09:51 
> query_test/test_spilling.py::TestSpillingDebugActionDimensions::test_spilling_regression_exhaustive[exec_option:
>  {'debug_action': '-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0', 
> 'default_spillable_buffer_size': '256k'} | table_format: parquet/none] 
> 20:09:51 [gw12] SKIPPED 
> query_test/test_spilling.py::TestSpillingDebugActionDimensions::test_spilling_regression_exhaustive[exec_option:
>  {'debug_action': '-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0', 
> 'default_spillable_buffer_size': '256k'} | table_format: parquet/none] Set 
> build name. 
> 03:38:00 New build name is '#2882 master' 
> 03:38:00 Build was aborted 
> 03:38:00 Calling Pipeline was cancelled 
> 03:38:00 Archiving artifacts 
> 03:38:00 Recording test results 
> 03:38:01 None of the test reports contained any result 
> 03:38:01 Finished: ABORTED
> {noformat}
> However, no artifacts were saved so we can't see what was going on during 
> that 7 hour gap. Similar issues have hindered debugging other cases that 
> timeout once in a while: https://issues.apache.org/jira/browse/IMPALA-6352



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-1112) Remove unnecessary boost/std/misc functions from the cross compiled module.

2020-12-23 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-1112.
---
Fix Version/s: Impala 2.7.0
   Resolution: Fixed

> Remove unnecessary boost/std/misc functions from the cross compiled module.
> ---
>
> Key: IMPALA-1112
> URL: https://issues.apache.org/jira/browse/IMPALA-1112
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 1.4
>Reporter: Nong Li
>Assignee: Michael Ho
>Priority: Major
>  Labels: codegen
> Fix For: Impala 2.7.0
>
>
> The module has a init section for initializing static vars. If we include 
> something that contains a static boost/std object, this could pull in a lot 
> of code that we don't need. This code cannot be removed by the internalize 
> optimization pass and results in high compile time.
> As an example, removing the timestamp functions changed compile time from 
> ~700ms -> 70ms. Even with that, we still have random boost functions in there 
> that we are compiling for every query. It's a little tricky to figure out how 
> to organize the includes though.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-1002) Regular expression functions

2020-12-23 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-1002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-1002:
--
Issue Type: New Feature  (was: Task)

> Regular expression functions
> 
>
> Key: IMPALA-1002
> URL: https://issues.apache.org/jira/browse/IMPALA-1002
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend
>Affects Versions: Impala 1.3.1
>Reporter: Justin Erickson
>Priority: Minor
>  Labels: ramp-up
> Attachments: impala-1002.1.patch
>
>
> A community member reported that the following REGEXP_ expressions would be a 
> nice addition to Impala:
> * REGEXP_SUBSTR
> * REGEXP_REPLACE (note the difference in arguments from the current 
> implementation in Impala)
> * REGEXP_SIMILAR
> * REGEXP_INSTR
> * REGEXP_SPLIT_TO_TABLE
> Doc: 
> http://www.info.teradata.com/HTMLPubs/DB_TTU_14_00/index.html#page/SQL_Reference/B035_1145_111A/Regular_Expr_Functions.085.01.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-1002) Regular expression functions

2020-12-23 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-1002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-1002:
--
Labels: ramp-up  (was: )

> Regular expression functions
> 
>
> Key: IMPALA-1002
> URL: https://issues.apache.org/jira/browse/IMPALA-1002
> Project: IMPALA
>  Issue Type: Task
>  Components: Backend
>Affects Versions: Impala 1.3.1
>Reporter: Justin Erickson
>Priority: Minor
>  Labels: ramp-up
> Attachments: impala-1002.1.patch
>
>
> A community member reported that the following REGEXP_ expressions would be a 
> nice addition to Impala:
> * REGEXP_SUBSTR
> * REGEXP_REPLACE (note the difference in arguments from the current 
> implementation in Impala)
> * REGEXP_SIMILAR
> * REGEXP_INSTR
> * REGEXP_SPLIT_TO_TABLE
> Doc: 
> http://www.info.teradata.com/HTMLPubs/DB_TTU_14_00/index.html#page/SQL_Reference/B035_1145_111A/Regular_Expr_Functions.085.01.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7489) Run tests with -Xcheck:jni enabled for debug mode

2020-12-23 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-7489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17254298#comment-17254298
 ] 

Joe McDonnell commented on IMPALA-7489:
---

We have done some runs with this flag in the past, but this is not enabled by 
default.

> Run tests with -Xcheck:jni enabled for debug mode
> -
>
> Key: IMPALA-7489
> URL: https://issues.apache.org/jira/browse/IMPALA-7489
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Reporter: Todd Lipcon
>Priority: Major
>
> -Xcheck:jni is a JVM flag which enables some extra checking of usage of JNI 
> calls. This can find misusage of the JNI API which might otherwise result in 
> hard-to-debug or hard-to-reproduce crashes/bugs. We should enable it when 
> running end-to-end tests on DEBUG mode builds, and probably enable it all the 
> time for Java test runs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-955) Implement the BYTES built-in

2020-12-23 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-955:
-
Labels: built-in-function newbie ramp-up  (was: built-in-function ramp-up)

> Implement the BYTES built-in
> 
>
> Key: IMPALA-955
> URL: https://issues.apache.org/jira/browse/IMPALA-955
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend
>Affects Versions: Impala 1.3
>Reporter: David Z. Chen
>Priority: Minor
>  Labels: built-in-function, newbie, ramp-up
>
> Implement the BYTES built-in: 
> http://www.info.teradata.com/HTMLPubs/DB_TTU_14_00/index.html#page/SQL_Reference/B035_1145_111A/Attribute_Functions.089.02.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-955) Implement the BYTES built-in

2020-12-23 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-955:
-
Labels: built-in-function ramp-up  (was: built-in-function)

> Implement the BYTES built-in
> 
>
> Key: IMPALA-955
> URL: https://issues.apache.org/jira/browse/IMPALA-955
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend
>Affects Versions: Impala 1.3
>Reporter: David Z. Chen
>Priority: Minor
>  Labels: built-in-function, ramp-up
>
> Implement the BYTES built-in: 
> http://www.info.teradata.com/HTMLPubs/DB_TTU_14_00/index.html#page/SQL_Reference/B035_1145_111A/Attribute_Functions.089.02.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-886) Always display HBase cols in same order as CREATE TABLE statement

2020-12-23 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-886:
-
Labels: catalog-server hbase usability  (was: catalog-server usability)

> Always display HBase cols in same order as CREATE TABLE statement
> -
>
> Key: IMPALA-886
> URL: https://issues.apache.org/jira/browse/IMPALA-886
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Affects Versions: Impala 1.3
>Reporter: John Russell
>Priority: Minor
>  Labels: catalog-server, hbase, usability
>
> I noticed a discrepancy with Hive, in how Impala handles column order for 
> HBase tables.
> I think it would be preferable to use the same behavior as Hive, otherwise 
> life becomes
> more complicated for anyone doing INSERT or SELECT * with an HBase table 
> through Impala.
> (And I have to add caveats and usage notes in the docs.)
> Repro:
> In HBase shell, create a table with a single column family. I think most 
> Impala tests use 1 column family per column, where you won't notice this 
> behavior.
> hbase(main):008:0> create 'sample_data_fast','cols'
> 0 row(s) in 71.8750 seconds
> In Hive shell, create a mapping table. Notice how DESCRIBE repeats back the 
> columns in the same order as in CREATE TABLE.
> hive> create external table sample_data_fast (id string, val int, zfill 
> string, name string, assertion boolean)
> > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> > WITH SERDEPROPERTIES (
> > "hbase.columns.mapping" =
> > ":key,cols:val,cols:zfill,cols:name,cols:assertion")
> > TBLPROPERTIES("hbase.table.name" = "sample_data_fast")
> > ;
> OK
> Time taken: 1.7 seconds
> hive> desc sample_data_fast;
> OK
> id  string  from deserializer
> val int from deserializer
> zfill string  from deserializer
> name  string  from deserializer
> assertion boolean from deserializer
> Time taken: 0.302 seconds
> Now try the same DESCRIBE in impala-shell. The key column (id) is listed 
> first. Then all the other columns, part of the same column family, are listed 
> in alphabetical order rather than the order from CREATE TABLE:
> [localhost:21000] > desc sample_data_fast;
> Query: describe sample_data_fast
> +---+-+-+
> | name  | type| comment |
> +---+-+-+
> | id| string  | |
> | assertion | boolean | |
> | name  | string  | |
> | val   | int | |
> | zfill | string  | |
> +---+-+-+
> Returned 5 row(s) in 0.02s
> Thus if you already had Hive code that was doing SELECT * from an HBase table 
> like this, you would get a different result set (different column order) in 
> Impala.
> If you tried to copy from an HDFS table via 'INSERT INTO hbase_table SELECT * 
> FROM hdfs_table', you would get an error because the columns don't match. If 
> you made a separate column family for each column, the discrepancy is masked 
> because you need more than one column per column family to experience the 
> alphabetical ordering.
> Since Hive is preserving the column order, the relevant info must be there in 
> the metastore.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-871) Check HBase config at startup time

2020-12-23 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-871:
-
Labels: supportability  (was: )

> Check HBase config at startup time
> --
>
> Key: IMPALA-871
> URL: https://issues.apache.org/jira/browse/IMPALA-871
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 1.2.4
>Reporter: Alan Choi
>Priority: Minor
>  Labels: supportability
>
> We check the HDFS config by trying to open an HDFS table. We can do the same 
> for HBase. We can try opening or reading a row from HBase table. If all of 
> the HBase table failed to load, then we can warn the user of improper config.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-871) Check HBase config at startup time

2020-12-23 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-871.
--
Resolution: Won't Fix

I think there are some obstacles here, e.g. how do we know what HBase table to 
access

> Check HBase config at startup time
> --
>
> Key: IMPALA-871
> URL: https://issues.apache.org/jira/browse/IMPALA-871
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 1.2.4
>Reporter: Alan Choi
>Priority: Minor
>  Labels: supportability
>
> We check the HDFS config by trying to open an HDFS table. We can do the same 
> for HBase. We can try opening or reading a row from HBase table. If all of 
> the HBase table failed to load, then we can warn the user of improper config.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-7319) Investigate Clang Tidy Diff

2020-12-23 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-7319.
---
Fix Version/s: Not Applicable
   Resolution: Won't Fix

The clang tidy job in upstream GVO already provides pretty fast feedback. I'm 
going to close this.

> Investigate Clang Tidy Diff
> ---
>
> Key: IMPALA-7319
> URL: https://issues.apache.org/jira/browse/IMPALA-7319
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 3.1.0
>Reporter: Joe McDonnell
>Priority: Minor
> Fix For: Not Applicable
>
>
> Clang has a script clang-tidy-diff.py that can run clang tidy on a diff. This 
> is substantially faster than the normal run-clang-tidy.py, because it 
> compiles and analyzes only the changed files. This might also allow a more 
> graceful way to incorporate new clang tidy checks. Kudu has implemented this 
> functionality in their project. See 
> [build-support/clang_tidy_gerrit.py|[https://github.com/apache/kudu/blob/master/build-support/clang_tidy_gerrit.py].]
>  
> While this is faster, it is possible to have a code change that introduces a 
> clang tidy issue in code that didn't change, so clang tidy on a diff might 
> miss some issues.
> We should evaluate whether this is something worth incorporating into Impala. 
> It could be a good way for a developer to do a quick check before upload to 
> Gerrit.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-874) Planner enhancement for HBase rowkey lookup and range scan

2020-12-23 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-874:
-
Labels: hbase  (was: )

> Planner enhancement for HBase rowkey lookup and range scan
> --
>
> Key: IMPALA-874
> URL: https://issues.apache.org/jira/browse/IMPALA-874
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 1.0.1
>Reporter: Alan Choi
>Priority: Minor
>  Labels: hbase
>
> The enhancement should tackle the following issues:
> 1. Turning Compound Predicate into row key look up - OR and IN predicate 
> doesn't work today.
> 2. Apply equivalence class analysis to HBase scan
> 3. When multiple range predicates are present, Impala should pick tightest 
> one. Currently, Impala picks the first set.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-794) Json metrics (/jsonmetrics) output should escape strings

2020-12-23 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-794:


Assignee: (was: Nitesh Thali)

> Json metrics (/jsonmetrics) output should escape strings
> 
>
> Key: IMPALA-794
> URL: https://issues.apache.org/jira/browse/IMPALA-794
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 1.3
>Reporter: Matthew Jacobs
>Priority: Minor
>  Labels: newbie, usability
>
> If a metric name or value (for Metric) contains quotes, the 
> /jsonmetrics output will not be valid JSON because we do not escape strings.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-794) Json metrics (/jsonmetrics) output should escape strings

2020-12-23 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-794.
--
Resolution: Cannot Reproduce

> Json metrics (/jsonmetrics) output should escape strings
> 
>
> Key: IMPALA-794
> URL: https://issues.apache.org/jira/browse/IMPALA-794
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 1.3
>Reporter: Matthew Jacobs
>Priority: Minor
>  Labels: newbie, usability
>
> If a metric name or value (for Metric) contains quotes, the 
> /jsonmetrics output will not be valid JSON because we do not escape strings.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-756) Improve error message / fallback behavior for impala-shell queries involving tabs

2020-12-23 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-756.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

IMPALA-10074 changed the default protocol to hs2, which doesn't have this 
problem.

> Improve error message / fallback behavior for impala-shell queries involving 
> tabs
> -
>
> Key: IMPALA-756
> URL: https://issues.apache.org/jira/browse/IMPALA-756
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Clients
>Affects Versions: Impala 1.2.3
>Reporter: John Russell
>Priority: Minor
> Fix For: Impala 4.0
>
>
> When a query in impala-shell refers to a string literal, the shell stops 
> using pretty-printed output, with a message like so:
> [localhost:21000] > select 'a\tb';
> Prettytable cannot resolve string columns values that have  embedded tabs. 
> Reverting to tab delimited text output
> a b
> I suggest taking out the reference to 'prettytable' as that's an 
> implementation detail. The second part of the message should also use a 
> hyphen 'tab-delimited' and end with a period.
> The same applies if the query itself doesn't contain any tabs, but the result 
> set does:
> [localhost:21000] > create table embedded_tabs (s string);
> [localhost:21000] > insert into table embedded_tabs values 
> ('abc\t123'),('xyz\t456');
> [localhost:21000] > select * from embedded_tabs;
> Prettytable cannot resolve string columns values that have  embedded tabs. 
> Reverting to tab delimited text output
> abc   123
> xyz   456
> Is tab-delimited text really the right fallback behavior though? Then there's 
> no way to distinguish between a tab that really was in the data and one added 
> by impala-shell as a separator. For example, here each row of the result set 
> has 2 fields, but in 2 out of 3 rows there's a tab in the output too. The 
> actual fields don't line up underneath each other, and the final row with no 
> tab looks like a field is missing.
> [localhost:21000] > alter table embedded_tabs replace columns (s1 string, s2 
> string);
> [localhost:21000] > insert overwrite table embedded_tabs values ('hello 
> world','abc\t123'),('xyz\t456','foo bar'),('bletch','baz');
> [localhost:21000] > select * from embedded_tabs;
> Prettytable cannot resolve string columns values that have  embedded tabs. 
> Reverting to tab delimited text output
> hello world   abc 123
> xyz   456 foo bar
> bletchbaz
> When output includes tabs, should the shell fall back to some different 
> separator like pipe, so that the fields with embedded tabs could be seen and 
> parsed correctly? (I realize that choosing a different separator opens up a 
> new can of worms if the output includes the new separator too.)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-733) Improve Parquet error handling for low disk space

2020-12-23 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-733:
-
Labels: supportability  (was: )

> Improve Parquet error handling for low disk space
> -
>
> Key: IMPALA-733
> URL: https://issues.apache.org/jira/browse/IMPALA-733
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 1.2.3
> Environment: Less than 1GB free on the filesystem where HDFS resides.
>Reporter: John Russell
>Priority: Minor
>  Labels: supportability
>
> If HDFS has less than 1 GB free (or I presume whatever value is set in the 
> PARQUET_FILE_SIZE query option), INSERT into a Parquet table fails even for 
> tiny amounts of data. That might be unavoidable, but the error should be 
> communicated more clearly to the user.
> INSERT ... VALUES reports that N rows were inserted (no error at all), but 
> the expected data is missing when the table is queried.
> INSERT ... SELECT gives a cryptic error message but still reports that the 
> rows were inserted, although they aren't.
> Repro:
> About 400MB free. (This is a VM that keeps getting filled up by 
> Impala-related logs.)
> $ df -k .
> Filesystem   1K-blocks  Used Available Use% Mounted on
> /dev/vda1 24607156  23961976395184  99% /
> I was going to answer a question on the mailing list by showing an INSERT 
> going from an unpartitioned to a partitioned table.
> [localhost:21000] > create table unpart (year int, s string) stored as 
> parquet;
> Query: create table unpart (year int, s string) stored as parquet
> Returned 0 row(s) in 0.12s
> INSERT ... VALUES looks like it succeeds, but the data isn't really there.
> [localhost:21000] > insert into unpart values (2013,'Happy'),(2014,'New 
> Year');
> Query: insert into unpart values (2013,'Happy'),(2014,'New Year')
> Inserted 2 rows in 0.22s
> [localhost:21000] > select * from unpart;
> Query: select * from unpart
> Returned 0 row(s) in 0.22s
> [localhost:21000] > select * from unpart;
> Query: select * from unpart
> Returned 0 row(s) in 0.22s
> Copying the data out of a text table, the error is reported but it doesn't 
> say specifically "out of space". And the "Inserted 2 rows" message raises the 
> hope the data made it in, but it didn't.
> [localhost:21000] > insert into unpart select * from t1;
> Query: insert into unpart select * from t1
> ERRORS ENCOUNTERED DURING EXECUTION: Backend 0:Failed to close HDFS file: 
> hdfs://127.0.0.1:8020/user/hive/warehouse/partitioning.db/unpart/.impala_insert_staging/284cf98f761aec95_5712ef093b357195//.2903970254304242837-6274340053807624598_1840160694_dir/2903970254304242837-6274340053807624598_1083629803_data.0
> Error(255): Unknown error 255
> Inserted 2 rows in 0.34s
> [localhost:21000] > select * from unpart;
> Query: select * from unpart
> Returned 0 row(s) in 0.22s
> After all this, the data directory contains a leftover staging subdirectory 
> (empty) and a zero-byte data file:
> $ hdfs dfs -ls 
> hdfs://127.0.0.1:8020/user/hive/warehouse/partitioning.db/unpart
> Found 2 items
> drwxrwxrwx   - impala supergroup  0 2014-01-08 11:39 
> hdfs://127.0.0.1:8020/user/hive/warehouse/partitioning.db/unpart/.impala_insert_staging
> -rw-r--r--   1 impala supergroup  0 2014-01-08 11:39 
> hdfs://127.0.0.1:8020/user/hive/warehouse/partitioning.db/unpart/3188829493227009611-3605612775229973420_1967882694_data.0
> Suggestions:
> - Make INSERT ... VALUES detect/report the HDFS error trying to write the 
> block. Don't report number of rows inserted.
> - Make INSERT ... SELECT error clearer, either suggest it could be 
> out-of-space or do some followup check for $(PARQUET_FILE_SIZE) space free. 
> Don't report number of rows inserted.
> - Be cleaner about leftover staging directories and empty data files. 
> (Shouldn't the data file stay in the staging directory until it's 
> successfully closed?)
> - Whatever distributed is checking is needed so the error is handled if it's 
> a remote node that runs out of space, rather than the coordinator node like 
> in this case with a single VM.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-539) Impala should gather final runtime profile from fragments for aborted/cancelled query

2020-12-23 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-539.
--
Resolution: Won't Fix

See previous comment

> Impala should gather final runtime profile from fragments for 
> aborted/cancelled query
> -
>
> Key: IMPALA-539
> URL: https://issues.apache.org/jira/browse/IMPALA-539
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 1.1
>Reporter: Alan Choi
>Priority: Minor
>  Labels: query-lifecycle
>
> For cancelled/aborted queries, the final runtime profile from plan fragments 
> are not recorded and reported by the coordinator. For short running queries, 
> we won't see any profile at all because no intermediate runtime profile has 
> been sent.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-462) ALTER DATABASE support to rename or move database

2020-12-23 Thread Tim Armstrong (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17254289#comment-17254289
 ] 

Tim Armstrong commented on IMPALA-462:
--

RENAME was closed as wont fix on Hive because it would disrupt the ecosystem (I 
think I agree with this), so this would now only entail setting location.

> ALTER DATABASE support to rename or move database
> -
>
> Key: IMPALA-462
> URL: https://issues.apache.org/jira/browse/IMPALA-462
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Catalog
>Affects Versions: Impala 1.1, Impala 2.3.0
>Reporter: John Russell
>Priority: Minor
>  Labels: ramp-up
> Fix For: Product Backlog
>
>
> I suggest adding an ALTER DATABASE statement, for completeness and future 
> expansion.
> Currently, Hive has ALTER DATABASE that AFAICT only allows a SET clause to 
> change properties.
> One logical syntax / use case for an Impala ALTER DATABASE would be:
> ALTER DATABASE old_name RENAME TO new_name;
> (OK to disallow for the DEFAULT database or the currently USEd database.)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



  1   2   3   >