[jira] [Created] (IMPALA-10212) Support ofs scheme

2020-10-05 Thread Attila Doroszlai (Jira)
Attila Doroszlai created IMPALA-10212:
-

 Summary: Support ofs scheme
 Key: IMPALA-10212
 URL: https://issues.apache.org/jira/browse/IMPALA-10212
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Attila Doroszlai


Ozone 1.0 introduced a new Hadoop-compatible filesystem called OFS, in addition 
to the existing O3FS implementation.  The goal of this ask is to add support 
for {{ofs://}} URLs in Impala.

https://hadoop.apache.org/ozone/docs/1.0.0/interface/ofs.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10213) Handle block location for Ozone

2020-10-05 Thread Attila Doroszlai (Jira)
Attila Doroszlai created IMPALA-10213:
-

 Summary: Handle block location for Ozone
 Key: IMPALA-10213
 URL: https://issues.apache.org/jira/browse/IMPALA-10213
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Attila Doroszlai


Currently Impala treats Ozone as a remote filesystem, similar to S3A, ADLS etc. 
 Ozone provides block location info in its Hadoop-compatible FS 
implementations.  Also, Ozone can be colocated with Impala daemons.  It would 
be nice if Impala could be improved to use Ozone's location info to support 
locality of execution.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10214) Ozone support for file handle cache

2020-10-05 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10214:
-

 Summary: Ozone support for file handle cache
 Key: IMPALA-10214
 URL: https://issues.apache.org/jira/browse/IMPALA-10214
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Sahil Takiar


This is dependent on the Ozone input streams supporting the {{CanUnbuffer}} 
interface first (last I checked, the input streams don't implement the 
interface).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10215) Implement INSERT INTO for non-partitioned Iceberg tables (Parquet)

2020-10-05 Thread Jira
Zoltán Borók-Nagy created IMPALA-10215:
--

 Summary: Implement INSERT INTO for non-partitioned Iceberg tables 
(Parquet)
 Key: IMPALA-10215
 URL: https://issues.apache.org/jira/browse/IMPALA-10215
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Zoltán Borók-Nagy


Impala should be able to insert into non-partitioned Iceberg table when the 
underlying data file format is Parquet.

INSERT OVERWRITE and CTAS is out-of-scope for this sub-task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10216) BufferPoolTest.WriteErrorBlacklistCompression is flaky on UBSAN builds

2020-10-05 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10216:
-

 Summary: BufferPoolTest.WriteErrorBlacklistCompression is flaky on 
UBSAN builds
 Key: IMPALA-10216
 URL: https://issues.apache.org/jira/browse/IMPALA-10216
 Project: IMPALA
  Issue Type: Test
Reporter: Sahil Takiar


Only seen this once so far:

{code}
BufferPoolTest.WriteErrorBlacklistCompression

Error Message
Value of: FindPageInDir(pages[NO_ERROR_QUERY], error_dir) != NULL
  Actual: false
Expected: true

Stacktrace

Impala/be/src/runtime/bufferpool/buffer-pool-test.cc:1764
Value of: FindPageInDir(pages[NO_ERROR_QUERY], error_dir) != NULL
  Actual: false
Expected: true
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10217) test_runtime_filters.TestMinMaxFilters.test_decimal_min_max_filters is flaky

2020-10-05 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10217:
-

 Summary: 
test_runtime_filters.TestMinMaxFilters.test_decimal_min_max_filters is flaky
 Key: IMPALA-10217
 URL: https://issues.apache.org/jira/browse/IMPALA-10217
 Project: IMPALA
  Issue Type: Test
Reporter: Sahil Takiar


Seen this a few times in exhaustive builds:
{code}
query_test.test_runtime_filters.TestMinMaxFilters.test_decimal_min_max_filters[protocol:
 beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 'abort_on_error': 
1, 'exec_single_node_rows_threshold': 0} | table_format: kudu/none] (from 
pytest)

query_test/test_runtime_filters.py:231: in test_decimal_min_max_filters
test_file_vars={'$RUNTIME_FILTER_WAIT_TIME_MS': str(WAIT_TIME_MS)})
common/impala_test_suite.py:718: in run_test_case
update_section=pytest.config.option.update_results)
common/test_result_verifier.py:627: in verify_runtime_profile
% (function, field, expected_value, actual_value, actual))
E   AssertionError: Aggregation of SUM over ProbeRows did not match expected 
results.
E   EXPECTED VALUE:
E   102
E   
E   ACTUAL VALUE:
E   38
E   
{code}





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-9107) Reduce time spent downloading maven artifacts for precommit tests

2020-10-05 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-9107.
---
Fix Version/s: Impala 4.0
 Assignee: Joe McDonnell
   Resolution: Fixed

This has been fixed by the m2 archive tarball infrastructure.

> Reduce time spent downloading maven artifacts for precommit tests
> -
>
> Key: IMPALA-9107
> URL: https://issues.apache.org/jira/browse/IMPALA-9107
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend, Infrastructure
>Affects Versions: Impala 3.4.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
> Fix For: Impala 4.0
>
>
> When building Impala from scratch with an empty .m2 directory, maven needs to 
> download a large number of jars and pom files. This is taking a long time and 
> adds about 15-20 minutes to the build. For example, here is some output from 
> a recent ubuntu-16.04-from-scratch, there is an 18 minute delay between the 
> end of building the backend tests and the end of building the frontend:
> {code:java}
> 00:58:33 [100%] Built target hash-ring-util
> ...
> 01:16:37 [100%] Built target fe{code}
> Almost all of that time is spent downloading maven artifacts. Quite a few 
> come from the maven central repository.
> This is taking way too much time, and we need to reduce it. The total size of 
> artifacts being downloaded is not large. One approach would be to produce a 
> tarball with the jars/poms that don't come from the CDP or CDH GBN repos. We 
> can download that tarball and use it to either populate the .m2 directory or 
> to stash it in IMPALA_TOOLCHAIN and use it as a maven repository with a 
> file:// URI.
> This impacts all our jobs: ubuntu-16.04-from-scratch, 
> ubuntu-16.04-dockerised-tests, all-build-options-ub1604, 
> ubuntu-16.04-build-only, clang-tidy-ub1604
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-9191) Provide a way to build Impala with only one of Sentry / Ranger

2020-10-05 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-9191.
---
Fix Version/s: Impala 4.0
   Resolution: Fixed

Sentry support was removed, so this is no longer a problem.

> Provide a way to build Impala with only one of Sentry / Ranger
> --
>
> Key: IMPALA-9191
> URL: https://issues.apache.org/jira/browse/IMPALA-9191
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 3.4.0
>Reporter: Joe McDonnell
>Assignee: Fang-Yu Rao
>Priority: Critical
> Fix For: Impala 4.0
>
>
> Deployments of Impala will use either Ranger or Sentry, and deployments would 
> not switch back and forth between the two. It makes sense to provide a way to 
> pick at compile time which one to include. This allows packagers of Impala to 
> avoid a dependency for whichever authorization provider they don't need.
> In particular, compilation of the USE_CDP_HIVE=true side of Impala currently 
> needs only a few things from the CDH_BUILD_NUMBER and one them is Sentry. In 
> the other direction, the only thing a USE_CDP_HIVE=false configuration uses 
> from the CDP_BUILD_NUMBER is Ranger.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10218) Remove dependency on the CDH_BUILD_NUMBER and associated maven repository

2020-10-05 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-10218:
--

 Summary: Remove dependency on the CDH_BUILD_NUMBER and associated 
maven repository
 Key: IMPALA-10218
 URL: https://issues.apache.org/jira/browse/IMPALA-10218
 Project: IMPALA
  Issue Type: Improvement
  Components: Infrastructure
Affects Versions: Impala 4.0
Reporter: Joe McDonnell
Assignee: Joe McDonnell


All of the major Hadoop component dependencies have been migrated to the CDP 
versions and come from the CDP_BUILD_NUMBER maven repository. Based on output 
from GVO, nothing comes from the CDH_BUILD_NUMBER maven repository in the main 
build.

There are a couple things that are not in the main build that get artifacts 
from the CDH build repo. Specifically, the Apache Kite dependency in 
testdata/pom.xml and testdata/TableFlattener/pom.xml uses a CDH version.

We should migrate to a public version of Kite and remove the CDH build 
repository (and associated CDH_BUILD_NUMBER code).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10219) Add a query option to simulate catalogd HDFS listing delays

2020-10-05 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created IMPALA-10219:


 Summary: Add a query option to simulate catalogd HDFS listing 
delays
 Key: IMPALA-10219
 URL: https://issues.apache.org/jira/browse/IMPALA-10219
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


This parent issue (IMPALA-6671) caused serious query backlog on large setups 
where namenode response times are slower for whatever reasons. While you can 
tune the namenode to some extent it is still problematic that Impala HDFS 
operations which happen while holding the table lock block other unrelated 
queries.

In order to simulate such problems in the product it would be nice to introduce 
a query option which adds a artificial delay in the RPCs to namenode when the 
table is being loaded. A query option is preferred over service level 
configuration since, that way it is easier to model a slow blocking query and a 
unrelated fast query in the test suite.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10220) Min value of RpcNetworkTime can be negative

2020-10-05 Thread Riza Suminto (Jira)
Riza Suminto created IMPALA-10220:
-

 Summary: Min value of RpcNetworkTime can be negative
 Key: IMPALA-10220
 URL: https://issues.apache.org/jira/browse/IMPALA-10220
 Project: IMPALA
  Issue Type: Bug
  Components: Distributed Exec
Affects Versions: Impala 3.4.0
Reporter: Riza Suminto
Assignee: Riza Suminto


There is a bug in function 
KrpcDataStreamSender::Channel::EndDataStreamCompleteCb(), particularly in this 
line:

[https://github.com/apache/impala/blob/d453d52/be/src/runtime/krpc-data-stream-sender.cc#L635]

network_time_ns should be computed using eos_rsp_.receiver_latency_ns() instead 
of resp_.receiver_latency_ns().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10221) Use 'iceberg.file_format' to replace 'iceberg_file_format'

2020-10-05 Thread WangSheng (Jira)
WangSheng created IMPALA-10221:
--

 Summary: Use 'iceberg.file_format' to replace 'iceberg_file_format'
 Key: IMPALA-10221
 URL: https://issues.apache.org/jira/browse/IMPALA-10221
 Project: IMPALA
  Issue Type: Sub-task
Reporter: WangSheng
Assignee: WangSheng


We provide several new table properties in IMPALA-10164, such as 
'iceberg.catalog',
 in order to keep consist of these properties, we rename 'iceberg_file_format' 
to
 'iceberg.file_format'.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-10184) Iceberg PARTITION SPEC missing from SHOW CREATE TABLE

2020-10-05 Thread Gabor Kaszab (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Kaszab resolved IMPALA-10184.
---
Target Version: Impala 4.0
Resolution: Fixed

> Iceberg PARTITION SPEC missing from SHOW CREATE TABLE
> -
>
> Key: IMPALA-10184
> URL: https://issues.apache.org/jira/browse/IMPALA-10184
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Gabor Kaszab
>Assignee: Gabor Kaszab
>Priority: Major
>  Labels: impala-iceberg
>
> The PARTITION SPEC is missing from the SHOW CREATE TABLE output for Iceberg 
> tables.
> This is how I created a table:
> {code:java}
> create table iceberg_tmp2 (
>   i int, 
>   s string, 
>   p1 string,
>   p2 timestamp
> ) 
> partition by spec (
>   p1 identity, 
>   p2 Day
> ) 
> stored as iceberg;
> {code}
> And this is the output of SHOW CREATE TABLE for the same table:
> {code:java}
> +---+
> | CREATE EXTERNAL TABLE default.iceberg_tmp2 (
> |   i INT,
> |   s STRING, 
> |   p1 STRING,
> |   p2 TIMESTAMP
> | )
> | STORED AS ICEBERG 
> | LOCATION 'hdfs://localhost:20500/test-warehouse/iceberg_tmp2'   
>  | TBLPROPERTIES 
> ('OBJCAPABILITIES'='EXTREAD,EXTWRITE', 'external.table.purge'='TRUE', 
> 'iceberg_file_format'='parquet')
> +--+
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-10175) Extend error message when cast(..format..) fails in parse phase

2020-10-05 Thread Gabor Kaszab (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Kaszab resolved IMPALA-10175.
---
Target Version: Impala 4.0
Resolution: Fixed

> Extend error message when cast(..format..) fails in parse phase
> ---
>
> Key: IMPALA-10175
> URL: https://issues.apache.org/jira/browse/IMPALA-10175
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Gabor Kaszab
>Assignee: Gabor Kaszab
>Priority: Major
>  Labels: supportability
>
> {code:java}
> select cast('0;367' as date format 'YY;DDD'); 
> ERROR: UDF ERROR: String to Date parse failed. Invalid string val: "0;367"
> {code}
> Here the output contains the input string but would be more helpful for 
> debugging if it also contained the original format string as well.
> This applies to String to Date conversions only as String to Timestamp 
> failures currently doesn't raise an error.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)