[jira] [Created] (IMPALA-10212) Support ofs scheme
Attila Doroszlai created IMPALA-10212: - Summary: Support ofs scheme Key: IMPALA-10212 URL: https://issues.apache.org/jira/browse/IMPALA-10212 Project: IMPALA Issue Type: Sub-task Reporter: Attila Doroszlai Ozone 1.0 introduced a new Hadoop-compatible filesystem called OFS, in addition to the existing O3FS implementation. The goal of this ask is to add support for {{ofs://}} URLs in Impala. https://hadoop.apache.org/ozone/docs/1.0.0/interface/ofs.html -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10213) Handle block location for Ozone
Attila Doroszlai created IMPALA-10213: - Summary: Handle block location for Ozone Key: IMPALA-10213 URL: https://issues.apache.org/jira/browse/IMPALA-10213 Project: IMPALA Issue Type: Sub-task Reporter: Attila Doroszlai Currently Impala treats Ozone as a remote filesystem, similar to S3A, ADLS etc. Ozone provides block location info in its Hadoop-compatible FS implementations. Also, Ozone can be colocated with Impala daemons. It would be nice if Impala could be improved to use Ozone's location info to support locality of execution. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10214) Ozone support for file handle cache
Sahil Takiar created IMPALA-10214: - Summary: Ozone support for file handle cache Key: IMPALA-10214 URL: https://issues.apache.org/jira/browse/IMPALA-10214 Project: IMPALA Issue Type: Sub-task Reporter: Sahil Takiar This is dependent on the Ozone input streams supporting the {{CanUnbuffer}} interface first (last I checked, the input streams don't implement the interface). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10215) Implement INSERT INTO for non-partitioned Iceberg tables (Parquet)
Zoltán Borók-Nagy created IMPALA-10215: -- Summary: Implement INSERT INTO for non-partitioned Iceberg tables (Parquet) Key: IMPALA-10215 URL: https://issues.apache.org/jira/browse/IMPALA-10215 Project: IMPALA Issue Type: Sub-task Reporter: Zoltán Borók-Nagy Impala should be able to insert into non-partitioned Iceberg table when the underlying data file format is Parquet. INSERT OVERWRITE and CTAS is out-of-scope for this sub-task. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10216) BufferPoolTest.WriteErrorBlacklistCompression is flaky on UBSAN builds
Sahil Takiar created IMPALA-10216: - Summary: BufferPoolTest.WriteErrorBlacklistCompression is flaky on UBSAN builds Key: IMPALA-10216 URL: https://issues.apache.org/jira/browse/IMPALA-10216 Project: IMPALA Issue Type: Test Reporter: Sahil Takiar Only seen this once so far: {code} BufferPoolTest.WriteErrorBlacklistCompression Error Message Value of: FindPageInDir(pages[NO_ERROR_QUERY], error_dir) != NULL Actual: false Expected: true Stacktrace Impala/be/src/runtime/bufferpool/buffer-pool-test.cc:1764 Value of: FindPageInDir(pages[NO_ERROR_QUERY], error_dir) != NULL Actual: false Expected: true {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10217) test_runtime_filters.TestMinMaxFilters.test_decimal_min_max_filters is flaky
Sahil Takiar created IMPALA-10217: - Summary: test_runtime_filters.TestMinMaxFilters.test_decimal_min_max_filters is flaky Key: IMPALA-10217 URL: https://issues.apache.org/jira/browse/IMPALA-10217 Project: IMPALA Issue Type: Test Reporter: Sahil Takiar Seen this a few times in exhaustive builds: {code} query_test.test_runtime_filters.TestMinMaxFilters.test_decimal_min_max_filters[protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: kudu/none] (from pytest) query_test/test_runtime_filters.py:231: in test_decimal_min_max_filters test_file_vars={'$RUNTIME_FILTER_WAIT_TIME_MS': str(WAIT_TIME_MS)}) common/impala_test_suite.py:718: in run_test_case update_section=pytest.config.option.update_results) common/test_result_verifier.py:627: in verify_runtime_profile % (function, field, expected_value, actual_value, actual)) E AssertionError: Aggregation of SUM over ProbeRows did not match expected results. E EXPECTED VALUE: E 102 E E ACTUAL VALUE: E 38 E {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-9107) Reduce time spent downloading maven artifacts for precommit tests
[ https://issues.apache.org/jira/browse/IMPALA-9107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe McDonnell resolved IMPALA-9107. --- Fix Version/s: Impala 4.0 Assignee: Joe McDonnell Resolution: Fixed This has been fixed by the m2 archive tarball infrastructure. > Reduce time spent downloading maven artifacts for precommit tests > - > > Key: IMPALA-9107 > URL: https://issues.apache.org/jira/browse/IMPALA-9107 > Project: IMPALA > Issue Type: Improvement > Components: Frontend, Infrastructure >Affects Versions: Impala 3.4.0 >Reporter: Joe McDonnell >Assignee: Joe McDonnell >Priority: Major > Fix For: Impala 4.0 > > > When building Impala from scratch with an empty .m2 directory, maven needs to > download a large number of jars and pom files. This is taking a long time and > adds about 15-20 minutes to the build. For example, here is some output from > a recent ubuntu-16.04-from-scratch, there is an 18 minute delay between the > end of building the backend tests and the end of building the frontend: > {code:java} > 00:58:33 [100%] Built target hash-ring-util > ... > 01:16:37 [100%] Built target fe{code} > Almost all of that time is spent downloading maven artifacts. Quite a few > come from the maven central repository. > This is taking way too much time, and we need to reduce it. The total size of > artifacts being downloaded is not large. One approach would be to produce a > tarball with the jars/poms that don't come from the CDP or CDH GBN repos. We > can download that tarball and use it to either populate the .m2 directory or > to stash it in IMPALA_TOOLCHAIN and use it as a maven repository with a > file:// URI. > This impacts all our jobs: ubuntu-16.04-from-scratch, > ubuntu-16.04-dockerised-tests, all-build-options-ub1604, > ubuntu-16.04-build-only, clang-tidy-ub1604 > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-9191) Provide a way to build Impala with only one of Sentry / Ranger
[ https://issues.apache.org/jira/browse/IMPALA-9191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe McDonnell resolved IMPALA-9191. --- Fix Version/s: Impala 4.0 Resolution: Fixed Sentry support was removed, so this is no longer a problem. > Provide a way to build Impala with only one of Sentry / Ranger > -- > > Key: IMPALA-9191 > URL: https://issues.apache.org/jira/browse/IMPALA-9191 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Affects Versions: Impala 3.4.0 >Reporter: Joe McDonnell >Assignee: Fang-Yu Rao >Priority: Critical > Fix For: Impala 4.0 > > > Deployments of Impala will use either Ranger or Sentry, and deployments would > not switch back and forth between the two. It makes sense to provide a way to > pick at compile time which one to include. This allows packagers of Impala to > avoid a dependency for whichever authorization provider they don't need. > In particular, compilation of the USE_CDP_HIVE=true side of Impala currently > needs only a few things from the CDH_BUILD_NUMBER and one them is Sentry. In > the other direction, the only thing a USE_CDP_HIVE=false configuration uses > from the CDP_BUILD_NUMBER is Ranger. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10218) Remove dependency on the CDH_BUILD_NUMBER and associated maven repository
Joe McDonnell created IMPALA-10218: -- Summary: Remove dependency on the CDH_BUILD_NUMBER and associated maven repository Key: IMPALA-10218 URL: https://issues.apache.org/jira/browse/IMPALA-10218 Project: IMPALA Issue Type: Improvement Components: Infrastructure Affects Versions: Impala 4.0 Reporter: Joe McDonnell Assignee: Joe McDonnell All of the major Hadoop component dependencies have been migrated to the CDP versions and come from the CDP_BUILD_NUMBER maven repository. Based on output from GVO, nothing comes from the CDH_BUILD_NUMBER maven repository in the main build. There are a couple things that are not in the main build that get artifacts from the CDH build repo. Specifically, the Apache Kite dependency in testdata/pom.xml and testdata/TableFlattener/pom.xml uses a CDH version. We should migrate to a public version of Kite and remove the CDH build repository (and associated CDH_BUILD_NUMBER code). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10219) Add a query option to simulate catalogd HDFS listing delays
Vihang Karajgaonkar created IMPALA-10219: Summary: Add a query option to simulate catalogd HDFS listing delays Key: IMPALA-10219 URL: https://issues.apache.org/jira/browse/IMPALA-10219 Project: IMPALA Issue Type: Sub-task Reporter: Vihang Karajgaonkar Assignee: Vihang Karajgaonkar This parent issue (IMPALA-6671) caused serious query backlog on large setups where namenode response times are slower for whatever reasons. While you can tune the namenode to some extent it is still problematic that Impala HDFS operations which happen while holding the table lock block other unrelated queries. In order to simulate such problems in the product it would be nice to introduce a query option which adds a artificial delay in the RPCs to namenode when the table is being loaded. A query option is preferred over service level configuration since, that way it is easier to model a slow blocking query and a unrelated fast query in the test suite. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10220) Min value of RpcNetworkTime can be negative
Riza Suminto created IMPALA-10220: - Summary: Min value of RpcNetworkTime can be negative Key: IMPALA-10220 URL: https://issues.apache.org/jira/browse/IMPALA-10220 Project: IMPALA Issue Type: Bug Components: Distributed Exec Affects Versions: Impala 3.4.0 Reporter: Riza Suminto Assignee: Riza Suminto There is a bug in function KrpcDataStreamSender::Channel::EndDataStreamCompleteCb(), particularly in this line: [https://github.com/apache/impala/blob/d453d52/be/src/runtime/krpc-data-stream-sender.cc#L635] network_time_ns should be computed using eos_rsp_.receiver_latency_ns() instead of resp_.receiver_latency_ns(). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10221) Use 'iceberg.file_format' to replace 'iceberg_file_format'
WangSheng created IMPALA-10221: -- Summary: Use 'iceberg.file_format' to replace 'iceberg_file_format' Key: IMPALA-10221 URL: https://issues.apache.org/jira/browse/IMPALA-10221 Project: IMPALA Issue Type: Sub-task Reporter: WangSheng Assignee: WangSheng We provide several new table properties in IMPALA-10164, such as 'iceberg.catalog', in order to keep consist of these properties, we rename 'iceberg_file_format' to 'iceberg.file_format'. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-10184) Iceberg PARTITION SPEC missing from SHOW CREATE TABLE
[ https://issues.apache.org/jira/browse/IMPALA-10184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Kaszab resolved IMPALA-10184. --- Target Version: Impala 4.0 Resolution: Fixed > Iceberg PARTITION SPEC missing from SHOW CREATE TABLE > - > > Key: IMPALA-10184 > URL: https://issues.apache.org/jira/browse/IMPALA-10184 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Reporter: Gabor Kaszab >Assignee: Gabor Kaszab >Priority: Major > Labels: impala-iceberg > > The PARTITION SPEC is missing from the SHOW CREATE TABLE output for Iceberg > tables. > This is how I created a table: > {code:java} > create table iceberg_tmp2 ( > i int, > s string, > p1 string, > p2 timestamp > ) > partition by spec ( > p1 identity, > p2 Day > ) > stored as iceberg; > {code} > And this is the output of SHOW CREATE TABLE for the same table: > {code:java} > +---+ > | CREATE EXTERNAL TABLE default.iceberg_tmp2 ( > | i INT, > | s STRING, > | p1 STRING, > | p2 TIMESTAMP > | ) > | STORED AS ICEBERG > | LOCATION 'hdfs://localhost:20500/test-warehouse/iceberg_tmp2' > | TBLPROPERTIES > ('OBJCAPABILITIES'='EXTREAD,EXTWRITE', 'external.table.purge'='TRUE', > 'iceberg_file_format'='parquet') > +--+ > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-10175) Extend error message when cast(..format..) fails in parse phase
[ https://issues.apache.org/jira/browse/IMPALA-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Kaszab resolved IMPALA-10175. --- Target Version: Impala 4.0 Resolution: Fixed > Extend error message when cast(..format..) fails in parse phase > --- > > Key: IMPALA-10175 > URL: https://issues.apache.org/jira/browse/IMPALA-10175 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Gabor Kaszab >Assignee: Gabor Kaszab >Priority: Major > Labels: supportability > > {code:java} > select cast('0;367' as date format 'YY;DDD'); > ERROR: UDF ERROR: String to Date parse failed. Invalid string val: "0;367" > {code} > Here the output contains the input string but would be more helpful for > debugging if it also contained the original format string as well. > This applies to String to Date conversions only as String to Timestamp > failures currently doesn't raise an error. -- This message was sent by Atlassian Jira (v8.3.4#803005)