[jira] [Resolved] (IMPALA-11528) hive-exec.pom doesn't include UDAF class

2022-09-26 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang resolved IMPALA-11528.
-
Fix Version/s: Impala 4.2.0
   Impala 4.1.1
   Resolution: Fixed

Resolving this. Thank [~scarlin] !

> hive-exec.pom doesn't include UDAF class
> 
>
> Key: IMPALA-11528
> URL: https://issues.apache.org/jira/browse/IMPALA-11528
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 4.1.0
>Reporter: Gabor Kaszab
>Assignee: Steve Carlin
>Priority: Major
> Fix For: Impala 4.2.0, Impala 4.1.1
>
>
> For hive-exec we load "*UDF*" that excludes the UDAF class:
> https://github.com/apache/impala/blob/master/java/shaded-deps/hive-exec/pom.xml#L102
> As a result when a UDAF is being loaded to catalog we get a 
> NoClassDefFoundException.
> {code:java}
> I0819 09:20:07.777845 1 HiveUdfLoader.java:63] Loading UDF 
> 'eu.radoop.datahandler.hive.udf.GenericUDAFCorrelationMatrix' from 
> file:/tmp/e5a348f5-753a-485a-b37d-2a1420b09df7.jar
> I0819 09:20:07.780457 1 MetastoreEventsProcessor.java:700] Metastore 
> event processing restarted. Last synced event id was updated from 902310 to 
> 902310
> I0819 09:20:07.780704 1 jni-util.cc:286] java.lang.NoClassDefFoundError: 
> org/apache/hadoop/hive/ql/exec/UDAF
>   at 
> org.apache.hadoop.hive.ql.exec.FunctionUtils.getUDFClassType(FunctionUtils.java:157)
>   at 
> org.apache.impala.hive.executor.HiveUdfLoader.(HiveUdfLoader.java:68)
>   at 
> org.apache.impala.hive.executor.HiveUdfLoader.createWithLocalPath(HiveUdfLoader.java:155)
>   at 
> org.apache.impala.hive.executor.HiveJavaFunctionFactoryImpl.create(HiveJavaFunctionFactoryImpl.java:47)
>   at 
> org.apache.impala.hive.executor.HiveJavaFunctionFactoryImpl.create(HiveJavaFunctionFactoryImpl.java:67)
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.loadJavaFunctions(CatalogServiceCatalog.java:1756)
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.invalidateDb(CatalogServiceCatalog.java:1862)
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.reset(CatalogServiceCatalog.java:1994)
>   at org.apache.impala.service.JniCatalog.(JniCatalog.java:166)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hadoop.hive.ql.exec.UDAF
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
>   ... 9 more
> I0819 09:20:07.780738 1 status.cc:129] NoClassDefFoundError: 
> org/apache/hadoop/hive/ql/exec/UDAF
> CAUSED BY: ClassNotFoundException: org.apache.hadoop.hive.ql.exec.UDAF
> @   0xd99193
> @  0x162f6d9
> @   0xd5f447
> @   0xd2ba46
> @   0xc85b28
> @   0xbd6fd0
> @ 0x7ff4f9bf7554
> @   0xc7ba86
> E0819 09:20:07.781023 1 catalog.cc:87] NoClassDefFoundError: 
> org/apache/hadoop/hive/ql/exec/UDAF
> CAUSED BY: ClassNotFoundException: org.apache.hadoop.hive.ql.exec.UDAF
> . Impalad exiting.
> {code}
> In Impla 3.4 we handled this exception gracefully but apaprently in 4.1 
> Catalog gets terminated by this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-11528) hive-exec.pom doesn't include UDAF class

2022-09-26 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang reassigned IMPALA-11528:
---

Assignee: Steve Carlin

> hive-exec.pom doesn't include UDAF class
> 
>
> Key: IMPALA-11528
> URL: https://issues.apache.org/jira/browse/IMPALA-11528
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 4.1.0
>Reporter: Gabor Kaszab
>Assignee: Steve Carlin
>Priority: Major
>
> For hive-exec we load "*UDF*" that excludes the UDAF class:
> https://github.com/apache/impala/blob/master/java/shaded-deps/hive-exec/pom.xml#L102
> As a result when a UDAF is being loaded to catalog we get a 
> NoClassDefFoundException.
> {code:java}
> I0819 09:20:07.777845 1 HiveUdfLoader.java:63] Loading UDF 
> 'eu.radoop.datahandler.hive.udf.GenericUDAFCorrelationMatrix' from 
> file:/tmp/e5a348f5-753a-485a-b37d-2a1420b09df7.jar
> I0819 09:20:07.780457 1 MetastoreEventsProcessor.java:700] Metastore 
> event processing restarted. Last synced event id was updated from 902310 to 
> 902310
> I0819 09:20:07.780704 1 jni-util.cc:286] java.lang.NoClassDefFoundError: 
> org/apache/hadoop/hive/ql/exec/UDAF
>   at 
> org.apache.hadoop.hive.ql.exec.FunctionUtils.getUDFClassType(FunctionUtils.java:157)
>   at 
> org.apache.impala.hive.executor.HiveUdfLoader.(HiveUdfLoader.java:68)
>   at 
> org.apache.impala.hive.executor.HiveUdfLoader.createWithLocalPath(HiveUdfLoader.java:155)
>   at 
> org.apache.impala.hive.executor.HiveJavaFunctionFactoryImpl.create(HiveJavaFunctionFactoryImpl.java:47)
>   at 
> org.apache.impala.hive.executor.HiveJavaFunctionFactoryImpl.create(HiveJavaFunctionFactoryImpl.java:67)
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.loadJavaFunctions(CatalogServiceCatalog.java:1756)
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.invalidateDb(CatalogServiceCatalog.java:1862)
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.reset(CatalogServiceCatalog.java:1994)
>   at org.apache.impala.service.JniCatalog.(JniCatalog.java:166)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hadoop.hive.ql.exec.UDAF
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
>   ... 9 more
> I0819 09:20:07.780738 1 status.cc:129] NoClassDefFoundError: 
> org/apache/hadoop/hive/ql/exec/UDAF
> CAUSED BY: ClassNotFoundException: org.apache.hadoop.hive.ql.exec.UDAF
> @   0xd99193
> @  0x162f6d9
> @   0xd5f447
> @   0xd2ba46
> @   0xc85b28
> @   0xbd6fd0
> @ 0x7ff4f9bf7554
> @   0xc7ba86
> E0819 09:20:07.781023 1 catalog.cc:87] NoClassDefFoundError: 
> org/apache/hadoop/hive/ql/exec/UDAF
> CAUSED BY: ClassNotFoundException: org.apache.hadoop.hive.ql.exec.UDAF
> . Impalad exiting.
> {code}
> In Impla 3.4 we handled this exception gracefully but apaprently in 4.1 
> Catalog gets terminated by this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-11594) TestIcebergTable.test_create_table_like_parquet fails in non-HDFS build

2022-09-26 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609795#comment-17609795
 ] 

ASF subversion and git services commented on IMPALA-11594:
--

Commit 0d7232d2f2c2105144a673ee7a413f77f12a995c in impala's branch 
refs/heads/master from Michael Smith
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=0d7232d2f ]

IMPALA-11438, IMPALA-11594: update test for non-HDFS

Updates the new create_table_like_parquet test to work with S3 and Ozone
filesystem schemes in addition to HDFS.

Change-Id: Ibd8d4c6b96c3ed607556793e6b822944a879a1f8
Reviewed-on: http://gerrit.cloudera.org:8080/19037
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> TestIcebergTable.test_create_table_like_parquet fails in non-HDFS build
> ---
>
> Key: IMPALA-11594
> URL: https://issues.apache.org/jira/browse/IMPALA-11594
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Quanlong Huang
>Assignee: Gergely Fürnstáhl
>Priority: Critical
>  Labels: broken-build, impala-iceberg
>
> TestIcebergTable.test_create_table_like_parquet added by IMPALA-11438 failed 
> in S3 builds:
> {code:java}
> query_test/test_iceberg.py:819: in test_create_table_like_parquet
> self._create_table_like_parquet_helper(vector, unique_database, tbl_name, 
> False)
> query_test/test_iceberg.py:806: in _create_table_like_parquet_helper
> assert hdfs_file
> E   assert None {code}
> Standard Error:
> {noformat}
> SET 
> client_identifier=query_test/test_iceberg.py::TestIcebergTable::()::test_create_table_like_parquet[protocol:beeswax|exec_option:{'test_replan':1;'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':False;'abort_on_error':1;'exec_single_node_row;
> SET sync_ddl=False;
> -- executing against localhost:21000
> DROP DATABASE IF EXISTS `test_create_table_like_parquet_6e658131` CASCADE;
> -- 2022-09-18 11:38:30,905 INFO MainThread: Started query 
> 9742962ebcabc35b:5c668e45
> SET 
> client_identifier=query_test/test_iceberg.py::TestIcebergTable::()::test_create_table_like_parquet[protocol:beeswax|exec_option:{'test_replan':1;'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':False;'abort_on_error':1;'exec_single_node_row;
> SET sync_ddl=False;
> -- executing against localhost:21000
> CREATE DATABASE `test_create_table_like_parquet_6e658131`;
> -- 2022-09-18 11:38:38,160 INFO MainThread: Started query 
> b84f60c7670afd2e:37deca4d
> -- 2022-09-18 11:38:38,765 INFO MainThread: Created database 
> "test_create_table_like_parquet_6e658131" for test ID 
> "query_test/test_iceberg.py::TestIcebergTable::()::test_create_table_like_parquet[protocol:
>  beeswax | exec_option: {'test_replan': 1, 'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> parquet/none]"
> 22/09/18 11:38:39 WARN impl.MetricsConfig: Cannot locate configuration: tried 
> hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
> 22/09/18 11:38:39 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot 
> period at 10 second(s).
> 22/09/18 11:38:39 INFO impl.MetricsSystemImpl: s3a-file-system metrics system 
> started
> 22/09/18 11:38:40 INFO Configuration.deprecation: No unit for 
> fs.s3a.connection.request.timeout(0) assuming SECONDS
> 22/09/18 11:38:41 INFO impl.MetricsSystemImpl: Stopping s3a-file-system 
> metrics system...
> 22/09/18 11:38:41 INFO impl.MetricsSystemImpl: s3a-file-system metrics system 
> stopped.
> 22/09/18 11:38:41 INFO impl.MetricsSystemImpl: s3a-file-system metrics system 
> shutdown complete.
> -- executing against localhost:21000
> create table test_create_table_like_parquet_6e658131.alltypes_tiny_pages like 
> parquet 
> "/test-warehouse/test_create_table_like_parquet_6e658131.db/alltypes_tiny_pages.parquet"
>  stored as parquet;
> -- 2022-09-18 11:38:41,292 INFO MainThread: Started query 
> c94eb08d583843d6:78da5179
> -- executing against localhost:21000
> load data inpath 
> "/test-warehouse/test_create_table_like_parquet_6e658131.db/alltypes_tiny_pages.parquet"
>  into table test_create_table_like_parquet_6e658131.alltypes_tiny_pages;
> -- 2022-09-18 11:38:45,913 INFO MainThread: Started query 
> 014969c4fd6439be:bdf84d62 {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-11585) Docker quickstart client fails to build on Ubuntu 20.04

2022-09-26 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609796#comment-17609796
 ] 

ASF subversion and git services commented on IMPALA-11585:
--

Commit f6151b0aa18cfcad50f8f63d5621f56db9fde6fe in impala's branch 
refs/heads/master from Michael Smith
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=f6151b0aa ]

IMPALA-11585: Build quickstart_client with Ubuntu 20

Ubuntu 20.04 only provides the python3-pip package. Update building
quickstart_client to use python3-pip on Ubuntu 20.04.

Change-Id: Ife89b7db88dd58e96ba1b3e3972ca97204332dd4
Reviewed-on: http://gerrit.cloudera.org:8080/18984
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Docker quickstart client fails to build on Ubuntu 20.04
> ---
>
> Key: IMPALA-11585
> URL: https://issues.apache.org/jira/browse/IMPALA-11585
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 4.1.0
>Reporter: Michael Smith
>Priority: Major
> Fix For: Impala 4.2.0
>
>
> The Docker quickstart client fails to build on Ubuntu 20.04 because it tries 
> to install {{python-pip}}, a package that's unavailable on 20+.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-11438) Add tests for CREATE TABLE LIKE PARQUET STORED AS ICEBERG

2022-09-26 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609794#comment-17609794
 ] 

ASF subversion and git services commented on IMPALA-11438:
--

Commit 0d7232d2f2c2105144a673ee7a413f77f12a995c in impala's branch 
refs/heads/master from Michael Smith
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=0d7232d2f ]

IMPALA-11438, IMPALA-11594: update test for non-HDFS

Updates the new create_table_like_parquet test to work with S3 and Ozone
filesystem schemes in addition to HDFS.

Change-Id: Ibd8d4c6b96c3ed607556793e6b822944a879a1f8
Reviewed-on: http://gerrit.cloudera.org:8080/19037
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Add tests for CREATE TABLE LIKE PARQUET STORED AS ICEBERG
> -
>
> Key: IMPALA-11438
> URL: https://issues.apache.org/jira/browse/IMPALA-11438
> Project: IMPALA
>  Issue Type: Test
>Reporter: Zoltán Borók-Nagy
>Assignee: Gergely Fürnstáhl
>Priority: Major
>  Labels: impala-iceberg
>
> Currently it's possible to use the CREATE TABLE LIKE PARQUET statement to 
> create new Iceberg tables, but we don't have any tests for it.
> The statement is expected to work when the Parquet file has compatible types. 
> When there are incompatible types in the Parquet file, e.g. INT_8 then Impala 
> should raise an error.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-11572) TestHdfsScannerSkew.test_mt_dop_skew_lpt is flaky

2022-09-26 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609793#comment-17609793
 ] 

ASF subversion and git services commented on IMPALA-11572:
--

Commit 49bffd236ca6caa4d86f988c3f7290b7dca0b2ff in impala's branch 
refs/heads/master from Zoltan Borok-Nagy
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=49bffd236 ]

IMPALA-11572: deflake test_mt_dop_skew_lpt part 2

I suspect the test is still flaky because of parallel execution
of other tests that have high CPU usage, hence marking the test
for serial execution to make it more stable.

Also printing out the query profile to stdout when we observe a
skew.

Change-Id: I521685d683d51a52e54ad138f8466dd41c844f72
Reviewed-on: http://gerrit.cloudera.org:8080/19035
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> TestHdfsScannerSkew.test_mt_dop_skew_lpt is flaky
> -
>
> Key: IMPALA-11572
> URL: https://issues.apache.org/jira/browse/IMPALA-11572
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: broken-build
>
> The test can fail with:
> {noformat}
> query_test/test_scanners.py:428: in test_mt_dop_skew_lpt
> assert cnt_fail < 3
> E   assert 3 < 3
> {noformat}
> We need to fine-tune the test to deflake it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-11528) hive-exec.pom doesn't include UDAF class

2022-09-26 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609792#comment-17609792
 ] 

ASF subversion and git services commented on IMPALA-11528:
--

Commit dddc17c1ef6037f3881a92f9188ffa57acd1bece in impala's branch 
refs/heads/branch-4.1.1 from Steve Carlin
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=dddc17c1e ]

IMPALA-11528: Catalogd should start up with a corrupt Hive function.

This commit handles the case for a specific kind of corrupt function
within the Hive Metastore in the following situation:

A valid Hive SQL function gets created in HMS. This UDF is written in
Java and must derive from the "UDF" class. After creating this function
in Impala, we then replace the underlying jar file with a class that
does NOT derive from the "UDF" class.

In this scenario, catalogd should reject the function and still start
up gracefully. Before this commit, catalogd wasn't coming up. The
reason for this was because the Hive function
FunctionUtils.getUDFClassType() has a dependency on UDAF and was
throwing a LinkageError exception, so we need to include the UDAF
class in the shaded jar.

Merge conflicts in branch-4.1:
- Ignore MapredContext.class in java/shaded-deps/hive-exec/pom.xml
- Replace SkipIfFS.hive in test_permanent_udfs.py with individual skip
  annotations
- Update version in java/test-corrupt-hive-udfs/pom.xml

Change-Id: I54e7a1df6d018ba6cf5ecf32dc9946edf86e2112
Reviewed-on: http://gerrit.cloudera.org:8080/18927
Tested-by: Impala Public Jenkins 
Reviewed-by: Tamas Mate 
Reviewed-on: http://gerrit.cloudera.org:8080/19019


> hive-exec.pom doesn't include UDAF class
> 
>
> Key: IMPALA-11528
> URL: https://issues.apache.org/jira/browse/IMPALA-11528
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 4.1.0
>Reporter: Gabor Kaszab
>Priority: Major
>
> For hive-exec we load "*UDF*" that excludes the UDAF class:
> https://github.com/apache/impala/blob/master/java/shaded-deps/hive-exec/pom.xml#L102
> As a result when a UDAF is being loaded to catalog we get a 
> NoClassDefFoundException.
> {code:java}
> I0819 09:20:07.777845 1 HiveUdfLoader.java:63] Loading UDF 
> 'eu.radoop.datahandler.hive.udf.GenericUDAFCorrelationMatrix' from 
> file:/tmp/e5a348f5-753a-485a-b37d-2a1420b09df7.jar
> I0819 09:20:07.780457 1 MetastoreEventsProcessor.java:700] Metastore 
> event processing restarted. Last synced event id was updated from 902310 to 
> 902310
> I0819 09:20:07.780704 1 jni-util.cc:286] java.lang.NoClassDefFoundError: 
> org/apache/hadoop/hive/ql/exec/UDAF
>   at 
> org.apache.hadoop.hive.ql.exec.FunctionUtils.getUDFClassType(FunctionUtils.java:157)
>   at 
> org.apache.impala.hive.executor.HiveUdfLoader.(HiveUdfLoader.java:68)
>   at 
> org.apache.impala.hive.executor.HiveUdfLoader.createWithLocalPath(HiveUdfLoader.java:155)
>   at 
> org.apache.impala.hive.executor.HiveJavaFunctionFactoryImpl.create(HiveJavaFunctionFactoryImpl.java:47)
>   at 
> org.apache.impala.hive.executor.HiveJavaFunctionFactoryImpl.create(HiveJavaFunctionFactoryImpl.java:67)
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.loadJavaFunctions(CatalogServiceCatalog.java:1756)
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.invalidateDb(CatalogServiceCatalog.java:1862)
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.reset(CatalogServiceCatalog.java:1994)
>   at org.apache.impala.service.JniCatalog.(JniCatalog.java:166)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hadoop.hive.ql.exec.UDAF
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
>   ... 9 more
> I0819 09:20:07.780738 1 status.cc:129] NoClassDefFoundError: 
> org/apache/hadoop/hive/ql/exec/UDAF
> CAUSED BY: ClassNotFoundException: org.apache.hadoop.hive.ql.exec.UDAF
> @   0xd99193
> @  0x162f6d9
> @   0xd5f447
> @   0xd2ba46
> @   0xc85b28
> @   0xbd6fd0
> @ 0x7ff4f9bf7554
> @   0xc7ba86
> E0819 09:20:07.781023 1 catalog.cc:87] NoClassDefFoundError: 
> org/apache/hadoop/hive/ql/exec/UDAF
> CAUSED BY: ClassNotFoundException: org.apache.hadoop.hive.ql.exec.UDAF
> . Impalad exiting.
> {code}
> In Impla 3.4 we handled this exception gracefully but apaprently in 4.1 
> Catalog gets terminated by this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: 

[jira] [Resolved] (IMPALA-11585) Docker quickstart client fails to build on Ubuntu 20.04

2022-09-26 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith resolved IMPALA-11585.

Resolution: Fixed

> Docker quickstart client fails to build on Ubuntu 20.04
> ---
>
> Key: IMPALA-11585
> URL: https://issues.apache.org/jira/browse/IMPALA-11585
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 4.1.0
>Reporter: Michael Smith
>Priority: Major
> Fix For: Impala 4.2.0
>
>
> The Docker quickstart client fails to build on Ubuntu 20.04 because it tries 
> to install {{python-pip}}, a package that's unavailable on 20+.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-11585) Docker quickstart client fails to build on Ubuntu 20.04

2022-09-26 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith updated IMPALA-11585:
---
Fix Version/s: Impala 4.2.0

> Docker quickstart client fails to build on Ubuntu 20.04
> ---
>
> Key: IMPALA-11585
> URL: https://issues.apache.org/jira/browse/IMPALA-11585
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 4.1.0
>Reporter: Michael Smith
>Priority: Major
> Fix For: Impala 4.2.0
>
>
> The Docker quickstart client fails to build on Ubuntu 20.04 because it tries 
> to install {{python-pip}}, a package that's unavailable on 20+.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10356) Analyzed query in explain plan is not quite right for insert with values clause

2022-09-26 Thread Csaba Ringhofer (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609689#comment-17609689
 ] 

Csaba Ringhofer commented on IMPALA-10356:
--

[~daniel.becker] I think that all others are are better than the current one as 
at least they do not look like a bug. In the long term it would make sense to 
implement special handling for values clause to make them more efficient than 
union. I would would for the first one as it seems the simplest to implement.

btw besides the 1 clause case, isn't it an issue that we write UNION instead of 
UNION ALL?

> Analyzed query in explain plan is not quite right for insert with values 
> clause
> ---
>
> Key: IMPALA-10356
> URL: https://issues.apache.org/jira/browse/IMPALA-10356
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 4.0.0
>Reporter: Tim Armstrong
>Assignee: Daniel Becker
>Priority: Major
>  Labels: newbie, ramp-up
>
> In impala-shell:
> {noformat}
> create table double_tbl (d double) stored as textfile;
> set explain_level=2;
> explain insert into double_tbl values (-0.43149576573887316);
> {noformat}
> {noformat}
> +--+
> | Explain String  
>  |
> +--+
> | Max Per-Host Resource Reservation: Memory=0B Threads=1  
>  |
> | Per-Host Resource Estimates: Memory=10MB
>  |
> | Codegen disabled by planner 
>  |
> | Analyzed query: SELECT CAST(-0.43149576573887316 AS DECIMAL(17,17)) UNION 
> SELECT |
> | CAST(-0.43149576573887316 AS DECIMAL(17,17))
>  |
> | 
>  |
> | F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1   
>  |
> | |  Per-Host Resources: mem-estimate=8B mem-reservation=0B 
> thread-reservation=1   |
> | WRITE TO HDFS [default.double_tbl, OVERWRITE=false] 
>  |
> | |  partitions=1 
>  |
> | |  output exprs: CAST(-0.43149576573887316 AS DOUBLE)   
>  |
> | |  mem-estimate=8B mem-reservation=0B thread-reservation=0  
>  |
> | |   
>  |
> | 00:UNION
>  |
> |constant-operands=1  
>  |
> |mem-estimate=0B mem-reservation=0B thread-reservation=0  
>  |
> |tuple-ids=0 row-size=8B cardinality=1
>  |
> |in pipelines:  
>  |
> +--+
> {noformat}
> The analyzed query does not make sense. We should investigate and fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-11614) TestValidateMetrics.test_metrics_are_zero fails with num-missing-volume-id for Ozone

2022-09-26 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-11614 started by Michael Smith.
--
> TestValidateMetrics.test_metrics_are_zero fails with num-missing-volume-id 
> for Ozone
> 
>
> Key: IMPALA-11614
> URL: https://issues.apache.org/jira/browse/IMPALA-11614
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 4.2.0
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Critical
>  Labels: broken-build
>
> TestValidateMetrics.test_metrics_are_zero fails for Ozone with
> {code}
> /data/jenkins/workspace/impala-private-ozone-parameterized/repos/Impala/tests/common/impala_service.py:210:
>  in __metric_timeout_assert
> assert 0, assert_string
> E   AssertionError: Metric impala-server.scan-ranges.num-missing-volume-id 
> did not reach value 0 in 60s.
> {code}
> This passed at one point: 
> https://master-03.jenkins.cloudera.com/view/Impala/view/Evergreen-asf-master/job/impala-asf-master-core-ozone/7/
>  with commit 17ec3a85c7e3733dacb08a9fcca83fff5ec75102 succeeded. I suspect it 
> started failing as a result of 79e474d310 (IMPALA-10213) because now most 
> tests see Ozone datanodes as local rather than remote.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-11614) TestValidateMetrics.test_metrics_are_zero fails with num-missing-volume-id for Ozone

2022-09-26 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609651#comment-17609651
 ] 

Joe McDonnell commented on IMPALA-11614:


I think it would be better to avoid updating this metric for Ozone.

> TestValidateMetrics.test_metrics_are_zero fails with num-missing-volume-id 
> for Ozone
> 
>
> Key: IMPALA-11614
> URL: https://issues.apache.org/jira/browse/IMPALA-11614
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 4.2.0
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Critical
>  Labels: broken-build
>
> TestValidateMetrics.test_metrics_are_zero fails for Ozone with
> {code}
> /data/jenkins/workspace/impala-private-ozone-parameterized/repos/Impala/tests/common/impala_service.py:210:
>  in __metric_timeout_assert
> assert 0, assert_string
> E   AssertionError: Metric impala-server.scan-ranges.num-missing-volume-id 
> did not reach value 0 in 60s.
> {code}
> This passed at one point: 
> https://master-03.jenkins.cloudera.com/view/Impala/view/Evergreen-asf-master/job/impala-asf-master-core-ozone/7/
>  with commit 17ec3a85c7e3733dacb08a9fcca83fff5ec75102 succeeded. I suspect it 
> started failing as a result of 79e474d310 (IMPALA-10213) because now most 
> tests see Ozone datanodes as local rather than remote.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-11614) TestValidateMetrics.test_metrics_are_zero fails with num-missing-volume-id for Ozone

2022-09-26 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell updated IMPALA-11614:
---
Priority: Critical  (was: Major)

> TestValidateMetrics.test_metrics_are_zero fails with num-missing-volume-id 
> for Ozone
> 
>
> Key: IMPALA-11614
> URL: https://issues.apache.org/jira/browse/IMPALA-11614
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 4.2.0
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Critical
>
> TestValidateMetrics.test_metrics_are_zero fails for Ozone with
> {code}
> /data/jenkins/workspace/impala-private-ozone-parameterized/repos/Impala/tests/common/impala_service.py:210:
>  in __metric_timeout_assert
> assert 0, assert_string
> E   AssertionError: Metric impala-server.scan-ranges.num-missing-volume-id 
> did not reach value 0 in 60s.
> {code}
> This passed at one point: 
> https://master-03.jenkins.cloudera.com/view/Impala/view/Evergreen-asf-master/job/impala-asf-master-core-ozone/7/
>  with commit 17ec3a85c7e3733dacb08a9fcca83fff5ec75102 succeeded. I suspect it 
> started failing as a result of 79e474d310 (IMPALA-10213) because now most 
> tests see Ozone datanodes as local rather than remote.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-11614) TestValidateMetrics.test_metrics_are_zero fails with num-missing-volume-id for Ozone

2022-09-26 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell updated IMPALA-11614:
---
Labels: broken-build  (was: )

> TestValidateMetrics.test_metrics_are_zero fails with num-missing-volume-id 
> for Ozone
> 
>
> Key: IMPALA-11614
> URL: https://issues.apache.org/jira/browse/IMPALA-11614
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 4.2.0
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Critical
>  Labels: broken-build
>
> TestValidateMetrics.test_metrics_are_zero fails for Ozone with
> {code}
> /data/jenkins/workspace/impala-private-ozone-parameterized/repos/Impala/tests/common/impala_service.py:210:
>  in __metric_timeout_assert
> assert 0, assert_string
> E   AssertionError: Metric impala-server.scan-ranges.num-missing-volume-id 
> did not reach value 0 in 60s.
> {code}
> This passed at one point: 
> https://master-03.jenkins.cloudera.com/view/Impala/view/Evergreen-asf-master/job/impala-asf-master-core-ozone/7/
>  with commit 17ec3a85c7e3733dacb08a9fcca83fff5ec75102 succeeded. I suspect it 
> started failing as a result of 79e474d310 (IMPALA-10213) because now most 
> tests see Ozone datanodes as local rather than remote.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-11614) TestValidateMetrics.test_metrics_are_zero fails with num-missing-volume-id for Ozone

2022-09-26 Thread Michael Smith (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609621#comment-17609621
 ] 

Michael Smith commented on IMPALA-11614:


[~joemcdonnell] have any concerns with skipping this assertion for Ozone 
testing? The other option would be to skip updating this metric for Ozone 
accesses.

> TestValidateMetrics.test_metrics_are_zero fails with num-missing-volume-id 
> for Ozone
> 
>
> Key: IMPALA-11614
> URL: https://issues.apache.org/jira/browse/IMPALA-11614
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Major
>
> TestValidateMetrics.test_metrics_are_zero fails for Ozone with
> {code}
> /data/jenkins/workspace/impala-private-ozone-parameterized/repos/Impala/tests/common/impala_service.py:210:
>  in __metric_timeout_assert
> assert 0, assert_string
> E   AssertionError: Metric impala-server.scan-ranges.num-missing-volume-id 
> did not reach value 0 in 60s.
> {code}
> This passed at one point: 
> https://master-03.jenkins.cloudera.com/view/Impala/view/Evergreen-asf-master/job/impala-asf-master-core-ozone/7/
>  with commit 17ec3a85c7e3733dacb08a9fcca83fff5ec75102 succeeded. I suspect it 
> started failing as a result of 79e474d310 (IMPALA-10213) because now most 
> tests see Ozone datanodes as local rather than remote.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-11614) TestValidateMetrics.test_metrics_are_zero fails with num-missing-volume-id for Ozone

2022-09-26 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith updated IMPALA-11614:
---
Affects Version/s: Impala 4.2.0

> TestValidateMetrics.test_metrics_are_zero fails with num-missing-volume-id 
> for Ozone
> 
>
> Key: IMPALA-11614
> URL: https://issues.apache.org/jira/browse/IMPALA-11614
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 4.2.0
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Major
>
> TestValidateMetrics.test_metrics_are_zero fails for Ozone with
> {code}
> /data/jenkins/workspace/impala-private-ozone-parameterized/repos/Impala/tests/common/impala_service.py:210:
>  in __metric_timeout_assert
> assert 0, assert_string
> E   AssertionError: Metric impala-server.scan-ranges.num-missing-volume-id 
> did not reach value 0 in 60s.
> {code}
> This passed at one point: 
> https://master-03.jenkins.cloudera.com/view/Impala/view/Evergreen-asf-master/job/impala-asf-master-core-ozone/7/
>  with commit 17ec3a85c7e3733dacb08a9fcca83fff5ec75102 succeeded. I suspect it 
> started failing as a result of 79e474d310 (IMPALA-10213) because now most 
> tests see Ozone datanodes as local rather than remote.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-11614) TestValidateMetrics.test_metrics_are_zero fails with num-missing-volume-id for Ozone

2022-09-26 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith updated IMPALA-11614:
---
Description: 
TestValidateMetrics.test_metrics_are_zero fails for Ozone with
{code}
/data/jenkins/workspace/impala-private-ozone-parameterized/repos/Impala/tests/common/impala_service.py:210:
 in __metric_timeout_assert
assert 0, assert_string
E   AssertionError: Metric impala-server.scan-ranges.num-missing-volume-id did 
not reach value 0 in 60s.
{code}

This passed at one point: 
https://master-03.jenkins.cloudera.com/view/Impala/view/Evergreen-asf-master/job/impala-asf-master-core-ozone/7/
 with commit 17ec3a85c7e3733dacb08a9fcca83fff5ec75102 succeeded. I suspect it 
started failing as a result of 79e474d310 (IMPALA-10213) because now most tests 
see Ozone datanodes as local rather than remote.

  was:
TestValidateMetrics.test_metrics_are_zero fails for Ozone with
{code}
/data/jenkins/workspace/impala-private-ozone-parameterized/repos/Impala/tests/common/impala_service.py:210:
 in __metric_timeout_assert
assert 0, assert_string
E   AssertionError: Metric impala-server.scan-ranges.num-missing-volume-id did 
not reach value 0 in 60s.
{code}

This passed at one point: 
https://master-03.jenkins.cloudera.com/view/Impala/view/Evergreen-asf-master/job/impala-asf-master-core-ozone/7/
 with commit 17ec3a85c7e3733dacb08a9fcca83fff5ec75102 succeeded. I suspect it 
started failing as a result of 79e474d310 (IMPALA-10213) which changed some 
characteristics about how scheduling works for Ozone tests.


> TestValidateMetrics.test_metrics_are_zero fails with num-missing-volume-id 
> for Ozone
> 
>
> Key: IMPALA-11614
> URL: https://issues.apache.org/jira/browse/IMPALA-11614
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Major
>
> TestValidateMetrics.test_metrics_are_zero fails for Ozone with
> {code}
> /data/jenkins/workspace/impala-private-ozone-parameterized/repos/Impala/tests/common/impala_service.py:210:
>  in __metric_timeout_assert
> assert 0, assert_string
> E   AssertionError: Metric impala-server.scan-ranges.num-missing-volume-id 
> did not reach value 0 in 60s.
> {code}
> This passed at one point: 
> https://master-03.jenkins.cloudera.com/view/Impala/view/Evergreen-asf-master/job/impala-asf-master-core-ozone/7/
>  with commit 17ec3a85c7e3733dacb08a9fcca83fff5ec75102 succeeded. I suspect it 
> started failing as a result of 79e474d310 (IMPALA-10213) because now most 
> tests see Ozone datanodes as local rather than remote.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-11593) Disk I/O error with NullPointerException from libhdfs in S3 builds

2022-09-26 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609619#comment-17609619
 ] 

Joe McDonnell commented on IMPALA-11593:


This seems to impact various tests, so making this a more generic JIRA.

> Disk I/O error with NullPointerException from libhdfs in S3 builds
> --
>
> Key: IMPALA-11593
> URL: https://issues.apache.org/jira/browse/IMPALA-11593
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
>  Labels: broken-build
>
> Saw this failure on an S3 build:
> {noformat}
> custom_cluster/test_mem_reservations.py:102: in 
> test_per_backend_min_reservation
> assert t.error is None
> E   assert 'ImpalaBeeswaxException:\n Query aborted:Disk I/O error on 
> impala-ec2-centos79-m6i-4xlarge-ondemand-1db1.vpc.clouderawarehouse/alltypes/year=2009/month=9/090901.txt\nError(255):
>  Unknown error 255\nRoot cause: NullPointerException: \n\n' is None
> E+  where 'ImpalaBeeswaxException:\n Query aborted:Disk I/O error on 
> impala-ec2-centos79-m6i-4xlarge-ondemand-1db1.vpc.clouderawarehouse/alltypes/year=2009/month=9/090901.txt\nError(255):
>  Unknown error 255\nRoot cause: NullPointerException: \n\n' = 
> .error
> {noformat}
> Impalad logs for the query:
> {noformat}
> I0915 03:12:33.839942 21677 impala-server.cc:1333] 
> 09439d05a2468038:3816f0f2] Registered query 
> query_id=09439d05a2468038:3816f0f2 
> session_id=874c5100c59607af:a86e04c8f62bb9a9
> I0915 03:12:33.889168 21677 Frontend.java:1628] 
> 09439d05a2468038:3816f0f2] Analyzing query: select max(t.c1), 
> avg(t.c2), min(t.c3), avg(c4), avg(c5), avg(c6)
> from (select
> max(tinyint_col) over (order by int_col) c1,
> avg(tinyint_col) over (order by smallint_col) c2,
> min(tinyint_col) over (order by smallint_col desc) c3,
> rank() over (order by int_col desc) c4,
> dense_rank() over (order by bigint_col) c5,
> first_value(tinyint_col) over (order by bigint_col desc) c6
> from functional.alltypes) t; db: default
> I0915 03:12:33.981251 21677 FeSupport.java:315] 
> 09439d05a2468038:3816f0f2] Requesting prioritized load of table(s): 
> functional.alltypes
> I0915 03:12:33.986737 21677 thrift-util.cc:99] 
> 09439d05a2468038:3816f0f2] TSocket::open() connect()  Port: 26000>: Connection refused
> I0915 03:12:34.582643 21677 BaseAuthorizationChecker.java:113] 
> 09439d05a2468038:3816f0f2] Authorization check took 693 ms
> I0915 03:12:34.582674 21677 Frontend.java:1671] 
> 09439d05a2468038:3816f0f2] Analysis and authorization finished.
> I0915 03:12:34.723712 21208 control-service.cc:148] 
> 4a4ebd3b7575254c:eb71cd80] ExecQueryFInstances(): 
> query_id=4a4ebd3b7575254c:eb71cd80 
> coord=impala-ec2-centos79-m6i-4xlarge-ondemand-1db1.vpc.cloudera.com:27000 
> #instances=1
> I0915 03:12:34.738032 21758 query-state.cc:942] 
> 4a4ebd3b7575254c:eb71cd82] Executing instance. 
> instance_id=4a4ebd3b7575254c:eb71cd82 fragment_idx=1 
> per_fragment_instance_idx=1 coord_state_idx=1 #in-flight=1
> I0915 03:12:34.850791 21820 admission-controller.cc:1819] 
> 09439d05a2468038:3816f0f2] Trying to admit 
> id=09439d05a2468038:3816f0f2 in pool_name=default-pool 
> executor_group_name=default per_host_mem_estimate=1.34 GB 
> dedicated_coord_mem_estimate=1.10 GB max_requests=-1 max_queued=200 
> max_mem=-1.00 B
> I0915 03:12:34.850811 21820 admission-controller.cc:1827] 
> 09439d05a2468038:3816f0f2] Stats: agg_num_running=1, 
> agg_num_queued=0, agg_mem_reserved=1.56 GB,  local_host(local_mem_admitted=0, 
> num_admitted_running=0, num_queued=0, backend_mem_reserved=192.46 MB, 
> topN_query_stats: queries=[4a4ebd3b7575254c:eb71cd80], 
> total_mem_consumed=192.46 MB, fraction_of_pool_total_mem=1; pool_level_stats: 
> num_running=1, min=192.46 MB, max=192.46 MB, pool_total_mem=192.46 MB, 
> average_per_query=192.46 MB)
> I0915 03:12:34.850852 21820 admission-controller.cc:1218] 
> 09439d05a2468038:3816f0f2] Admitting query 
> id=09439d05a2468038:3816f0f2
> I0915 03:12:34.850939 21820 impala-server.cc:2159] 
> 09439d05a2468038:3816f0f2] Registering query locations
> I0915 03:12:34.850998 21820 coordinator.cc:150] 
> 09439d05a2468038:3816f0f2] Exec() 
> query_id=09439d05a2468038:3816f0f2 stmt=select max(t.c1), avg(t.c2), 
> min(t.c3), avg(c4), avg(c5), avg(c6)
> from (select
> max(tinyint_col) over (order by int_col) c1,
> avg(tinyint_col) over (order by smallint_col) c2,
> min(tinyint_col) over (order by smallint_col desc) c3,
> rank() over (order by int_col desc) c4,
> 

[jira] [Updated] (IMPALA-11593) Disk I/O error with NullPointerException from libhdfs in S3 builds

2022-09-26 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell updated IMPALA-11593:
---
Summary: Disk I/O error with NullPointerException from libhdfs in S3 builds 
 (was: TestMemReservations.test_per_backend_min_reservation failed by 
NullPointerException from libhdfs in S3 builds)

> Disk I/O error with NullPointerException from libhdfs in S3 builds
> --
>
> Key: IMPALA-11593
> URL: https://issues.apache.org/jira/browse/IMPALA-11593
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
>  Labels: broken-build
>
> Saw this failure on an S3 build:
> {noformat}
> custom_cluster/test_mem_reservations.py:102: in 
> test_per_backend_min_reservation
> assert t.error is None
> E   assert 'ImpalaBeeswaxException:\n Query aborted:Disk I/O error on 
> impala-ec2-centos79-m6i-4xlarge-ondemand-1db1.vpc.clouderawarehouse/alltypes/year=2009/month=9/090901.txt\nError(255):
>  Unknown error 255\nRoot cause: NullPointerException: \n\n' is None
> E+  where 'ImpalaBeeswaxException:\n Query aborted:Disk I/O error on 
> impala-ec2-centos79-m6i-4xlarge-ondemand-1db1.vpc.clouderawarehouse/alltypes/year=2009/month=9/090901.txt\nError(255):
>  Unknown error 255\nRoot cause: NullPointerException: \n\n' = 
> .error
> {noformat}
> Impalad logs for the query:
> {noformat}
> I0915 03:12:33.839942 21677 impala-server.cc:1333] 
> 09439d05a2468038:3816f0f2] Registered query 
> query_id=09439d05a2468038:3816f0f2 
> session_id=874c5100c59607af:a86e04c8f62bb9a9
> I0915 03:12:33.889168 21677 Frontend.java:1628] 
> 09439d05a2468038:3816f0f2] Analyzing query: select max(t.c1), 
> avg(t.c2), min(t.c3), avg(c4), avg(c5), avg(c6)
> from (select
> max(tinyint_col) over (order by int_col) c1,
> avg(tinyint_col) over (order by smallint_col) c2,
> min(tinyint_col) over (order by smallint_col desc) c3,
> rank() over (order by int_col desc) c4,
> dense_rank() over (order by bigint_col) c5,
> first_value(tinyint_col) over (order by bigint_col desc) c6
> from functional.alltypes) t; db: default
> I0915 03:12:33.981251 21677 FeSupport.java:315] 
> 09439d05a2468038:3816f0f2] Requesting prioritized load of table(s): 
> functional.alltypes
> I0915 03:12:33.986737 21677 thrift-util.cc:99] 
> 09439d05a2468038:3816f0f2] TSocket::open() connect()  Port: 26000>: Connection refused
> I0915 03:12:34.582643 21677 BaseAuthorizationChecker.java:113] 
> 09439d05a2468038:3816f0f2] Authorization check took 693 ms
> I0915 03:12:34.582674 21677 Frontend.java:1671] 
> 09439d05a2468038:3816f0f2] Analysis and authorization finished.
> I0915 03:12:34.723712 21208 control-service.cc:148] 
> 4a4ebd3b7575254c:eb71cd80] ExecQueryFInstances(): 
> query_id=4a4ebd3b7575254c:eb71cd80 
> coord=impala-ec2-centos79-m6i-4xlarge-ondemand-1db1.vpc.cloudera.com:27000 
> #instances=1
> I0915 03:12:34.738032 21758 query-state.cc:942] 
> 4a4ebd3b7575254c:eb71cd82] Executing instance. 
> instance_id=4a4ebd3b7575254c:eb71cd82 fragment_idx=1 
> per_fragment_instance_idx=1 coord_state_idx=1 #in-flight=1
> I0915 03:12:34.850791 21820 admission-controller.cc:1819] 
> 09439d05a2468038:3816f0f2] Trying to admit 
> id=09439d05a2468038:3816f0f2 in pool_name=default-pool 
> executor_group_name=default per_host_mem_estimate=1.34 GB 
> dedicated_coord_mem_estimate=1.10 GB max_requests=-1 max_queued=200 
> max_mem=-1.00 B
> I0915 03:12:34.850811 21820 admission-controller.cc:1827] 
> 09439d05a2468038:3816f0f2] Stats: agg_num_running=1, 
> agg_num_queued=0, agg_mem_reserved=1.56 GB,  local_host(local_mem_admitted=0, 
> num_admitted_running=0, num_queued=0, backend_mem_reserved=192.46 MB, 
> topN_query_stats: queries=[4a4ebd3b7575254c:eb71cd80], 
> total_mem_consumed=192.46 MB, fraction_of_pool_total_mem=1; pool_level_stats: 
> num_running=1, min=192.46 MB, max=192.46 MB, pool_total_mem=192.46 MB, 
> average_per_query=192.46 MB)
> I0915 03:12:34.850852 21820 admission-controller.cc:1218] 
> 09439d05a2468038:3816f0f2] Admitting query 
> id=09439d05a2468038:3816f0f2
> I0915 03:12:34.850939 21820 impala-server.cc:2159] 
> 09439d05a2468038:3816f0f2] Registering query locations
> I0915 03:12:34.850998 21820 coordinator.cc:150] 
> 09439d05a2468038:3816f0f2] Exec() 
> query_id=09439d05a2468038:3816f0f2 stmt=select max(t.c1), avg(t.c2), 
> min(t.c3), avg(c4), avg(c5), avg(c6)
> from (select
> max(tinyint_col) over (order by int_col) c1,
> avg(tinyint_col) over (order by smallint_col) c2,
> min(tinyint_col) over (order by 

[jira] [Commented] (IMPALA-11592) NullPointerException from s3a.Listing.createObjectListingIterator in various S3 tests

2022-09-26 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609618#comment-17609618
 ] 

Joe McDonnell commented on IMPALA-11592:


I'm turning this into a more generic JIRA, because we've seen this symptom in 
various s3 tests.

> NullPointerException from s3a.Listing.createObjectListingIterator in various 
> S3 tests
> -
>
> Key: IMPALA-11592
> URL: https://issues.apache.org/jira/browse/IMPALA-11592
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
>  Labels: broken-build
> Attachments: catalogd.INFO.gz, impalad.INFO.gz, 
> impalad_node1.INFO.gz, impalad_node2.INFO.gz
>
>
> custom_cluster.test_local_catalog.TestLocalCatalogRetries.test_fetch_metadata_retry
>  fails in a S3 build:
> {noformat}
> custom_cluster/test_local_catalog.py:317: in test_fetch_metadata_retry
> seen = self._check_metadata_retries(queries)
> custom_cluster/test_local_catalog.py:293: in _check_metadata_retries
> assert failed_queries.empty(),\
> E   AssertionError: Failed query count non zero: [('refresh 
> functional.alltypes', 'ImpalaBeeswaxException:\n Query 
> aborted:TableLoadingException: Refreshing file and block metadata for 24 
> paths for table functional.alltypes: failed to load 1 paths. Check the 
> catalog server log for more details.\n\n')]
> E   assert  0x7fe3fb710128>>()
> E+  where  0x7fe3fb710128>> = .empty
> {noformat}
> Looking into the catalog server log, there is a NullPointerException:
> {noformat}
> E0916 21:15:27.469425 25508 ParallelFileMetadataLoader.java:171] Refreshing 
> file and block metadata for 24 paths for table functional.alltypes 
> encountered an error loading data for path 
> s3a://impala-test-uswest2-2/test-warehouse/alltypes/year=2010/month=9
> Java exception follows:
> java.util.concurrent.ExecutionException: java.lang.NullPointerException
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> at 
> org.apache.impala.catalog.ParallelFileMetadataLoader.loadInternal(ParallelFileMetadataLoader.java:168)
> at 
> org.apache.impala.catalog.ParallelFileMetadataLoader.load(ParallelFileMetadataLoader.java:120)
> at 
> org.apache.impala.catalog.HdfsTable.loadFileMetadataForPartitions(HdfsTable.java:781)
> at org.apache.impala.catalog.HdfsTable.access$100(HdfsTable.java:153)
> at 
> org.apache.impala.catalog.HdfsTable$PartitionDeltaUpdater.apply(HdfsTable.java:1534)
> at 
> org.apache.impala.catalog.HdfsTable.updatePartitionsFromHms(HdfsTable.java:1411)
> at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1254)
> at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1179)
> at 
> org.apache.impala.catalog.CatalogServiceCatalog.reloadTable(CatalogServiceCatalog.java:2551)
> at 
> org.apache.impala.service.CatalogOpExecutor.execResetMetadata(CatalogOpExecutor.java:6158)
> at 
> org.apache.impala.service.JniCatalog.resetMetadata(JniCatalog.java:287)
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.fs.s3a.Listing$ObjectListingIterator.(Listing.java:621)
> at 
> org.apache.hadoop.fs.s3a.Listing.createObjectListingIterator(Listing.java:163)
> at 
> org.apache.hadoop.fs.s3a.Listing.createFileStatusListingIterator(Listing.java:144)
> at 
> org.apache.hadoop.fs.s3a.Listing.getListFilesAssumingDir(Listing.java:212)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.innerListFiles(S3AFileSystem.java:4790)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$listFiles$37(S3AFileSystem.java:4732)
> at 
> org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.invokeTrackingDuration(IOStatisticsBinding.java:543)
> at 
> org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:524)
> at 
> org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDuration(IOStatisticsBinding.java:445)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2363)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2382)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.listFiles(S3AFileSystem.java:4731)
> at 
> org.apache.impala.common.FileSystemUtil.listFiles(FileSystemUtil.java:754)
> at 
> org.apache.impala.common.FileSystemUtil.listStatus(FileSystemUtil.java:729)
> at 
> org.apache.impala.catalog.FileMetadataLoader.load(FileMetadataLoader.java:190)
> at 
> 

[jira] [Updated] (IMPALA-11592) NullPointerException from s3a.Listing.createObjectListingIterator in various S3 tests

2022-09-26 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell updated IMPALA-11592:
---
Summary: NullPointerException from s3a.Listing.createObjectListingIterator 
in various S3 tests  (was: TestLocalCatalogRetries.test_fetch_metadata_retry 
fails in S3 build)

> NullPointerException from s3a.Listing.createObjectListingIterator in various 
> S3 tests
> -
>
> Key: IMPALA-11592
> URL: https://issues.apache.org/jira/browse/IMPALA-11592
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
>  Labels: broken-build
> Attachments: catalogd.INFO.gz, impalad.INFO.gz, 
> impalad_node1.INFO.gz, impalad_node2.INFO.gz
>
>
> custom_cluster.test_local_catalog.TestLocalCatalogRetries.test_fetch_metadata_retry
>  fails in a S3 build:
> {noformat}
> custom_cluster/test_local_catalog.py:317: in test_fetch_metadata_retry
> seen = self._check_metadata_retries(queries)
> custom_cluster/test_local_catalog.py:293: in _check_metadata_retries
> assert failed_queries.empty(),\
> E   AssertionError: Failed query count non zero: [('refresh 
> functional.alltypes', 'ImpalaBeeswaxException:\n Query 
> aborted:TableLoadingException: Refreshing file and block metadata for 24 
> paths for table functional.alltypes: failed to load 1 paths. Check the 
> catalog server log for more details.\n\n')]
> E   assert  0x7fe3fb710128>>()
> E+  where  0x7fe3fb710128>> = .empty
> {noformat}
> Looking into the catalog server log, there is a NullPointerException:
> {noformat}
> E0916 21:15:27.469425 25508 ParallelFileMetadataLoader.java:171] Refreshing 
> file and block metadata for 24 paths for table functional.alltypes 
> encountered an error loading data for path 
> s3a://impala-test-uswest2-2/test-warehouse/alltypes/year=2010/month=9
> Java exception follows:
> java.util.concurrent.ExecutionException: java.lang.NullPointerException
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> at 
> org.apache.impala.catalog.ParallelFileMetadataLoader.loadInternal(ParallelFileMetadataLoader.java:168)
> at 
> org.apache.impala.catalog.ParallelFileMetadataLoader.load(ParallelFileMetadataLoader.java:120)
> at 
> org.apache.impala.catalog.HdfsTable.loadFileMetadataForPartitions(HdfsTable.java:781)
> at org.apache.impala.catalog.HdfsTable.access$100(HdfsTable.java:153)
> at 
> org.apache.impala.catalog.HdfsTable$PartitionDeltaUpdater.apply(HdfsTable.java:1534)
> at 
> org.apache.impala.catalog.HdfsTable.updatePartitionsFromHms(HdfsTable.java:1411)
> at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1254)
> at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1179)
> at 
> org.apache.impala.catalog.CatalogServiceCatalog.reloadTable(CatalogServiceCatalog.java:2551)
> at 
> org.apache.impala.service.CatalogOpExecutor.execResetMetadata(CatalogOpExecutor.java:6158)
> at 
> org.apache.impala.service.JniCatalog.resetMetadata(JniCatalog.java:287)
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.fs.s3a.Listing$ObjectListingIterator.(Listing.java:621)
> at 
> org.apache.hadoop.fs.s3a.Listing.createObjectListingIterator(Listing.java:163)
> at 
> org.apache.hadoop.fs.s3a.Listing.createFileStatusListingIterator(Listing.java:144)
> at 
> org.apache.hadoop.fs.s3a.Listing.getListFilesAssumingDir(Listing.java:212)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.innerListFiles(S3AFileSystem.java:4790)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$listFiles$37(S3AFileSystem.java:4732)
> at 
> org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.invokeTrackingDuration(IOStatisticsBinding.java:543)
> at 
> org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:524)
> at 
> org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDuration(IOStatisticsBinding.java:445)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2363)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2382)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.listFiles(S3AFileSystem.java:4731)
> at 
> org.apache.impala.common.FileSystemUtil.listFiles(FileSystemUtil.java:754)
> at 
> org.apache.impala.common.FileSystemUtil.listStatus(FileSystemUtil.java:729)
> at 
> org.apache.impala.catalog.FileMetadataLoader.load(FileMetadataLoader.java:190)

[jira] [Created] (IMPALA-11616) TestFrontendConnectionLimit.test_server_busy is flaky

2022-09-26 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-11616:
--

 Summary: TestFrontendConnectionLimit.test_server_busy is flaky
 Key: IMPALA-11616
 URL: https://issues.apache.org/jira/browse/IMPALA-11616
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 4.2.0
Reporter: Joe McDonnell


TestFrontendConnectionLimit.test_server_busy failed with the following symptom:
{noformat}
custom_cluster/test_frontend_connection_limit.py:76: in test_server_busy
client.execute_async("select sleep(7000)")
common/impala_connection.py:220: in execute_async
beeswax_handle = self.__beeswax_client.execute_query_async(sql_stmt, 
user=user)
beeswax/impala_beeswax.py:359: in execute_query_async
handle = self.__do_rpc(lambda: self.imp_service.query(query,))
beeswax/impala_beeswax.py:525: in __do_rpc
raise ImpalaBeeswaxException(self.__build_error_message(e), e)
E   ImpalaBeeswaxException: ImpalaBeeswaxException:
EINNER EXCEPTION: 
EMESSAGE: TSocket read 0 bytes{noformat}
This has been seen once.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-11511) Provide an option to build with compressed debug info

2022-09-26 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609617#comment-17609617
 ] 

ASF subversion and git services commented on IMPALA-11511:
--

Commit 10c19b1a5730a898e17cc653be6bd19f0dc3340e in impala's branch 
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=10c19b1a5 ]

IMPALA-11511: Add build options for reducing binary sizes

Impala's build produces dozens of C++ binaries
that link in all Impala libraries. Each binary is
hundreds of megabytes, leading to 10s of gigabytes
of disk space. A large proportion of this (~80%) is debug
information. The debug information increases in newer
versions of GCC such as GCC 10.

This introduces two options for reducing the size
of debug information:
 - IMPALA_MINIMAL_DEBUG_INFO=true builds Impala with
   minimal debug information (-g1). This contains line tables
   and can resolve backtraces, but it does not contain
   variable information and restricts further debugging.
 - IMPALA_COMPRESSED_DEBUG_INFO=true builds Impala with
   compressed debug information (-gz). This does not change
   the debug information included, but the compression saves
   significant disk space. gdb is known to work with
   compressed debug information, but other tools may not
   support it. The dump_breakpad_symbols.py script has been
   adjusted to handle these binaries.
These are disabled by default.

Release impalad binary sizes:
Configuration  | Size (bytes) | % reduction over base
Base   | 707834808| N/A
Stripped   |  83351664| 88%
Minimal debuginfo  | 215924096| 69%
Compressed debuginfo   | 301619286| 57%
Minimal + compressed debuginfo | 120886705| 83%

Testing:
 - Generated minidumps and resolved them
 - Verified this is disabled by default

Change-Id: I04a20258a86053d8f3972b9c7c81cd5bec1bbb66
Reviewed-on: http://gerrit.cloudera.org:8080/18962
Reviewed-by: Michael Smith 
Reviewed-by: Wenzhe Zhou 
Tested-by: Impala Public Jenkins 


> Provide an option to build with compressed debug info
> -
>
> Key: IMPALA-11511
> URL: https://issues.apache.org/jira/browse/IMPALA-11511
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 4.2.0
>Reporter: Joe McDonnell
>Priority: Major
>
> For builds with debug information, the debug information is often a large 
> portion of the binary size. There is a feature that compresses the debug info 
> using ZLIB via the "-gz" compilation flag. It makes a very large difference 
> to the size of our binaries:
> {noformat}
> GCC 7.5:
> debug: 726767520
> debug with -gz: 325970776
> release: 707911496
> with with -gz: 301671026
> GCC 10.4:
> debug: 870378832
> debug with -gz: 351442253
> release: 974600536
> release with -gz: 367938487{noformat}
> The size reduction would be useful for developers, but support in other tools 
> is mixed. gdb has support and seems to work fine. breakpad does not have 
> support. However, it is easy to convert a binary with compressed debug 
> symbols to one with normal debug symbols using objcopy:
> {noformat}
> objcopy --decompress-debug-sections [in_binary] [out_binary]{noformat}
> Given a minidump, it is possible to run objcopy to decompress the debug 
> symbols for the original binary, dump the breakpad symbols, and then process 
> the minidump successfully. So, it should be possible to modify 
> bin/dump_breakpad_symbols.py to do this automatically.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-11615) TestAdmissionController.test_impala_server_startup_delay fails on S3

2022-09-26 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell updated IMPALA-11615:
---
Labels: broken-build flaky  (was: )

> TestAdmissionController.test_impala_server_startup_delay fails on S3
> 
>
> Key: IMPALA-11615
> URL: https://issues.apache.org/jira/browse/IMPALA-11615
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.2.0
>Reporter: Joe McDonnell
>Priority: Critical
>  Labels: broken-build, flaky
>
> A recent run of s3 tests saw this failure in 
> TestAdmissionController.test_impala_server_startup_delay():
> {noformat}
> custom_cluster/test_admission_controller.py:1270: in 
> test_impala_server_startup_delay
> client = self.create_impala_client()
> common/impala_test_suite.py:307: in create_impala_client
> client.connect()
> common/impala_connection.py:197: in connect
> self.__beeswax_client.connect()
> beeswax/impala_beeswax.py:164: in connect
> raise ImpalaBeeswaxException(self.__build_error_message(e), e)
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EINNER EXCEPTION:  'thrift.transport.TTransport.TTransportException'>
> EMESSAGE: Could not connect to any of [('::1', 21000, 0, 0), 
> ('127.0.0.1', 21000)]{noformat}
> The startup script is not seeing running Impalads:
> {noformat}
> 11:24:10 MainThread: Waiting for num_known_live_backends=3. Current value: 
> None
> 11:24:11 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 11:24:11 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos79-m6i-4xlarge-ondemand-11f9.vpc.cloudera.com:25000
> 11:24:11 MainThread: 'backends'
> 11:24:11 MainThread: Waiting for num_known_live_backends=3. Current value: 
> None
> 11:24:12 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 11:24:12 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos79-m6i-4xlarge-ondemand-11f9.vpc.cloudera.com:25000
> 11:24:12 MainThread: 'backends'
> 11:24:12 MainThread: Waiting for num_known_live_backends=3. Current value: 
> None
> 11:24:13 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 11:24:13 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos79-m6i-4xlarge-ondemand-11f9.vpc.cloudera.com:25000
> 11:24:13 MainThread: 'backends'
> 11:24:13 MainThread: Waiting for num_known_live_backends=3. Current value: 
> None{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-11615) TestAdmissionController.test_impala_server_startup_delay fails on S3

2022-09-26 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-11615:
--

 Summary: TestAdmissionController.test_impala_server_startup_delay 
fails on S3
 Key: IMPALA-11615
 URL: https://issues.apache.org/jira/browse/IMPALA-11615
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 4.2.0
Reporter: Joe McDonnell


A recent run of s3 tests saw this failure in 
TestAdmissionController.test_impala_server_startup_delay():
{noformat}
custom_cluster/test_admission_controller.py:1270: in 
test_impala_server_startup_delay
client = self.create_impala_client()
common/impala_test_suite.py:307: in create_impala_client
client.connect()
common/impala_connection.py:197: in connect
self.__beeswax_client.connect()
beeswax/impala_beeswax.py:164: in connect
raise ImpalaBeeswaxException(self.__build_error_message(e), e)
E   ImpalaBeeswaxException: ImpalaBeeswaxException:
EINNER EXCEPTION: 
EMESSAGE: Could not connect to any of [('::1', 21000, 0, 0), ('127.0.0.1', 
21000)]{noformat}
The startup script is not seeing running Impalads:
{noformat}
11:24:10 MainThread: Waiting for num_known_live_backends=3. Current value: None
11:24:11 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
11:24:11 MainThread: Getting num_known_live_backends from 
impala-ec2-centos79-m6i-4xlarge-ondemand-11f9.vpc.cloudera.com:25000
11:24:11 MainThread: 'backends'
11:24:11 MainThread: Waiting for num_known_live_backends=3. Current value: None
11:24:12 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
11:24:12 MainThread: Getting num_known_live_backends from 
impala-ec2-centos79-m6i-4xlarge-ondemand-11f9.vpc.cloudera.com:25000
11:24:12 MainThread: 'backends'
11:24:12 MainThread: Waiting for num_known_live_backends=3. Current value: None
11:24:13 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
11:24:13 MainThread: Getting num_known_live_backends from 
impala-ec2-centos79-m6i-4xlarge-ondemand-11f9.vpc.cloudera.com:25000
11:24:13 MainThread: 'backends'
11:24:13 MainThread: Waiting for num_known_live_backends=3. Current value: 
None{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-11614) TestValidateMetrics.test_metrics_are_zero fails with num-missing-volume-id for Ozone

2022-09-26 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith reassigned IMPALA-11614:
--

Assignee: Michael Smith

> TestValidateMetrics.test_metrics_are_zero fails with num-missing-volume-id 
> for Ozone
> 
>
> Key: IMPALA-11614
> URL: https://issues.apache.org/jira/browse/IMPALA-11614
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Major
>
> TestValidateMetrics.test_metrics_are_zero fails for Ozone with
> {code}
> /data/jenkins/workspace/impala-private-ozone-parameterized/repos/Impala/tests/common/impala_service.py:210:
>  in __metric_timeout_assert
> assert 0, assert_string
> E   AssertionError: Metric impala-server.scan-ranges.num-missing-volume-id 
> did not reach value 0 in 60s.
> {code}
> This passed at one point: 
> https://master-03.jenkins.cloudera.com/view/Impala/view/Evergreen-asf-master/job/impala-asf-master-core-ozone/7/
>  with commit 17ec3a85c7e3733dacb08a9fcca83fff5ec75102 succeeded. I suspect it 
> started failing as a result of 79e474d310 (IMPALA-10213) which changed some 
> characteristics about how scheduling works for Ozone tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-11614) TestValidateMetrics.test_metrics_are_zero fails with num-missing-volume-id for Ozone

2022-09-26 Thread Michael Smith (Jira)
Michael Smith created IMPALA-11614:
--

 Summary: TestValidateMetrics.test_metrics_are_zero fails with 
num-missing-volume-id for Ozone
 Key: IMPALA-11614
 URL: https://issues.apache.org/jira/browse/IMPALA-11614
 Project: IMPALA
  Issue Type: Bug
Reporter: Michael Smith


TestValidateMetrics.test_metrics_are_zero fails for Ozone with
{code}
/data/jenkins/workspace/impala-private-ozone-parameterized/repos/Impala/tests/common/impala_service.py:210:
 in __metric_timeout_assert
assert 0, assert_string
E   AssertionError: Metric impala-server.scan-ranges.num-missing-volume-id did 
not reach value 0 in 60s.
{code}

This passed at one point: 
https://master-03.jenkins.cloudera.com/view/Impala/view/Evergreen-asf-master/job/impala-asf-master-core-ozone/7/
 with commit 17ec3a85c7e3733dacb08a9fcca83fff5ec75102 succeeded. I suspect it 
started failing as a result of 79e474d310 (IMPALA-10213) which changed some 
characteristics about how scheduling works for Ozone tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-11508) Iceberg test test_expire_snapshots is flaky

2022-09-26 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609564#comment-17609564
 ] 

ASF subversion and git services commented on IMPALA-11508:
--

Commit aaf6fdc645b2f67e67f009c8e6dcf3ce25b22c98 in impala's branch 
refs/heads/master from Zoltan Borok-Nagy
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=aaf6fdc64 ]

IMPALA-11508: Deflake test_expire_snapshots

Before this patch test_expire_snapshots failed frequently. The patch is
only a few lines of code, but there are some subtleties here:

IcebergCatalogOpExecutor.alterTableExecute() didn't use Iceberg
transactions to carry out the operation. This means that
expireSnapshots() resulted in an ALTER TABLE operation on its own
which we also got during event processing.

Because this ALTER TABLE event didn't had the catalog version set we
didn't recognized it as a self-event. This caused unnecessary table
reloads during the tests which manifested in
InconsistentMetadataFetchException: "... table ... changed version
between accesses" errors.

With this patch IcebergCatalogOpExecutor.alterTableExecute() takes
an Iceberg transaction object and invokes expireSnapshots() in the
context of this Iceberg transaction. This Iceberg transaction also
sets table properties "impala.events.catalogServiceId" and
"impala.events.catalogVersion". And because everything happens in
a single Iceberg transaction we only create a single ALTER TABLE
which we can recognize during event processing (based on the
table properties), avoiding unnecessary table reloads.

Testing:
 * executed test_expire_snapshots in a loop

Change-Id: I6d82c8b52466a24af096fe5fe4dbd034a1ee6a15
Reviewed-on: http://gerrit.cloudera.org:8080/19036
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Iceberg test test_expire_snapshots is flaky
> ---
>
> Key: IMPALA-11508
> URL: https://issues.apache.org/jira/browse/IMPALA-11508
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: broken-build, impala-iceberg
>
> h2. Stacktrace
> {noformat}
> query_test/test_iceberg.py:104: in test_expire_snapshots
> impalad_client.execute(insert_q)
> common/impala_connection.py:212: in execute
> return self.__beeswax_client.execute(sql_stmt, user=user)
> beeswax/impala_beeswax.py:189: in execute
> handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:365: in __execute_query
> handle = self.execute_query_async(query_string, user=user)
> beeswax/impala_beeswax.py:359: in execute_query_async
> handle = self.__do_rpc(lambda: self.imp_service.query(query,))
> beeswax/impala_beeswax.py:522: in __do_rpc
> raise ImpalaBeeswaxException(self.__build_error_message(b), b)
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EINNER EXCEPTION: 
> EMESSAGE: AnalysisException: 
> org.apache.impala.catalog.TableLoadingException: Error opening Iceberg table 
> 'test_expire_snapshots_e488dbc3.expire_snapshots'
> E   CAUSED BY: TableLoadingException: Error opening Iceberg table 
> 'test_expire_snapshots_e488dbc3.expire_snapshots'
> E   CAUSED BY: InconsistentMetadataFetchException: Catalog object 
> TCatalogObject(type:TABLE, catalog_version:7309, 
> table:TTable(db_name:test_expire_snapshots_e488dbc3, 
> tbl_name:expire_snapshots)) changed version between accesses.
> {noformat}
> The error might be due to not detecting all self-events correctly, so the 
> IcebergTable gets updated on the CatalogD side.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-11582) Implement table sampling for Iceberg tables

2022-09-26 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609565#comment-17609565
 ] 

ASF subversion and git services commented on IMPALA-11582:
--

Commit b91aa065377a1d154cd9eb5f5cd9ffb6da919b65 in impala's branch 
refs/heads/master from Zoltan Borok-Nagy
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=b91aa0653 ]

IMPALA-11582: Implement table sampling for Iceberg tables

This patch adds table sampling functionalities for Iceberg tables.
>From now it's possible to execute SELECT and COMPUTE STATS statements
with table sampling.

Predicates in the WHERE clause affect the results of table sampling
similarly to how legacy tables work (sampling is applied after static
partition and file pruning).

Sampling is repeatable via the REPEATABLE clause.

Testing
 * planner tests
 * e2e tests for V1 and V2 tables

Change-Id: I5de151747c0e9d9379a4051252175fccf42efd7d
Reviewed-on: http://gerrit.cloudera.org:8080/18989
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Implement table sampling for Iceberg tables
> ---
>
> Key: IMPALA-11582
> URL: https://issues.apache.org/jira/browse/IMPALA-11582
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-iceberg
>
> Currently Iceberg tables cannot be sampled.
> We should allow table sampling for Iceberg tables in SELECT and COMPUTE STATS 
> statements.
> Predicates in the WHERE clause should affect the results of table sampling 
> similarly to how legacy tables work (sampling is applied after static 
> partition pruning).
> Make the operation repeatable via the REPEATABLE clause.
> See details at 
> https://impala.apache.org/docs/build/html/topics/impala_tablesample.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10889) TestExecutorGroups.test_admission_control_with_multiple_coords is flaky

2022-09-26 Thread Michael Smith (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609559#comment-17609559
 ] 

Michael Smith commented on IMPALA-10889:


This looks like a duplicate of IMPALA-10877 as well.

> TestExecutorGroups.test_admission_control_with_multiple_coords is flaky
> ---
>
> Key: IMPALA-10889
> URL: https://issues.apache.org/jira/browse/IMPALA-10889
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
>  Labels: broken-build
>
> Failed test: 
> custom_cluster.test_executor_groups.TestExecutorGroups.test_admission_control_with_multiple_coords
> Error Message
> {code}
> AssertionError: Metric admission-controller.agg-num-running.default-pool did 
> not reach value 1 in 30s. Dumping debug webpages in JSON format... Dumped 
> memz JSON to 
> $IMPALA_HOME/logs/metric_timeout_diags_20210826_05:47:13/json/memz.json 
> Dumped metrics JSON to 
> $IMPALA_HOME/logs/metric_timeout_diags_20210826_05:47:13/json/metrics.json 
> Dumped queries JSON to 
> $IMPALA_HOME/logs/metric_timeout_diags_20210826_05:47:13/json/queries.json 
> Dumped sessions JSON to 
> $IMPALA_HOME/logs/metric_timeout_diags_20210826_05:47:13/json/sessions.json 
> Dumped threadz JSON to 
> $IMPALA_HOME/logs/metric_timeout_diags_20210826_05:47:13/json/threadz.json 
> Dumped rpcz JSON to 
> $IMPALA_HOME/logs/metric_timeout_diags_20210826_05:47:13/json/rpcz.json 
> Dumping minidumps for impalads/catalogds... Dumped minidump for Impalad PID 
> 23807 Dumped minidump for Impalad PID 23810 Dumped minidump for Impalad PID 
> 24897 Dumped minidump for Impalad PID 24900 Dumped minidump for Catalogd PID 
> 23746{code}
> Stacktrace
> {code}
> custom_cluster/test_executor_groups.py:579: in 
> test_admission_control_with_multiple_coords
> "admission-controller.agg-num-running.default-pool", 1, timeout=30)
> common/impala_service.py:143: in wait_for_metric_value
> self.__metric_timeout_assert(metric_name, expected_value, timeout)
> common/impala_service.py:210: in __metric_timeout_assert
> assert 0, assert_string
> E   AssertionError: Metric admission-controller.agg-num-running.default-pool 
> did not reach value 1 in 30s.
> E   Dumping debug webpages in JSON format...
> E   Dumped memz JSON to 
> $IMPALA_HOME/logs/metric_timeout_diags_20210826_05:47:13/json/memz.json
> E   Dumped metrics JSON to 
> $IMPALA_HOME/logs/metric_timeout_diags_20210826_05:47:13/json/metrics.json
> E   Dumped queries JSON to 
> $IMPALA_HOME/logs/metric_timeout_diags_20210826_05:47:13/json/queries.json
> E   Dumped sessions JSON to 
> $IMPALA_HOME/logs/metric_timeout_diags_20210826_05:47:13/json/sessions.json
> E   Dumped threadz JSON to 
> $IMPALA_HOME/logs/metric_timeout_diags_20210826_05:47:13/json/threadz.json
> E   Dumped rpcz JSON to 
> $IMPALA_HOME/logs/metric_timeout_diags_20210826_05:47:13/json/rpcz.json
> E   Dumping minidumps for impalads/catalogds...
> E   Dumped minidump for Impalad PID 23807
> E   Dumped minidump for Impalad PID 23810
> E   Dumped minidump for Impalad PID 24897
> E   Dumped minidump for Impalad PID 24900
> E   Dumped minidump for Catalogd PID 23746
> {code}
> Standard Error
> {code}
> -- 2021-08-26 05:46:20,996 INFO MainThread: Starting cluster with 
> command: 
> /data/jenkins/workspace/impala-asf-master-core-ubsan/repos/Impala/bin/start-impala-cluster.py
>  '--state_store_args=--statestore_update_frequency_ms=50 
> --statestore_priority_update_frequency_ms=50 
> --statestore_heartbeat_frequency_ms=50' --cluster_size=1 --num_coordinators=1 
> --log_dir=/data/jenkins/workspace/impala-asf-master-core-ubsan/repos/Impala/logs/custom_cluster_tests
>  --log_level=1 --use_exclusive_coordinators '--impalad_args= 
> -executor_groups=coordinator ' --impalad_args=--default_query_options=
> 05:46:21 MainThread: Starting impala cluster without executors
> 05:46:21 MainThread: Found 0 impalad/0 statestored/0 catalogd process(es)
> 05:46:21 MainThread: Starting State Store logging to 
> /data/jenkins/workspace/impala-asf-master-core-ubsan/repos/Impala/logs/custom_cluster_tests/statestored.INFO
> 05:46:21 MainThread: Starting Catalog Service logging to 
> /data/jenkins/workspace/impala-asf-master-core-ubsan/repos/Impala/logs/custom_cluster_tests/catalogd.INFO
> 05:46:21 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-asf-master-core-ubsan/repos/Impala/logs/custom_cluster_tests/impalad.INFO
> 05:46:24 MainThread: Found 1 impalad/1 statestored/1 catalogd process(es)
> 05:46:24 MainThread: Found 1 impalad/1 statestored/1 catalogd process(es)
> 05:46:24 MainThread: Getting num_known_live_backends from 
> 

[jira] [Updated] (IMPALA-11612) ORDER BY expression not produced by aggregation output

2022-09-26 Thread jhkcool (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jhkcool updated IMPALA-11612:
-
Description: 
Create execute plan failed : org.apache.impala.common.AnalysisException: ORDER 
BY expression not produced by aggregation output (missing from GROUP BY 
clause?): (CASE WHEN (to_date(CAST(t1.dt AS TIMESTAMP)) = TIMESTAMP '2022-05-12 
00:00:00') THEN '2022-05-12' WHEN (to_date(CAST(t1.dt AS TIMESTAMP)) = 
TIMESTAMP '2022-05-13 00:00:00') THEN '2022-05-13' END) at 
org.apache.impala.analysis.SelectStmt$SelectAnalyzer.verifyAggregation(SelectStmt.java:1090)
 at 
org.apache.impala.analysis.SelectStmt$SelectAnalyzer.analyze(SelectStmt.java:312)
 at 
org.apache.impala.analysis.SelectStmt$SelectAnalyzer.access$100(SelectStmt.java:276)
 at org.apache.impala.analysis.SelectStmt.analyze(SelectStmt.java:269) at 
org.apache.impala.analysis.AnalysisContext.reAnalyze(AnalysisContext.java:611) 
at org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:585) 
at 
org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:468)
 at org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2084) 
at org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1967) at 
org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1791) at 
org.apache.impala.service.Frontend$4.run(Frontend.java:2867) at 
org.apache.impala.service.Frontend$4.run(Frontend.java:2863) at 
java.security.AccessController.doPrivileged(Native Method) at 
javax.security.auth.Subject.doAs(Subject.java:360) at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1864)
 at 
org.apache.impala.service.Frontend.createExecRequestWithProxy(Frontend.java:2863)
 at 
org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:208)

query_option:
ENABLE_EXPR_REWRITES : true
while execute query sql:

{noformat}
SELECT (1 = 1 AND CASE WHEN (TO_DATE(CAST(`t1`.`dt` AS TIMESTAMP)) = 
CAST('2022-05-12' AS TIMESTAMP)) THEN '2022-05-12' WHEN (TO_DATE(CAST(`t1`.`dt` 
AS TIMESTAMP)) = CAST('2022-05-13' AS TIMESTAMP)) THEN '2022-05-13' END) d0
FROM `jhk_test`.`p1` `t1`
WHERE (TO_DATE(CAST(`t1`.`dt` AS TIMESTAMP)) >= CAST('2022-05-12' AS 
TIMESTAMP)) AND (TO_DATE(CAST(`t1`.`dt` AS TIMESTAMP)) < CAST('2022-05-19' AS 
TIMESTAMP))
GROUP BY (1 = 1 AND CASE WHEN (TO_DATE(CAST(`t1`.`dt` AS TIMESTAMP)) = 
CAST('2022-05-12' AS TIMESTAMP)) THEN '2022-05-12' WHEN (TO_DATE(CAST(`t1`.`dt` 
AS TIMESTAMP)) = CAST('2022-05-13' AS TIMESTAMP)) THEN '2022-05-13' END)
ORDER BY (1 = 1 AND CASE WHEN (TO_DATE(CAST(`t1`.`dt` AS TIMESTAMP)) = 
CAST('2022-05-12' AS TIMESTAMP)) THEN '2022-05-12' WHEN (TO_DATE(CAST(`t1`.`dt` 
AS TIMESTAMP)) = CAST('2022-05-13' AS TIMESTAMP)) THEN '2022-05-13' END)
limit 10
{noformat}


  was:
Create execute plan failed : org.apache.impala.common.AnalysisException: ORDER 
BY expression not produced by aggregation output (missing from GROUP BY 
clause?): (CASE WHEN (to_date(CAST(t1.dt AS TIMESTAMP)) = TIMESTAMP '2022-05-12 
00:00:00') THEN '2022-05-12' WHEN (to_date(CAST(t1.dt AS TIMESTAMP)) = 
TIMESTAMP '2022-05-13 00:00:00') THEN '2022-05-13' END) at 
org.apache.impala.analysis.SelectStmt$SelectAnalyzer.verifyAggregation(SelectStmt.java:1090)
 at 
org.apache.impala.analysis.SelectStmt$SelectAnalyzer.analyze(SelectStmt.java:312)
 at 
org.apache.impala.analysis.SelectStmt$SelectAnalyzer.access$100(SelectStmt.java:276)
 at org.apache.impala.analysis.SelectStmt.analyze(SelectStmt.java:269) at 
org.apache.impala.analysis.AnalysisContext.reAnalyze(AnalysisContext.java:611) 
at org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:585) 
at 
org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:468)
 at org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2084) 
at org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1967) at 
org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1791) at 
org.apache.impala.service.Frontend$4.run(Frontend.java:2867) at 
org.apache.impala.service.Frontend$4.run(Frontend.java:2863) at 
java.security.AccessController.doPrivileged(Native Method) at 
javax.security.auth.Subject.doAs(Subject.java:360) at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1864)
 at 
org.apache.impala.service.Frontend.createExecRequestWithProxy(Frontend.java:2863)
 at 
org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:208)

query_option:
ENABLE_EXPR_REWRITES : true
while execute query sql:

{noformat}
SELECT (CASE WHEN (TO_DATE(CAST(`t1`.`dt` AS TIMESTAMP)) = CAST('2022-05-12' AS 
TIMESTAMP)) THEN '2022-05-12' WHEN (TO_DATE(CAST(`t1`.`dt` AS TIMESTAMP)) = 
CAST('2022-05-13' AS TIMESTAMP)) THEN '2022-05-13' END) d0
FROM `jhk_test`.`p1` `t1`
WHERE (TO_DATE(CAST(`t1`.`dt` AS TIMESTAMP)) >= CAST('2022-05-12' AS 
TIMESTAMP)) AND 

[jira] [Commented] (IMPALA-11508) Iceberg test test_expire_snapshots is flaky

2022-09-26 Thread Quanlong Huang (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-11508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609474#comment-17609474
 ] 

Quanlong Huang commented on IMPALA-11508:
-

Saw this again in 
https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/6370

> Iceberg test test_expire_snapshots is flaky
> ---
>
> Key: IMPALA-11508
> URL: https://issues.apache.org/jira/browse/IMPALA-11508
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: broken-build, impala-iceberg
>
> h2. Stacktrace
> {noformat}
> query_test/test_iceberg.py:104: in test_expire_snapshots
> impalad_client.execute(insert_q)
> common/impala_connection.py:212: in execute
> return self.__beeswax_client.execute(sql_stmt, user=user)
> beeswax/impala_beeswax.py:189: in execute
> handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:365: in __execute_query
> handle = self.execute_query_async(query_string, user=user)
> beeswax/impala_beeswax.py:359: in execute_query_async
> handle = self.__do_rpc(lambda: self.imp_service.query(query,))
> beeswax/impala_beeswax.py:522: in __do_rpc
> raise ImpalaBeeswaxException(self.__build_error_message(b), b)
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EINNER EXCEPTION: 
> EMESSAGE: AnalysisException: 
> org.apache.impala.catalog.TableLoadingException: Error opening Iceberg table 
> 'test_expire_snapshots_e488dbc3.expire_snapshots'
> E   CAUSED BY: TableLoadingException: Error opening Iceberg table 
> 'test_expire_snapshots_e488dbc3.expire_snapshots'
> E   CAUSED BY: InconsistentMetadataFetchException: Catalog object 
> TCatalogObject(type:TABLE, catalog_version:7309, 
> table:TTable(db_name:test_expire_snapshots_e488dbc3, 
> tbl_name:expire_snapshots)) changed version between accesses.
> {noformat}
> The error might be due to not detecting all self-events correctly, so the 
> IcebergTable gets updated on the CatalogD side.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10356) Analyzed query in explain plan is not quite right for insert with values clause

2022-09-26 Thread Daniel Becker (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609449#comment-17609449
 ] 

Daniel Becker commented on IMPALA-10356:


The problem seems to be with how we print the analysed statement, not the 
analysed statement itself. {{SetOperationStmt.toSql()}} implicitly assumes that 
there are at least 2 operands: first it prints the first one separately, then 
prints operands from the second to the one before the last in a loop, then the 
last one also separately. The problem is that when there is only one operand, 
the first and the last ones are the same but no check is performed. See 
[https://github.com/apache/impala/blob/296e94411d3344e2969d4b083036ff238e80ad19/fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java#L540]

A {{SetOperationStmt}} with only one operand is only possible in a 
{{{}ValuesStms{}}}, which is a specialised {{{}UnionStmt{}}}; otherwise it is 
syntactically impossible. The question is how we should print \{{ValuesStmt}}s 
with a single operand:
 * print only the operand
 ** {*}pro{*}: reflects that there is no set operation in the original SQL 
statement
 ** {*}con{*}: doesn't reflect how we actually represent the analysed query in 
Impala
 * print a union with 2 identical operands (this is what we do now)
 ** {*}pro{*}: reflects the representation of the analysed query in that there 
is a UNION statement
 ** {*}con{*}: adds a second operand that is not present in either the original 
SQL or the analysed query
 * invent some syntax to print a union with a single operand
 ** {*}pro{*}: reflects how we  represent the analysed query
 ** {*}con{*}: prints invalid SQL
 * don't convert single-operand VALUES clauses to a UnionStmt
 ** {*}pro{*}: we can correctly represent the analysed query; simplify the 
statement tree - one less level
 ** {*}con{*}: different handling of single-operand VALUES clauses than other 
VALUES clauses

> Analyzed query in explain plan is not quite right for insert with values 
> clause
> ---
>
> Key: IMPALA-10356
> URL: https://issues.apache.org/jira/browse/IMPALA-10356
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 4.0.0
>Reporter: Tim Armstrong
>Assignee: Daniel Becker
>Priority: Major
>  Labels: newbie, ramp-up
>
> In impala-shell:
> {noformat}
> create table double_tbl (d double) stored as textfile;
> set explain_level=2;
> explain insert into double_tbl values (-0.43149576573887316);
> {noformat}
> {noformat}
> +--+
> | Explain String  
>  |
> +--+
> | Max Per-Host Resource Reservation: Memory=0B Threads=1  
>  |
> | Per-Host Resource Estimates: Memory=10MB
>  |
> | Codegen disabled by planner 
>  |
> | Analyzed query: SELECT CAST(-0.43149576573887316 AS DECIMAL(17,17)) UNION 
> SELECT |
> | CAST(-0.43149576573887316 AS DECIMAL(17,17))
>  |
> | 
>  |
> | F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1   
>  |
> | |  Per-Host Resources: mem-estimate=8B mem-reservation=0B 
> thread-reservation=1   |
> | WRITE TO HDFS [default.double_tbl, OVERWRITE=false] 
>  |
> | |  partitions=1 
>  |
> | |  output exprs: CAST(-0.43149576573887316 AS DOUBLE)   
>  |
> | |  mem-estimate=8B mem-reservation=0B thread-reservation=0  
>  |
> | |   
>  |
> | 00:UNION
>  |
> |constant-operands=1  
>  |
> |mem-estimate=0B mem-reservation=0B thread-reservation=0  
>  |
> |tuple-ids=0 row-size=8B cardinality=1
>  |
> |in pipelines:  
>  |
> +--+
> {noformat}
> The analyzed query does not make sense. We should investigate and fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: 

[jira] [Updated] (IMPALA-11558) Looks like memory leak when select from kudu table concurrently

2022-09-26 Thread Xianqing He (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianqing He updated IMPALA-11558:
-
Summary: Looks like memory leak when select from kudu table concurrently  
(was: Memory leak when select from kudu table concurrently)

> Looks like memory leak when select from kudu table concurrently
> ---
>
> Key: IMPALA-11558
> URL: https://issues.apache.org/jira/browse/IMPALA-11558
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 4.0.0, Impala 4.1.0
>Reporter: Xianqing He
>Assignee: Xianqing He
>Priority: Major
> Attachments: image-2022-09-08-13-45-38-446.png
>
>
> Reproduce:
>  # restart impala
>  # query kudu table concurrently, eg. select count(*) from tpch_kudu.lineitem
> The untracked memory will increase to a  large value.
> !image-2022-09-08-13-45-38-446.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-11558) Memory leak when select from kudu table concurrently

2022-09-26 Thread Xianqing He (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianqing He updated IMPALA-11558:
-
Labels:   (was: memory-leak)

> Memory leak when select from kudu table concurrently
> 
>
> Key: IMPALA-11558
> URL: https://issues.apache.org/jira/browse/IMPALA-11558
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 4.0.0, Impala 4.1.0
>Reporter: Xianqing He
>Assignee: Xianqing He
>Priority: Major
> Attachments: image-2022-09-08-13-45-38-446.png
>
>
> Reproduce:
>  # restart impala
>  # query kudu table concurrently, eg. select count(*) from tpch_kudu.lineitem
> The untracked memory will increase to a  large value.
> !image-2022-09-08-13-45-38-446.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-11576) query_test.test_iceberg.test_multiple_storage_locations fails on S3

2022-09-26 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gergely Fürnstáhl updated IMPALA-11576:
---
Labels: broken-build impala-iceberg  (was: broken-build)

> query_test.test_iceberg.test_multiple_storage_locations fails on S3
> ---
>
> Key: IMPALA-11576
> URL: https://issues.apache.org/jira/browse/IMPALA-11576
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Laszlo Gaal
>Assignee: Gergely Fürnstáhl
>Priority: Blocker
>  Labels: broken-build, impala-iceberg
>
> The test seems to fail on a badly constructed file name.
> Stack trace:{code}
> query_test.test_iceberg.TestIcebergTable.test_multiple_storage_locations[protocol:
>  beeswax | exec_option: {'test_replan': 1, 'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> parquet/none]
> query_test/test_iceberg.py:785: in test_multiple_storage_locations
> vector, unique_database)
> common/impala_test_suite.py:706: in run_test_case
> result = exec_fn(query, user=test_section.get('USER', '').strip() or None)
> common/impala_test_suite.py:644: in __exec_in_impala
> result = self.__execute_query(target_impalad_client, query, user=user)
> common/impala_test_suite.py:980: in __execute_query
> return impalad_client.execute(query, user=user)
> common/impala_connection.py:212: in execute
> return self.__beeswax_client.execute(sql_stmt, user=user)
> beeswax/impala_beeswax.py:189: in execute
> handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:367: in __execute_query
> self.wait_for_finished(handle)
> beeswax/impala_beeswax.py:388: in wait_for_finished
> raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EQuery aborted:Disk I/O error on 
> impala-ec2-centos79-m6i-4xlarge-ondemand-1313.vpc.cloudera.com:27001: Failed 
> to open HDFS file 
> s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations/s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations_data02/col_int=2/1-1-26bc91ef-b403-4b65-a6b0-566396b8d097-1.parquet
> E   Error(2): No such file or directory
> E   Root cause: FileNotFoundException: No such file or directory: 
> s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations/s3a:/impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations_data02/col_int=2/1-1-26bc91ef-b403-4b65-a6b0-566396b8d097-1.parquet{code}
> Here the file name (this is a single, contuguous string despite the apparent 
> line breaks!)
> s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations/s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations_data02/col_int=2/1-1-26bc91ef-b403-4b65-a6b0-566396b8d097-1.parquet
> contains the s3a: protocol specifier in the middle of the string, which seems 
> to be a result of a badly set up concatenation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-11576) query_test.test_iceberg.test_multiple_storage_locations fails on S3

2022-09-26 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-11576 started by Gergely Fürnstáhl.
--
> query_test.test_iceberg.test_multiple_storage_locations fails on S3
> ---
>
> Key: IMPALA-11576
> URL: https://issues.apache.org/jira/browse/IMPALA-11576
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Laszlo Gaal
>Assignee: Gergely Fürnstáhl
>Priority: Blocker
>  Labels: broken-build
>
> The test seems to fail on a badly constructed file name.
> Stack trace:{code}
> query_test.test_iceberg.TestIcebergTable.test_multiple_storage_locations[protocol:
>  beeswax | exec_option: {'test_replan': 1, 'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> parquet/none]
> query_test/test_iceberg.py:785: in test_multiple_storage_locations
> vector, unique_database)
> common/impala_test_suite.py:706: in run_test_case
> result = exec_fn(query, user=test_section.get('USER', '').strip() or None)
> common/impala_test_suite.py:644: in __exec_in_impala
> result = self.__execute_query(target_impalad_client, query, user=user)
> common/impala_test_suite.py:980: in __execute_query
> return impalad_client.execute(query, user=user)
> common/impala_connection.py:212: in execute
> return self.__beeswax_client.execute(sql_stmt, user=user)
> beeswax/impala_beeswax.py:189: in execute
> handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:367: in __execute_query
> self.wait_for_finished(handle)
> beeswax/impala_beeswax.py:388: in wait_for_finished
> raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EQuery aborted:Disk I/O error on 
> impala-ec2-centos79-m6i-4xlarge-ondemand-1313.vpc.cloudera.com:27001: Failed 
> to open HDFS file 
> s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations/s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations_data02/col_int=2/1-1-26bc91ef-b403-4b65-a6b0-566396b8d097-1.parquet
> E   Error(2): No such file or directory
> E   Root cause: FileNotFoundException: No such file or directory: 
> s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations/s3a:/impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations_data02/col_int=2/1-1-26bc91ef-b403-4b65-a6b0-566396b8d097-1.parquet{code}
> Here the file name (this is a single, contuguous string despite the apparent 
> line breaks!)
> s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations/s3a://impala-test-uswest2-2/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_multiple_storage_locations_data02/col_int=2/1-1-26bc91ef-b403-4b65-a6b0-566396b8d097-1.parquet
> contains the s3a: protocol specifier in the middle of the string, which seems 
> to be a result of a badly set up concatenation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-11613) Optimize result spooling for the statement that returns at most one row

2022-09-26 Thread Xianqing He (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianqing He updated IMPALA-11613:
-
Description: 
If result spooling is enabled and the statement that returns at most one row, 
we can 
set the min memory reservation by the  row size.
 
The strategy is as follows: 
 # If it contains string or complex data types we set the reservation as 
max_row_size.
 # Others compute the row size.

  was:
If result spooling is enabled and the statement that returns at most one row, 
we can 
set the min memory reservation by the  row size.
 
The strategy is as follows: 
 # If it contains complex data types we set the reservation as before.
 # If it contains string type we set the reservation as max_row_size.
 # Others compute the row size.


> Optimize result spooling for the statement that returns at most one row
> ---
>
> Key: IMPALA-11613
> URL: https://issues.apache.org/jira/browse/IMPALA-11613
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Xianqing He
>Assignee: Xianqing He
>Priority: Major
> Fix For: Impala 4.2.0
>
>
> If result spooling is enabled and the statement that returns at most one row, 
> we can 
> set the min memory reservation by the  row size.
>  
> The strategy is as follows: 
>  # If it contains string or complex data types we set the reservation as 
> max_row_size.
>  # Others compute the row size.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-11613) Optimize result spooling for the statement that returns at most one row

2022-09-26 Thread Xianqing He (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianqing He updated IMPALA-11613:
-
Description: 
If result spooling is enabled and the statement that returns at most one row, 
we can 
set the min memory reservation by the  row size.
 
The strategy is as follows: 
 # If it contains complex data types we set the reservation as before.
 # If it contains string type we set the reservation as max_row_size.
 # Others compute the row size.

  was:
If result spooling is enabled and the statement that returns at most one row, 
we can 
set the min memory reservation by the  row size.
 
The strategy is as follows: # If it contains complex data types we set the 
reservation as before.
 # If it contains string type we set the reservation as max_row_size.
 # Others compute the row size.


> Optimize result spooling for the statement that returns at most one row
> ---
>
> Key: IMPALA-11613
> URL: https://issues.apache.org/jira/browse/IMPALA-11613
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Xianqing He
>Assignee: Xianqing He
>Priority: Major
> Fix For: Impala 4.2.0
>
>
> If result spooling is enabled and the statement that returns at most one row, 
> we can 
> set the min memory reservation by the  row size.
>  
> The strategy is as follows: 
>  # If it contains complex data types we set the reservation as before.
>  # If it contains string type we set the reservation as max_row_size.
>  # Others compute the row size.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-11613) Optimize result spooling for the statement that returns at most one row

2022-09-26 Thread Xianqing He (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianqing He updated IMPALA-11613:
-
Fix Version/s: Impala 4.2.0

> Optimize result spooling for the statement that returns at most one row
> ---
>
> Key: IMPALA-11613
> URL: https://issues.apache.org/jira/browse/IMPALA-11613
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Xianqing He
>Assignee: Xianqing He
>Priority: Major
> Fix For: Impala 4.2.0
>
>
> If result spooling is enabled and the statement that returns at most one row, 
> we can 
> set the min memory reservation by the  row size.
>  
> The strategy is as follows: # If it contains complex data types we set the 
> reservation as before.
>  # If it contains string type we set the reservation as max_row_size.
>  # Others compute the row size.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-11613) Optimize result spooling for the statement that returns at most one row

2022-09-26 Thread Xianqing He (Jira)
Xianqing He created IMPALA-11613:


 Summary: Optimize result spooling for the statement that returns 
at most one row
 Key: IMPALA-11613
 URL: https://issues.apache.org/jira/browse/IMPALA-11613
 Project: IMPALA
  Issue Type: Improvement
Reporter: Xianqing He
Assignee: Xianqing He


If result spooling is enabled and the statement that returns at most one row, 
we can 
set the min memory reservation by the  row size.
 
The strategy is as follows: # If it contains complex data types we set the 
reservation as before.
 # If it contains string type we set the reservation as max_row_size.
 # Others compute the row size.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-11594) TestIcebergTable.test_create_table_like_parquet fails in non-HDFS build

2022-09-26 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-11594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-11594 started by Gergely Fürnstáhl.
--
> TestIcebergTable.test_create_table_like_parquet fails in non-HDFS build
> ---
>
> Key: IMPALA-11594
> URL: https://issues.apache.org/jira/browse/IMPALA-11594
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Quanlong Huang
>Assignee: Gergely Fürnstáhl
>Priority: Critical
>  Labels: broken-build, impala-iceberg
>
> TestIcebergTable.test_create_table_like_parquet added by IMPALA-11438 failed 
> in S3 builds:
> {code:java}
> query_test/test_iceberg.py:819: in test_create_table_like_parquet
> self._create_table_like_parquet_helper(vector, unique_database, tbl_name, 
> False)
> query_test/test_iceberg.py:806: in _create_table_like_parquet_helper
> assert hdfs_file
> E   assert None {code}
> Standard Error:
> {noformat}
> SET 
> client_identifier=query_test/test_iceberg.py::TestIcebergTable::()::test_create_table_like_parquet[protocol:beeswax|exec_option:{'test_replan':1;'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':False;'abort_on_error':1;'exec_single_node_row;
> SET sync_ddl=False;
> -- executing against localhost:21000
> DROP DATABASE IF EXISTS `test_create_table_like_parquet_6e658131` CASCADE;
> -- 2022-09-18 11:38:30,905 INFO MainThread: Started query 
> 9742962ebcabc35b:5c668e45
> SET 
> client_identifier=query_test/test_iceberg.py::TestIcebergTable::()::test_create_table_like_parquet[protocol:beeswax|exec_option:{'test_replan':1;'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':False;'abort_on_error':1;'exec_single_node_row;
> SET sync_ddl=False;
> -- executing against localhost:21000
> CREATE DATABASE `test_create_table_like_parquet_6e658131`;
> -- 2022-09-18 11:38:38,160 INFO MainThread: Started query 
> b84f60c7670afd2e:37deca4d
> -- 2022-09-18 11:38:38,765 INFO MainThread: Created database 
> "test_create_table_like_parquet_6e658131" for test ID 
> "query_test/test_iceberg.py::TestIcebergTable::()::test_create_table_like_parquet[protocol:
>  beeswax | exec_option: {'test_replan': 1, 'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> parquet/none]"
> 22/09/18 11:38:39 WARN impl.MetricsConfig: Cannot locate configuration: tried 
> hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
> 22/09/18 11:38:39 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot 
> period at 10 second(s).
> 22/09/18 11:38:39 INFO impl.MetricsSystemImpl: s3a-file-system metrics system 
> started
> 22/09/18 11:38:40 INFO Configuration.deprecation: No unit for 
> fs.s3a.connection.request.timeout(0) assuming SECONDS
> 22/09/18 11:38:41 INFO impl.MetricsSystemImpl: Stopping s3a-file-system 
> metrics system...
> 22/09/18 11:38:41 INFO impl.MetricsSystemImpl: s3a-file-system metrics system 
> stopped.
> 22/09/18 11:38:41 INFO impl.MetricsSystemImpl: s3a-file-system metrics system 
> shutdown complete.
> -- executing against localhost:21000
> create table test_create_table_like_parquet_6e658131.alltypes_tiny_pages like 
> parquet 
> "/test-warehouse/test_create_table_like_parquet_6e658131.db/alltypes_tiny_pages.parquet"
>  stored as parquet;
> -- 2022-09-18 11:38:41,292 INFO MainThread: Started query 
> c94eb08d583843d6:78da5179
> -- executing against localhost:21000
> load data inpath 
> "/test-warehouse/test_create_table_like_parquet_6e658131.db/alltypes_tiny_pages.parquet"
>  into table test_create_table_like_parquet_6e658131.alltypes_tiny_pages;
> -- 2022-09-18 11:38:45,913 INFO MainThread: Started query 
> 014969c4fd6439be:bdf84d62 {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-11594) TestIcebergTable.test_create_table_like_parquet fails in non-HDFS build

2022-09-26 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-11594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gergely Fürnstáhl updated IMPALA-11594:
---
Labels: broken-build impala-iceberg  (was: broken-build)

> TestIcebergTable.test_create_table_like_parquet fails in non-HDFS build
> ---
>
> Key: IMPALA-11594
> URL: https://issues.apache.org/jira/browse/IMPALA-11594
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Quanlong Huang
>Assignee: Gergely Fürnstáhl
>Priority: Critical
>  Labels: broken-build, impala-iceberg
>
> TestIcebergTable.test_create_table_like_parquet added by IMPALA-11438 failed 
> in S3 builds:
> {code:java}
> query_test/test_iceberg.py:819: in test_create_table_like_parquet
> self._create_table_like_parquet_helper(vector, unique_database, tbl_name, 
> False)
> query_test/test_iceberg.py:806: in _create_table_like_parquet_helper
> assert hdfs_file
> E   assert None {code}
> Standard Error:
> {noformat}
> SET 
> client_identifier=query_test/test_iceberg.py::TestIcebergTable::()::test_create_table_like_parquet[protocol:beeswax|exec_option:{'test_replan':1;'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':False;'abort_on_error':1;'exec_single_node_row;
> SET sync_ddl=False;
> -- executing against localhost:21000
> DROP DATABASE IF EXISTS `test_create_table_like_parquet_6e658131` CASCADE;
> -- 2022-09-18 11:38:30,905 INFO MainThread: Started query 
> 9742962ebcabc35b:5c668e45
> SET 
> client_identifier=query_test/test_iceberg.py::TestIcebergTable::()::test_create_table_like_parquet[protocol:beeswax|exec_option:{'test_replan':1;'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':False;'abort_on_error':1;'exec_single_node_row;
> SET sync_ddl=False;
> -- executing against localhost:21000
> CREATE DATABASE `test_create_table_like_parquet_6e658131`;
> -- 2022-09-18 11:38:38,160 INFO MainThread: Started query 
> b84f60c7670afd2e:37deca4d
> -- 2022-09-18 11:38:38,765 INFO MainThread: Created database 
> "test_create_table_like_parquet_6e658131" for test ID 
> "query_test/test_iceberg.py::TestIcebergTable::()::test_create_table_like_parquet[protocol:
>  beeswax | exec_option: {'test_replan': 1, 'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> parquet/none]"
> 22/09/18 11:38:39 WARN impl.MetricsConfig: Cannot locate configuration: tried 
> hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
> 22/09/18 11:38:39 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot 
> period at 10 second(s).
> 22/09/18 11:38:39 INFO impl.MetricsSystemImpl: s3a-file-system metrics system 
> started
> 22/09/18 11:38:40 INFO Configuration.deprecation: No unit for 
> fs.s3a.connection.request.timeout(0) assuming SECONDS
> 22/09/18 11:38:41 INFO impl.MetricsSystemImpl: Stopping s3a-file-system 
> metrics system...
> 22/09/18 11:38:41 INFO impl.MetricsSystemImpl: s3a-file-system metrics system 
> stopped.
> 22/09/18 11:38:41 INFO impl.MetricsSystemImpl: s3a-file-system metrics system 
> shutdown complete.
> -- executing against localhost:21000
> create table test_create_table_like_parquet_6e658131.alltypes_tiny_pages like 
> parquet 
> "/test-warehouse/test_create_table_like_parquet_6e658131.db/alltypes_tiny_pages.parquet"
>  stored as parquet;
> -- 2022-09-18 11:38:41,292 INFO MainThread: Started query 
> c94eb08d583843d6:78da5179
> -- executing against localhost:21000
> load data inpath 
> "/test-warehouse/test_create_table_like_parquet_6e658131.db/alltypes_tiny_pages.parquet"
>  into table test_create_table_like_parquet_6e658131.alltypes_tiny_pages;
> -- 2022-09-18 11:38:45,913 INFO MainThread: Started query 
> 014969c4fd6439be:bdf84d62 {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10771) Support Tencent COS File System

2022-09-26 Thread LiPenglin (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609357#comment-17609357
 ] 

LiPenglin commented on IMPALA-10771:


ClassNotfoundexception was thrown when using 
"cosn://test-warehouse-xxx"(https://cloud.tencent.com/product/cos). I guess 
that 'CDP_HADOOP_VERSION=3.1.1.7.2.16.0-164' and 
'IMPALA_COS_VERSION=3.1.0-5.9.3' might not match. I solved the problem by 
temporarily replacing them with cos_api-bundle-5.6.69.jar and 
hadoop-cos-3.1.0-8.0.8.jar 
(https://github.com/tencentyun/hadoop-cos/releases/tag/v8.0.8) locally.

FYI:
https://github.com/tencentyun/hadoop-cos/releases
The latest version also does not match 'CDP_HADOOP_VERSION=3.1.1.7.2.16.0-164'.

> Support Tencent COS File System
> ---
>
> Key: IMPALA-10771
> URL: https://issues.apache.org/jira/browse/IMPALA-10771
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend
>Reporter: Fucun Chu
>Assignee: Fucun Chu
>Priority: Major
>  Labels: connector
>
> [Tencent COS|https://intl.cloud.tencent.com/product/cos] is a famous object 
> storage system provided by Tencent Corp. Hadoop-COS is a client that makes 
> the upper computing systems based on HDFS be able to use the COS as its 
> underlying storage system. The big data-processing systems that have been 
> identified for support are: Hadoop MR, Spark, Alluxio and etc. In addition, 
> Druid also can use COS as its deep storage by configuring HDFS-Load-Plugin 
> integerating with HADOOP-COS.
> More information:
> https://hadoop.apache.org/docs/current/hadoop-cos/cloud-storage/index.html
> Hadoop 3.3 officially supports COS storage , see: 
> [HADOOP-15616|https://issues.apache.org/jira/browse/HADOOP-15616]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org