[jira] [Commented] (IMPALA-9884) TestAdmissionControllerStress.test_mem_limit failing occasionally
[ https://issues.apache.org/jira/browse/IMPALA-9884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217255#comment-17217255 ] Tim Armstrong commented on IMPALA-9884: --- On that executor we see that it finished in a few seconds. {noformat} I1017 01:11:24.338460 24753 control-service.cc:142] 1b4e1ee5d51fc461:12219325] ExecQueryFInstances(): query_id=1b4e1ee5d51fc461:12219325 coord=impala-ec2-centos74-m5-4xlarge-ondemand-1925.vpc.cloudera.com:27000 #instances=1 I1017 01:11:24.349339 25196 query-state.cc:897] 1b4e1ee5d51fc461:122193250002] Executing instance. instance_id=1b4e1ee5d51fc461:122193250002 fragment_idx=1 per_fragment_instance_idx=1 coord_state_idx=2 #in-flight=3 ... I1017 01:11:29.631078 25196 query-state.cc:906] 1b4e1ee5d51fc461:122193250002] Instance completed. instance_id=1b4e1ee5d51fc461:122193250002 #in-flight=2 status=OK I1017 01:11:29.631088 25178 query-state.cc:464] 1b4e1ee5d51fc461:12219325] UpdateBackendExecState(): last report for 1b4e1ee5d51fc461:12219325 {noformat} So I think this is a repeat of the scenario in IMPALA-8565, I guess even though I made the query bigger it isn't enough. I'll probably just increase it further. > TestAdmissionControllerStress.test_mem_limit failing occasionally > - > > Key: IMPALA-9884 > URL: https://issues.apache.org/jira/browse/IMPALA-9884 > Project: IMPALA > Issue Type: Bug >Affects Versions: Impala 4.0 >Reporter: Vihang Karajgaonkar >Assignee: Tim Armstrong >Priority: Blocker > Labels: broken-build, flaky > Attachments: impalad-executors.tar.gz, > impalad.impala-ec2-centos74-m5-4xlarge-ondemand-1925.vpc.cloudera.com.jenkins.log.INFO.20201017-06.23933.gz > > > Recently, I saw this test failing with the exception trace below. > {noformat} > custom_cluster/test_admission_controller.py:1782: in test_mem_limit > {'request_pool': self.pool_name, 'mem_limit': query_mem_limit}) > custom_cluster/test_admission_controller.py:1638: in run_admission_test > assert metric_deltas['dequeued'] == 0,\ > E AssertionError: Queued queries should not run until others are made to > finish > E assert 1 == 0 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9884) TestAdmissionControllerStress.test_mem_limit failing occasionally
[ https://issues.apache.org/jira/browse/IMPALA-9884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217253#comment-17217253 ] Tim Armstrong commented on IMPALA-9884: --- {noformat} I1017 01:11:24.339452 25165 admission-controller.cc:1638] 3144178b629c699c:dde994b7] Stats: agg_num_running=5, agg_num_queued=0, agg_mem_reserved=0, local_host(local_mem_admitted=12.00 GB, num_admitted_running=5, num_queued=0, backend_mem_reserved=0, topN_query_stats: queries=[], total_mem_consumed=0; pool_level_stats: num_running=0, min=0, max=0, pool_total_mem=0) ... I1017 01:11:24.339519 25165 admission-controller.cc:1195] 3144178b629c699c:dde994b7] Queuing, query id=3144178b629c699c:dde994b7 reason: Not enough aggregate memory available in pool default-pool with max mem resources 12.00 GB. Needed 2.40 GB but only 18.00 B was available. ... I1017 01:11:29.640173 24428 admission-controller.cc:1630] Trying to admit id=3144178b629c699c:dde994b7 in pool_name=default-pool executor_group_name=default per_host_mem_estimate=81.29 MB dedicated_coord_mem_estimate=101.29 MB max_requests=150 max_queued=10 max_mem=12.00 GB I1017 01:11:29.640328 24428 admission-controller.cc:1652] Cannot admit query 3144178b629c699c:dde994b7 to group default: Not enough aggregate memory available in pool default-pool with max mem resources 12.00 GB. Needed 2.40 GB but only 18.00 B was available. Details: I1017 01:11:29.640334 24428 admission-controller.cc:1851] Could not dequeue query id=3144178b629c699c:dde994b7 reason: Not enough aggregate memory available in pool default-pool with max mem resources 12.00 GB. Needed 2.40 GB but only 18.00 B was available. I1017 01:11:29.677559 24428 admission-controller.cc:1630] Trying to admit id=3144178b629c699c:dde994b7 in pool_name=default-pool executor_group_name=default per_host_mem_estimate=81.29 MB dedicated_coord_mem_estimate=101.29 MB max_requests=150 max_queued=10 max_mem=12.00 GB I1017 01:11:29.677701 24428 admission-controller.cc:1786] Admitting from queue: query=3144178b629c699c:dde994b7 I1017 01:11:29.677712 24428 admission-controller.cc:1878] For Query 3144178b629c699c:dde994b7 per_backend_mem_limit set to: 819.20 MB per_backend_mem_to_admit set to: 819.20 MB coord_backend_mem_limit set to: 819.20 MB coord_backend_mem_to_admit set to: 819.20 MB I1017 01:11:29.677990 25165 admission-controller.cc:1273] 3144178b629c699c:dde994b7] Admitted queued query id=3144178b629c699c:dde994b7 I1017 01:11:29.678004 25165 admission-controller.cc:1274] 3144178b629c699c:dde994b7] Final: agg_num_running=6, agg_num_queued=9, agg_mem_reserved=9.60 GB, local_host(local_mem_admitted=12.00 GB, num_admitted_running=6, num_queued=9, backend_mem_reserved=4.00 GB, topN_query_stats: queries=[8f462fa2ce60d289:e0631471, d5466702e1e5c14e:43f31d30, 1b4e1ee5d51fc461:12219325, cf498fd1ece032b6:b4f673d1, 4a4d18e5caa85310:022e0229], total_mem_consumed=59.95 MB, fraction_of_pool_total_mem=1; pool_level_stats: num_running=5, min=5.03 MB, max=13.76 MB, pool_total_mem=59.95 MB, average_per_query=11.99 MB) {noformat} It looks like this was able to be dequeued because a query finished running on a backend: {noformat} I1017 01:11:29.639609 24226 coordinator.cc:959] Backend completed: host=impala-ec2-centos74-m5-4xlarge-ondemand-1925.vpc.cloudera.com:27001 remaining=3 query_id= [^impalad-executors.tar.gz] 00 I1017 01:11:29.639629 24226 coordinator-backend-state.cc:362] query_id=1b4e1ee5d51fc461:12219325: first in-progress backend: impala-ec2-centos74-m5-4xlarge-ondemand-1925.vpc.cloudera.com:27000 I1017 01:11:29.639644 24226 admission-controller.cc:759] Update admitted mem reserved for host=impala-ec2-centos74-m5-4xlarge-ondemand-1925.vpc.cloudera.com:27001 prev=2.40 GB new=1.60 GB I1017 01:11:29.639657 24226 admission-controller.cc:764] Update admitted queries for host=impala-ec2-centos74-m5-4xlarge-ondemand-1925.vpc.cloudera.com:27001 prev=3 new=2 I1017 01:11:29.639659 24226 admission-controller.cc:769] Update slots in use for host=impala-ec2-centos74-m5-4xlarge-ondemand-1925.vpc.cloudera.com:27001 prev=3 new=2 I1017 01:11:29.639701 24226 admission-controller.cc:1337] Released query backend(s) impala-ec2-centos74-m5-4xlarge-ondemand-1925.vpc.cloudera.com:27001 for query id=1b4e1ee5d51fc461:12219325 agg_num_running=5, agg_num_queued=10, agg_mem_reserved=12.00 GB, local_host(local_mem_admitted=9.60 GB, num_admitted_running=5, num_queued=10, backend_mem_reserved=4.00 GB, topN_query_stats: queries=[cf498fd1ece032b6:b4f673d1, d5466702e1e5c14e:43f31d30, 1b4e1ee5d51fc461:12219325, 8f462fa2ce60d289:e0631471, 4a4d18e5caa85310:022e0229], total_mem_consumed=37.12 MB, fraction_of_pool_total_mem=1; pool_level_stats: num_running=5,
[jira] [Updated] (IMPALA-9884) TestAdmissionControllerStress.test_mem_limit failing occasionally
[ https://issues.apache.org/jira/browse/IMPALA-9884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-9884: -- Attachment: impalad-executors.tar.gz > TestAdmissionControllerStress.test_mem_limit failing occasionally > - > > Key: IMPALA-9884 > URL: https://issues.apache.org/jira/browse/IMPALA-9884 > Project: IMPALA > Issue Type: Bug >Affects Versions: Impala 4.0 >Reporter: Vihang Karajgaonkar >Assignee: Tim Armstrong >Priority: Blocker > Labels: broken-build, flaky > Attachments: impalad-executors.tar.gz, > impalad.impala-ec2-centos74-m5-4xlarge-ondemand-1925.vpc.cloudera.com.jenkins.log.INFO.20201017-06.23933.gz > > > Recently, I saw this test failing with the exception trace below. > {noformat} > custom_cluster/test_admission_controller.py:1782: in test_mem_limit > {'request_pool': self.pool_name, 'mem_limit': query_mem_limit}) > custom_cluster/test_admission_controller.py:1638: in run_admission_test > assert metric_deltas['dequeued'] == 0,\ > E AssertionError: Queued queries should not run until others are made to > finish > E assert 1 == 0 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-10261) impala-minimal-hive-exec should include org/apache/hive/com/google/**
[ https://issues.apache.org/jira/browse/IMPALA-10261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe McDonnell resolved IMPALA-10261. Fix Version/s: Impala 4.0 Assignee: Joe McDonnell Resolution: Fixed > impala-minimal-hive-exec should include org/apache/hive/com/google/** > - > > Key: IMPALA-10261 > URL: https://issues.apache.org/jira/browse/IMPALA-10261 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Affects Versions: Impala 4.0 >Reporter: Joe McDonnell >Assignee: Joe McDonnell >Priority: Critical > Fix For: Impala 4.0 > > > Hive started shading guava (com/google) with HIVE-22126, so > impala-minimal-hive-exec should add org/apache/hive/com/google to its > inclusions. This will allow Impala to build/work with newer versions of Hive > that have this change. Leaving the existing com/google inclusion should let > it work with both: > [https://github.com/apache/impala/blob/master/java/shaded-deps/hive-exec/pom.xml#L116] > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9918) HdfsOrcScanner crash on resolving columns
[ https://issues.apache.org/jira/browse/IMPALA-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217250#comment-17217250 ] ASF subversion and git services commented on IMPALA-9918: - Commit 1e2176c84909a26e6405df7ae6d34d724e5a5217 in impala's branch refs/heads/master from Csaba Ringhofer [ https://gitbox.apache.org/repos/asf?p=impala.git;h=1e2176c ] IMPALA-9918: ORC scanner hits DCHECK when GLOG_v=3 PrintPath assumed that all elements in the path are complex, and hit a DCHECK if it contained a scalar element. This didn't seem to cause problems in Parquet, but the ORC scanner called this function with paths where the last element was scalar. This problem was probably not discovered because no one tested ORC scanning with v=3 logging + DEBUG builds. Also added logging to the events when log levels are changed through the webpage. In case of ResetJavaLogLevelCallback there was already log line from GlogAppender.java. Note that the cause of the original issue is still unknown, as it occurred during custom cluster tests where no other tests should change the log levels in parallel. Testing: - tested the log changes manually Change-Id: I94e12d2a62ccab5eb5d21675d5f0138f04e622ac Reviewed-on: http://gerrit.cloudera.org:8080/16611 Reviewed-by: Tim Armstrong Tested-by: Impala Public Jenkins > HdfsOrcScanner crash on resolving columns > - > > Key: IMPALA-9918 > URL: https://issues.apache.org/jira/browse/IMPALA-9918 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.0 > Environment: BUILD_TAG > jenkins-impala-cdpd-master-core-ubsan-111 >Reporter: Wenzhe Zhou >Assignee: Csaba Ringhofer >Priority: Major > Labels: broken-build > Attachments: 092420_backtraces.txt, backtraces.txt, backtraces.txt > > > Core file generated in impala-cdpd-master-core-ubsan build > Back traces: > CORE: ./tests/core.1594000709.13971.impalad > BINARY: ./be/build/latest/service/impalad > Core was generated by > `/data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/build/lat'. > Program terminated with signal SIGABRT, Aborted. > #0 0x7f7a481851f7 in raise () from /lib64/libc.so.6 > To enable execution of this file add > add-auto-load-safe-path > /data0/jenkins/workspace/impala-cdpd-master-core-ubsan/Impala-Toolchain/gcc-4.9.2/lib64/libstdc++.so.6.0.20-gdb.py > line to your configuration file "/var/lib/jenkins/.gdbinit". > To completely disable this security protection add > set auto-load safe-path / > line to your configuration file "/var/lib/jenkins/.gdbinit". > For more information about this security protection see the > "Auto-loading safe path" section in the GDB manual. E.g., run from the shell: > info "(gdb)Auto-loading safe path" > #0 0x7f7a481851f7 in raise () from /lib64/libc.so.6 > #1 0x7f7a481868e8 in abort () from /lib64/libc.so.6 > #2 0x083401c4 in google::DumpStackTraceAndExit() () > #3 0x08336b5d in google::LogMessage::Fail() () > #4 0x08338402 in google::LogMessage::SendToLog() () > #5 0x08336537 in google::LogMessage::Flush() () > #6 0x08339afe in google::LogMessageFatal::~LogMessageFatal() () > #7 0x03215662 in impala::PrintPath (tbl_desc=..., path=...) at > /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/util/debug-util.cc:259 > #8 0x0370dfe9 in impala::HdfsOrcScanner::ResolveColumns > (this=0x14555c00, tuple_desc=..., selected_nodes=0x7f79722730a8, > pos_slots=0x7f7972273058) at > /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/exec/hdfs-orc-scanner.cc:436 > #9 0x037099dd in impala::HdfsOrcScanner::SelectColumns > (this=0x14555c00, tuple_desc=...) at > /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/exec/hdfs-orc-scanner.cc:456 > #10 0x03707688 in impala::HdfsOrcScanner::Open (this=0x14555c00, > context=0x7f7972274700) at > /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/exec/hdfs-orc-scanner.cc:221 > #11 0x035e0a48 in > impala::HdfsScanNodeBase::CreateAndOpenScannerHelper (this=0x1b1c7100, > partition=0x142f9d00, context=0x7f7972274700, scanner=0x7f79722746f8) at > /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/exec/hdfs-scan-node-base.cc:882 > #12 0x039df2e8 in impala::HdfsScanNode::ProcessSplit > (this=0x1b1c7100, filter_ctxs=..., expr_results_pool=0x7f7972274bd8, > scan_range=0x12a16c40, scanner_thread_reservation=0x7f7972274e18) at > /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/exec/hdfs-scan-node.cc:480 > #13 0x039ddd85 in impala::HdfsScanNode::ScannerThread > (this=0x1b1c7100, first_thread=true,
[jira] [Commented] (IMPALA-10261) impala-minimal-hive-exec should include org/apache/hive/com/google/**
[ https://issues.apache.org/jira/browse/IMPALA-10261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217251#comment-17217251 ] ASF subversion and git services commented on IMPALA-10261: -- Commit ca4d6912be7c89acd518bbbe44e7c2407f1bb217 in impala's branch refs/heads/master from Joe McDonnell [ https://gitbox.apache.org/repos/asf?p=impala.git;h=ca4d691 ] IMPALA-10261: Include org/apache/hive/com/google in impala-minimal-hive-exec Newer versions of Hive shade guava, which means that they require the presence of artifacts in org/apache/hive/com/google. To support these newer versions, this adds that path to the inclusions for impala-minimal-hive-exec. Testing: - Tested with a newer version of Hive that has the shading and verified that Impala starts up and functions. Change-Id: I87ac089fdacc6fc5089ed68be92dedce514050b9 Reviewed-on: http://gerrit.cloudera.org:8080/16614 Reviewed-by: Tim Armstrong Tested-by: Impala Public Jenkins > impala-minimal-hive-exec should include org/apache/hive/com/google/** > - > > Key: IMPALA-10261 > URL: https://issues.apache.org/jira/browse/IMPALA-10261 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Affects Versions: Impala 4.0 >Reporter: Joe McDonnell >Priority: Critical > > Hive started shading guava (com/google) with HIVE-22126, so > impala-minimal-hive-exec should add org/apache/hive/com/google to its > inclusions. This will allow Impala to build/work with newer versions of Hive > that have this change. Leaving the existing com/google inclusion should let > it work with both: > [https://github.com/apache/impala/blob/master/java/shaded-deps/hive-exec/pom.xml#L116] > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9884) TestAdmissionControllerStress.test_mem_limit failing occasionally
[ https://issues.apache.org/jira/browse/IMPALA-9884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217245#comment-17217245 ] Tim Armstrong commented on IMPALA-9884: --- This assertion is meant to check that, when the first wave of queries is submitted, none of them will be queued and then dequeued (because none of the admitted queries will finish running before the initial wave of admission decisions is made). This could be because a query was dequeued when it shouldn't have been. But it could also be explained if the test is incorrect in detecting when the admission decisions have been made for the first wave. One thing that's Piecing together what happened, the test was stuck for a long time waiting for the initial admission decisions. I see that the last query rejected (out of 34 rejected) in the initial wave was at 01:11:31: {noformat} $ grep 'Rejected' impalad.impala-ec2-centos74-m5-4xlarge-ondemand-1925.vpc.cloudera.com.jenkins.log.INFO.20201017-06.23933 ... I1017 01:11:31.068819 25916 admission-controller.cc:1169] 8448348c952b1d85:ac9a6bb1] Rejected query from pool default-pool: queue full, limit=10, num_queued=10. {noformat} There are 5 queries admitted right away, then a 6th admitted from the queue, which is likely the problematic one: {noformat} $grep 'Admitting.*que' impalad.impala-ec2-centos74-m5-4xlarge-ondemand-1925.vpc.cloudera.com.jenkins.log.INFO.20201017-06.23933 I1017 01:11:24.310699 25150 admission-controller.cc:1185] cf498fd1ece032b6:b4f673d1] Admitting query id=cf498fd1ece032b6:b4f673d1 I1017 01:11:24.317937 25152 admission-controller.cc:1185] 1b4e1ee5d51fc461:12219325] Admitting query id=1b4e1ee5d51fc461:12219325 I1017 01:11:24.321123 25154 admission-controller.cc:1185] 4a4d18e5caa85310:022e0229] Admitting query id=4a4d18e5caa85310:022e0229 I1017 01:11:24.327414 25157 admission-controller.cc:1185] 8f462fa2ce60d289:e0631471] Admitting query id=8f462fa2ce60d289:e0631471 I1017 01:11:24.334887 25161 admission-controller.cc:1185] d5466702e1e5c14e:43f31d30] Admitting query id=d5466702e1e5c14e:43f31d30 I1017 01:11:29.677701 24428 admission-controller.cc:1786] Admitting from queue: query=3144178b629c699c:dde994b7 I1017 01:21:24.471750 24428 admission-controller.cc:1786] Admitting from queue: query=bd408b203d1cc15e:141cd954 I1017 01:21:24.471935 24428 admission-controller.cc:1786] Admitting from queue: query=7e47cf24a70c6e29:fe48a517 I1017 01:21:24.472086 24428 admission-controller.cc:1786] Admitting from queue: query=9041a95ec5b2de44:536b85bc I1017 01:21:24.472236 24428 admission-controller.cc:1786] Admitting from queue: query=9b4b9f3a12265bad:3c8fbdab I1017 01:21:24.472388 24428 admission-controller.cc:1786] Admitting from queue: query=254d8911a574bbda:4ddbf95c I1017 01:21:29.131278 24428 admission-controller.cc:1786] Admitting from queue: query=5744d3d253edafeb:f0bbc048 {noformat} I think the 10 minute gap might be from the test getting stuck at wait_for_admitted_threads here, but I'm not sure: {noformat} LOG.info("Wait for initial admission decisions") (metric_deltas, curr_metrics) = self.wait_for_metric_changes( ['admitted', 'queued', 'rejected'], initial_metrics, num_queries) # Also wait for the test threads that submitted the queries to start executing. self.wait_for_admitted_threads(metric_deltas['admitted']) {noformat} Next is to figure out what happened with 3144178b629c699c:dde994b7 > TestAdmissionControllerStress.test_mem_limit failing occasionally > - > > Key: IMPALA-9884 > URL: https://issues.apache.org/jira/browse/IMPALA-9884 > Project: IMPALA > Issue Type: Bug >Affects Versions: Impala 4.0 >Reporter: Vihang Karajgaonkar >Assignee: Tim Armstrong >Priority: Blocker > Labels: broken-build, flaky > Attachments: > impalad.impala-ec2-centos74-m5-4xlarge-ondemand-1925.vpc.cloudera.com.jenkins.log.INFO.20201017-06.23933.gz > > > Recently, I saw this test failing with the exception trace below. > {noformat} > custom_cluster/test_admission_controller.py:1782: in test_mem_limit > {'request_pool': self.pool_name, 'mem_limit': query_mem_limit}) > custom_cluster/test_admission_controller.py:1638: in run_admission_test > assert metric_deltas['dequeued'] == 0,\ > E AssertionError: Queued queries should not run until others are made to > finish > E assert 1 == 0 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail:
[jira] [Commented] (IMPALA-10256) TestDisableFeatures.test_disable_incremental_metadata_updates fails
[ https://issues.apache.org/jira/browse/IMPALA-10256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217215#comment-17217215 ] Quanlong Huang commented on IMPALA-10256: - S3 doesn't have hdfs cache pools. We should skip this test when running on non-hdfs systems. > TestDisableFeatures.test_disable_incremental_metadata_updates fails > --- > > Key: IMPALA-10256 > URL: https://issues.apache.org/jira/browse/IMPALA-10256 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Blocker > Labels: broken-build > > Saw test failures in internal CORE S3 builds: > custom_cluster.test_disable_features.TestDisableFeatures.test_disable_incremental_metadata_updates[protocol: > beeswax | exec_option: \{'batch_size': 0, 'num_nodes': 0, > 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, > 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: > text/none-unique_database0] > {code:java} > custom_cluster/test_disable_features.py:45: in > test_disable_incremental_metadata_updates > use_db=unique_database, multiple_impalad=True) > common/impala_test_suite.py:662: in run_test_case > result = exec_fn(query, user=test_section.get('USER', '').strip() or None) > common/impala_test_suite.py:600: in __exec_in_impala > result = self.__execute_query(target_impalad_client, query, user=user) > common/impala_test_suite.py:909: in __execute_query > return impalad_client.execute(query, user=user) > common/impala_connection.py:205: in execute > return self.__beeswax_client.execute(sql_stmt, user=user) > beeswax/impala_beeswax.py:187: in execute > handle = self.__execute_query(query_string.strip(), user=user) > beeswax/impala_beeswax.py:363: in __execute_query > handle = self.execute_query_async(query_string, user=user) > beeswax/impala_beeswax.py:357: in execute_query_async > handle = self.__do_rpc(lambda: self.imp_service.query(query,)) > beeswax/impala_beeswax.py:520: in __do_rpc > raise ImpalaBeeswaxException(self.__build_error_message(b), b) > E ImpalaBeeswaxException: ImpalaBeeswaxException: > EINNER EXCEPTION: > EMESSAGE: AnalysisException: The specified cache pool does not exist: > testPool {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10256) TestDisableFeatures.test_disable_incremental_metadata_updates fails
[ https://issues.apache.org/jira/browse/IMPALA-10256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-10256: Description: Saw test failures in internal CORE S3 builds: custom_cluster.test_disable_features.TestDisableFeatures.test_disable_incremental_metadata_updates[protocol: beeswax | exec_option: \{'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: text/none-unique_database0] {code:java} custom_cluster/test_disable_features.py:45: in test_disable_incremental_metadata_updates use_db=unique_database, multiple_impalad=True) common/impala_test_suite.py:662: in run_test_case result = exec_fn(query, user=test_section.get('USER', '').strip() or None) common/impala_test_suite.py:600: in __exec_in_impala result = self.__execute_query(target_impalad_client, query, user=user) common/impala_test_suite.py:909: in __execute_query return impalad_client.execute(query, user=user) common/impala_connection.py:205: in execute return self.__beeswax_client.execute(sql_stmt, user=user) beeswax/impala_beeswax.py:187: in execute handle = self.__execute_query(query_string.strip(), user=user) beeswax/impala_beeswax.py:363: in __execute_query handle = self.execute_query_async(query_string, user=user) beeswax/impala_beeswax.py:357: in execute_query_async handle = self.__do_rpc(lambda: self.imp_service.query(query,)) beeswax/impala_beeswax.py:520: in __do_rpc raise ImpalaBeeswaxException(self.__build_error_message(b), b) E ImpalaBeeswaxException: ImpalaBeeswaxException: EINNER EXCEPTION: EMESSAGE: AnalysisException: The specified cache pool does not exist: testPool {code} was: Saw test failures in internal CORE builds: custom_cluster.test_disable_features.TestDisableFeatures.test_disable_incremental_metadata_updates[protocol: beeswax | exec_option: \{'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: text/none-unique_database0] {code:java} custom_cluster/test_disable_features.py:45: in test_disable_incremental_metadata_updates use_db=unique_database, multiple_impalad=True) common/impala_test_suite.py:662: in run_test_case result = exec_fn(query, user=test_section.get('USER', '').strip() or None) common/impala_test_suite.py:600: in __exec_in_impala result = self.__execute_query(target_impalad_client, query, user=user) common/impala_test_suite.py:909: in __execute_query return impalad_client.execute(query, user=user) common/impala_connection.py:205: in execute return self.__beeswax_client.execute(sql_stmt, user=user) beeswax/impala_beeswax.py:187: in execute handle = self.__execute_query(query_string.strip(), user=user) beeswax/impala_beeswax.py:363: in __execute_query handle = self.execute_query_async(query_string, user=user) beeswax/impala_beeswax.py:357: in execute_query_async handle = self.__do_rpc(lambda: self.imp_service.query(query,)) beeswax/impala_beeswax.py:520: in __do_rpc raise ImpalaBeeswaxException(self.__build_error_message(b), b) E ImpalaBeeswaxException: ImpalaBeeswaxException: EINNER EXCEPTION: EMESSAGE: AnalysisException: The specified cache pool does not exist: testPool {code} > TestDisableFeatures.test_disable_incremental_metadata_updates fails > --- > > Key: IMPALA-10256 > URL: https://issues.apache.org/jira/browse/IMPALA-10256 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Blocker > Labels: broken-build > > Saw test failures in internal CORE S3 builds: > custom_cluster.test_disable_features.TestDisableFeatures.test_disable_incremental_metadata_updates[protocol: > beeswax | exec_option: \{'batch_size': 0, 'num_nodes': 0, > 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, > 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: > text/none-unique_database0] > {code:java} > custom_cluster/test_disable_features.py:45: in > test_disable_incremental_metadata_updates > use_db=unique_database, multiple_impalad=True) > common/impala_test_suite.py:662: in run_test_case > result = exec_fn(query, user=test_section.get('USER', '').strip() or None) > common/impala_test_suite.py:600: in __exec_in_impala > result = self.__execute_query(target_impalad_client, query, user=user) > common/impala_test_suite.py:909: in __execute_query > return impalad_client.execute(query, user=user) > common/impala_connection.py:205: in execute > return self.__beeswax_client.execute(sql_stmt,
[jira] [Created] (IMPALA-10265) Doc about enable_incremental_metadata_updates flag
Quanlong Huang created IMPALA-10265: --- Summary: Doc about enable_incremental_metadata_updates flag Key: IMPALA-10265 URL: https://issues.apache.org/jira/browse/IMPALA-10265 Project: IMPALA Issue Type: Documentation Reporter: Quanlong Huang IMPALA-10113 adds a feature flag to turn off the incremental metadata update feature which defaults to be turned on. This flag decides how catalogd propagates metadata updates to the catalog topic. If enable_incremental_metadata_updates is true, catalogd will send metadata updates in partition granularity. So a table that just has one partition changed will only have an update on that partition. This reduces the size of the metadata that needs to be sent from the catalogd. If enable_incremental_metadata_updates is false, that's the legacy behavior. Catalogd will send metadata updates in table granularity. So a table that just has one partition changed will still have an update for the whole table object. Catalogd still sends the whole table thrift object to the catalog topic. Note that this is a catalogd-only flag. Don't need to be set on impalads since impalad can process both incremental and non-incremental catalog updates. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-10113) Add feature flag for incremental metadata update
[ https://issues.apache.org/jira/browse/IMPALA-10113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang resolved IMPALA-10113. - Fix Version/s: Impala 4.0 Resolution: Fixed > Add feature flag for incremental metadata update > > > Key: IMPALA-10113 > URL: https://issues.apache.org/jira/browse/IMPALA-10113 > Project: IMPALA > Issue Type: New Feature >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Critical > Fix For: Impala 4.0 > > > Now catalogd sends metadata updates in partition level. A feature flag to > switch back to the original behavior (i.e. sending metadata updates in table > level) will ease performance tests like IMPALA-10079. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9884) TestAdmissionControllerStress.test_mem_limit failing occasionally
[ https://issues.apache.org/jira/browse/IMPALA-9884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217170#comment-17217170 ] Tim Armstrong commented on IMPALA-9884: --- Grabbed the coordinator log [^impalad.impala-ec2-centos74-m5-4xlarge-ondemand-1925.vpc.cloudera.com.jenkins.log.INFO.20201017-06.23933.gz] . Here's the JUnit XML output too {noformat} custom_cluster/test_admission_controller.py:1856: in test_mem_limit {request_pool: self.pool_name, mem_limit: query_mem_limit}) custom_cluster/test_admission_controller.py:1712: in run_admission_test assert metric_deltas[dequeued] == 0,\ E AssertionError: Queued queries should not run until others are made to finish E assert 1 == 001:11:15 MainThread: Found 0 impalad/0 statestored/0 catalogd process(es) 01:11:15 MainThread: Starting State Store logging to /data/jenkins/workspace/impala-cdpd-master-staging-exhaustive-release/repos/Impala/logs/custom_cluster_tests/statestored.INFO 01:11:16 MainThread: Starting Catalog Service logging to /data/jenkins/workspace/impala-cdpd-master-staging-exhaustive-release/repos/Impala/logs/custom_cluster_tests/catalogd.INFO 01:11:16 MainThread: Starting Impala Daemon logging to /data/jenkins/workspace/impala-cdpd-master-staging-exhaustive-release/repos/Impala/logs/custom_cluster_tests/impalad.INFO 01:11:16 MainThread: Starting Impala Daemon logging to /data/jenkins/workspace/impala-cdpd-master-staging-exhaustive-release/repos/Impala/logs/custom_cluster_tests/impalad_node1.INFO 01:11:16 MainThread: Starting Impala Daemon logging to /data/jenkins/workspace/impala-cdpd-master-staging-exhaustive-release/repos/Impala/logs/custom_cluster_tests/impalad_node2.INFO 01:11:19 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) 01:11:19 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) 01:11:19 MainThread: Getting num_known_live_backends from impala-ec2-centos74-m5-4xlarge-ondemand-1925.vpc.cloudera.com:25000 01:11:19 MainThread: Waiting for num_known_live_backends=3. Current value: 0 01:11:20 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) 01:11:20 MainThread: Getting num_known_live_backends from impala-ec2-centos74-m5-4xlarge-ondemand-1925.vpc.cloudera.com:25000 01:11:20 MainThread: Waiting for num_known_live_backends=3. Current value: 0 01:11:21 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) 01:11:21 MainThread: Getting num_known_live_backends from impala-ec2-centos74-m5-4xlarge-ondemand-1925.vpc.cloudera.com:25000 01:11:21 MainThread: num_known_live_backends has reached value: 3 01:11:21 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) 01:11:21 MainThread: Getting num_known_live_backends from impala-ec2-centos74-m5-4xlarge-ondemand-1925.vpc.cloudera.com:25001 01:11:21 MainThread: num_known_live_backends has reached value: 3 01:11:22 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) 01:11:22 MainThread: Getting num_known_live_backends from impala-ec2-centos74-m5-4xlarge-ondemand-1925.vpc.cloudera.com:25002 01:11:22 MainThread: num_known_live_backends has reached value: 3 01:11:22 MainThread: Impala Cluster Running with 3 nodes (3 coordinators, 3 executors). DEBUG:impala_cluster:Found 3 impalad/1 statestored/1 catalogd process(es) INFO:impala_service:Getting metric: statestore.live-backends from impala-ec2-centos74-m5-4xlarge-ondemand-1925.vpc.cloudera.com:25010 INFO:impala_service:Metric statestore.live-backends has reached desired value: 4 DEBUG:impala_service:Getting num_known_live_backends from impala-ec2-centos74-m5-4xlarge-ondemand-1925.vpc.cloudera.com:25000 INFO:impala_service:num_known_live_backends has reached value: 3 DEBUG:impala_service:Getting num_known_live_backends from impala-ec2-centos74-m5-4xlarge-ondemand-1925.vpc.cloudera.com:25001 INFO:impala_service:num_known_live_backends has reached value: 3 DEBUG:impala_service:Getting num_known_live_backends from impala-ec2-centos74-m5-4xlarge-ondemand-1925.vpc.cloudera.com:25002 INFO:impala_service:num_known_live_backends has reached value: 3 {noformat} > TestAdmissionControllerStress.test_mem_limit failing occasionally > - > > Key: IMPALA-9884 > URL: https://issues.apache.org/jira/browse/IMPALA-9884 > Project: IMPALA > Issue Type: Bug >Affects Versions: Impala 4.0 >Reporter: Vihang Karajgaonkar >Assignee: Tim Armstrong >Priority: Blocker > Labels: broken-build, flaky > Attachments: > impalad.impala-ec2-centos74-m5-4xlarge-ondemand-1925.vpc.cloudera.com.jenkins.log.INFO.20201017-06.23933.gz > > > Recently, I saw this test failing with the exception trace below. > {noformat} > custom_cluster/test_admission_controller.py:1782: in test_mem_limit > {'request_pool': self.pool_name, 'mem_limit':
[jira] [Updated] (IMPALA-9884) TestAdmissionControllerStress.test_mem_limit failing occasionally
[ https://issues.apache.org/jira/browse/IMPALA-9884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-9884: -- Attachment: impalad.impala-ec2-centos74-m5-4xlarge-ondemand-1925.vpc.cloudera.com.jenkins.log.INFO.20201017-06.23933.gz > TestAdmissionControllerStress.test_mem_limit failing occasionally > - > > Key: IMPALA-9884 > URL: https://issues.apache.org/jira/browse/IMPALA-9884 > Project: IMPALA > Issue Type: Bug >Affects Versions: Impala 4.0 >Reporter: Vihang Karajgaonkar >Assignee: Tim Armstrong >Priority: Blocker > Labels: broken-build, flaky > Attachments: > impalad.impala-ec2-centos74-m5-4xlarge-ondemand-1925.vpc.cloudera.com.jenkins.log.INFO.20201017-06.23933.gz > > > Recently, I saw this test failing with the exception trace below. > {noformat} > custom_cluster/test_admission_controller.py:1782: in test_mem_limit > {'request_pool': self.pool_name, 'mem_limit': query_mem_limit}) > custom_cluster/test_admission_controller.py:1638: in run_admission_test > assert metric_deltas['dequeued'] == 0,\ > E AssertionError: Queued queries should not run until others are made to > finish > E assert 1 == 0 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10264) Add ability to build docker images for a different Linux distribution
Joe McDonnell created IMPALA-10264: -- Summary: Add ability to build docker images for a different Linux distribution Key: IMPALA-10264 URL: https://issues.apache.org/jira/browse/IMPALA-10264 Project: IMPALA Issue Type: Improvement Components: Infrastructure Affects Versions: Impala 4.0 Reporter: Joe McDonnell Currently, the build for Impala's docker images builds on the local host OS and then makes the binaries available in the docker build context. The docker image thus needs to run the same Linux distribution and version as the host. Ubuntu 16 docker images should be built on an Ubuntu 16 host. Centos 7 docker images should be built on a Centos 7 host. It would be useful to be able to build docker containers for a different Linux distribution. Developers often develop on Ubuntu 16, but it would be useful to be able to build Centos 7 docker images to use in other contexts. To do this, we could build the binaries inside a docker container of a matching version as the docker image we want to produce. This would construct the docker build context, and the binaries would always match. An Ubuntu 16 machine could produce Centos 7 docker containers. Hypothetically, this could also use QEMU to build ARM docker containers on an x86 host. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10263) Native toolchain support for cross compiling to produce ARM binaries
Joe McDonnell created IMPALA-10263: -- Summary: Native toolchain support for cross compiling to produce ARM binaries Key: IMPALA-10263 URL: https://issues.apache.org/jira/browse/IMPALA-10263 Project: IMPALA Issue Type: Improvement Components: Infrastructure Affects Versions: Impala 4.0 Reporter: Joe McDonnell With support for ARM added to upstream Impala, it would be useful to be able to build the ARM native toolchain from an x86 machine. This would allow it to be built and uploaded to s3 using the same infrastructure that currently builds the x86 binaries. Having the ARM binaries in s3 opens up possibilities to incorporate an ARM build into GVO. QEMU has the ability to emulate ARM on an x86 machine, and it is surprisingly simple to get an ARM docker container running on x86. This article provides some depth: [https://ownyourbits.com/2018/06/27/running-and-building-arm-docker-containers-in-x86/] The basic story is that the steps are: # Install qemu-user/qemu-user-static (which installs appropriate hooks in the kernel) # Make qemu-aarch64-static available in the context for building the docker container # In the Dockerfile, copy qemu-aarch64-static into /usr/bin For example, here is the start of the ubuntu1804 Dockerfile: {noformat} FROM arm64v8/ubuntu:18.04 COPY qemu-aarch64-static /usr/bin/qemu-aarch64-static # The rest of the dockerfile{noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10102) Impalad crashses when writting a parquet file with large rows
[ https://issues.apache.org/jira/browse/IMPALA-10102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217155#comment-17217155 ] Abhishek Rawat commented on IMPALA-10102: - I *believe*, in this case, the issue is that query "mem_limit" is set to higher memory than what's available in the system for Impala. And in such a case, memory allocation fails. Impala will not cancel the query since the query hasn't exceeded the query mem_limit. Also, the process mem_limit is not exceeded. In such a scenario it will be difficult to handle memory allocation failure, since it would pretty much require putting a check for every single memory allocation. And so it looked like a bad configuration problem. If both query/process mem_limits are configured properly then query cleanly fails without crashing the impalad. > Impalad crashses when writting a parquet file with large rows > - > > Key: IMPALA-10102 > URL: https://issues.apache.org/jira/browse/IMPALA-10102 > Project: IMPALA > Issue Type: Bug >Reporter: Quanlong Huang >Assignee: Yida Wu >Priority: Critical > Labels: crash > > Encountered a crash when testing following queries on my local branch: > {code:sql} > create table bigstrs3 stored as parquet as > select *, repeat(uuid(), cast(random() * 20 as int)) as bigstr > from functional.alltypes > limit 1000; > # Length of uuid() is 36. So the max row size is 7,200,000. > set MAX_ROW_SIZE=8m; > create table my_str_group stored as parquet as > select group_concat(string_col) as ss, bigstr > from bigstrs3 group by bigstr; > create table my_cnt stored as parquet as > select count(*) as cnt, bigstr > from bigstrs3 group by bigstr; > {code} > The crash stacktrace: > {code} > Crash reason: SIGSEGV > Crash address: 0x0 > Process uptime: not available > Thread 336 (crashed) > 0 libc-2.23.so + 0x14e10b > 1 impalad!snappy::UncheckedByteArraySink::Append(char const*, unsigned > long) [clone .localalias.0] + 0x1a > 2 impalad!snappy::Compress(snappy::Source*, snappy::Sink*) + 0xb1 > 3 impalad!snappy::RawCompress(char const*, unsigned long, char*, unsigned > long*) + 0x51 > 4 impalad!impala::SnappyCompressor::ProcessBlock(bool, long, unsigned char > const*, long*, unsigned char**) [compress.cc : 295 + 0x24] > 5 impalad!impala::Codec::ProcessBlock32(bool, int, unsigned char const*, > int*, unsigned char**) [codec.cc : 211 + 0x41] > 6 impalad!impala::HdfsParquetTableWriter::BaseColumnWriter::Flush(long*, > long*, long*) [hdfs-parquet-table-writer.cc : 775 + 0x56] > 7 impalad!impala::HdfsParquetTableWriter::FlushCurrentRowGroup() > [hdfs-parquet-table-writer.cc : 1330 + 0x60] > 8 impalad!impala::HdfsParquetTableWriter::Finalize() > [hdfs-parquet-table-writer.cc : 1297 + 0x19] > 9 > impalad!impala::HdfsTableSink::FinalizePartitionFile(impala::RuntimeState*, > impala::OutputPartition*) [hdfs-table-sink.cc : 652 + 0x2e] > 10 > impalad!impala::HdfsTableSink::WriteRowsToPartition(impala::RuntimeState*, > impala::RowBatch*, std::pair std::default_delete >, std::vector std::allocator > >*) [hdfs-table-sink.cc : 282 + 0x21] > 11 impalad!impala::HdfsTableSink::Send(impala::RuntimeState*, > impala::RowBatch*) [hdfs-table-sink.cc : 621 + 0x2e] > 12 impalad!impala::FragmentInstanceState::ExecInternal() > [fragment-instance-state.cc : 422 + 0x58] > 13 impalad!impala::FragmentInstanceState::Exec() [fragment-instance-state.cc > : 106 + 0x16] > 14 impalad!impala::QueryState::ExecFInstance(impala::FragmentInstanceState*) > [query-state.cc : 836 + 0x19] > 15 impalad!impala::QueryState::StartFInstances()::{lambda()#1}::operator()() > const + 0x26 > 16 > impalad!boost::detail::function::void_function_obj_invoker0, > void>::invoke [function_template.hpp : 159 + 0xc] > 17 impalad!boost::function0::operator()() const [function_template.hpp > : 770 + 0x1d] > 18 impalad!impala::Thread::SuperviseThread(std::__cxx11::basic_string std::char_traits, std::allocator > const&, > std::__cxx11::basic_string, std::allocator > > const&, boost::function, impala::ThreadDebugInfo const*, > impala::Promise*) [thread.cc : 360 + 0xf] > 19 impalad!void > boost::_bi::list5 std::char_traits, std::allocator > >, > boost::_bi::value, > std::allocator > >, boost::_bi::value >, > boost::_bi::value, > boost::_bi::value*> > >::operator() std::char_traits, std::allocator > const&, > std::__cxx11::basic_string, std::allocator > > const&, boost::function, impala::ThreadDebugInfo const*, > impala::Promise*), > boost::_bi::list0>(boost::_bi::type, void > (*&)(std::__cxx11::basic_string, > std::allocator > const&, std::__cxx11::basic_string std::char_traits, std::allocator > const&, boost::function ()>, impala::ThreadDebugInfo const*, impala::Promise
[jira] [Reopened] (IMPALA-10102) Impalad crashses when writting a parquet file with large rows
[ https://issues.apache.org/jira/browse/IMPALA-10102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong reopened IMPALA-10102: [~baggio000] Impala still shouldn't crash if there isn't enough memory, it should return a clean error to the user and not disrupt other running queries. So I'm going to reopen this. > Impalad crashses when writting a parquet file with large rows > - > > Key: IMPALA-10102 > URL: https://issues.apache.org/jira/browse/IMPALA-10102 > Project: IMPALA > Issue Type: Bug >Reporter: Quanlong Huang >Assignee: Yida Wu >Priority: Critical > Labels: crash > > Encountered a crash when testing following queries on my local branch: > {code:sql} > create table bigstrs3 stored as parquet as > select *, repeat(uuid(), cast(random() * 20 as int)) as bigstr > from functional.alltypes > limit 1000; > # Length of uuid() is 36. So the max row size is 7,200,000. > set MAX_ROW_SIZE=8m; > create table my_str_group stored as parquet as > select group_concat(string_col) as ss, bigstr > from bigstrs3 group by bigstr; > create table my_cnt stored as parquet as > select count(*) as cnt, bigstr > from bigstrs3 group by bigstr; > {code} > The crash stacktrace: > {code} > Crash reason: SIGSEGV > Crash address: 0x0 > Process uptime: not available > Thread 336 (crashed) > 0 libc-2.23.so + 0x14e10b > 1 impalad!snappy::UncheckedByteArraySink::Append(char const*, unsigned > long) [clone .localalias.0] + 0x1a > 2 impalad!snappy::Compress(snappy::Source*, snappy::Sink*) + 0xb1 > 3 impalad!snappy::RawCompress(char const*, unsigned long, char*, unsigned > long*) + 0x51 > 4 impalad!impala::SnappyCompressor::ProcessBlock(bool, long, unsigned char > const*, long*, unsigned char**) [compress.cc : 295 + 0x24] > 5 impalad!impala::Codec::ProcessBlock32(bool, int, unsigned char const*, > int*, unsigned char**) [codec.cc : 211 + 0x41] > 6 impalad!impala::HdfsParquetTableWriter::BaseColumnWriter::Flush(long*, > long*, long*) [hdfs-parquet-table-writer.cc : 775 + 0x56] > 7 impalad!impala::HdfsParquetTableWriter::FlushCurrentRowGroup() > [hdfs-parquet-table-writer.cc : 1330 + 0x60] > 8 impalad!impala::HdfsParquetTableWriter::Finalize() > [hdfs-parquet-table-writer.cc : 1297 + 0x19] > 9 > impalad!impala::HdfsTableSink::FinalizePartitionFile(impala::RuntimeState*, > impala::OutputPartition*) [hdfs-table-sink.cc : 652 + 0x2e] > 10 > impalad!impala::HdfsTableSink::WriteRowsToPartition(impala::RuntimeState*, > impala::RowBatch*, std::pair std::default_delete >, std::vector std::allocator > >*) [hdfs-table-sink.cc : 282 + 0x21] > 11 impalad!impala::HdfsTableSink::Send(impala::RuntimeState*, > impala::RowBatch*) [hdfs-table-sink.cc : 621 + 0x2e] > 12 impalad!impala::FragmentInstanceState::ExecInternal() > [fragment-instance-state.cc : 422 + 0x58] > 13 impalad!impala::FragmentInstanceState::Exec() [fragment-instance-state.cc > : 106 + 0x16] > 14 impalad!impala::QueryState::ExecFInstance(impala::FragmentInstanceState*) > [query-state.cc : 836 + 0x19] > 15 impalad!impala::QueryState::StartFInstances()::{lambda()#1}::operator()() > const + 0x26 > 16 > impalad!boost::detail::function::void_function_obj_invoker0, > void>::invoke [function_template.hpp : 159 + 0xc] > 17 impalad!boost::function0::operator()() const [function_template.hpp > : 770 + 0x1d] > 18 impalad!impala::Thread::SuperviseThread(std::__cxx11::basic_string std::char_traits, std::allocator > const&, > std::__cxx11::basic_string, std::allocator > > const&, boost::function, impala::ThreadDebugInfo const*, > impala::Promise*) [thread.cc : 360 + 0xf] > 19 impalad!void > boost::_bi::list5 std::char_traits, std::allocator > >, > boost::_bi::value, > std::allocator > >, boost::_bi::value >, > boost::_bi::value, > boost::_bi::value*> > >::operator() std::char_traits, std::allocator > const&, > std::__cxx11::basic_string, std::allocator > > const&, boost::function, impala::ThreadDebugInfo const*, > impala::Promise*), > boost::_bi::list0>(boost::_bi::type, void > (*&)(std::__cxx11::basic_string, > std::allocator > const&, std::__cxx11::basic_string std::char_traits, std::allocator > const&, boost::function ()>, impala::ThreadDebugInfo const*, impala::Promise (impala::PromiseMode)0>*), boost::_bi::list0&, int) [bind.hpp : 531 + 0x15] > 20 impalad!boost::_bi::bind_t (*)(std::__cxx11::basic_string, > std::allocator > const&, std::__cxx11::basic_string std::char_traits, std::allocator > const&, boost::function ()>, impala::ThreadDebugInfo const*, impala::Promise (impala::PromiseMode)0>*), > boost::_bi::list5 std::char_traits, std::allocator > >, > boost::_bi::value, > std::allocator > >, boost::_bi::value >, > boost::_bi::value, > boost::_bi::value*> > >
[jira] [Resolved] (IMPALA-10226) Change buildall.sh -notests to invoke a single Make target
[ https://issues.apache.org/jira/browse/IMPALA-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe McDonnell resolved IMPALA-10226. Fix Version/s: Impala 4.0 Target Version: Impala 4.0 Assignee: Joe McDonnell Resolution: Fixed > Change buildall.sh -notests to invoke a single Make target > -- > > Key: IMPALA-10226 > URL: https://issues.apache.org/jira/browse/IMPALA-10226 > Project: IMPALA > Issue Type: Improvement > Components: Infrastructure >Affects Versions: Impala 4.0 >Reporter: Joe McDonnell >Assignee: Joe McDonnell >Priority: Major > Fix For: Impala 4.0 > > > Currently, running "buildall.sh -notests" boils down to invoking make with > multiple targets: > > {noformat} > if [[ $BUILD_TESTS -eq 0 ]]; then > # Specify all the non-test targets > MAKE_TARGETS="impalad statestored catalogd fesupport loggingsupport > ImpalaUdf \ > udasample udfsample" > if (( build_independent_targets )); then > MAKE_TARGETS+=" cscope fe tarballs" > fi > fi > ${MAKE_CMD} -j${IMPALA_BUILD_THREADS:-4} ${IMPALA_MAKE_FLAGS} > ${MAKE_TARGETS}{noformat} > Based on the build output, it looks like each make target is invoked > individually (with the commands underneath going parallel). This is > particularly a problem for impalad (which needs to build the backend) and fe. > We want these to run simultaneously, and this limitation prevents that. > We should create a single target that builds all the things needing to be > built for -notests. Then, this will be invoking one target and allowing all > the pieces go parallel. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10226) Change buildall.sh -notests to invoke a single Make target
[ https://issues.apache.org/jira/browse/IMPALA-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217108#comment-17217108 ] ASF subversion and git services commented on IMPALA-10226: -- Commit e76010d62889aaa2b04f6cfea9bb74b829877eb9 in impala's branch refs/heads/master from Joe McDonnell [ https://gitbox.apache.org/repos/asf?p=impala.git;h=e76010d ] IMPALA-10226: Change buildall.sh -notests to invoke a single Make target This is a small cleanup to add specific targets in CMake for buildall.sh -notests to invoke. Previously, it ran multiple targets like: make target1 target2 target3 ... In hand tests, make builds each target separately, so it is unable to overlap the builds of the multiple targets. Pushing it into CMake simplifies the code and allows the targets to build simultaneously. Testing: - Ran buildall.sh -notests Change-Id: Id881d6f481b32ba82501b16bada14b6630ba32d2 Reviewed-on: http://gerrit.cloudera.org:8080/16605 Reviewed-by: Tim Armstrong Tested-by: Impala Public Jenkins > Change buildall.sh -notests to invoke a single Make target > -- > > Key: IMPALA-10226 > URL: https://issues.apache.org/jira/browse/IMPALA-10226 > Project: IMPALA > Issue Type: Improvement > Components: Infrastructure >Affects Versions: Impala 4.0 >Reporter: Joe McDonnell >Priority: Major > > Currently, running "buildall.sh -notests" boils down to invoking make with > multiple targets: > > {noformat} > if [[ $BUILD_TESTS -eq 0 ]]; then > # Specify all the non-test targets > MAKE_TARGETS="impalad statestored catalogd fesupport loggingsupport > ImpalaUdf \ > udasample udfsample" > if (( build_independent_targets )); then > MAKE_TARGETS+=" cscope fe tarballs" > fi > fi > ${MAKE_CMD} -j${IMPALA_BUILD_THREADS:-4} ${IMPALA_MAKE_FLAGS} > ${MAKE_TARGETS}{noformat} > Based on the build output, it looks like each make target is invoked > individually (with the commands underneath going parallel). This is > particularly a problem for impalad (which needs to build the backend) and fe. > We want these to run simultaneously, and this limitation prevents that. > We should create a single target that builds all the things needing to be > built for -notests. Then, this will be invoking one target and allowing all > the pieces go parallel. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-9954) RpcRecvrTime can be negative
[ https://issues.apache.org/jira/browse/IMPALA-9954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-9954. -- Fix Version/s: Impala 4.0 Resolution: Fixed > RpcRecvrTime can be negative > > > Key: IMPALA-9954 > URL: https://issues.apache.org/jira/browse/IMPALA-9954 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Sahil Takiar >Assignee: Riza Suminto >Priority: Major > Fix For: Impala 4.0 > > Attachments: profile_034e7209bd98c96c_9a448dfc.txt > > > Saw this on a recent version of master. Attached the full runtime profile. > {code:java} > KrpcDataStreamSender (dst_id=2):(Total: 9.863ms, non-child: 3.185ms, > % non-child: 32.30%) > ExecOption: Unpartitioned Sender Codegen Disabled: not needed >- BytesSent (500.000ms): 0, 0 >- NetworkThroughput: (Avg: 4.34 MB/sec ; Min: 4.34 MB/sec ; Max: > 4.34 MB/sec ; Number of samples: 1) >- RpcNetworkTime: (Avg: 3.562ms ; Min: 679.676us ; Max: 6.445ms ; > Number of samples: 2) >- RpcRecvrTime: (Avg: -151281.000ns ; Min: -231485.000ns ; Max: > -71077.000ns ; Number of samples: 2) >- EosSent: 1 (1) >- PeakMemoryUsage: 416.00 B (416) >- RowsSent: 100 (100) >- RpcFailure: 0 (0) >- RpcRetry: 0 (0) >- SerializeBatchTime: 2.880ms >- TotalBytesSent: 28.67 KB (29355) >- UncompressedRowBatchSize: 69.29 KB (70950) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-9954) RpcRecvrTime can be negative
[ https://issues.apache.org/jira/browse/IMPALA-9954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar reassigned IMPALA-9954: Assignee: Riza Suminto > RpcRecvrTime can be negative > > > Key: IMPALA-9954 > URL: https://issues.apache.org/jira/browse/IMPALA-9954 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Sahil Takiar >Assignee: Riza Suminto >Priority: Major > Attachments: profile_034e7209bd98c96c_9a448dfc.txt > > > Saw this on a recent version of master. Attached the full runtime profile. > {code:java} > KrpcDataStreamSender (dst_id=2):(Total: 9.863ms, non-child: 3.185ms, > % non-child: 32.30%) > ExecOption: Unpartitioned Sender Codegen Disabled: not needed >- BytesSent (500.000ms): 0, 0 >- NetworkThroughput: (Avg: 4.34 MB/sec ; Min: 4.34 MB/sec ; Max: > 4.34 MB/sec ; Number of samples: 1) >- RpcNetworkTime: (Avg: 3.562ms ; Min: 679.676us ; Max: 6.445ms ; > Number of samples: 2) >- RpcRecvrTime: (Avg: -151281.000ns ; Min: -231485.000ns ; Max: > -71077.000ns ; Number of samples: 2) >- EosSent: 1 (1) >- PeakMemoryUsage: 416.00 B (416) >- RowsSent: 100 (100) >- RpcFailure: 0 (0) >- RpcRetry: 0 (0) >- SerializeBatchTime: 2.880ms >- TotalBytesSent: 28.67 KB (29355) >- UncompressedRowBatchSize: 69.29 KB (70950) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-10102) Impalad crashses when writting a parquet file with large rows
[ https://issues.apache.org/jira/browse/IMPALA-10102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yida Wu resolved IMPALA-10102. -- Resolution: Not A Bug Set a proper impalad mem_limit option should avoid the crash happens. > Impalad crashses when writting a parquet file with large rows > - > > Key: IMPALA-10102 > URL: https://issues.apache.org/jira/browse/IMPALA-10102 > Project: IMPALA > Issue Type: Bug >Reporter: Quanlong Huang >Assignee: Yida Wu >Priority: Critical > Labels: crash > > Encountered a crash when testing following queries on my local branch: > {code:sql} > create table bigstrs3 stored as parquet as > select *, repeat(uuid(), cast(random() * 20 as int)) as bigstr > from functional.alltypes > limit 1000; > # Length of uuid() is 36. So the max row size is 7,200,000. > set MAX_ROW_SIZE=8m; > create table my_str_group stored as parquet as > select group_concat(string_col) as ss, bigstr > from bigstrs3 group by bigstr; > create table my_cnt stored as parquet as > select count(*) as cnt, bigstr > from bigstrs3 group by bigstr; > {code} > The crash stacktrace: > {code} > Crash reason: SIGSEGV > Crash address: 0x0 > Process uptime: not available > Thread 336 (crashed) > 0 libc-2.23.so + 0x14e10b > 1 impalad!snappy::UncheckedByteArraySink::Append(char const*, unsigned > long) [clone .localalias.0] + 0x1a > 2 impalad!snappy::Compress(snappy::Source*, snappy::Sink*) + 0xb1 > 3 impalad!snappy::RawCompress(char const*, unsigned long, char*, unsigned > long*) + 0x51 > 4 impalad!impala::SnappyCompressor::ProcessBlock(bool, long, unsigned char > const*, long*, unsigned char**) [compress.cc : 295 + 0x24] > 5 impalad!impala::Codec::ProcessBlock32(bool, int, unsigned char const*, > int*, unsigned char**) [codec.cc : 211 + 0x41] > 6 impalad!impala::HdfsParquetTableWriter::BaseColumnWriter::Flush(long*, > long*, long*) [hdfs-parquet-table-writer.cc : 775 + 0x56] > 7 impalad!impala::HdfsParquetTableWriter::FlushCurrentRowGroup() > [hdfs-parquet-table-writer.cc : 1330 + 0x60] > 8 impalad!impala::HdfsParquetTableWriter::Finalize() > [hdfs-parquet-table-writer.cc : 1297 + 0x19] > 9 > impalad!impala::HdfsTableSink::FinalizePartitionFile(impala::RuntimeState*, > impala::OutputPartition*) [hdfs-table-sink.cc : 652 + 0x2e] > 10 > impalad!impala::HdfsTableSink::WriteRowsToPartition(impala::RuntimeState*, > impala::RowBatch*, std::pair std::default_delete >, std::vector std::allocator > >*) [hdfs-table-sink.cc : 282 + 0x21] > 11 impalad!impala::HdfsTableSink::Send(impala::RuntimeState*, > impala::RowBatch*) [hdfs-table-sink.cc : 621 + 0x2e] > 12 impalad!impala::FragmentInstanceState::ExecInternal() > [fragment-instance-state.cc : 422 + 0x58] > 13 impalad!impala::FragmentInstanceState::Exec() [fragment-instance-state.cc > : 106 + 0x16] > 14 impalad!impala::QueryState::ExecFInstance(impala::FragmentInstanceState*) > [query-state.cc : 836 + 0x19] > 15 impalad!impala::QueryState::StartFInstances()::{lambda()#1}::operator()() > const + 0x26 > 16 > impalad!boost::detail::function::void_function_obj_invoker0, > void>::invoke [function_template.hpp : 159 + 0xc] > 17 impalad!boost::function0::operator()() const [function_template.hpp > : 770 + 0x1d] > 18 impalad!impala::Thread::SuperviseThread(std::__cxx11::basic_string std::char_traits, std::allocator > const&, > std::__cxx11::basic_string, std::allocator > > const&, boost::function, impala::ThreadDebugInfo const*, > impala::Promise*) [thread.cc : 360 + 0xf] > 19 impalad!void > boost::_bi::list5 std::char_traits, std::allocator > >, > boost::_bi::value, > std::allocator > >, boost::_bi::value >, > boost::_bi::value, > boost::_bi::value*> > >::operator() std::char_traits, std::allocator > const&, > std::__cxx11::basic_string, std::allocator > > const&, boost::function, impala::ThreadDebugInfo const*, > impala::Promise*), > boost::_bi::list0>(boost::_bi::type, void > (*&)(std::__cxx11::basic_string, > std::allocator > const&, std::__cxx11::basic_string std::char_traits, std::allocator > const&, boost::function ()>, impala::ThreadDebugInfo const*, impala::Promise (impala::PromiseMode)0>*), boost::_bi::list0&, int) [bind.hpp : 531 + 0x15] > 20 impalad!boost::_bi::bind_t (*)(std::__cxx11::basic_string, > std::allocator > const&, std::__cxx11::basic_string std::char_traits, std::allocator > const&, boost::function ()>, impala::ThreadDebugInfo const*, impala::Promise (impala::PromiseMode)0>*), > boost::_bi::list5 std::char_traits, std::allocator > >, > boost::_bi::value, > std::allocator > >, boost::_bi::value >, > boost::_bi::value, > boost::_bi::value*> > > >::operator()() [bind.hpp : 1222 + 0x22] > 21 impalad!boost::detail::thread_data (*)(std::__cxx11::basic_string,
[jira] [Commented] (IMPALA-9954) RpcRecvrTime can be negative
[ https://issues.apache.org/jira/browse/IMPALA-9954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217070#comment-17217070 ] Riza Suminto commented on IMPALA-9954: -- Hi [~stakiar], I have resolved IMPALA-10220. That Jira also include fix to acquire lock first before computing elapsed time. I think this Jira can be closed as well. > RpcRecvrTime can be negative > > > Key: IMPALA-9954 > URL: https://issues.apache.org/jira/browse/IMPALA-9954 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Sahil Takiar >Priority: Major > Attachments: profile_034e7209bd98c96c_9a448dfc.txt > > > Saw this on a recent version of master. Attached the full runtime profile. > {code:java} > KrpcDataStreamSender (dst_id=2):(Total: 9.863ms, non-child: 3.185ms, > % non-child: 32.30%) > ExecOption: Unpartitioned Sender Codegen Disabled: not needed >- BytesSent (500.000ms): 0, 0 >- NetworkThroughput: (Avg: 4.34 MB/sec ; Min: 4.34 MB/sec ; Max: > 4.34 MB/sec ; Number of samples: 1) >- RpcNetworkTime: (Avg: 3.562ms ; Min: 679.676us ; Max: 6.445ms ; > Number of samples: 2) >- RpcRecvrTime: (Avg: -151281.000ns ; Min: -231485.000ns ; Max: > -71077.000ns ; Number of samples: 2) >- EosSent: 1 (1) >- PeakMemoryUsage: 416.00 B (416) >- RowsSent: 100 (100) >- RpcFailure: 0 (0) >- RpcRetry: 0 (0) >- SerializeBatchTime: 2.880ms >- TotalBytesSent: 28.67 KB (29355) >- UncompressedRowBatchSize: 69.29 KB (70950) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-10220) Min value of RpcNetworkTime can be negative
[ https://issues.apache.org/jira/browse/IMPALA-10220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Riza Suminto resolved IMPALA-10220. --- Fix Version/s: Impala 4.0 Resolution: Fixed Closing this Jira since the patch has been merged. cc: [~stakiar] > Min value of RpcNetworkTime can be negative > --- > > Key: IMPALA-10220 > URL: https://issues.apache.org/jira/browse/IMPALA-10220 > Project: IMPALA > Issue Type: Bug > Components: Distributed Exec >Affects Versions: Impala 3.4.0 >Reporter: Riza Suminto >Assignee: Riza Suminto >Priority: Major > Fix For: Impala 4.0 > > > There is a bug in function > KrpcDataStreamSender::Channel::EndDataStreamCompleteCb(), particularly in > this line: > [https://github.com/apache/impala/blob/d453d52/be/src/runtime/krpc-data-stream-sender.cc#L635] > network_time_ns should be computed using eos_rsp_.receiver_latency_ns() > instead of resp_.receiver_latency_ns(). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9954) RpcRecvrTime can be negative
[ https://issues.apache.org/jira/browse/IMPALA-9954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217063#comment-17217063 ] Sahil Takiar commented on IMPALA-9954: -- [~rizaon] so if understand correctly, the remaining work to be done here is to add proper locking of {{rpc_start_time_ns_}} in {{be/src/runtime/krpc-data-stream-sender.cc}} > RpcRecvrTime can be negative > > > Key: IMPALA-9954 > URL: https://issues.apache.org/jira/browse/IMPALA-9954 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Sahil Takiar >Priority: Major > Attachments: profile_034e7209bd98c96c_9a448dfc.txt > > > Saw this on a recent version of master. Attached the full runtime profile. > {code:java} > KrpcDataStreamSender (dst_id=2):(Total: 9.863ms, non-child: 3.185ms, > % non-child: 32.30%) > ExecOption: Unpartitioned Sender Codegen Disabled: not needed >- BytesSent (500.000ms): 0, 0 >- NetworkThroughput: (Avg: 4.34 MB/sec ; Min: 4.34 MB/sec ; Max: > 4.34 MB/sec ; Number of samples: 1) >- RpcNetworkTime: (Avg: 3.562ms ; Min: 679.676us ; Max: 6.445ms ; > Number of samples: 2) >- RpcRecvrTime: (Avg: -151281.000ns ; Min: -231485.000ns ; Max: > -71077.000ns ; Number of samples: 2) >- EosSent: 1 (1) >- PeakMemoryUsage: 416.00 B (416) >- RowsSent: 100 (100) >- RpcFailure: 0 (0) >- RpcRetry: 0 (0) >- SerializeBatchTime: 2.880ms >- TotalBytesSent: 28.67 KB (29355) >- UncompressedRowBatchSize: 69.29 KB (70950) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10220) Min value of RpcNetworkTime can be negative
[ https://issues.apache.org/jira/browse/IMPALA-10220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217061#comment-17217061 ] Sahil Takiar commented on IMPALA-10220: --- [~rizaon] can this be closed? > Min value of RpcNetworkTime can be negative > --- > > Key: IMPALA-10220 > URL: https://issues.apache.org/jira/browse/IMPALA-10220 > Project: IMPALA > Issue Type: Bug > Components: Distributed Exec >Affects Versions: Impala 3.4.0 >Reporter: Riza Suminto >Assignee: Riza Suminto >Priority: Major > > There is a bug in function > KrpcDataStreamSender::Channel::EndDataStreamCompleteCb(), particularly in > this line: > [https://github.com/apache/impala/blob/d453d52/be/src/runtime/krpc-data-stream-sender.cc#L635] > network_time_ns should be computed using eos_rsp_.receiver_latency_ns() > instead of resp_.receiver_latency_ns(). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Work started] (IMPALA-9918) HdfsOrcScanner crash on resolving columns
[ https://issues.apache.org/jira/browse/IMPALA-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-9918 started by Csaba Ringhofer. --- > HdfsOrcScanner crash on resolving columns > - > > Key: IMPALA-9918 > URL: https://issues.apache.org/jira/browse/IMPALA-9918 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.0 > Environment: BUILD_TAG > jenkins-impala-cdpd-master-core-ubsan-111 >Reporter: Wenzhe Zhou >Assignee: Csaba Ringhofer >Priority: Major > Labels: broken-build > Attachments: 092420_backtraces.txt, backtraces.txt, backtraces.txt > > > Core file generated in impala-cdpd-master-core-ubsan build > Back traces: > CORE: ./tests/core.1594000709.13971.impalad > BINARY: ./be/build/latest/service/impalad > Core was generated by > `/data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/build/lat'. > Program terminated with signal SIGABRT, Aborted. > #0 0x7f7a481851f7 in raise () from /lib64/libc.so.6 > To enable execution of this file add > add-auto-load-safe-path > /data0/jenkins/workspace/impala-cdpd-master-core-ubsan/Impala-Toolchain/gcc-4.9.2/lib64/libstdc++.so.6.0.20-gdb.py > line to your configuration file "/var/lib/jenkins/.gdbinit". > To completely disable this security protection add > set auto-load safe-path / > line to your configuration file "/var/lib/jenkins/.gdbinit". > For more information about this security protection see the > "Auto-loading safe path" section in the GDB manual. E.g., run from the shell: > info "(gdb)Auto-loading safe path" > #0 0x7f7a481851f7 in raise () from /lib64/libc.so.6 > #1 0x7f7a481868e8 in abort () from /lib64/libc.so.6 > #2 0x083401c4 in google::DumpStackTraceAndExit() () > #3 0x08336b5d in google::LogMessage::Fail() () > #4 0x08338402 in google::LogMessage::SendToLog() () > #5 0x08336537 in google::LogMessage::Flush() () > #6 0x08339afe in google::LogMessageFatal::~LogMessageFatal() () > #7 0x03215662 in impala::PrintPath (tbl_desc=..., path=...) at > /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/util/debug-util.cc:259 > #8 0x0370dfe9 in impala::HdfsOrcScanner::ResolveColumns > (this=0x14555c00, tuple_desc=..., selected_nodes=0x7f79722730a8, > pos_slots=0x7f7972273058) at > /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/exec/hdfs-orc-scanner.cc:436 > #9 0x037099dd in impala::HdfsOrcScanner::SelectColumns > (this=0x14555c00, tuple_desc=...) at > /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/exec/hdfs-orc-scanner.cc:456 > #10 0x03707688 in impala::HdfsOrcScanner::Open (this=0x14555c00, > context=0x7f7972274700) at > /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/exec/hdfs-orc-scanner.cc:221 > #11 0x035e0a48 in > impala::HdfsScanNodeBase::CreateAndOpenScannerHelper (this=0x1b1c7100, > partition=0x142f9d00, context=0x7f7972274700, scanner=0x7f79722746f8) at > /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/exec/hdfs-scan-node-base.cc:882 > #12 0x039df2e8 in impala::HdfsScanNode::ProcessSplit > (this=0x1b1c7100, filter_ctxs=..., expr_results_pool=0x7f7972274bd8, > scan_range=0x12a16c40, scanner_thread_reservation=0x7f7972274e18) at > /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/exec/hdfs-scan-node.cc:480 > #13 0x039ddd85 in impala::HdfsScanNode::ScannerThread > (this=0x1b1c7100, first_thread=true, scanner_thread_reservation=8192) at > /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/exec/hdfs-scan-node.cc:418 > #14 0x039e1980 in > impala::HdfsScanNode::ThreadTokenAvailableCb(impala::ThreadResourcePool*)::$_0::operator()() > const (this=0x7f7972275450) at > /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/exec/hdfs-scan-node.cc:339 > #15 0x039e13b2 in > boost::detail::function::void_function_obj_invoker0 void>::invoke(boost::detail::function::function_buffer&) > (function_obj_ptr=...) at > /data/jenkins/workspace/impala-cdpd-master-core-ubsan/Impala-Toolchain/boost-1.61.0-p2/include/boost/function/function_template.hpp:159 > #16 0x024d46f0 in boost::function0::operator() > (this=0x7f7972275448) at > /data/jenkins/workspace/impala-cdpd-master-core-ubsan/Impala-Toolchain/boost-1.61.0-p2/include/boost/function/function_template.hpp:770 > #17 0x03425ba7 in impala::Thread::SuperviseThread(std::string const&, > std::string const&, boost::function, impala::ThreadDebugInfo const*, > impala::Promise*) (name=..., category=..., > functor=..., parent_thread_info=0x7f797006d068, >
[jira] [Commented] (IMPALA-9918) HdfsOrcScanner crash on resolving columns
[ https://issues.apache.org/jira/browse/IMPALA-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217040#comment-17217040 ] Csaba Ringhofer commented on IMPALA-9918: - A fix is on review: https://gerrit.cloudera.org/#/c/16611/ Note that there is still no explanation why a function behind VLOG(3) << was called in a custom cluster test. > HdfsOrcScanner crash on resolving columns > - > > Key: IMPALA-9918 > URL: https://issues.apache.org/jira/browse/IMPALA-9918 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.0 > Environment: BUILD_TAG > jenkins-impala-cdpd-master-core-ubsan-111 >Reporter: Wenzhe Zhou >Assignee: Csaba Ringhofer >Priority: Major > Labels: broken-build > Attachments: 092420_backtraces.txt, backtraces.txt, backtraces.txt > > > Core file generated in impala-cdpd-master-core-ubsan build > Back traces: > CORE: ./tests/core.1594000709.13971.impalad > BINARY: ./be/build/latest/service/impalad > Core was generated by > `/data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/build/lat'. > Program terminated with signal SIGABRT, Aborted. > #0 0x7f7a481851f7 in raise () from /lib64/libc.so.6 > To enable execution of this file add > add-auto-load-safe-path > /data0/jenkins/workspace/impala-cdpd-master-core-ubsan/Impala-Toolchain/gcc-4.9.2/lib64/libstdc++.so.6.0.20-gdb.py > line to your configuration file "/var/lib/jenkins/.gdbinit". > To completely disable this security protection add > set auto-load safe-path / > line to your configuration file "/var/lib/jenkins/.gdbinit". > For more information about this security protection see the > "Auto-loading safe path" section in the GDB manual. E.g., run from the shell: > info "(gdb)Auto-loading safe path" > #0 0x7f7a481851f7 in raise () from /lib64/libc.so.6 > #1 0x7f7a481868e8 in abort () from /lib64/libc.so.6 > #2 0x083401c4 in google::DumpStackTraceAndExit() () > #3 0x08336b5d in google::LogMessage::Fail() () > #4 0x08338402 in google::LogMessage::SendToLog() () > #5 0x08336537 in google::LogMessage::Flush() () > #6 0x08339afe in google::LogMessageFatal::~LogMessageFatal() () > #7 0x03215662 in impala::PrintPath (tbl_desc=..., path=...) at > /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/util/debug-util.cc:259 > #8 0x0370dfe9 in impala::HdfsOrcScanner::ResolveColumns > (this=0x14555c00, tuple_desc=..., selected_nodes=0x7f79722730a8, > pos_slots=0x7f7972273058) at > /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/exec/hdfs-orc-scanner.cc:436 > #9 0x037099dd in impala::HdfsOrcScanner::SelectColumns > (this=0x14555c00, tuple_desc=...) at > /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/exec/hdfs-orc-scanner.cc:456 > #10 0x03707688 in impala::HdfsOrcScanner::Open (this=0x14555c00, > context=0x7f7972274700) at > /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/exec/hdfs-orc-scanner.cc:221 > #11 0x035e0a48 in > impala::HdfsScanNodeBase::CreateAndOpenScannerHelper (this=0x1b1c7100, > partition=0x142f9d00, context=0x7f7972274700, scanner=0x7f79722746f8) at > /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/exec/hdfs-scan-node-base.cc:882 > #12 0x039df2e8 in impala::HdfsScanNode::ProcessSplit > (this=0x1b1c7100, filter_ctxs=..., expr_results_pool=0x7f7972274bd8, > scan_range=0x12a16c40, scanner_thread_reservation=0x7f7972274e18) at > /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/exec/hdfs-scan-node.cc:480 > #13 0x039ddd85 in impala::HdfsScanNode::ScannerThread > (this=0x1b1c7100, first_thread=true, scanner_thread_reservation=8192) at > /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/exec/hdfs-scan-node.cc:418 > #14 0x039e1980 in > impala::HdfsScanNode::ThreadTokenAvailableCb(impala::ThreadResourcePool*)::$_0::operator()() > const (this=0x7f7972275450) at > /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/exec/hdfs-scan-node.cc:339 > #15 0x039e13b2 in > boost::detail::function::void_function_obj_invoker0 void>::invoke(boost::detail::function::function_buffer&) > (function_obj_ptr=...) at > /data/jenkins/workspace/impala-cdpd-master-core-ubsan/Impala-Toolchain/boost-1.61.0-p2/include/boost/function/function_template.hpp:159 > #16 0x024d46f0 in boost::function0::operator() > (this=0x7f7972275448) at > /data/jenkins/workspace/impala-cdpd-master-core-ubsan/Impala-Toolchain/boost-1.61.0-p2/include/boost/function/function_template.hpp:770 > #17 0x03425ba7 in
[jira] [Commented] (IMPALA-9918) HdfsOrcScanner crash on resolving columns
[ https://issues.apache.org/jira/browse/IMPALA-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17216998#comment-17216998 ] Csaba Ringhofer commented on IMPALA-9918: - This seems to be a weird logging bug. What is sure is that https://github.com/apache/impala/blob/master/be/src/exec/hdfs-orc-scanner.cc#L455 is wrong - PrintPath always hits the dcheck if it is called, which should happen only with GLOG_v=3 logging. This is clearly a bug, as nearly all queries on ORC tables with hit this if GLOG_v=3. This is only a problem in DEBUG builds, PrintPath returns a sensible result in RELEASE. The mysterious part is that we shouldn't use GLOG_v=3. in the tests that broke, and a logs also don't seem to be so verbose. On lower verbosity the functions called during logging should not be invoked. I suspect this to be some kind of GLOG bug. > HdfsOrcScanner crash on resolving columns > - > > Key: IMPALA-9918 > URL: https://issues.apache.org/jira/browse/IMPALA-9918 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.0 > Environment: BUILD_TAG > jenkins-impala-cdpd-master-core-ubsan-111 >Reporter: Wenzhe Zhou >Assignee: Csaba Ringhofer >Priority: Major > Labels: broken-build > Attachments: 092420_backtraces.txt, backtraces.txt, backtraces.txt > > > Core file generated in impala-cdpd-master-core-ubsan build > Back traces: > CORE: ./tests/core.1594000709.13971.impalad > BINARY: ./be/build/latest/service/impalad > Core was generated by > `/data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/build/lat'. > Program terminated with signal SIGABRT, Aborted. > #0 0x7f7a481851f7 in raise () from /lib64/libc.so.6 > To enable execution of this file add > add-auto-load-safe-path > /data0/jenkins/workspace/impala-cdpd-master-core-ubsan/Impala-Toolchain/gcc-4.9.2/lib64/libstdc++.so.6.0.20-gdb.py > line to your configuration file "/var/lib/jenkins/.gdbinit". > To completely disable this security protection add > set auto-load safe-path / > line to your configuration file "/var/lib/jenkins/.gdbinit". > For more information about this security protection see the > "Auto-loading safe path" section in the GDB manual. E.g., run from the shell: > info "(gdb)Auto-loading safe path" > #0 0x7f7a481851f7 in raise () from /lib64/libc.so.6 > #1 0x7f7a481868e8 in abort () from /lib64/libc.so.6 > #2 0x083401c4 in google::DumpStackTraceAndExit() () > #3 0x08336b5d in google::LogMessage::Fail() () > #4 0x08338402 in google::LogMessage::SendToLog() () > #5 0x08336537 in google::LogMessage::Flush() () > #6 0x08339afe in google::LogMessageFatal::~LogMessageFatal() () > #7 0x03215662 in impala::PrintPath (tbl_desc=..., path=...) at > /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/util/debug-util.cc:259 > #8 0x0370dfe9 in impala::HdfsOrcScanner::ResolveColumns > (this=0x14555c00, tuple_desc=..., selected_nodes=0x7f79722730a8, > pos_slots=0x7f7972273058) at > /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/exec/hdfs-orc-scanner.cc:436 > #9 0x037099dd in impala::HdfsOrcScanner::SelectColumns > (this=0x14555c00, tuple_desc=...) at > /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/exec/hdfs-orc-scanner.cc:456 > #10 0x03707688 in impala::HdfsOrcScanner::Open (this=0x14555c00, > context=0x7f7972274700) at > /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/exec/hdfs-orc-scanner.cc:221 > #11 0x035e0a48 in > impala::HdfsScanNodeBase::CreateAndOpenScannerHelper (this=0x1b1c7100, > partition=0x142f9d00, context=0x7f7972274700, scanner=0x7f79722746f8) at > /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/exec/hdfs-scan-node-base.cc:882 > #12 0x039df2e8 in impala::HdfsScanNode::ProcessSplit > (this=0x1b1c7100, filter_ctxs=..., expr_results_pool=0x7f7972274bd8, > scan_range=0x12a16c40, scanner_thread_reservation=0x7f7972274e18) at > /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/exec/hdfs-scan-node.cc:480 > #13 0x039ddd85 in impala::HdfsScanNode::ScannerThread > (this=0x1b1c7100, first_thread=true, scanner_thread_reservation=8192) at > /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/exec/hdfs-scan-node.cc:418 > #14 0x039e1980 in > impala::HdfsScanNode::ThreadTokenAvailableCb(impala::ThreadResourcePool*)::$_0::operator()() > const (this=0x7f7972275450) at > /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/exec/hdfs-scan-node.cc:339 > #15 0x039e13b2 in > boost::detail::function::void_function_obj_invoker0
[jira] [Created] (IMPALA-10262) Linux Packaging Support
Shant Hovsepian created IMPALA-10262: Summary: Linux Packaging Support Key: IMPALA-10262 URL: https://issues.apache.org/jira/browse/IMPALA-10262 Project: IMPALA Issue Type: New Feature Reporter: Shant Hovsepian Would be nice if we could easily make installation packages from the Impala source code. For example RPM or DEB packages. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10261) impala-minimal-hive-exec should include org/apache/hive/com/google/**
Joe McDonnell created IMPALA-10261: -- Summary: impala-minimal-hive-exec should include org/apache/hive/com/google/** Key: IMPALA-10261 URL: https://issues.apache.org/jira/browse/IMPALA-10261 Project: IMPALA Issue Type: Bug Components: Frontend Affects Versions: Impala 4.0 Reporter: Joe McDonnell Hive started shading guava (com/google) with HIVE-22126, so impala-minimal-hive-exec should add org/apache/hive/com/google to its inclusions. This will allow Impala to build/work with newer versions of Hive that have this change. Leaving the existing com/google inclusion should let it work with both: [https://github.com/apache/impala/blob/master/java/shaded-deps/hive-exec/pom.xml#L116] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10253) Improve query performance contains dict function
[ https://issues.apache.org/jira/browse/IMPALA-10253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17216964#comment-17216964 ] Tim Armstrong commented on IMPALA-10253: You could try rewriting this query as a join as well, e.g. {noformat} SELECT events.*, d AS `event` FROM rawdata.event_view_p7 events INNER JOIN event_dict d on events.event_id = d.event_id WHERE d.event in ('SignUp', 'ViewProduct'); {noformat} A bloom filter on event_id should be pushed from the join into the scan of events in this case and that probably performs better than the UDF dict lookup. > Improve query performance contains dict function > > > Key: IMPALA-10253 > URL: https://issues.apache.org/jira/browse/IMPALA-10253 > Project: IMPALA > Issue Type: New Feature > Components: Frontend >Reporter: gaoxiaoqing >Priority: Major > > we have the following parquet table: > {code:java} > CREATE EXTERNAL TABLE rawdata.event_ros_p1 ( > event_id INT, > user_id BIGINT, > time TIMESTAMP, > p_abook_type STRING > ) > PARTITIONED BY ( > day INT, > event_bucket INT > ) > STORED AS PARQUET > LOCATION 'hdfs://localhost:20500/sa/data/1/event' > {code} > the data show as following: > ||event_id||user_id||time||p_abook_type|| > |1|-922235446862664806|2018-07-18 09:01:06.158|小说| > |2|-922235446862664806|2018-07-19 09:01:06.158|小说| > if we want remapping event_id to the real event name, we can realize dict > udf. the dict udf is defined as DICT(BIGINT expression, STRING path). first > parameter is the column, second parameter is hdfs path which store the > remapping rule like this: > {code:java} > 1,SignUp > 2,ViewProduct{code} > then build a view table which add the dict column on original table: > {code:java} > CREATE VIEW rawdata.event_external_view_p7 AS SELECT events.*, > dict(`event_id`, '/data/1/event.txt') AS `event` FROM rawdata.event_view_p7 > events > {code} > If the query group by column has dict, the query is slower then group by > original column. when explain the sql, we found that each line data need > remapping in SCAN phase and AGGREGATE phase. > {code:java} > select event, count(*) from event_external_view_p7 where event in ('SignUp', > 'ViewProduct') group by event;{code} > {code:java} > PLAN-ROOT SINK > | > 04:EXCHANGE [UNPARTITIONED] > | > 03:AGGREGATE [FINALIZE] > | output: count:merge(*) > | group by: event > | row-size=20B cardinality=0 > | > 02:EXCHANGE [HASH(event)] > | > 01:AGGREGATE [STREAMING] > | output: count(*) > | group by: rawdata.DICT(event_id, '/data/1/event.txt') > | row-size=20B cardinality=0 > | > 00:SCAN HDFS [rawdata.event_ros_p7_merge_offline] > | partitions=39/39 files=99 size=9.00GB > | predicates: rawdata.DICT(event_id, '/data/1/event.txt') IN ('SignUp', > 'ViewProduct') > | row-size=4B cardinality=unavailable > {code} > the idea is to modify plan, use original column in SCAN phase and AGGREGATE > phase and remapping the original column at last, the new plan like this: > {code:java} > PLAN-ROOT SINK > | > 05:SELECT [FINALIZE] > | output: dict(event_id) > | row-size=20B cardinality=0 > | > 04:EXCHANGE [UNPARTITIONED] > | > 03:AGGREGATE [FINALIZE] > | output: count:merge(*) > | group by: event_id > | row-size=20B cardinality=0 > | > 02:EXCHANGE [HASH(event)] > | > 01:AGGREGATE [STREAMING] > | output: count(*) > | group by: event_id > | row-size=20B cardinality=0 > | > 00:SCAN HDFS [rawdata.event_ros_p7_merge_offline] > | partitions=39/39 files=99 size=9.00GB > | predicates: event_id IN (1, 2) > | row-size=4B cardinality=unavailable > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10253) Improve query performance contains dict function
[ https://issues.apache.org/jira/browse/IMPALA-10253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17216961#comment-17216961 ] Tim Armstrong commented on IMPALA-10253: [~gaoxiaoqing] if you created a UDF that did a reverse dictionary lookup you could rewrite the query something like this: {noformat} select event, count(*) from event_external_view_p7 where event_id in (dict_reverse('SignUp', '/path/to/dict'), dict_reverse('ViewProduct', '/path/to/dict'))) group by event; {noformat} I think to do it automatically you'd need some additional metadata for the UDF that specified the inverse of the function, then to do an expression rewrite rule that detected a pattern that could exploit it. We did try to do a similar rewrite for strings cases using the expression rewriter in the frontend - IMPALA-5929 (although had to revert that change because it had bugs). Anyway I added this as a contributor if you want to assign this JIRA to yourself. > Improve query performance contains dict function > > > Key: IMPALA-10253 > URL: https://issues.apache.org/jira/browse/IMPALA-10253 > Project: IMPALA > Issue Type: New Feature > Components: Frontend >Reporter: gaoxiaoqing >Priority: Major > > we have the following parquet table: > {code:java} > CREATE EXTERNAL TABLE rawdata.event_ros_p1 ( > event_id INT, > user_id BIGINT, > time TIMESTAMP, > p_abook_type STRING > ) > PARTITIONED BY ( > day INT, > event_bucket INT > ) > STORED AS PARQUET > LOCATION 'hdfs://localhost:20500/sa/data/1/event' > {code} > the data show as following: > ||event_id||user_id||time||p_abook_type|| > |1|-922235446862664806|2018-07-18 09:01:06.158|小说| > |2|-922235446862664806|2018-07-19 09:01:06.158|小说| > if we want remapping event_id to the real event name, we can realize dict > udf. the dict udf is defined as DICT(BIGINT expression, STRING path). first > parameter is the column, second parameter is hdfs path which store the > remapping rule like this: > {code:java} > 1,SignUp > 2,ViewProduct{code} > then build a view table which add the dict column on original table: > {code:java} > CREATE VIEW rawdata.event_external_view_p7 AS SELECT events.*, > dict(`event_id`, '/data/1/event.txt') AS `event` FROM rawdata.event_view_p7 > events > {code} > If the query group by column has dict, the query is slower then group by > original column. when explain the sql, we found that each line data need > remapping in SCAN phase and AGGREGATE phase. > {code:java} > select event, count(*) from event_external_view_p7 where event in ('SignUp', > 'ViewProduct') group by event;{code} > {code:java} > PLAN-ROOT SINK > | > 04:EXCHANGE [UNPARTITIONED] > | > 03:AGGREGATE [FINALIZE] > | output: count:merge(*) > | group by: event > | row-size=20B cardinality=0 > | > 02:EXCHANGE [HASH(event)] > | > 01:AGGREGATE [STREAMING] > | output: count(*) > | group by: rawdata.DICT(event_id, '/data/1/event.txt') > | row-size=20B cardinality=0 > | > 00:SCAN HDFS [rawdata.event_ros_p7_merge_offline] > | partitions=39/39 files=99 size=9.00GB > | predicates: rawdata.DICT(event_id, '/data/1/event.txt') IN ('SignUp', > 'ViewProduct') > | row-size=4B cardinality=unavailable > {code} > the idea is to modify plan, use original column in SCAN phase and AGGREGATE > phase and remapping the original column at last, the new plan like this: > {code:java} > PLAN-ROOT SINK > | > 05:SELECT [FINALIZE] > | output: dict(event_id) > | row-size=20B cardinality=0 > | > 04:EXCHANGE [UNPARTITIONED] > | > 03:AGGREGATE [FINALIZE] > | output: count:merge(*) > | group by: event_id > | row-size=20B cardinality=0 > | > 02:EXCHANGE [HASH(event)] > | > 01:AGGREGATE [STREAMING] > | output: count(*) > | group by: event_id > | row-size=20B cardinality=0 > | > 00:SCAN HDFS [rawdata.event_ros_p7_merge_offline] > | partitions=39/39 files=99 size=9.00GB > | predicates: event_id IN (1, 2) > | row-size=4B cardinality=unavailable > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10259) Hit DCHECK in TestImpalaShell.test_completed_query_errors_2
[ https://issues.apache.org/jira/browse/IMPALA-10259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-10259: --- Labels: crash (was: ) > Hit DCHECK in TestImpalaShell.test_completed_query_errors_2 > --- > > Key: IMPALA-10259 > URL: https://issues.apache.org/jira/browse/IMPALA-10259 > Project: IMPALA > Issue Type: Bug >Reporter: Quanlong Huang >Assignee: Wenzhe Zhou >Priority: Blocker > Labels: crash > > TestImpalaShell.test_completed_query_errors_2 hits a DCHECK in a core ASAN > build: > {code:java} > F1016 17:08:54.728466 19955 query-state.cc:877] > 924f4ce603ac07bb:a08656e3] Check failed: is_cancelled_.Load() == 1 (0 > vs. 1) {code} > The test is: > {code:java} > shell.test_shell_commandline.TestImpalaShell.test_completed_query_errors_2[table_format_and_file_extension: > ('textfile', '.txt') | protocol: hs2] {code} > The query is: > {code:java} > I1016 17:08:49.026532 19947 Frontend.java:1522] > 924f4ce603ac07bb:a08656e3] Analyzing query: select id, cnt from > functional_parquet.bad_column_metadata t, (select 1 cnt) u db: default {code} > Query options: > {code:java} > I1016 17:08:49.020670 19947 impala-hs2-server.cc:269] > TClientRequest.queryOptions: TQueryOptions { > 01: abort_on_error (bool) = true, > 02: max_errors (i32) = 100, > 03: disable_codegen (bool) = false, > 04: batch_size (i32) = 0, > 05: num_nodes (i32) = 0, > 06: max_scan_range_length (i64) = 0, > 07: num_scanner_threads (i32) = 0, > 11: debug_action (string) = "", > 12: mem_limit (i64) = 0, > 15: hbase_caching (i32) = 0, > 16: hbase_cache_blocks (bool) = false, > 17: parquet_file_size (i64) = 0, > 18: explain_level (i32) = 1, > 19: sync_ddl (bool) = false, > 24: disable_outermost_topn (bool) = false, > 26: query_timeout_s (i32) = 0, > 28: appx_count_distinct (bool) = false, > 29: disable_unsafe_spills (bool) = false, > 31: exec_single_node_rows_threshold (i32) = 100, > 32: optimize_partition_key_scans (bool) = false, > 33: replica_preference (i32) = 0, > 34: schedule_random_replica (bool) = false, > 36: disable_streaming_preaggregations (bool) = false, > 37: runtime_filter_mode (i32) = 2, > 38: runtime_bloom_filter_size (i32) = 1048576, > 39: runtime_filter_wait_time_ms (i32) = 0, > 40: disable_row_runtime_filtering (bool) = false, > 41: max_num_runtime_filters (i32) = 10, > 42: parquet_annotate_strings_utf8 (bool) = false, > 43: parquet_fallback_schema_resolution (i32) = 0, > 45: s3_skip_insert_staging (bool) = true, > 46: runtime_filter_min_size (i32) = 1048576, > 47: runtime_filter_max_size (i32) = 16777216, > 48: prefetch_mode (i32) = 1, > 49: strict_mode (bool) = false, > 50: scratch_limit (i64) = -1, > 51: enable_expr_rewrites (bool) = true, > 52: decimal_v2 (bool) = true, > 53: parquet_dictionary_filtering (bool) = true, > 54: parquet_array_resolution (i32) = 0, > 55: parquet_read_statistics (bool) = true, > 56: default_join_distribution_mode (i32) = 0, > 57: disable_codegen_rows_threshold (i32) = 5, > 58: default_spillable_buffer_size (i64) = 2097152, > 59: min_spillable_buffer_size (i64) = 65536, > 60: max_row_size (i64) = 524288, > 61: idle_session_timeout (i32) = 0, > 62: compute_stats_min_sample_size (i64) = 1073741824, > 63: exec_time_limit_s (i32) = 0, > 64: shuffle_distinct_exprs (bool) = true, > 65: max_mem_estimate_for_admission (i64) = 0, > 66: thread_reservation_limit (i32) = 3000, > 67: thread_reservation_aggregate_limit (i32) = 0, > 68: kudu_read_mode (i32) = 0, > 69: allow_erasure_coded_files (bool) = false, > 70: timezone (string) = "", > 71: scan_bytes_limit (i64) = 0, > 72: cpu_limit_s (i64) = 0, > 73: topn_bytes_limit (i64) = 536870912, > 74: client_identifier (string) = "Impala Shell v4.0.0-SNAPSHOT (1e30eec) > built on Fri Oct 16 13:26:18 PDT 2020", > 75: resource_trace_ratio (double) = 0, > 76: num_remote_executor_candidates (i32) = 3, > 77: num_rows_produced_limit (i64) = 0, > 78: planner_testcase_mode (bool) = false, > 79: default_file_format (i32) = 0, > 80: parquet_timestamp_type (i32) = 0, > 81: parquet_read_page_index (bool) = true, > 82: parquet_write_page_index (bool) = true, > 84: disable_hdfs_num_rows_estimate (bool) = false, > 86: spool_query_results (bool) = false, > 87: default_transactional_type (i32) = 0, > 88: statement_expression_limit (i32) = 25, > 89: max_statement_length_bytes (i32) = 16777216, > 90: disable_data_cache (bool) = false, > 91: max_result_spooling_mem (i64) = 104857600, > 92: max_spilled_result_spooling_mem (i64) = 1073741824, > 93: disable_hbase_num_rows_estimate (bool) = false, > 94: fetch_rows_timeout_ms (i64) = 1, > 95: now_string
[jira] [Updated] (IMPALA-10259) Hit DCHECK in TestImpalaShell.test_completed_query_errors_2
[ https://issues.apache.org/jira/browse/IMPALA-10259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-10259: --- Priority: Blocker (was: Critical) > Hit DCHECK in TestImpalaShell.test_completed_query_errors_2 > --- > > Key: IMPALA-10259 > URL: https://issues.apache.org/jira/browse/IMPALA-10259 > Project: IMPALA > Issue Type: Bug >Reporter: Quanlong Huang >Assignee: Wenzhe Zhou >Priority: Blocker > > TestImpalaShell.test_completed_query_errors_2 hits a DCHECK in a core ASAN > build: > {code:java} > F1016 17:08:54.728466 19955 query-state.cc:877] > 924f4ce603ac07bb:a08656e3] Check failed: is_cancelled_.Load() == 1 (0 > vs. 1) {code} > The test is: > {code:java} > shell.test_shell_commandline.TestImpalaShell.test_completed_query_errors_2[table_format_and_file_extension: > ('textfile', '.txt') | protocol: hs2] {code} > The query is: > {code:java} > I1016 17:08:49.026532 19947 Frontend.java:1522] > 924f4ce603ac07bb:a08656e3] Analyzing query: select id, cnt from > functional_parquet.bad_column_metadata t, (select 1 cnt) u db: default {code} > Query options: > {code:java} > I1016 17:08:49.020670 19947 impala-hs2-server.cc:269] > TClientRequest.queryOptions: TQueryOptions { > 01: abort_on_error (bool) = true, > 02: max_errors (i32) = 100, > 03: disable_codegen (bool) = false, > 04: batch_size (i32) = 0, > 05: num_nodes (i32) = 0, > 06: max_scan_range_length (i64) = 0, > 07: num_scanner_threads (i32) = 0, > 11: debug_action (string) = "", > 12: mem_limit (i64) = 0, > 15: hbase_caching (i32) = 0, > 16: hbase_cache_blocks (bool) = false, > 17: parquet_file_size (i64) = 0, > 18: explain_level (i32) = 1, > 19: sync_ddl (bool) = false, > 24: disable_outermost_topn (bool) = false, > 26: query_timeout_s (i32) = 0, > 28: appx_count_distinct (bool) = false, > 29: disable_unsafe_spills (bool) = false, > 31: exec_single_node_rows_threshold (i32) = 100, > 32: optimize_partition_key_scans (bool) = false, > 33: replica_preference (i32) = 0, > 34: schedule_random_replica (bool) = false, > 36: disable_streaming_preaggregations (bool) = false, > 37: runtime_filter_mode (i32) = 2, > 38: runtime_bloom_filter_size (i32) = 1048576, > 39: runtime_filter_wait_time_ms (i32) = 0, > 40: disable_row_runtime_filtering (bool) = false, > 41: max_num_runtime_filters (i32) = 10, > 42: parquet_annotate_strings_utf8 (bool) = false, > 43: parquet_fallback_schema_resolution (i32) = 0, > 45: s3_skip_insert_staging (bool) = true, > 46: runtime_filter_min_size (i32) = 1048576, > 47: runtime_filter_max_size (i32) = 16777216, > 48: prefetch_mode (i32) = 1, > 49: strict_mode (bool) = false, > 50: scratch_limit (i64) = -1, > 51: enable_expr_rewrites (bool) = true, > 52: decimal_v2 (bool) = true, > 53: parquet_dictionary_filtering (bool) = true, > 54: parquet_array_resolution (i32) = 0, > 55: parquet_read_statistics (bool) = true, > 56: default_join_distribution_mode (i32) = 0, > 57: disable_codegen_rows_threshold (i32) = 5, > 58: default_spillable_buffer_size (i64) = 2097152, > 59: min_spillable_buffer_size (i64) = 65536, > 60: max_row_size (i64) = 524288, > 61: idle_session_timeout (i32) = 0, > 62: compute_stats_min_sample_size (i64) = 1073741824, > 63: exec_time_limit_s (i32) = 0, > 64: shuffle_distinct_exprs (bool) = true, > 65: max_mem_estimate_for_admission (i64) = 0, > 66: thread_reservation_limit (i32) = 3000, > 67: thread_reservation_aggregate_limit (i32) = 0, > 68: kudu_read_mode (i32) = 0, > 69: allow_erasure_coded_files (bool) = false, > 70: timezone (string) = "", > 71: scan_bytes_limit (i64) = 0, > 72: cpu_limit_s (i64) = 0, > 73: topn_bytes_limit (i64) = 536870912, > 74: client_identifier (string) = "Impala Shell v4.0.0-SNAPSHOT (1e30eec) > built on Fri Oct 16 13:26:18 PDT 2020", > 75: resource_trace_ratio (double) = 0, > 76: num_remote_executor_candidates (i32) = 3, > 77: num_rows_produced_limit (i64) = 0, > 78: planner_testcase_mode (bool) = false, > 79: default_file_format (i32) = 0, > 80: parquet_timestamp_type (i32) = 0, > 81: parquet_read_page_index (bool) = true, > 82: parquet_write_page_index (bool) = true, > 84: disable_hdfs_num_rows_estimate (bool) = false, > 86: spool_query_results (bool) = false, > 87: default_transactional_type (i32) = 0, > 88: statement_expression_limit (i32) = 25, > 89: max_statement_length_bytes (i32) = 16777216, > 90: disable_data_cache (bool) = false, > 91: max_result_spooling_mem (i64) = 104857600, > 92: max_spilled_result_spooling_mem (i64) = 1073741824, > 93: disable_hbase_num_rows_estimate (bool) = false, > 94: fetch_rows_timeout_ms (i64) = 1, > 95: now_string (string) = "", >
[jira] [Updated] (IMPALA-10259) Hit DCHECK in TestImpalaShell.test_completed_query_errors_2
[ https://issues.apache.org/jira/browse/IMPALA-10259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-10259: --- Component/s: Backend > Hit DCHECK in TestImpalaShell.test_completed_query_errors_2 > --- > > Key: IMPALA-10259 > URL: https://issues.apache.org/jira/browse/IMPALA-10259 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Quanlong Huang >Assignee: Wenzhe Zhou >Priority: Blocker > Labels: crash > > TestImpalaShell.test_completed_query_errors_2 hits a DCHECK in a core ASAN > build: > {code:java} > F1016 17:08:54.728466 19955 query-state.cc:877] > 924f4ce603ac07bb:a08656e3] Check failed: is_cancelled_.Load() == 1 (0 > vs. 1) {code} > The test is: > {code:java} > shell.test_shell_commandline.TestImpalaShell.test_completed_query_errors_2[table_format_and_file_extension: > ('textfile', '.txt') | protocol: hs2] {code} > The query is: > {code:java} > I1016 17:08:49.026532 19947 Frontend.java:1522] > 924f4ce603ac07bb:a08656e3] Analyzing query: select id, cnt from > functional_parquet.bad_column_metadata t, (select 1 cnt) u db: default {code} > Query options: > {code:java} > I1016 17:08:49.020670 19947 impala-hs2-server.cc:269] > TClientRequest.queryOptions: TQueryOptions { > 01: abort_on_error (bool) = true, > 02: max_errors (i32) = 100, > 03: disable_codegen (bool) = false, > 04: batch_size (i32) = 0, > 05: num_nodes (i32) = 0, > 06: max_scan_range_length (i64) = 0, > 07: num_scanner_threads (i32) = 0, > 11: debug_action (string) = "", > 12: mem_limit (i64) = 0, > 15: hbase_caching (i32) = 0, > 16: hbase_cache_blocks (bool) = false, > 17: parquet_file_size (i64) = 0, > 18: explain_level (i32) = 1, > 19: sync_ddl (bool) = false, > 24: disable_outermost_topn (bool) = false, > 26: query_timeout_s (i32) = 0, > 28: appx_count_distinct (bool) = false, > 29: disable_unsafe_spills (bool) = false, > 31: exec_single_node_rows_threshold (i32) = 100, > 32: optimize_partition_key_scans (bool) = false, > 33: replica_preference (i32) = 0, > 34: schedule_random_replica (bool) = false, > 36: disable_streaming_preaggregations (bool) = false, > 37: runtime_filter_mode (i32) = 2, > 38: runtime_bloom_filter_size (i32) = 1048576, > 39: runtime_filter_wait_time_ms (i32) = 0, > 40: disable_row_runtime_filtering (bool) = false, > 41: max_num_runtime_filters (i32) = 10, > 42: parquet_annotate_strings_utf8 (bool) = false, > 43: parquet_fallback_schema_resolution (i32) = 0, > 45: s3_skip_insert_staging (bool) = true, > 46: runtime_filter_min_size (i32) = 1048576, > 47: runtime_filter_max_size (i32) = 16777216, > 48: prefetch_mode (i32) = 1, > 49: strict_mode (bool) = false, > 50: scratch_limit (i64) = -1, > 51: enable_expr_rewrites (bool) = true, > 52: decimal_v2 (bool) = true, > 53: parquet_dictionary_filtering (bool) = true, > 54: parquet_array_resolution (i32) = 0, > 55: parquet_read_statistics (bool) = true, > 56: default_join_distribution_mode (i32) = 0, > 57: disable_codegen_rows_threshold (i32) = 5, > 58: default_spillable_buffer_size (i64) = 2097152, > 59: min_spillable_buffer_size (i64) = 65536, > 60: max_row_size (i64) = 524288, > 61: idle_session_timeout (i32) = 0, > 62: compute_stats_min_sample_size (i64) = 1073741824, > 63: exec_time_limit_s (i32) = 0, > 64: shuffle_distinct_exprs (bool) = true, > 65: max_mem_estimate_for_admission (i64) = 0, > 66: thread_reservation_limit (i32) = 3000, > 67: thread_reservation_aggregate_limit (i32) = 0, > 68: kudu_read_mode (i32) = 0, > 69: allow_erasure_coded_files (bool) = false, > 70: timezone (string) = "", > 71: scan_bytes_limit (i64) = 0, > 72: cpu_limit_s (i64) = 0, > 73: topn_bytes_limit (i64) = 536870912, > 74: client_identifier (string) = "Impala Shell v4.0.0-SNAPSHOT (1e30eec) > built on Fri Oct 16 13:26:18 PDT 2020", > 75: resource_trace_ratio (double) = 0, > 76: num_remote_executor_candidates (i32) = 3, > 77: num_rows_produced_limit (i64) = 0, > 78: planner_testcase_mode (bool) = false, > 79: default_file_format (i32) = 0, > 80: parquet_timestamp_type (i32) = 0, > 81: parquet_read_page_index (bool) = true, > 82: parquet_write_page_index (bool) = true, > 84: disable_hdfs_num_rows_estimate (bool) = false, > 86: spool_query_results (bool) = false, > 87: default_transactional_type (i32) = 0, > 88: statement_expression_limit (i32) = 25, > 89: max_statement_length_bytes (i32) = 16777216, > 90: disable_data_cache (bool) = false, > 91: max_result_spooling_mem (i64) = 104857600, > 92: max_spilled_result_spooling_mem (i64) = 1073741824, > 93: disable_hbase_num_rows_estimate (bool) = false, > 94: fetch_rows_timeout_ms
[jira] [Updated] (IMPALA-10259) Hit DCHECK in TestImpalaShell.test_completed_query_errors_2
[ https://issues.apache.org/jira/browse/IMPALA-10259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-10259: --- Labels: broken-build crash (was: crash) > Hit DCHECK in TestImpalaShell.test_completed_query_errors_2 > --- > > Key: IMPALA-10259 > URL: https://issues.apache.org/jira/browse/IMPALA-10259 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Quanlong Huang >Assignee: Wenzhe Zhou >Priority: Blocker > Labels: broken-build, crash > > TestImpalaShell.test_completed_query_errors_2 hits a DCHECK in a core ASAN > build: > {code:java} > F1016 17:08:54.728466 19955 query-state.cc:877] > 924f4ce603ac07bb:a08656e3] Check failed: is_cancelled_.Load() == 1 (0 > vs. 1) {code} > The test is: > {code:java} > shell.test_shell_commandline.TestImpalaShell.test_completed_query_errors_2[table_format_and_file_extension: > ('textfile', '.txt') | protocol: hs2] {code} > The query is: > {code:java} > I1016 17:08:49.026532 19947 Frontend.java:1522] > 924f4ce603ac07bb:a08656e3] Analyzing query: select id, cnt from > functional_parquet.bad_column_metadata t, (select 1 cnt) u db: default {code} > Query options: > {code:java} > I1016 17:08:49.020670 19947 impala-hs2-server.cc:269] > TClientRequest.queryOptions: TQueryOptions { > 01: abort_on_error (bool) = true, > 02: max_errors (i32) = 100, > 03: disable_codegen (bool) = false, > 04: batch_size (i32) = 0, > 05: num_nodes (i32) = 0, > 06: max_scan_range_length (i64) = 0, > 07: num_scanner_threads (i32) = 0, > 11: debug_action (string) = "", > 12: mem_limit (i64) = 0, > 15: hbase_caching (i32) = 0, > 16: hbase_cache_blocks (bool) = false, > 17: parquet_file_size (i64) = 0, > 18: explain_level (i32) = 1, > 19: sync_ddl (bool) = false, > 24: disable_outermost_topn (bool) = false, > 26: query_timeout_s (i32) = 0, > 28: appx_count_distinct (bool) = false, > 29: disable_unsafe_spills (bool) = false, > 31: exec_single_node_rows_threshold (i32) = 100, > 32: optimize_partition_key_scans (bool) = false, > 33: replica_preference (i32) = 0, > 34: schedule_random_replica (bool) = false, > 36: disable_streaming_preaggregations (bool) = false, > 37: runtime_filter_mode (i32) = 2, > 38: runtime_bloom_filter_size (i32) = 1048576, > 39: runtime_filter_wait_time_ms (i32) = 0, > 40: disable_row_runtime_filtering (bool) = false, > 41: max_num_runtime_filters (i32) = 10, > 42: parquet_annotate_strings_utf8 (bool) = false, > 43: parquet_fallback_schema_resolution (i32) = 0, > 45: s3_skip_insert_staging (bool) = true, > 46: runtime_filter_min_size (i32) = 1048576, > 47: runtime_filter_max_size (i32) = 16777216, > 48: prefetch_mode (i32) = 1, > 49: strict_mode (bool) = false, > 50: scratch_limit (i64) = -1, > 51: enable_expr_rewrites (bool) = true, > 52: decimal_v2 (bool) = true, > 53: parquet_dictionary_filtering (bool) = true, > 54: parquet_array_resolution (i32) = 0, > 55: parquet_read_statistics (bool) = true, > 56: default_join_distribution_mode (i32) = 0, > 57: disable_codegen_rows_threshold (i32) = 5, > 58: default_spillable_buffer_size (i64) = 2097152, > 59: min_spillable_buffer_size (i64) = 65536, > 60: max_row_size (i64) = 524288, > 61: idle_session_timeout (i32) = 0, > 62: compute_stats_min_sample_size (i64) = 1073741824, > 63: exec_time_limit_s (i32) = 0, > 64: shuffle_distinct_exprs (bool) = true, > 65: max_mem_estimate_for_admission (i64) = 0, > 66: thread_reservation_limit (i32) = 3000, > 67: thread_reservation_aggregate_limit (i32) = 0, > 68: kudu_read_mode (i32) = 0, > 69: allow_erasure_coded_files (bool) = false, > 70: timezone (string) = "", > 71: scan_bytes_limit (i64) = 0, > 72: cpu_limit_s (i64) = 0, > 73: topn_bytes_limit (i64) = 536870912, > 74: client_identifier (string) = "Impala Shell v4.0.0-SNAPSHOT (1e30eec) > built on Fri Oct 16 13:26:18 PDT 2020", > 75: resource_trace_ratio (double) = 0, > 76: num_remote_executor_candidates (i32) = 3, > 77: num_rows_produced_limit (i64) = 0, > 78: planner_testcase_mode (bool) = false, > 79: default_file_format (i32) = 0, > 80: parquet_timestamp_type (i32) = 0, > 81: parquet_read_page_index (bool) = true, > 82: parquet_write_page_index (bool) = true, > 84: disable_hdfs_num_rows_estimate (bool) = false, > 86: spool_query_results (bool) = false, > 87: default_transactional_type (i32) = 0, > 88: statement_expression_limit (i32) = 25, > 89: max_statement_length_bytes (i32) = 16777216, > 90: disable_data_cache (bool) = false, > 91: max_result_spooling_mem (i64) = 104857600, > 92: max_spilled_result_spooling_mem (i64) = 1073741824, > 93: disable_hbase_num_rows_estimate (bool) = false,
[jira] [Updated] (IMPALA-10259) Hit DCHECK in TestImpalaShell.test_completed_query_errors_2
[ https://issues.apache.org/jira/browse/IMPALA-10259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-10259: --- Target Version: Impala 4.0 > Hit DCHECK in TestImpalaShell.test_completed_query_errors_2 > --- > > Key: IMPALA-10259 > URL: https://issues.apache.org/jira/browse/IMPALA-10259 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Quanlong Huang >Assignee: Wenzhe Zhou >Priority: Blocker > Labels: broken-build, crash > > TestImpalaShell.test_completed_query_errors_2 hits a DCHECK in a core ASAN > build: > {code:java} > F1016 17:08:54.728466 19955 query-state.cc:877] > 924f4ce603ac07bb:a08656e3] Check failed: is_cancelled_.Load() == 1 (0 > vs. 1) {code} > The test is: > {code:java} > shell.test_shell_commandline.TestImpalaShell.test_completed_query_errors_2[table_format_and_file_extension: > ('textfile', '.txt') | protocol: hs2] {code} > The query is: > {code:java} > I1016 17:08:49.026532 19947 Frontend.java:1522] > 924f4ce603ac07bb:a08656e3] Analyzing query: select id, cnt from > functional_parquet.bad_column_metadata t, (select 1 cnt) u db: default {code} > Query options: > {code:java} > I1016 17:08:49.020670 19947 impala-hs2-server.cc:269] > TClientRequest.queryOptions: TQueryOptions { > 01: abort_on_error (bool) = true, > 02: max_errors (i32) = 100, > 03: disable_codegen (bool) = false, > 04: batch_size (i32) = 0, > 05: num_nodes (i32) = 0, > 06: max_scan_range_length (i64) = 0, > 07: num_scanner_threads (i32) = 0, > 11: debug_action (string) = "", > 12: mem_limit (i64) = 0, > 15: hbase_caching (i32) = 0, > 16: hbase_cache_blocks (bool) = false, > 17: parquet_file_size (i64) = 0, > 18: explain_level (i32) = 1, > 19: sync_ddl (bool) = false, > 24: disable_outermost_topn (bool) = false, > 26: query_timeout_s (i32) = 0, > 28: appx_count_distinct (bool) = false, > 29: disable_unsafe_spills (bool) = false, > 31: exec_single_node_rows_threshold (i32) = 100, > 32: optimize_partition_key_scans (bool) = false, > 33: replica_preference (i32) = 0, > 34: schedule_random_replica (bool) = false, > 36: disable_streaming_preaggregations (bool) = false, > 37: runtime_filter_mode (i32) = 2, > 38: runtime_bloom_filter_size (i32) = 1048576, > 39: runtime_filter_wait_time_ms (i32) = 0, > 40: disable_row_runtime_filtering (bool) = false, > 41: max_num_runtime_filters (i32) = 10, > 42: parquet_annotate_strings_utf8 (bool) = false, > 43: parquet_fallback_schema_resolution (i32) = 0, > 45: s3_skip_insert_staging (bool) = true, > 46: runtime_filter_min_size (i32) = 1048576, > 47: runtime_filter_max_size (i32) = 16777216, > 48: prefetch_mode (i32) = 1, > 49: strict_mode (bool) = false, > 50: scratch_limit (i64) = -1, > 51: enable_expr_rewrites (bool) = true, > 52: decimal_v2 (bool) = true, > 53: parquet_dictionary_filtering (bool) = true, > 54: parquet_array_resolution (i32) = 0, > 55: parquet_read_statistics (bool) = true, > 56: default_join_distribution_mode (i32) = 0, > 57: disable_codegen_rows_threshold (i32) = 5, > 58: default_spillable_buffer_size (i64) = 2097152, > 59: min_spillable_buffer_size (i64) = 65536, > 60: max_row_size (i64) = 524288, > 61: idle_session_timeout (i32) = 0, > 62: compute_stats_min_sample_size (i64) = 1073741824, > 63: exec_time_limit_s (i32) = 0, > 64: shuffle_distinct_exprs (bool) = true, > 65: max_mem_estimate_for_admission (i64) = 0, > 66: thread_reservation_limit (i32) = 3000, > 67: thread_reservation_aggregate_limit (i32) = 0, > 68: kudu_read_mode (i32) = 0, > 69: allow_erasure_coded_files (bool) = false, > 70: timezone (string) = "", > 71: scan_bytes_limit (i64) = 0, > 72: cpu_limit_s (i64) = 0, > 73: topn_bytes_limit (i64) = 536870912, > 74: client_identifier (string) = "Impala Shell v4.0.0-SNAPSHOT (1e30eec) > built on Fri Oct 16 13:26:18 PDT 2020", > 75: resource_trace_ratio (double) = 0, > 76: num_remote_executor_candidates (i32) = 3, > 77: num_rows_produced_limit (i64) = 0, > 78: planner_testcase_mode (bool) = false, > 79: default_file_format (i32) = 0, > 80: parquet_timestamp_type (i32) = 0, > 81: parquet_read_page_index (bool) = true, > 82: parquet_write_page_index (bool) = true, > 84: disable_hdfs_num_rows_estimate (bool) = false, > 86: spool_query_results (bool) = false, > 87: default_transactional_type (i32) = 0, > 88: statement_expression_limit (i32) = 25, > 89: max_statement_length_bytes (i32) = 16777216, > 90: disable_data_cache (bool) = false, > 91: max_result_spooling_mem (i64) = 104857600, > 92: max_spilled_result_spooling_mem (i64) = 1073741824, > 93: disable_hbase_num_rows_estimate (bool) = false, > 94:
[jira] [Updated] (IMPALA-9240) Impala shell using hs2-http reports all http error codes as EOFError
[ https://issues.apache.org/jira/browse/IMPALA-9240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-9240: -- Fix Version/s: (was: Impala 4.0) Impala 3.4.0 > Impala shell using hs2-http reports all http error codes as EOFError > > > Key: IMPALA-9240 > URL: https://issues.apache.org/jira/browse/IMPALA-9240 > Project: IMPALA > Issue Type: Bug >Reporter: Andrew Sherman >Assignee: Andrew Sherman >Priority: Major > Fix For: Impala 3.4.0 > > > For example if I try to connect to an http endpoint that returns 503 I see > {code} > $ impala-shell -V --protocol='hs2-http' -i "localhost:28000" > Starting Impala Shell without Kerberos authentication > Warning: --connect_timeout_ms is currently ignored with HTTP transport. > Opened TCP connection to localhost:28000 > Error connecting: EOFError, > {code} > At present Impala shell does not properly check http return calls. > When this fix is complete it should be also put into Impyla. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-9240) Impala shell using hs2-http reports all http error codes as EOFError
[ https://issues.apache.org/jira/browse/IMPALA-9240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-9240: -- Fix Version/s: Impala 4.0 > Impala shell using hs2-http reports all http error codes as EOFError > > > Key: IMPALA-9240 > URL: https://issues.apache.org/jira/browse/IMPALA-9240 > Project: IMPALA > Issue Type: Bug >Reporter: Andrew Sherman >Assignee: Andrew Sherman >Priority: Major > Fix For: Impala 4.0 > > > For example if I try to connect to an http endpoint that returns 503 I see > {code} > $ impala-shell -V --protocol='hs2-http' -i "localhost:28000" > Starting Impala Shell without Kerberos authentication > Warning: --connect_timeout_ms is currently ignored with HTTP transport. > Opened TCP connection to localhost:28000 > Error connecting: EOFError, > {code} > At present Impala shell does not properly check http return calls. > When this fix is complete it should be also put into Impyla. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-9240) Impala shell using hs2-http reports all http error codes as EOFError
[ https://issues.apache.org/jira/browse/IMPALA-9240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Sherman resolved IMPALA-9240. Resolution: Fixed Thanks [~tarmstrong] for pointing this out > Impala shell using hs2-http reports all http error codes as EOFError > > > Key: IMPALA-9240 > URL: https://issues.apache.org/jira/browse/IMPALA-9240 > Project: IMPALA > Issue Type: Bug >Reporter: Andrew Sherman >Assignee: Andrew Sherman >Priority: Major > > For example if I try to connect to an http endpoint that returns 503 I see > {code} > $ impala-shell -V --protocol='hs2-http' -i "localhost:28000" > Starting Impala Shell without Kerberos authentication > Warning: --connect_timeout_ms is currently ignored with HTTP transport. > Opened TCP connection to localhost:28000 > Error connecting: EOFError, > {code} > At present Impala shell does not properly check http return calls. > When this fix is complete it should be also put into Impyla. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10057) TransactionKeepalive NoClassDefFoundError floods logs during JDBC_TEST/FE_TEST
[ https://issues.apache.org/jira/browse/IMPALA-10057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17216853#comment-17216853 ] Joe McDonnell commented on IMPALA-10057: The maven test phase does not touch the fe/target/impala-frontend*.jar. It does recompile and touch fe/target/classes. Interesting enough, our classpath for impalads, catalogds, etc don't use fe/target/impala-frontend*.jar. Instead, they directly reference fe/target/classes: [https://github.com/apache/impala/blob/master/bin/impala-config.sh#L691] [https://github.com/apache/impala/blob/master/bin/set-classpath.sh#L33] So, there could be a race condition between the maven test phase deleting/recompiling the classes and the impalad/catalogd reading the class. One solution is to use fe/target/impala-frontend*.jar on the impalad/catalogd classpath rather than referencing fe/target/classes directly. The impala-frontend*.jar would get rebuilt at appropriate times, but it is untouched by the maven test phase. > TransactionKeepalive NoClassDefFoundError floods logs during JDBC_TEST/FE_TEST > -- > > Key: IMPALA-10057 > URL: https://issues.apache.org/jira/browse/IMPALA-10057 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 4.0 >Reporter: Joe McDonnell >Priority: Major > > For the both the normal tests and the docker-based tests, the Impala logs > generated during the FE_TEST/JDBC_TEST can be huge: > > {noformat} > $ du -c -h fe_test/ee_tests > 4.0K fe_test/ee_tests/minidumps/statestored > 4.0K fe_test/ee_tests/minidumps/impalad > 4.0K fe_test/ee_tests/minidumps/catalogd > 16K fe_test/ee_tests/minidumps > 352K fe_test/ee_tests/profiles > 81G fe_test/ee_tests > 81G total{noformat} > Creating a tarball of these logs takes 10 minutes. The Impalad/catalogd logs > are filled with this error over and over: > {noformat} > E0903 02:25:39.453887 12060 TransactionKeepalive.java:137] Unexpected > exception thrown > Java exception follows: > java.lang.BootstrapMethodError: java.lang.NoClassDefFoundError: > org/apache/impala/common/TransactionKeepalive$HeartbeatContext > at > org.apache.impala.common.TransactionKeepalive$DaemonThread.run(TransactionKeepalive.java:114) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.NoClassDefFoundError: > org/apache/impala/common/TransactionKeepalive$HeartbeatContext > ... 2 more > Caused by: java.lang.ClassNotFoundException: > org.apache.impala.common.TransactionKeepalive$HeartbeatContext > at java.net.URLClassLoader.findClass(URLClassLoader.java:382) > at java.lang.ClassLoader.loadClass(ClassLoader.java:418) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) > at java.lang.ClassLoader.loadClass(ClassLoader.java:351) > ... 2 more{noformat} > Two interesting points: > # The frontend/jdbc tests are passing, so all of these errors in the impalad > logs are not impacting tests. > # These errors aren't concurrently with any of the other tests (ee tests, > custom cluster tests, etc). > This is happening on normal core runs (including the GVO job that does > FE_TEST/JDBC_TEST) on both Ubuntu and Centos 7. It is also happening on > docker-based tests. A theory is that FE_TEST/JDBC_TEST have an Impala cluster > running and then invoke maven to run the tests. Maven could manipulate jars > while Impala is running. Maybe there is a race-condition or conflict when > manipulating those jars that could cause the NoClassDefFoundError. It makes > no sense for Impala not to be able to find > TransactionKeepalive$HeartbeatContext. > When it happens, it is in a tight loop, printing the message more than once > per millisecond. It fills the ERROR, WARNING, and INFO logs with that > message, sometimes for multiple Impalads and/or catalogd. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9240) Impala shell using hs2-http reports all http error codes as EOFError
[ https://issues.apache.org/jira/browse/IMPALA-9240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17216850#comment-17216850 ] Tim Armstrong commented on IMPALA-9240: --- [~asherman] can we close this? > Impala shell using hs2-http reports all http error codes as EOFError > > > Key: IMPALA-9240 > URL: https://issues.apache.org/jira/browse/IMPALA-9240 > Project: IMPALA > Issue Type: Bug >Reporter: Andrew Sherman >Assignee: Andrew Sherman >Priority: Major > > For example if I try to connect to an http endpoint that returns 503 I see > {code} > $ impala-shell -V --protocol='hs2-http' -i "localhost:28000" > Starting Impala Shell without Kerberos authentication > Warning: --connect_timeout_ms is currently ignored with HTTP transport. > Opened TCP connection to localhost:28000 > Error connecting: EOFError, > {code} > At present Impala shell does not properly check http return calls. > When this fix is complete it should be also put into Impyla. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-10062) TestCompressedNonText.test_insensitivity_to_extension can fail due to wrong filename
[ https://issues.apache.org/jira/browse/IMPALA-10062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe McDonnell resolved IMPALA-10062. Fix Version/s: Impala 4.0 Resolution: Fixed > TestCompressedNonText.test_insensitivity_to_extension can fail due to wrong > filename > > > Key: IMPALA-10062 > URL: https://issues.apache.org/jira/browse/IMPALA-10062 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Affects Versions: Impala 4.0 >Reporter: Joe McDonnell >Priority: Critical > Labels: broken-build, flaky > Fix For: Impala 4.0 > > > The fix for IMPALA-10005 added a new TestCompressedNonText test. It relies on > Hive generating specific file names when writing these compressed tables > (i.e. it expects a file named 00_0). It looks like that is not guaranteed > by dataload, which can lead to failures like this: > {noformat} > query_test/test_compressed_formats.py:142: in test_insensitivity_to_extension > unique_database, 'tinytable', db_suffix, '00_0', src_extension, ext) > query_test/test_compressed_formats.py:86: in _copy_and_query_compressed_file > self.filesystem_client.copy(src_file, dest_file, overwrite=True) > util/hdfs_util.py:79: in copy > self.hdfs_filesystem_client.copy(src, dst, overwrite) > util/hdfs_util.py:241: in copy > '{0} copy failed: '.format(self.filesystem_type) + stderr + "; " + stdout > E AssertionError: HDFS copy failed: cp: > `/test-warehouse/tinytable_avro_snap/00_0': No such file or directory > E ;{noformat} > The file list shows that the filename is actually > "/test-warehouse/tinytable_avro_snap/00_1" > We should update the test to tolerate this. The actual base filename doesn't > matter for this test. > I have seen this exactly once so far. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-10127) LIRS enforcement of tombstone limit has pathological performance scenarios
[ https://issues.apache.org/jira/browse/IMPALA-10127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe McDonnell resolved IMPALA-10127. Fix Version/s: Impala 4.0 Resolution: Fixed > LIRS enforcement of tombstone limit has pathological performance scenarios > -- > > Key: IMPALA-10127 > URL: https://issues.apache.org/jira/browse/IMPALA-10127 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.0 >Reporter: Joe McDonnell >Assignee: Joe McDonnell >Priority: Blocker > Fix For: Impala 4.0 > > > LIRS maintains metadata for some tombstone entries that have been evicted > from the cache (but might be seen again). To limit the memory used for these > entries, the lirs_tombstone_multiple limits the total number of tombstone > entries. The enforcement walks through the recency list from oldest to newest > and removes tombstone entries until it is back underneath the limit. This > requires it to go past a certain number of non-tombstone entries before it > reaches a tombstone entry. There are pathological cases where this > enforcement needs to walk past a very large number of entries before reaching > a tombstone entry. > Suppose that the cache never accesses the same entry more than once. Starting > from empty, the first entries representing 95% of the capacity are > automatically considered protected. The subsequent accesses are all > unprotected. In order to dislodge a protected entry, an entry needs to > accessed more than once. If every entry is unique, the protected entries are > never touched again and form a contiguous block as the oldest entries on the > recency list. Tombstone entries are above them, and unprotected elements are > the newest entries on the recency list. It looks like this: > Oldest > Protected entries (representing 95% of cache capacity) > ... > Tombstone entries > ... > Unprotected entries (representing 5% of cache capacity) > Newest > To enforce the tombstone limit, it would need to pass all the protected > entries to reach a single tombstone entry. I modified cache-bench to add a > case with UNIFORM distribution but a 500x ratio of entries to the cache size. > This shows pathological performance compared to the 3x ratio: > {noformat} > I0831 18:22:06.356406 2605 cache-bench.cc:180] Warming up... > I0831 18:22:07.357687 2605 cache-bench.cc:183] Running benchmark... > I0831 18:22:22.358944 2605 cache-bench.cc:191] UNIFORM ratio=3.00x > n_unique=786432: 3.48M lookups/sec < FINE > I0831 18:22:22.358958 2605 cache-bench.cc:192] UNIFORM ratio=3.00x > n_unique=786432: 33.3% hit rate > I0831 18:22:22.961802 2605 cache-bench.cc:180] Warming up... > I0831 18:22:24.010735 2605 cache-bench.cc:183] Running benchmark... > I0831 18:22:39.026588 2605 cache-bench.cc:191] UNIFORM ratio=500.00x > n_unique=131072000: 1.31k lookups/sec <- OUCH > I0831 18:22:39.026614 2605 cache-bench.cc:192] UNIFORM ratio=500.00x > n_unique=131072000: 0.2% hit rate{noformat} > We should rework the enforcement of the tombstone limit to avoid walking past > all those entries. One option is to keep the tombstone entries on their own > list. > Note that without the tombstone limit, this pathological case would use an > unbounded amount of memory (because the tombstone entries would never be > reach the bottom of the recency list and get removed). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-10198) Unify Java components into a single maven project
[ https://issues.apache.org/jira/browse/IMPALA-10198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe McDonnell resolved IMPALA-10198. Fix Version/s: Impala 4.0 Assignee: Joe McDonnell Resolution: Fixed > Unify Java components into a single maven project > - > > Key: IMPALA-10198 > URL: https://issues.apache.org/jira/browse/IMPALA-10198 > Project: IMPALA > Issue Type: Improvement > Components: Infrastructure >Affects Versions: Impala 4.0 >Reporter: Joe McDonnell >Assignee: Joe McDonnell >Priority: Major > Fix For: Impala 4.0 > > > Currently, there are multiple maven projects in Impala's source. Each one is > built separately with a separate maven invocation, while sharing a parent pom > (impala-parent/pom.xml). This requires artificial CMake dependencies to avoid > concurrent maven invocations (e.g. > [https://github.com/apache/impala/commit/4c3f701204f92f8753cf65a97fe4804d1f77bc08]). > > We should unify the Java projects into a single project with submodules. This > will allow a single maven invocation. This makes it easier to add new Java > submodules, and it fixes the "mvn versions:set" command. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10256) TestDisableFeatures.test_disable_incremental_metadata_updates fails
[ https://issues.apache.org/jira/browse/IMPALA-10256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-10256: --- Labels: broken-build (was: ) > TestDisableFeatures.test_disable_incremental_metadata_updates fails > --- > > Key: IMPALA-10256 > URL: https://issues.apache.org/jira/browse/IMPALA-10256 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Blocker > Labels: broken-build > > Saw test failures in internal CORE builds: > custom_cluster.test_disable_features.TestDisableFeatures.test_disable_incremental_metadata_updates[protocol: > beeswax | exec_option: \{'batch_size': 0, 'num_nodes': 0, > 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, > 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: > text/none-unique_database0] > {code:java} > custom_cluster/test_disable_features.py:45: in > test_disable_incremental_metadata_updates > use_db=unique_database, multiple_impalad=True) > common/impala_test_suite.py:662: in run_test_case > result = exec_fn(query, user=test_section.get('USER', '').strip() or None) > common/impala_test_suite.py:600: in __exec_in_impala > result = self.__execute_query(target_impalad_client, query, user=user) > common/impala_test_suite.py:909: in __execute_query > return impalad_client.execute(query, user=user) > common/impala_connection.py:205: in execute > return self.__beeswax_client.execute(sql_stmt, user=user) > beeswax/impala_beeswax.py:187: in execute > handle = self.__execute_query(query_string.strip(), user=user) > beeswax/impala_beeswax.py:363: in __execute_query > handle = self.execute_query_async(query_string, user=user) > beeswax/impala_beeswax.py:357: in execute_query_async > handle = self.__do_rpc(lambda: self.imp_service.query(query,)) > beeswax/impala_beeswax.py:520: in __do_rpc > raise ImpalaBeeswaxException(self.__build_error_message(b), b) > E ImpalaBeeswaxException: ImpalaBeeswaxException: > EINNER EXCEPTION: > EMESSAGE: AnalysisException: The specified cache pool does not exist: > testPool {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10256) TestDisableFeatures.test_disable_incremental_metadata_updates fails
[ https://issues.apache.org/jira/browse/IMPALA-10256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-10256: --- Component/s: Catalog > TestDisableFeatures.test_disable_incremental_metadata_updates fails > --- > > Key: IMPALA-10256 > URL: https://issues.apache.org/jira/browse/IMPALA-10256 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Blocker > > Saw test failures in internal CORE builds: > custom_cluster.test_disable_features.TestDisableFeatures.test_disable_incremental_metadata_updates[protocol: > beeswax | exec_option: \{'batch_size': 0, 'num_nodes': 0, > 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, > 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: > text/none-unique_database0] > {code:java} > custom_cluster/test_disable_features.py:45: in > test_disable_incremental_metadata_updates > use_db=unique_database, multiple_impalad=True) > common/impala_test_suite.py:662: in run_test_case > result = exec_fn(query, user=test_section.get('USER', '').strip() or None) > common/impala_test_suite.py:600: in __exec_in_impala > result = self.__execute_query(target_impalad_client, query, user=user) > common/impala_test_suite.py:909: in __execute_query > return impalad_client.execute(query, user=user) > common/impala_connection.py:205: in execute > return self.__beeswax_client.execute(sql_stmt, user=user) > beeswax/impala_beeswax.py:187: in execute > handle = self.__execute_query(query_string.strip(), user=user) > beeswax/impala_beeswax.py:363: in __execute_query > handle = self.execute_query_async(query_string, user=user) > beeswax/impala_beeswax.py:357: in execute_query_async > handle = self.__do_rpc(lambda: self.imp_service.query(query,)) > beeswax/impala_beeswax.py:520: in __do_rpc > raise ImpalaBeeswaxException(self.__build_error_message(b), b) > E ImpalaBeeswaxException: ImpalaBeeswaxException: > EINNER EXCEPTION: > EMESSAGE: AnalysisException: The specified cache pool does not exist: > testPool {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-10260) heap-use-after-free AddressSanitizer error in aggregating runtime filters
[ https://issues.apache.org/jira/browse/IMPALA-10260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao resolved IMPALA-10260. -- Resolution: Duplicate The JIRA seems to be a duplicate of IMPALA-9767. > heap-use-after-free AddressSanitizer error in aggregating runtime filters > - > > Key: IMPALA-10260 > URL: https://issues.apache.org/jira/browse/IMPALA-10260 > Project: IMPALA > Issue Type: Bug >Reporter: Quanlong Huang >Assignee: Fang-Yu Rao >Priority: Critical > > Saw the following ASAN failure in an internal CORE build: > {code:java} > ==7121==ERROR: AddressSanitizer: heap-use-after-free on address > 0x7fec0d74d800 at pc 0x01ae9f71 bp 0x7fecfe5d7180 sp 0x7fecfe5d6930 > READ of size 1048576 at 0x7fec0d74d800 thread T82 (rpc reactor-757) > #0 0x1ae9f70 in read_iovec(void*, __sanitizer::__sanitizer_iovec*, > unsigned long, unsigned long) > /mnt/source/llvm/llvm-5.0.1.src-p3/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:904 > #1 0x1b005d1 in read_msghdr(void*, __sanitizer::__sanitizer_msghdr*, > long) > /mnt/source/llvm/llvm-5.0.1.src-p3/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:2781 > #2 0x1b02eb3 in __interceptor_sendmsg > /mnt/source/llvm/llvm-5.0.1.src-p3/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:2796 > #3 0x399f54c in kudu::Socket::Writev(iovec const*, int, long*) > /data/jenkins/workspace/impala-cdpd-master-staging-core-asan/repos/Impala/be/src/kudu/util/net/socket.cc:447:3 > #4 0x35afe75 in kudu::rpc::OutboundTransfer::SendBuffer(kudu::Socket&) > /data/jenkins/workspace/impala-cdpd-master-staging-core-asan/repos/Impala/be/src/kudu/rpc/transfer.cc:227:26 > #5 0x35b8930 in kudu::rpc::Connection::WriteHandler(ev::io&, int) > /data/jenkins/workspace/impala-cdpd-master-staging-core-asan/repos/Impala/be/src/kudu/rpc/connection.cc:802:31 > #6 0x580bd12 in ev_invoke_pending > (/data0/jenkins/workspace/impala-cdpd-master-staging-core-asan/repos/Impala/be/build/debug/service/impalad+0x580bd12) > #7 0x3542c9c in kudu::rpc::ReactorThread::InvokePendingCb(ev_loop*) > /data/jenkins/workspace/impala-cdpd-master-staging-core-asan/repos/Impala/be/src/kudu/rpc/reactor.cc:196:3 > #8 0x580f3bf in ev_run > (/data0/jenkins/workspace/impala-cdpd-master-staging-core-asan/repos/Impala/be/build/debug/service/impalad+0x580f3bf) > #9 0x3542e91 in kudu::rpc::ReactorThread::RunThread() > /data/jenkins/workspace/impala-cdpd-master-staging-core-asan/repos/Impala/be/src/kudu/rpc/reactor.cc:497:9 > #10 0x35545cb in boost::_bi::bind_t kudu::rpc::ReactorThread>, > boost::_bi::list1 > > >::operator()() > /data/jenkins/workspace/impala-cdpd-master-staging-core-asan/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/bind.hpp:1222:16 > #11 0x23417c6 in boost::function0::operator()() const > /data/jenkins/workspace/impala-cdpd-master-staging-core-asan/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:770:14 > #12 0x233e039 in kudu::Thread::SuperviseThread(void*) > /data/jenkins/workspace/impala-cdpd-master-staging-core-asan/repos/Impala/be/src/kudu/util/thread.cc:675:3 > #13 0x7ff54bd29e24 in start_thread (/lib64/libpthread.so.0+0x7e24) > #14 0x7ff5487c034c in __clone (/lib64/libc.so.6+0xf834c) > 0x7fec0d74d800 is located 0 bytes inside of 1048577-byte region > [0x7fec0d74d800,0x7fec0d84d801) > freed by thread T112 here: > #0 0x1b6ff50 in operator delete(void*) > /mnt/source/llvm/llvm-5.0.1.src-p3/projects/compiler-rt/lib/asan/asan_new_delete.cc:137 > #1 0x7ff5490c35a9 in __gnu_cxx::new_allocator::deallocate(char*, > unsigned long) > /mnt/source/gcc/build-7.5.0/x86_64-pc-linux-gnu/libstdc++-v3/include/ext/new_allocator.h:125 > #2 0x7ff5490c35a9 in std::allocator_traits > >::deallocate(std::allocator&, char*, unsigned long) > /mnt/source/gcc/build-7.5.0/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/alloc_traits.h:462 > #3 0x7ff5490c35a9 in std::__cxx11::basic_string std::char_traits, std::allocator >::_M_destroy(unsigned long) > /mnt/source/gcc/build-7.5.0/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/basic_string.h:226 > #4 0x7ff5490c35a9 in std::__cxx11::basic_string std::char_traits, std::allocator >::reserve(unsigned long) > /mnt/source/gcc/build-7.5.0/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/basic_string.tcc:302 > previously allocated by thread T112 here: > #0 0x1b6f1e0 in operator new(unsigned long) > /mnt/source/llvm/llvm-5.0.1.src-p3/projects/compiler-rt/lib/asan/asan_new_delete.cc:92 > #1 0x1b73ece in void std::__cxx11::basic_string std::char_traits, std::allocator
[jira] [Assigned] (IMPALA-10132) Implement ds_hll_estimate_bounds()
[ https://issues.apache.org/jira/browse/IMPALA-10132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fucun Chu reassigned IMPALA-10132: -- Assignee: Fucun Chu > Implement ds_hll_estimate_bounds() > -- > > Key: IMPALA-10132 > URL: https://issues.apache.org/jira/browse/IMPALA-10132 > Project: IMPALA > Issue Type: Sub-task >Reporter: Adam Tamas >Assignee: Fucun Chu >Priority: Major > > In hive ds_hll_estimate_bounds() gives back an array of doubles. > An example for a sketch created from a table which contains only a single > value: > {code:java} > (select ds_hll_estimate_bounds(ds_hll_sketch(i)) from t;) > +---+ > | _c0 | > +---+ > | [1.0,1.0,1.998634873453] | > +---+ > {code} > The values of the array is probably a lower bound, an estimate and an upper > bound of the sketch. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10249) TestImpalaShell.test_queries_closed is flaky
[ https://issues.apache.org/jira/browse/IMPALA-10249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17216769#comment-17216769 ] Quanlong Huang commented on IMPALA-10249: - Hi [~tarmstr...@cloudera.com], this looks like an old test. Do you know who is suitable to be the assignee? > TestImpalaShell.test_queries_closed is flaky > > > Key: IMPALA-10249 > URL: https://issues.apache.org/jira/browse/IMPALA-10249 > Project: IMPALA > Issue Type: Bug >Reporter: Quanlong Huang >Priority: Critical > > Saw a failure in a CORE ASAN build: > shell.test_shell_commandline.TestImpalaShell.test_queries_closed[table_format_and_file_extension: > ('textfile', '.txt') | protocol: hs2-http] (from pytest) > {code:java} > /data/jenkins/workspace/impala-cdpd-master-core-asan/repos/Impala/tests/shell/test_shell_commandline.py:365: > in test_queries_closed > assert 0 == impalad_service.get_num_in_flight_queries() > E assert 0 == 1 > E+ where 1 = >() > E+where > = > 0x7ac8510>.get_num_in_flight_queries {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10249) TestImpalaShell.test_queries_closed is flaky
[ https://issues.apache.org/jira/browse/IMPALA-10249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-10249: Description: Saw a failure in a CORE ASAN build: shell.test_shell_commandline.TestImpalaShell.test_queries_closed[table_format_and_file_extension: ('textfile', '.txt') | protocol: hs2-http] (from pytest) {code:java} /data/jenkins/workspace/impala-cdpd-master-core-asan/repos/Impala/tests/shell/test_shell_commandline.py:365: in test_queries_closed assert 0 == impalad_service.get_num_in_flight_queries() E assert 0 == 1 E+ where 1 = >() E+where > = .get_num_in_flight_queries {code} was: Saw a failure in an internal job: shell.test_shell_commandline.TestImpalaShell.test_queries_closed[table_format_and_file_extension: ('textfile', '.txt') | protocol: hs2-http] (from pytest) {code:java} /data/jenkins/workspace/impala-cdpd-master-core-asan/repos/Impala/tests/shell/test_shell_commandline.py:365: in test_queries_closed assert 0 == impalad_service.get_num_in_flight_queries() E assert 0 == 1 E+ where 1 = >() E+where > = .get_num_in_flight_queries {code} > TestImpalaShell.test_queries_closed is flaky > > > Key: IMPALA-10249 > URL: https://issues.apache.org/jira/browse/IMPALA-10249 > Project: IMPALA > Issue Type: Bug >Reporter: Quanlong Huang >Priority: Critical > > Saw a failure in a CORE ASAN build: > shell.test_shell_commandline.TestImpalaShell.test_queries_closed[table_format_and_file_extension: > ('textfile', '.txt') | protocol: hs2-http] (from pytest) > {code:java} > /data/jenkins/workspace/impala-cdpd-master-core-asan/repos/Impala/tests/shell/test_shell_commandline.py:365: > in test_queries_closed > assert 0 == impalad_service.get_num_in_flight_queries() > E assert 0 == 1 > E+ where 1 = >() > E+where > = > 0x7ac8510>.get_num_in_flight_queries {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-10245) Test fails in TestKuduReadTokenSplit.test_kudu_scanner
[ https://issues.apache.org/jira/browse/IMPALA-10245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang reassigned IMPALA-10245: --- Assignee: Thomas Tauber-Marshall Hi [~twmarshall], randomly assign this to you. Feel free to reassign it. Thanks! > Test fails in TestKuduReadTokenSplit.test_kudu_scanner > -- > > Key: IMPALA-10245 > URL: https://issues.apache.org/jira/browse/IMPALA-10245 > Project: IMPALA > Issue Type: Bug >Reporter: Quanlong Huang >Assignee: Thomas Tauber-Marshall >Priority: Critical > > Tests with erasure-coding enabled failed in: > query_test.test_kudu.TestKuduReadTokenSplit.test_kudu_scanner[protocol: > beeswax | exec_option: \{'kudu_read_mode': 'READ_AT_SNAPSHOT', 'batch_size': > 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': > False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | > table_format: kudu/none] (from pytest) > {code:java} > query_test/test_kudu.py:1508: in test_kudu_scanner > targeted_kudu_scan_range_length=None, plans=plans) > query_test/test_kudu.py:1542: in __get_num_scanner_instances > assert len(matches.groups()) == 1 > E AttributeError: 'NoneType' object has no attribute 'groups' {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-10216) BufferPoolTest.WriteErrorBlacklistCompression is flaky on UBSAN builds
[ https://issues.apache.org/jira/browse/IMPALA-10216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang reassigned IMPALA-10216: --- Assignee: Tim Armstrong Hi [~tarmstr...@cloudera.com], assign this to you first since you added the test. Feel free to reassign it if you don't have bandwidth. Thanks! > BufferPoolTest.WriteErrorBlacklistCompression is flaky on UBSAN builds > -- > > Key: IMPALA-10216 > URL: https://issues.apache.org/jira/browse/IMPALA-10216 > Project: IMPALA > Issue Type: Bug >Reporter: Sahil Takiar >Assignee: Tim Armstrong >Priority: Critical > > Only seen this once so far: > {code} > BufferPoolTest.WriteErrorBlacklistCompression > Error Message > Value of: FindPageInDir(pages[NO_ERROR_QUERY], error_dir) != NULL > Actual: false > Expected: true > Stacktrace > Impala/be/src/runtime/bufferpool/buffer-pool-test.cc:1764 > Value of: FindPageInDir(pages[NO_ERROR_QUERY], error_dir) != NULL > Actual: false > Expected: true > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10260) heap-use-after-free AddressSanitizer error in aggregating runtime filters
[ https://issues.apache.org/jira/browse/IMPALA-10260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-10260: Description: Saw the following ASAN failure in an internal CORE build: {code:java} ==7121==ERROR: AddressSanitizer: heap-use-after-free on address 0x7fec0d74d800 at pc 0x01ae9f71 bp 0x7fecfe5d7180 sp 0x7fecfe5d6930 READ of size 1048576 at 0x7fec0d74d800 thread T82 (rpc reactor-757) #0 0x1ae9f70 in read_iovec(void*, __sanitizer::__sanitizer_iovec*, unsigned long, unsigned long) /mnt/source/llvm/llvm-5.0.1.src-p3/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:904 #1 0x1b005d1 in read_msghdr(void*, __sanitizer::__sanitizer_msghdr*, long) /mnt/source/llvm/llvm-5.0.1.src-p3/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:2781 #2 0x1b02eb3 in __interceptor_sendmsg /mnt/source/llvm/llvm-5.0.1.src-p3/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:2796 #3 0x399f54c in kudu::Socket::Writev(iovec const*, int, long*) /data/jenkins/workspace/impala-cdpd-master-staging-core-asan/repos/Impala/be/src/kudu/util/net/socket.cc:447:3 #4 0x35afe75 in kudu::rpc::OutboundTransfer::SendBuffer(kudu::Socket&) /data/jenkins/workspace/impala-cdpd-master-staging-core-asan/repos/Impala/be/src/kudu/rpc/transfer.cc:227:26 #5 0x35b8930 in kudu::rpc::Connection::WriteHandler(ev::io&, int) /data/jenkins/workspace/impala-cdpd-master-staging-core-asan/repos/Impala/be/src/kudu/rpc/connection.cc:802:31 #6 0x580bd12 in ev_invoke_pending (/data0/jenkins/workspace/impala-cdpd-master-staging-core-asan/repos/Impala/be/build/debug/service/impalad+0x580bd12) #7 0x3542c9c in kudu::rpc::ReactorThread::InvokePendingCb(ev_loop*) /data/jenkins/workspace/impala-cdpd-master-staging-core-asan/repos/Impala/be/src/kudu/rpc/reactor.cc:196:3 #8 0x580f3bf in ev_run (/data0/jenkins/workspace/impala-cdpd-master-staging-core-asan/repos/Impala/be/build/debug/service/impalad+0x580f3bf) #9 0x3542e91 in kudu::rpc::ReactorThread::RunThread() /data/jenkins/workspace/impala-cdpd-master-staging-core-asan/repos/Impala/be/src/kudu/rpc/reactor.cc:497:9 #10 0x35545cb in boost::_bi::bind_t, boost::_bi::list1 > >::operator()() /data/jenkins/workspace/impala-cdpd-master-staging-core-asan/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/bind.hpp:1222:16 #11 0x23417c6 in boost::function0::operator()() const /data/jenkins/workspace/impala-cdpd-master-staging-core-asan/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:770:14 #12 0x233e039 in kudu::Thread::SuperviseThread(void*) /data/jenkins/workspace/impala-cdpd-master-staging-core-asan/repos/Impala/be/src/kudu/util/thread.cc:675:3 #13 0x7ff54bd29e24 in start_thread (/lib64/libpthread.so.0+0x7e24) #14 0x7ff5487c034c in __clone (/lib64/libc.so.6+0xf834c) 0x7fec0d74d800 is located 0 bytes inside of 1048577-byte region [0x7fec0d74d800,0x7fec0d84d801) freed by thread T112 here: #0 0x1b6ff50 in operator delete(void*) /mnt/source/llvm/llvm-5.0.1.src-p3/projects/compiler-rt/lib/asan/asan_new_delete.cc:137 #1 0x7ff5490c35a9 in __gnu_cxx::new_allocator::deallocate(char*, unsigned long) /mnt/source/gcc/build-7.5.0/x86_64-pc-linux-gnu/libstdc++-v3/include/ext/new_allocator.h:125 #2 0x7ff5490c35a9 in std::allocator_traits >::deallocate(std::allocator&, char*, unsigned long) /mnt/source/gcc/build-7.5.0/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/alloc_traits.h:462 #3 0x7ff5490c35a9 in std::__cxx11::basic_string, std::allocator >::_M_destroy(unsigned long) /mnt/source/gcc/build-7.5.0/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/basic_string.h:226 #4 0x7ff5490c35a9 in std::__cxx11::basic_string, std::allocator >::reserve(unsigned long) /mnt/source/gcc/build-7.5.0/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/basic_string.tcc:302 previously allocated by thread T112 here: #0 0x1b6f1e0 in operator new(unsigned long) /mnt/source/llvm/llvm-5.0.1.src-p3/projects/compiler-rt/lib/asan/asan_new_delete.cc:92 #1 0x1b73ece in void std::__cxx11::basic_string, std::allocator >::_M_construct(char const*, char const*, std::forward_iterator_tag) /data/jenkins/workspace/impala-cdpd-master-staging-core-asan/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/basic_string.tcc:219:14 #2 0x7ff5490c5994 in void std::__cxx11::basic_string, std::allocator >::_M_construct_aux(char const*, char const*, std::__false_type) /mnt/source/gcc/build-7.5.0/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/basic_string.h:236 #3 0x7ff5490c5994 in void std::__cxx11::basic_string, std::allocator >::_M_construct(char const*, char const*)
[jira] [Updated] (IMPALA-10216) BufferPoolTest.WriteErrorBlacklistCompression is flaky on UBSAN builds
[ https://issues.apache.org/jira/browse/IMPALA-10216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-10216: Issue Type: Bug (was: Test) > BufferPoolTest.WriteErrorBlacklistCompression is flaky on UBSAN builds > -- > > Key: IMPALA-10216 > URL: https://issues.apache.org/jira/browse/IMPALA-10216 > Project: IMPALA > Issue Type: Bug >Reporter: Sahil Takiar >Priority: Critical > > Only seen this once so far: > {code} > BufferPoolTest.WriteErrorBlacklistCompression > Error Message > Value of: FindPageInDir(pages[NO_ERROR_QUERY], error_dir) != NULL > Actual: false > Expected: true > Stacktrace > Impala/be/src/runtime/bufferpool/buffer-pool-test.cc:1764 > Value of: FindPageInDir(pages[NO_ERROR_QUERY], error_dir) != NULL > Actual: false > Expected: true > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10260) heap-use-after-free AddressSanitizer error in aggregating runtime filters
Quanlong Huang created IMPALA-10260: --- Summary: heap-use-after-free AddressSanitizer error in aggregating runtime filters Key: IMPALA-10260 URL: https://issues.apache.org/jira/browse/IMPALA-10260 Project: IMPALA Issue Type: Bug Reporter: Quanlong Huang Assignee: Fang-Yu Rao Saw the following ASAN failure in an internal build: {code:java} ==7121==ERROR: AddressSanitizer: heap-use-after-free on address 0x7fec0d74d800 at pc 0x01ae9f71 bp 0x7fecfe5d7180 sp 0x7fecfe5d6930 READ of size 1048576 at 0x7fec0d74d800 thread T82 (rpc reactor-757) #0 0x1ae9f70 in read_iovec(void*, __sanitizer::__sanitizer_iovec*, unsigned long, unsigned long) /mnt/source/llvm/llvm-5.0.1.src-p3/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:904 #1 0x1b005d1 in read_msghdr(void*, __sanitizer::__sanitizer_msghdr*, long) /mnt/source/llvm/llvm-5.0.1.src-p3/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:2781 #2 0x1b02eb3 in __interceptor_sendmsg /mnt/source/llvm/llvm-5.0.1.src-p3/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:2796 #3 0x399f54c in kudu::Socket::Writev(iovec const*, int, long*) /data/jenkins/workspace/impala-cdpd-master-staging-core-asan/repos/Impala/be/src/kudu/util/net/socket.cc:447:3 #4 0x35afe75 in kudu::rpc::OutboundTransfer::SendBuffer(kudu::Socket&) /data/jenkins/workspace/impala-cdpd-master-staging-core-asan/repos/Impala/be/src/kudu/rpc/transfer.cc:227:26 #5 0x35b8930 in kudu::rpc::Connection::WriteHandler(ev::io&, int) /data/jenkins/workspace/impala-cdpd-master-staging-core-asan/repos/Impala/be/src/kudu/rpc/connection.cc:802:31 #6 0x580bd12 in ev_invoke_pending (/data0/jenkins/workspace/impala-cdpd-master-staging-core-asan/repos/Impala/be/build/debug/service/impalad+0x580bd12) #7 0x3542c9c in kudu::rpc::ReactorThread::InvokePendingCb(ev_loop*) /data/jenkins/workspace/impala-cdpd-master-staging-core-asan/repos/Impala/be/src/kudu/rpc/reactor.cc:196:3 #8 0x580f3bf in ev_run (/data0/jenkins/workspace/impala-cdpd-master-staging-core-asan/repos/Impala/be/build/debug/service/impalad+0x580f3bf) #9 0x3542e91 in kudu::rpc::ReactorThread::RunThread() /data/jenkins/workspace/impala-cdpd-master-staging-core-asan/repos/Impala/be/src/kudu/rpc/reactor.cc:497:9 #10 0x35545cb in boost::_bi::bind_t, boost::_bi::list1 > >::operator()() /data/jenkins/workspace/impala-cdpd-master-staging-core-asan/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/bind.hpp:1222:16 #11 0x23417c6 in boost::function0::operator()() const /data/jenkins/workspace/impala-cdpd-master-staging-core-asan/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:770:14 #12 0x233e039 in kudu::Thread::SuperviseThread(void*) /data/jenkins/workspace/impala-cdpd-master-staging-core-asan/repos/Impala/be/src/kudu/util/thread.cc:675:3 #13 0x7ff54bd29e24 in start_thread (/lib64/libpthread.so.0+0x7e24) #14 0x7ff5487c034c in __clone (/lib64/libc.so.6+0xf834c) 0x7fec0d74d800 is located 0 bytes inside of 1048577-byte region [0x7fec0d74d800,0x7fec0d84d801) freed by thread T112 here: #0 0x1b6ff50 in operator delete(void*) /mnt/source/llvm/llvm-5.0.1.src-p3/projects/compiler-rt/lib/asan/asan_new_delete.cc:137 #1 0x7ff5490c35a9 in __gnu_cxx::new_allocator::deallocate(char*, unsigned long) /mnt/source/gcc/build-7.5.0/x86_64-pc-linux-gnu/libstdc++-v3/include/ext/new_allocator.h:125 #2 0x7ff5490c35a9 in std::allocator_traits >::deallocate(std::allocator&, char*, unsigned long) /mnt/source/gcc/build-7.5.0/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/alloc_traits.h:462 #3 0x7ff5490c35a9 in std::__cxx11::basic_string, std::allocator >::_M_destroy(unsigned long) /mnt/source/gcc/build-7.5.0/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/basic_string.h:226 #4 0x7ff5490c35a9 in std::__cxx11::basic_string, std::allocator >::reserve(unsigned long) /mnt/source/gcc/build-7.5.0/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/basic_string.tcc:302 previously allocated by thread T112 here: #0 0x1b6f1e0 in operator new(unsigned long) /mnt/source/llvm/llvm-5.0.1.src-p3/projects/compiler-rt/lib/asan/asan_new_delete.cc:92 #1 0x1b73ece in void std::__cxx11::basic_string, std::allocator >::_M_construct(char const*, char const*, std::forward_iterator_tag) /data/jenkins/workspace/impala-cdpd-master-staging-core-asan/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/basic_string.tcc:219:14 #2 0x7ff5490c5994 in void std::__cxx11::basic_string, std::allocator >::_M_construct_aux(char const*, char const*, std::__false_type)
[jira] [Commented] (IMPALA-10259) Hit DCHECK in TestImpalaShell.test_completed_query_errors_2
[ https://issues.apache.org/jira/browse/IMPALA-10259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17216725#comment-17216725 ] Quanlong Huang commented on IMPALA-10259: - [~wzhou], assign this to you since you resolved IMPALA-10050. Feel free to mark it as duplicated or reassign it. Thanks! > Hit DCHECK in TestImpalaShell.test_completed_query_errors_2 > --- > > Key: IMPALA-10259 > URL: https://issues.apache.org/jira/browse/IMPALA-10259 > Project: IMPALA > Issue Type: Bug >Reporter: Quanlong Huang >Assignee: Wenzhe Zhou >Priority: Critical > > TestImpalaShell.test_completed_query_errors_2 hits a DCHECK in a core ASAN > build: > {code:java} > F1016 17:08:54.728466 19955 query-state.cc:877] > 924f4ce603ac07bb:a08656e3] Check failed: is_cancelled_.Load() == 1 (0 > vs. 1) {code} > The test is: > {code:java} > shell.test_shell_commandline.TestImpalaShell.test_completed_query_errors_2[table_format_and_file_extension: > ('textfile', '.txt') | protocol: hs2] {code} > The query is: > {code:java} > I1016 17:08:49.026532 19947 Frontend.java:1522] > 924f4ce603ac07bb:a08656e3] Analyzing query: select id, cnt from > functional_parquet.bad_column_metadata t, (select 1 cnt) u db: default {code} > Query options: > {code:java} > I1016 17:08:49.020670 19947 impala-hs2-server.cc:269] > TClientRequest.queryOptions: TQueryOptions { > 01: abort_on_error (bool) = true, > 02: max_errors (i32) = 100, > 03: disable_codegen (bool) = false, > 04: batch_size (i32) = 0, > 05: num_nodes (i32) = 0, > 06: max_scan_range_length (i64) = 0, > 07: num_scanner_threads (i32) = 0, > 11: debug_action (string) = "", > 12: mem_limit (i64) = 0, > 15: hbase_caching (i32) = 0, > 16: hbase_cache_blocks (bool) = false, > 17: parquet_file_size (i64) = 0, > 18: explain_level (i32) = 1, > 19: sync_ddl (bool) = false, > 24: disable_outermost_topn (bool) = false, > 26: query_timeout_s (i32) = 0, > 28: appx_count_distinct (bool) = false, > 29: disable_unsafe_spills (bool) = false, > 31: exec_single_node_rows_threshold (i32) = 100, > 32: optimize_partition_key_scans (bool) = false, > 33: replica_preference (i32) = 0, > 34: schedule_random_replica (bool) = false, > 36: disable_streaming_preaggregations (bool) = false, > 37: runtime_filter_mode (i32) = 2, > 38: runtime_bloom_filter_size (i32) = 1048576, > 39: runtime_filter_wait_time_ms (i32) = 0, > 40: disable_row_runtime_filtering (bool) = false, > 41: max_num_runtime_filters (i32) = 10, > 42: parquet_annotate_strings_utf8 (bool) = false, > 43: parquet_fallback_schema_resolution (i32) = 0, > 45: s3_skip_insert_staging (bool) = true, > 46: runtime_filter_min_size (i32) = 1048576, > 47: runtime_filter_max_size (i32) = 16777216, > 48: prefetch_mode (i32) = 1, > 49: strict_mode (bool) = false, > 50: scratch_limit (i64) = -1, > 51: enable_expr_rewrites (bool) = true, > 52: decimal_v2 (bool) = true, > 53: parquet_dictionary_filtering (bool) = true, > 54: parquet_array_resolution (i32) = 0, > 55: parquet_read_statistics (bool) = true, > 56: default_join_distribution_mode (i32) = 0, > 57: disable_codegen_rows_threshold (i32) = 5, > 58: default_spillable_buffer_size (i64) = 2097152, > 59: min_spillable_buffer_size (i64) = 65536, > 60: max_row_size (i64) = 524288, > 61: idle_session_timeout (i32) = 0, > 62: compute_stats_min_sample_size (i64) = 1073741824, > 63: exec_time_limit_s (i32) = 0, > 64: shuffle_distinct_exprs (bool) = true, > 65: max_mem_estimate_for_admission (i64) = 0, > 66: thread_reservation_limit (i32) = 3000, > 67: thread_reservation_aggregate_limit (i32) = 0, > 68: kudu_read_mode (i32) = 0, > 69: allow_erasure_coded_files (bool) = false, > 70: timezone (string) = "", > 71: scan_bytes_limit (i64) = 0, > 72: cpu_limit_s (i64) = 0, > 73: topn_bytes_limit (i64) = 536870912, > 74: client_identifier (string) = "Impala Shell v4.0.0-SNAPSHOT (1e30eec) > built on Fri Oct 16 13:26:18 PDT 2020", > 75: resource_trace_ratio (double) = 0, > 76: num_remote_executor_candidates (i32) = 3, > 77: num_rows_produced_limit (i64) = 0, > 78: planner_testcase_mode (bool) = false, > 79: default_file_format (i32) = 0, > 80: parquet_timestamp_type (i32) = 0, > 81: parquet_read_page_index (bool) = true, > 82: parquet_write_page_index (bool) = true, > 84: disable_hdfs_num_rows_estimate (bool) = false, > 86: spool_query_results (bool) = false, > 87: default_transactional_type (i32) = 0, > 88: statement_expression_limit (i32) = 25, > 89: max_statement_length_bytes (i32) = 16777216, > 90: disable_data_cache (bool) = false, > 91: max_result_spooling_mem (i64) = 104857600, > 92: max_spilled_result_spooling_mem (i64) = 1073741824, > 93:
[jira] [Created] (IMPALA-10259) Hit DCHECK in TestImpalaShell.test_completed_query_errors_2
Quanlong Huang created IMPALA-10259: --- Summary: Hit DCHECK in TestImpalaShell.test_completed_query_errors_2 Key: IMPALA-10259 URL: https://issues.apache.org/jira/browse/IMPALA-10259 Project: IMPALA Issue Type: Bug Reporter: Quanlong Huang Assignee: Wenzhe Zhou TestImpalaShell.test_completed_query_errors_2 hits a DCHECK in a core ASAN build: {code:java} F1016 17:08:54.728466 19955 query-state.cc:877] 924f4ce603ac07bb:a08656e3] Check failed: is_cancelled_.Load() == 1 (0 vs. 1) {code} The test is: {code:java} shell.test_shell_commandline.TestImpalaShell.test_completed_query_errors_2[table_format_and_file_extension: ('textfile', '.txt') | protocol: hs2] {code} The query is: {code:java} I1016 17:08:49.026532 19947 Frontend.java:1522] 924f4ce603ac07bb:a08656e3] Analyzing query: select id, cnt from functional_parquet.bad_column_metadata t, (select 1 cnt) u db: default {code} Query options: {code:java} I1016 17:08:49.020670 19947 impala-hs2-server.cc:269] TClientRequest.queryOptions: TQueryOptions { 01: abort_on_error (bool) = true, 02: max_errors (i32) = 100, 03: disable_codegen (bool) = false, 04: batch_size (i32) = 0, 05: num_nodes (i32) = 0, 06: max_scan_range_length (i64) = 0, 07: num_scanner_threads (i32) = 0, 11: debug_action (string) = "", 12: mem_limit (i64) = 0, 15: hbase_caching (i32) = 0, 16: hbase_cache_blocks (bool) = false, 17: parquet_file_size (i64) = 0, 18: explain_level (i32) = 1, 19: sync_ddl (bool) = false, 24: disable_outermost_topn (bool) = false, 26: query_timeout_s (i32) = 0, 28: appx_count_distinct (bool) = false, 29: disable_unsafe_spills (bool) = false, 31: exec_single_node_rows_threshold (i32) = 100, 32: optimize_partition_key_scans (bool) = false, 33: replica_preference (i32) = 0, 34: schedule_random_replica (bool) = false, 36: disable_streaming_preaggregations (bool) = false, 37: runtime_filter_mode (i32) = 2, 38: runtime_bloom_filter_size (i32) = 1048576, 39: runtime_filter_wait_time_ms (i32) = 0, 40: disable_row_runtime_filtering (bool) = false, 41: max_num_runtime_filters (i32) = 10, 42: parquet_annotate_strings_utf8 (bool) = false, 43: parquet_fallback_schema_resolution (i32) = 0, 45: s3_skip_insert_staging (bool) = true, 46: runtime_filter_min_size (i32) = 1048576, 47: runtime_filter_max_size (i32) = 16777216, 48: prefetch_mode (i32) = 1, 49: strict_mode (bool) = false, 50: scratch_limit (i64) = -1, 51: enable_expr_rewrites (bool) = true, 52: decimal_v2 (bool) = true, 53: parquet_dictionary_filtering (bool) = true, 54: parquet_array_resolution (i32) = 0, 55: parquet_read_statistics (bool) = true, 56: default_join_distribution_mode (i32) = 0, 57: disable_codegen_rows_threshold (i32) = 5, 58: default_spillable_buffer_size (i64) = 2097152, 59: min_spillable_buffer_size (i64) = 65536, 60: max_row_size (i64) = 524288, 61: idle_session_timeout (i32) = 0, 62: compute_stats_min_sample_size (i64) = 1073741824, 63: exec_time_limit_s (i32) = 0, 64: shuffle_distinct_exprs (bool) = true, 65: max_mem_estimate_for_admission (i64) = 0, 66: thread_reservation_limit (i32) = 3000, 67: thread_reservation_aggregate_limit (i32) = 0, 68: kudu_read_mode (i32) = 0, 69: allow_erasure_coded_files (bool) = false, 70: timezone (string) = "", 71: scan_bytes_limit (i64) = 0, 72: cpu_limit_s (i64) = 0, 73: topn_bytes_limit (i64) = 536870912, 74: client_identifier (string) = "Impala Shell v4.0.0-SNAPSHOT (1e30eec) built on Fri Oct 16 13:26:18 PDT 2020", 75: resource_trace_ratio (double) = 0, 76: num_remote_executor_candidates (i32) = 3, 77: num_rows_produced_limit (i64) = 0, 78: planner_testcase_mode (bool) = false, 79: default_file_format (i32) = 0, 80: parquet_timestamp_type (i32) = 0, 81: parquet_read_page_index (bool) = true, 82: parquet_write_page_index (bool) = true, 84: disable_hdfs_num_rows_estimate (bool) = false, 86: spool_query_results (bool) = false, 87: default_transactional_type (i32) = 0, 88: statement_expression_limit (i32) = 25, 89: max_statement_length_bytes (i32) = 16777216, 90: disable_data_cache (bool) = false, 91: max_result_spooling_mem (i64) = 104857600, 92: max_spilled_result_spooling_mem (i64) = 1073741824, 93: disable_hbase_num_rows_estimate (bool) = false, 94: fetch_rows_timeout_ms (i64) = 1, 95: now_string (string) = "", 96: parquet_object_store_split_size (i64) = 268435456, 97: mem_limit_executors (i64) = 0, 98: broadcast_bytes_limit (i64) = 34359738368, 99: preagg_bytes_limit (i64) = -1, 100: enable_cnf_rewrites (bool) = true, 101: max_cnf_exprs (i32) = 0, 102: kudu_snapshot_read_timestamp_micros (i64) = 0, 103: retry_failed_queries (bool) = false, 104: enabled_runtime_filter_types (i32) = 3, 105: async_codegen (bool) = false, 106:
[jira] [Commented] (IMPALA-10247) Data loading of functional-query ORC fails with EOFException
[ https://issues.apache.org/jira/browse/IMPALA-10247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17216717#comment-17216717 ] Peter Vary commented on IMPALA-10247: - Maybe [~kuczoram] could be of more help if this is happening in direct insert path? I see this in the log: {code:java} Caused by: java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:392) at org.apache.hadoop.hive.ql.exec.Utilities.handleDirectInsertTableFinalPath(Utilities.java:4587) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.jobCloseOp(FileSinkOperator.java:1462) ... 29 more {code} > Data loading of functional-query ORC fails with EOFException > > > Key: IMPALA-10247 > URL: https://issues.apache.org/jira/browse/IMPALA-10247 > Project: IMPALA > Issue Type: Bug >Reporter: Quanlong Huang >Assignee: Zoltán Borók-Nagy >Priority: Critical > > Data loading of functional-query on ORC tables occasionally fails with > {code:java} > 16:41:21 Loading custom schemas (logging to > /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/data_loading/load-custom-schemas.log)... > > 16:41:24 Loading custom schemas OK (Took: 0 min 4 sec) > 16:41:24 Started Loading functional-query data in background; pid 23644. > 16:41:24 Started Loading TPC-H data in background; pid 23645. > 16:41:24 Loading functional-query data (logging to > /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/data_loading/load-functional-query.log)... > > 16:41:24 Started Loading TPC-DS data in background; pid 23646. > 16:41:24 Loading TPC-H data (logging to > /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/data_loading/load-tpch.log)... > > 16:41:24 Loading TPC-DS data (logging to > /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/data_loading/load-tpcds.log)... > > 16:48:51 Loading workload 'tpch' using exploration strategy 'core' OK > (Took: 7 min 27 sec) > 16:50:53 FAILED (Took: 9 min 29 sec) > 16:50:53 'load-data functional-query exhaustive' failed. Tail of log: > {code} > This looks similar to IMPALA-9923 but have a different error stacktrace: > {code:java} > 2020-10-13T16:50:50,369 INFO [HiveServer2-Background-Pool: Thread-23853] > ql.Driver: Executing > command(queryId=jenkins_20201013165050_5dc3d632-a5c3-4f85-b2d3-8c1dc6682322): > INSERT OVERWRITE TABLE tpcds_orc_def.web_sales > SELECT * FROM tpcds.web_sales > .. > 2020-10-13T16:50:53,423 INFO [HiveServer2-Background-Pool: Thread-23832] > FileOperations: Reading manifest > hdfs://localhost:20500/test-warehouse/managed/jointbl_orc_def/_tmp.base_001_0/00_0.manifest > 2020-10-13T16:50:53,423 INFO [HiveServer2-Background-Pool: Thread-23832] > FileOperations: Reading manifest > hdfs://localhost:20500/test-warehouse/managed/jointbl_orc_def/_tmp.base_001_0/00_1.manifest > 2020-10-13T16:50:53,423 INFO [HiveServer2-Background-Pool: Thread-23832] > FileOperations: Looking at manifest file: > hdfs://localhost:20500/test-warehouse/managed/jointbl_orc_def/_tmp.base_001_0/00_0.manifest > 2020-10-13T16:50:53,424 ERROR [HiveServer2-Background-Pool: Thread-23832] > exec.Task: Job Commit failed with exception > 'org.apache.hadoop.hive.ql.metadata.HiveException(java.io.EOFException)' > org.apache.hadoop.hive.ql.metadata.HiveException: java.io.EOFException > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.jobCloseOp(FileSinkOperator.java:1468) > at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:798) > at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:803) > at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:803) > at org.apache.hadoop.hive.ql.exec.tez.TezTask.close(TezTask.java:627) > at > org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:342) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) > at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:357) > at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330) > at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246) > at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:488) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:482) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) > at >
[jira] [Resolved] (IMPALA-10248) TestKuduOperations.test_column_storage_attributes on exhaustive tests
[ https://issues.apache.org/jira/browse/IMPALA-10248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang resolved IMPALA-10248. - Fix Version/s: Impala 4.0 Resolution: Fixed > TestKuduOperations.test_column_storage_attributes on exhaustive tests > - > > Key: IMPALA-10248 > URL: https://issues.apache.org/jira/browse/IMPALA-10248 > Project: IMPALA > Issue Type: Bug >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Blocker > Fix For: Impala 4.0 > > > This is a reverse issue of IMPALA-9513. The failure is > {code:java} > query_test/test_kudu.py:472: in test_column_storage_attributes > assert cursor.fetchall() == \ > E assert [(0, True, 0, 0, 0, 0, ...)] == [(0, True, 0, 0, 0, 0, ...)] > E At index 0 diff: (0, True, 0, 0, 0, 0, 0.0, 0.0, '0', > datetime.datetime(2009, 1, 1, 0, 0), Decimal('0'), datetime.date(2010, 1, 1), > '') != (0, True, 0, 0, 0, 0, 0.0, 0.0, '0', datetime.datetime(2009, 1, 1, 0, > 0), 0, '2010-01-01', '') > E Use -v to get the full diff{code} > The difference to IMPALA-9513 is that it's expected to get a string > '2020-01-01' instead of the actual {{datetime.date(2010, 1, 1)}}. Maybe due > to the recent bumping of impyla version in IMPALA-10225. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10247) Data loading of functional-query ORC fails with EOFException
[ https://issues.apache.org/jira/browse/IMPALA-10247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17216673#comment-17216673 ] Zoltán Borók-Nagy commented on IMPALA-10247: Thanks, Quanlong. [~pvary] could you please take a look? This seems like a Hive issue. > Data loading of functional-query ORC fails with EOFException > > > Key: IMPALA-10247 > URL: https://issues.apache.org/jira/browse/IMPALA-10247 > Project: IMPALA > Issue Type: Bug >Reporter: Quanlong Huang >Assignee: Zoltán Borók-Nagy >Priority: Critical > > Data loading of functional-query on ORC tables occasionally fails with > {code:java} > 16:41:21 Loading custom schemas (logging to > /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/data_loading/load-custom-schemas.log)... > > 16:41:24 Loading custom schemas OK (Took: 0 min 4 sec) > 16:41:24 Started Loading functional-query data in background; pid 23644. > 16:41:24 Started Loading TPC-H data in background; pid 23645. > 16:41:24 Loading functional-query data (logging to > /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/data_loading/load-functional-query.log)... > > 16:41:24 Started Loading TPC-DS data in background; pid 23646. > 16:41:24 Loading TPC-H data (logging to > /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/data_loading/load-tpch.log)... > > 16:41:24 Loading TPC-DS data (logging to > /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/data_loading/load-tpcds.log)... > > 16:48:51 Loading workload 'tpch' using exploration strategy 'core' OK > (Took: 7 min 27 sec) > 16:50:53 FAILED (Took: 9 min 29 sec) > 16:50:53 'load-data functional-query exhaustive' failed. Tail of log: > {code} > This looks similar to IMPALA-9923 but have a different error stacktrace: > {code:java} > 2020-10-13T16:50:50,369 INFO [HiveServer2-Background-Pool: Thread-23853] > ql.Driver: Executing > command(queryId=jenkins_20201013165050_5dc3d632-a5c3-4f85-b2d3-8c1dc6682322): > INSERT OVERWRITE TABLE tpcds_orc_def.web_sales > SELECT * FROM tpcds.web_sales > .. > 2020-10-13T16:50:53,423 INFO [HiveServer2-Background-Pool: Thread-23832] > FileOperations: Reading manifest > hdfs://localhost:20500/test-warehouse/managed/jointbl_orc_def/_tmp.base_001_0/00_0.manifest > 2020-10-13T16:50:53,423 INFO [HiveServer2-Background-Pool: Thread-23832] > FileOperations: Reading manifest > hdfs://localhost:20500/test-warehouse/managed/jointbl_orc_def/_tmp.base_001_0/00_1.manifest > 2020-10-13T16:50:53,423 INFO [HiveServer2-Background-Pool: Thread-23832] > FileOperations: Looking at manifest file: > hdfs://localhost:20500/test-warehouse/managed/jointbl_orc_def/_tmp.base_001_0/00_0.manifest > 2020-10-13T16:50:53,424 ERROR [HiveServer2-Background-Pool: Thread-23832] > exec.Task: Job Commit failed with exception > 'org.apache.hadoop.hive.ql.metadata.HiveException(java.io.EOFException)' > org.apache.hadoop.hive.ql.metadata.HiveException: java.io.EOFException > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.jobCloseOp(FileSinkOperator.java:1468) > at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:798) > at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:803) > at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:803) > at org.apache.hadoop.hive.ql.exec.tez.TezTask.close(TezTask.java:627) > at > org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:342) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) > at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:357) > at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330) > at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246) > at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:488) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:482) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) > at > org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225) > at > org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87) > at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322) > at java.security.AccessController.doPrivileged(Native Method) > at
[jira] [Commented] (IMPALA-10248) TestKuduOperations.test_column_storage_attributes on exhaustive tests
[ https://issues.apache.org/jira/browse/IMPALA-10248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17216649#comment-17216649 ] ASF subversion and git services commented on IMPALA-10248: -- Commit c7f118a860af6b811e2e2c4c5e3693f43429def8 in impala's branch refs/heads/master from stiga-huang [ https://gitbox.apache.org/repos/asf?p=impala.git;h=c7f118a ] IMPALA-10248: Fix test_column_storage_attributes date string errors After IMPALA-10225 bumps the impyla version to 0.17a1, we should expect impyla return a datetime.date instead of a string for DATE type data. Tests: - Run test_column_storage_attributes with --exploration_strategy=exhaustive to verify the fix. Change-Id: I618a759a03213efc22a5e54e9a30fa09e8929023 Reviewed-on: http://gerrit.cloudera.org:8080/16608 Reviewed-by: Tim Armstrong Tested-by: Impala Public Jenkins > TestKuduOperations.test_column_storage_attributes on exhaustive tests > - > > Key: IMPALA-10248 > URL: https://issues.apache.org/jira/browse/IMPALA-10248 > Project: IMPALA > Issue Type: Bug >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Blocker > > This is a reverse issue of IMPALA-9513. The failure is > {code:java} > query_test/test_kudu.py:472: in test_column_storage_attributes > assert cursor.fetchall() == \ > E assert [(0, True, 0, 0, 0, 0, ...)] == [(0, True, 0, 0, 0, 0, ...)] > E At index 0 diff: (0, True, 0, 0, 0, 0, 0.0, 0.0, '0', > datetime.datetime(2009, 1, 1, 0, 0), Decimal('0'), datetime.date(2010, 1, 1), > '') != (0, True, 0, 0, 0, 0, 0.0, 0.0, '0', datetime.datetime(2009, 1, 1, 0, > 0), 0, '2010-01-01', '') > E Use -v to get the full diff{code} > The difference to IMPALA-9513 is that it's expected to get a string > '2020-01-01' instead of the actual {{datetime.date(2010, 1, 1)}}. Maybe due > to the recent bumping of impyla version in IMPALA-10225. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10225) Bump Impyla version
[ https://issues.apache.org/jira/browse/IMPALA-10225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17216650#comment-17216650 ] ASF subversion and git services commented on IMPALA-10225: -- Commit c7f118a860af6b811e2e2c4c5e3693f43429def8 in impala's branch refs/heads/master from stiga-huang [ https://gitbox.apache.org/repos/asf?p=impala.git;h=c7f118a ] IMPALA-10248: Fix test_column_storage_attributes date string errors After IMPALA-10225 bumps the impyla version to 0.17a1, we should expect impyla return a datetime.date instead of a string for DATE type data. Tests: - Run test_column_storage_attributes with --exploration_strategy=exhaustive to verify the fix. Change-Id: I618a759a03213efc22a5e54e9a30fa09e8929023 Reviewed-on: http://gerrit.cloudera.org:8080/16608 Reviewed-by: Tim Armstrong Tested-by: Impala Public Jenkins > Bump Impyla version > --- > > Key: IMPALA-10225 > URL: https://issues.apache.org/jira/browse/IMPALA-10225 > Project: IMPALA > Issue Type: Improvement > Components: Infrastructure >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Major > Fix For: Impala 4.0 > > > There are a couple of new Impyla releases that we can test out in Impala's > end-to-end test environment - https://pypi.org/project/impyla/0.17a1/#history -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-10258) TestQueryRetries.test_original_query_cancel is flaky
[ https://issues.apache.org/jira/browse/IMPALA-10258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang reassigned IMPALA-10258: --- Assignee: Sahil Takiar Assign this to [~stakiar] first since this seems similar to IMPALA-9550. Feel free to reassign it to me. Thanks! > TestQueryRetries.test_original_query_cancel is flaky > - > > Key: IMPALA-10258 > URL: https://issues.apache.org/jira/browse/IMPALA-10258 > Project: IMPALA > Issue Type: Bug >Reporter: Quanlong Huang >Assignee: Sahil Takiar >Priority: Critical > > Saw this fails in a core-s3-data-cache build: > custom_cluster.test_query_retries.TestQueryRetries.test_original_query_cancel > {code:java} > custom_cluster/test_query_retries.py:622: in test_original_query_cancel > self.wait_for_state(handle, self.client.QUERY_STATES[state], 60) > common/impala_test_suite.py:1053: in wait_for_state > self.wait_for_any_state(handle, [expected_state], timeout, client) > common/impala_test_suite.py:1070: in wait_for_any_state > actual_state)) > E Timeout: query 494af68cdf3d8ecb:d3a3bf36 did not reach one of the > expected states [3], last known state 5{code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10258) TestQueryRetries.test_original_query_cancel is flaky
Quanlong Huang created IMPALA-10258: --- Summary: TestQueryRetries.test_original_query_cancel is flaky Key: IMPALA-10258 URL: https://issues.apache.org/jira/browse/IMPALA-10258 Project: IMPALA Issue Type: Bug Reporter: Quanlong Huang Saw this fails in a core-s3-data-cache build: custom_cluster.test_query_retries.TestQueryRetries.test_original_query_cancel {code:java} custom_cluster/test_query_retries.py:622: in test_original_query_cancel self.wait_for_state(handle, self.client.QUERY_STATES[state], 60) common/impala_test_suite.py:1053: in wait_for_state self.wait_for_any_state(handle, [expected_state], timeout, client) common/impala_test_suite.py:1070: in wait_for_any_state actual_state)) E Timeout: query 494af68cdf3d8ecb:d3a3bf36 did not reach one of the expected states [3], last known state 5{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-10257) Hit DCHECK in HdfsParquetScanner::CheckPageFiltering in a CORE S3 build
[ https://issues.apache.org/jira/browse/IMPALA-10257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang reassigned IMPALA-10257: --- Assignee: Zoltán Borók-Nagy Assign to [~boroknagyz] first since you are the expert on parquet page indexing. > Hit DCHECK in HdfsParquetScanner::CheckPageFiltering in a CORE S3 build > --- > > Key: IMPALA-10257 > URL: https://issues.apache.org/jira/browse/IMPALA-10257 > Project: IMPALA > Issue Type: Bug >Reporter: Quanlong Huang >Assignee: Zoltán Borók-Nagy >Priority: Critical > > Saw the crash in a CORE S3 build: > {code:java} > F1018 06:14:23.631407 12990 hdfs-parquet-scanner.cc:1170] > cf49030f4bbe0736:84de19aa0002] Check failed: false > {code} > The query is a tpch-nested query: > {code:java} > I1018 06:14:22.352707 12712 Frontend.java:1522] > cf49030f4bbe0736:84de19aa] Analyzing query: select > l_shipmode, > sum(case > when o_orderpriority = '1-URGENT' > or o_orderpriority = '2-HIGH' > then 1 > else 0 > end) as high_line_count, > sum(case > when o_orderpriority <> '1-URGENT' > and o_orderpriority <> '2-HIGH' > then 1 > else 0 > end) as low_line_count > from > customer.c_orders o, > o.o_lineitems l > where > l_shipmode in ('MAIL', 'SHIP') > and l_commitdate < l_receiptdate > and l_shipdate < l_commitdate > and l_receiptdate >= '1994-01-01' > and l_receiptdate < '1995-01-01' > group by > l_shipmode > order by > l_shipmode db: tpch_nested_parquet > {code} > The test is > {code:java} > query_test.test_tpch_nested_queries.TestTpchNestedQuery.test_tpch_q12[protocol: > beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, > 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, > 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: > parquet/none]{code} > A similar test also failed in the same build: > {code:java} > authorization.test_ranger.TestRangerColumnMaskingTpchNested.test_tpch_nested_column_masking[protocol: > beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, > 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, > 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: > parquet/none]{code} > Backtrace: > {code:java} > #0 0x7f32b548c1f7 in raise () from /lib64/libc.so.6 > #1 0x7f32b548d8e8 in abort () from /lib64/libc.so.6 > #2 0x0521cce4 in google::DumpStackTraceAndExit() () > #3 0x052120dd in google::LogMessage::Fail() () > #4 0x052139cd in google::LogMessage::SendToLog() () > #5 0x05211a3b in google::LogMessage::Flush() () > #6 0x05215639 in google::LogMessageFatal::~LogMessageFatal() () > #7 0x02d87f54 in impala::HdfsParquetScanner::CheckPageFiltering > (this=0xb6f) at > /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/be/src/exec/parquet/hdfs-parquet-scanner.cc:1170 > #8 0x02d87860 in impala::HdfsParquetScanner::AssembleRows > (this=0xb6f, column_readers=..., row_batch=0x10bc95a0, > skip_row_group=0xb6f01d0) at > /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/be/src/exec/parquet/hdfs-parquet-scanner.cc:1150 > #9 0x02d82453 in impala::HdfsParquetScanner::GetNextInternal > (this=0xb6f, row_batch=0x10bc95a0) at > /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/be/src/exec/parquet/hdfs-parquet-scanner.cc:458 > #10 0x02d803e2 in impala::HdfsParquetScanner::ProcessSplit > (this=0xb6f) at > /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/be/src/exec/parquet/hdfs-parquet-scanner.cc:350 > #11 0x02986f4d in impala::HdfsScanNode::ProcessSplit > (this=0x11ade800, filter_ctxs=..., expr_results_pool=0x7f31da200480, > scan_range=0x16a08b20, scanner_thread_reservation=0x7f31da2003a8) at > /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/be/src/exec/hdfs-scan-node.cc:500 > #12 0x029862ce in impala::HdfsScanNode::ScannerThread > (this=0x11ade800, first_thread=true, scanner_thread_reservation=25165824) at > /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/be/src/exec/hdfs-scan-node.cc:418 > #13 0x02985636 in impala::HdfsScanNodeoperator()(void) > const (__closure=0x7f31da200ba8) at > /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/be/src/exec/hdfs-scan-node.cc:339 > #14 0x029879ef in > boost::detail::function::void_function_obj_invoker0, > void>::invoke(boost::detail::function::function_buffer &) > (function_obj_ptr=...) at >
[jira] [Updated] (IMPALA-10257) Hit DCHECK in HdfsParquetScanner::CheckPageFiltering in a CORE S3 build
[ https://issues.apache.org/jira/browse/IMPALA-10257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-10257: Description: Saw the crash in a CORE S3 build: {code:java} F1018 06:14:23.631407 12990 hdfs-parquet-scanner.cc:1170] cf49030f4bbe0736:84de19aa0002] Check failed: false {code} The query is a tpch-nested query: {code:java} I1018 06:14:22.352707 12712 Frontend.java:1522] cf49030f4bbe0736:84de19aa] Analyzing query: select l_shipmode, sum(case when o_orderpriority = '1-URGENT' or o_orderpriority = '2-HIGH' then 1 else 0 end) as high_line_count, sum(case when o_orderpriority <> '1-URGENT' and o_orderpriority <> '2-HIGH' then 1 else 0 end) as low_line_count from customer.c_orders o, o.o_lineitems l where l_shipmode in ('MAIL', 'SHIP') and l_commitdate < l_receiptdate and l_shipdate < l_commitdate and l_receiptdate >= '1994-01-01' and l_receiptdate < '1995-01-01' group by l_shipmode order by l_shipmode db: tpch_nested_parquet {code} The test is {code:java} query_test.test_tpch_nested_queries.TestTpchNestedQuery.test_tpch_q12[protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: parquet/none]{code} A similar test also failed in the same build: {code:java} authorization.test_ranger.TestRangerColumnMaskingTpchNested.test_tpch_nested_column_masking[protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: parquet/none]{code} Backtrace: {code:java} #0 0x7f32b548c1f7 in raise () from /lib64/libc.so.6 #1 0x7f32b548d8e8 in abort () from /lib64/libc.so.6 #2 0x0521cce4 in google::DumpStackTraceAndExit() () #3 0x052120dd in google::LogMessage::Fail() () #4 0x052139cd in google::LogMessage::SendToLog() () #5 0x05211a3b in google::LogMessage::Flush() () #6 0x05215639 in google::LogMessageFatal::~LogMessageFatal() () #7 0x02d87f54 in impala::HdfsParquetScanner::CheckPageFiltering (this=0xb6f) at /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/be/src/exec/parquet/hdfs-parquet-scanner.cc:1170 #8 0x02d87860 in impala::HdfsParquetScanner::AssembleRows (this=0xb6f, column_readers=..., row_batch=0x10bc95a0, skip_row_group=0xb6f01d0) at /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/be/src/exec/parquet/hdfs-parquet-scanner.cc:1150 #9 0x02d82453 in impala::HdfsParquetScanner::GetNextInternal (this=0xb6f, row_batch=0x10bc95a0) at /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/be/src/exec/parquet/hdfs-parquet-scanner.cc:458 #10 0x02d803e2 in impala::HdfsParquetScanner::ProcessSplit (this=0xb6f) at /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/be/src/exec/parquet/hdfs-parquet-scanner.cc:350 #11 0x02986f4d in impala::HdfsScanNode::ProcessSplit (this=0x11ade800, filter_ctxs=..., expr_results_pool=0x7f31da200480, scan_range=0x16a08b20, scanner_thread_reservation=0x7f31da2003a8) at /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/be/src/exec/hdfs-scan-node.cc:500 #12 0x029862ce in impala::HdfsScanNode::ScannerThread (this=0x11ade800, first_thread=true, scanner_thread_reservation=25165824) at /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/be/src/exec/hdfs-scan-node.cc:418 #13 0x02985636 in impala::HdfsScanNodeoperator()(void) const (__closure=0x7f31da200ba8) at /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/be/src/exec/hdfs-scan-node.cc:339 #14 0x029879ef in boost::detail::function::void_function_obj_invoker0, void>::invoke(boost::detail::function::function_buffer &) (function_obj_ptr=...) at /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:159 #15 0x021467d6 in boost::function0::operator() (this=0x7f31da200ba0) at /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:770 #16 0x02727552 in impala::Thread::SuperviseThread(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&, boost::function, impala::ThreadDebugInfo const*, impala::Promise*) (name=..., category=..., functor=..., parent_thread_info=0x7f31dc204840, thread_started=0x7f31dc202e70) at
[jira] [Created] (IMPALA-10257) Hit DCHECK in HdfsParquetScanner::CheckPageFiltering in a CORE S3 build
Quanlong Huang created IMPALA-10257: --- Summary: Hit DCHECK in HdfsParquetScanner::CheckPageFiltering in a CORE S3 build Key: IMPALA-10257 URL: https://issues.apache.org/jira/browse/IMPALA-10257 Project: IMPALA Issue Type: Bug Reporter: Quanlong Huang Saw the crash in a CORE S3 build: {code}F1018 08:41:48.955114 27641 hdfs-parquet-scanner.cc:1170] ed47e522687c15e8:f07974d10002] Check failed: false {code} Backtrace: {code} #0 0x7f32b548c1f7 in raise () from /lib64/libc.so.6 #1 0x7f32b548d8e8 in abort () from /lib64/libc.so.6 #2 0x0521cce4 in google::DumpStackTraceAndExit() () #3 0x052120dd in google::LogMessage::Fail() () #4 0x052139cd in google::LogMessage::SendToLog() () #5 0x05211a3b in google::LogMessage::Flush() () #6 0x05215639 in google::LogMessageFatal::~LogMessageFatal() () #7 0x02d87f54 in impala::HdfsParquetScanner::CheckPageFiltering (this=0xb6f) at /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/be/src/exec/parquet/hdfs-parquet-scanner.cc:1170 #8 0x02d87860 in impala::HdfsParquetScanner::AssembleRows (this=0xb6f, column_readers=..., row_batch=0x10bc95a0, skip_row_group=0xb6f01d0) at /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/be/src/exec/parquet/hdfs-parquet-scanner.cc:1150 #9 0x02d82453 in impala::HdfsParquetScanner::GetNextInternal (this=0xb6f, row_batch=0x10bc95a0) at /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/be/src/exec/parquet/hdfs-parquet-scanner.cc:458 #10 0x02d803e2 in impala::HdfsParquetScanner::ProcessSplit (this=0xb6f) at /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/be/src/exec/parquet/hdfs-parquet-scanner.cc:350 #11 0x02986f4d in impala::HdfsScanNode::ProcessSplit (this=0x11ade800, filter_ctxs=..., expr_results_pool=0x7f31da200480, scan_range=0x16a08b20, scanner_thread_reservation=0x7f31da2003a8) at /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/be/src/exec/hdfs-scan-node.cc:500 #12 0x029862ce in impala::HdfsScanNode::ScannerThread (this=0x11ade800, first_thread=true, scanner_thread_reservation=25165824) at /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/be/src/exec/hdfs-scan-node.cc:418 #13 0x02985636 in impala::HdfsScanNodeoperator()(void) const (__closure=0x7f31da200ba8) at /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/be/src/exec/hdfs-scan-node.cc:339 #14 0x029879ef in boost::detail::function::void_function_obj_invoker0, void>::invoke(boost::detail::function::function_buffer &) (function_obj_ptr=...) at /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:159 #15 0x021467d6 in boost::function0::operator() (this=0x7f31da200ba0) at /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:770 #16 0x02727552 in impala::Thread::SuperviseThread(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&, boost::function, impala::ThreadDebugInfo const*, impala::Promise*) (name=..., category=..., functor=..., parent_thread_info=0x7f31dc204840, thread_started=0x7f31dc202e70) at /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/be/src/util/thread.cc:360 #17 0x0272f4ef in boost::_bi::list5, std::allocator > >, boost::_bi::value, std::allocator > >, boost::_bi::value >, boost::_bi::value, boost::_bi::value*> >::operator(), std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&, boost::function, impala::ThreadDebugInfo const*, impala::Promise*), boost::_bi::list0>(boost::_bi::type, void (*&)(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&, boost::function, impala::ThreadDebugInfo const*, impala::Promise*), boost::_bi::list0&, int) (this=0x15915340, f=@0x15915338: 0x272720c , std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&, boost::function, impala::ThreadDebugInfo const*, impala::Promise*)>, a=...) at /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/bind.hpp:531 #18 0x0272f413 in boost::_bi::bind_t, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&, boost::function, impala::ThreadDebugInfo const*, impala::Promise*), boost::_bi::list5, std::allocator > >, boost::_bi::value, std::allocator > >, boost::_bi::value >, boost::_bi::value, boost::_bi::value*> >
[jira] [Created] (IMPALA-10256) TestDisableFeatures.test_disable_incremental_metadata_updates fails
Quanlong Huang created IMPALA-10256: --- Summary: TestDisableFeatures.test_disable_incremental_metadata_updates fails Key: IMPALA-10256 URL: https://issues.apache.org/jira/browse/IMPALA-10256 Project: IMPALA Issue Type: Bug Reporter: Quanlong Huang Assignee: Quanlong Huang Saw test failures in internal CORE builds: custom_cluster.test_disable_features.TestDisableFeatures.test_disable_incremental_metadata_updates[protocol: beeswax | exec_option: \{'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: text/none-unique_database0] {code:java} custom_cluster/test_disable_features.py:45: in test_disable_incremental_metadata_updates use_db=unique_database, multiple_impalad=True) common/impala_test_suite.py:662: in run_test_case result = exec_fn(query, user=test_section.get('USER', '').strip() or None) common/impala_test_suite.py:600: in __exec_in_impala result = self.__execute_query(target_impalad_client, query, user=user) common/impala_test_suite.py:909: in __execute_query return impalad_client.execute(query, user=user) common/impala_connection.py:205: in execute return self.__beeswax_client.execute(sql_stmt, user=user) beeswax/impala_beeswax.py:187: in execute handle = self.__execute_query(query_string.strip(), user=user) beeswax/impala_beeswax.py:363: in __execute_query handle = self.execute_query_async(query_string, user=user) beeswax/impala_beeswax.py:357: in execute_query_async handle = self.__do_rpc(lambda: self.imp_service.query(query,)) beeswax/impala_beeswax.py:520: in __do_rpc raise ImpalaBeeswaxException(self.__build_error_message(b), b) E ImpalaBeeswaxException: ImpalaBeeswaxException: EINNER EXCEPTION: EMESSAGE: AnalysisException: The specified cache pool does not exist: testPool {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10256) TestDisableFeatures.test_disable_incremental_metadata_updates fails
[ https://issues.apache.org/jira/browse/IMPALA-10256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-10256: Priority: Blocker (was: Major) > TestDisableFeatures.test_disable_incremental_metadata_updates fails > --- > > Key: IMPALA-10256 > URL: https://issues.apache.org/jira/browse/IMPALA-10256 > Project: IMPALA > Issue Type: Bug >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Blocker > > Saw test failures in internal CORE builds: > custom_cluster.test_disable_features.TestDisableFeatures.test_disable_incremental_metadata_updates[protocol: > beeswax | exec_option: \{'batch_size': 0, 'num_nodes': 0, > 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, > 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: > text/none-unique_database0] > {code:java} > custom_cluster/test_disable_features.py:45: in > test_disable_incremental_metadata_updates > use_db=unique_database, multiple_impalad=True) > common/impala_test_suite.py:662: in run_test_case > result = exec_fn(query, user=test_section.get('USER', '').strip() or None) > common/impala_test_suite.py:600: in __exec_in_impala > result = self.__execute_query(target_impalad_client, query, user=user) > common/impala_test_suite.py:909: in __execute_query > return impalad_client.execute(query, user=user) > common/impala_connection.py:205: in execute > return self.__beeswax_client.execute(sql_stmt, user=user) > beeswax/impala_beeswax.py:187: in execute > handle = self.__execute_query(query_string.strip(), user=user) > beeswax/impala_beeswax.py:363: in __execute_query > handle = self.execute_query_async(query_string, user=user) > beeswax/impala_beeswax.py:357: in execute_query_async > handle = self.__do_rpc(lambda: self.imp_service.query(query,)) > beeswax/impala_beeswax.py:520: in __do_rpc > raise ImpalaBeeswaxException(self.__build_error_message(b), b) > E ImpalaBeeswaxException: ImpalaBeeswaxException: > EINNER EXCEPTION: > EMESSAGE: AnalysisException: The specified cache pool does not exist: > testPool {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-10250) TestNestedTypes.test_scanner_position fails in an ASAN test
[ https://issues.apache.org/jira/browse/IMPALA-10250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang reassigned IMPALA-10250: --- Assignee: Quanlong Huang > TestNestedTypes.test_scanner_position fails in an ASAN test > --- > > Key: IMPALA-10250 > URL: https://issues.apache.org/jira/browse/IMPALA-10250 > Project: IMPALA > Issue Type: Bug >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Critical > > TestNestedTypes.test_scanner_position fails in a CORE ASAN job: > {code:java} > query_test.test_nested_types.TestNestedTypes.test_scanner_position[mt_dop: 0 > | protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, > 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, > 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: > orc/def/block] > query_test.test_nested_types.TestNestedTypes.test_scanner_position[mt_dop: 2 > | protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, > 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, > 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: > orc/def/block] {code} > The stacktrace are the same: > {code:java} > query_test/test_nested_types.py:76: in test_scanner_position > self.run_test_case('QueryTest/nested-types-scanner-position', vector) > common/impala_test_suite.py:693: in run_test_case > self.__verify_results_and_errors(vector, test_section, result, use_db) > common/impala_test_suite.py:529: in __verify_results_and_errors > replace_filenames_with_placeholder) > common/test_result_verifier.py:456: in verify_raw_results > VERIFIER_MAP[verifier](expected, actual) > common/test_result_verifier.py:278: in verify_query_result_is_equal > assert expected_results == actual_results > E assert Comparing QueryTestResults (expected vs actual): > E 0,-1,7300 != 0,-1,9366 > E 0,1,7300 != 0,1,9800 > E 0,NULL,7300 != 0,NULL,9796 > E 1,1,7300 != 1,1,9796 > E 1,2,7300 != 1,2,9800 > E 2,2,7300 != 2,2,9796 > E 2,3,7300 != 2,3,9800 > E 3,NULL,7300 != 3,NULL,9796 > E 4,3,7300 != 4,3,9796 > E 5,NULL,7300 != 5,NULL,9796 {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-10247) Data loading of functional-query ORC fails with EOFException
[ https://issues.apache.org/jira/browse/IMPALA-10247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang reassigned IMPALA-10247: --- Assignee: Zoltán Borók-Nagy Assign to [~boroknagyz] first since this looks like a variant of IMPALA-9923. > Data loading of functional-query ORC fails with EOFException > > > Key: IMPALA-10247 > URL: https://issues.apache.org/jira/browse/IMPALA-10247 > Project: IMPALA > Issue Type: Bug >Reporter: Quanlong Huang >Assignee: Zoltán Borók-Nagy >Priority: Critical > > Data loading of functional-query on ORC tables occasionally fails with > {code:java} > 16:41:21 Loading custom schemas (logging to > /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/data_loading/load-custom-schemas.log)... > > 16:41:24 Loading custom schemas OK (Took: 0 min 4 sec) > 16:41:24 Started Loading functional-query data in background; pid 23644. > 16:41:24 Started Loading TPC-H data in background; pid 23645. > 16:41:24 Loading functional-query data (logging to > /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/data_loading/load-functional-query.log)... > > 16:41:24 Started Loading TPC-DS data in background; pid 23646. > 16:41:24 Loading TPC-H data (logging to > /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/data_loading/load-tpch.log)... > > 16:41:24 Loading TPC-DS data (logging to > /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/data_loading/load-tpcds.log)... > > 16:48:51 Loading workload 'tpch' using exploration strategy 'core' OK > (Took: 7 min 27 sec) > 16:50:53 FAILED (Took: 9 min 29 sec) > 16:50:53 'load-data functional-query exhaustive' failed. Tail of log: > {code} > This looks similar to IMPALA-9923 but have a different error stacktrace: > {code:java} > 2020-10-13T16:50:50,369 INFO [HiveServer2-Background-Pool: Thread-23853] > ql.Driver: Executing > command(queryId=jenkins_20201013165050_5dc3d632-a5c3-4f85-b2d3-8c1dc6682322): > INSERT OVERWRITE TABLE tpcds_orc_def.web_sales > SELECT * FROM tpcds.web_sales > .. > 2020-10-13T16:50:53,423 INFO [HiveServer2-Background-Pool: Thread-23832] > FileOperations: Reading manifest > hdfs://localhost:20500/test-warehouse/managed/jointbl_orc_def/_tmp.base_001_0/00_0.manifest > 2020-10-13T16:50:53,423 INFO [HiveServer2-Background-Pool: Thread-23832] > FileOperations: Reading manifest > hdfs://localhost:20500/test-warehouse/managed/jointbl_orc_def/_tmp.base_001_0/00_1.manifest > 2020-10-13T16:50:53,423 INFO [HiveServer2-Background-Pool: Thread-23832] > FileOperations: Looking at manifest file: > hdfs://localhost:20500/test-warehouse/managed/jointbl_orc_def/_tmp.base_001_0/00_0.manifest > 2020-10-13T16:50:53,424 ERROR [HiveServer2-Background-Pool: Thread-23832] > exec.Task: Job Commit failed with exception > 'org.apache.hadoop.hive.ql.metadata.HiveException(java.io.EOFException)' > org.apache.hadoop.hive.ql.metadata.HiveException: java.io.EOFException > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.jobCloseOp(FileSinkOperator.java:1468) > at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:798) > at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:803) > at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:803) > at org.apache.hadoop.hive.ql.exec.tez.TezTask.close(TezTask.java:627) > at > org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:342) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) > at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:357) > at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330) > at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246) > at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:488) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:482) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) > at > org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225) > at > org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87) > at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322) > at java.security.AccessController.doPrivileged(Native Method) > at
[jira] [Updated] (IMPALA-10250) TestNestedTypes.test_scanner_position fails in an ASAN test
[ https://issues.apache.org/jira/browse/IMPALA-10250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-10250: Description: TestNestedTypes.test_scanner_position fails in a CORE ASAN job: {code:java} query_test.test_nested_types.TestNestedTypes.test_scanner_position[mt_dop: 0 | protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: orc/def/block] query_test.test_nested_types.TestNestedTypes.test_scanner_position[mt_dop: 2 | protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: orc/def/block] {code} The stacktrace are the same: {code:java} query_test/test_nested_types.py:76: in test_scanner_position self.run_test_case('QueryTest/nested-types-scanner-position', vector) common/impala_test_suite.py:693: in run_test_case self.__verify_results_and_errors(vector, test_section, result, use_db) common/impala_test_suite.py:529: in __verify_results_and_errors replace_filenames_with_placeholder) common/test_result_verifier.py:456: in verify_raw_results VERIFIER_MAP[verifier](expected, actual) common/test_result_verifier.py:278: in verify_query_result_is_equal assert expected_results == actual_results E assert Comparing QueryTestResults (expected vs actual): E 0,-1,7300 != 0,-1,9366 E 0,1,7300 != 0,1,9800 E 0,NULL,7300 != 0,NULL,9796 E 1,1,7300 != 1,1,9796 E 1,2,7300 != 1,2,9800 E 2,2,7300 != 2,2,9796 E 2,3,7300 != 2,3,9800 E 3,NULL,7300 != 3,NULL,9796 E 4,3,7300 != 4,3,9796 E 5,NULL,7300 != 5,NULL,9796 {code} was: TestNestedTypes.test_scanner_position fails in a CORE ASAN job: {code:java} query_test.test_nested_types.TestNestedTypes.test_scanner_position[mt_dop: 0 | protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: orc/def/block]query_test.test_nested_types.TestNestedTypes.test_scanner_position[mt_dop: 2 | protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: orc/def/block] {code} The stacktrace are the same: {code:java} query_test/test_nested_types.py:76: in test_scanner_position self.run_test_case('QueryTest/nested-types-scanner-position', vector) common/impala_test_suite.py:693: in run_test_case self.__verify_results_and_errors(vector, test_section, result, use_db) common/impala_test_suite.py:529: in __verify_results_and_errors replace_filenames_with_placeholder) common/test_result_verifier.py:456: in verify_raw_results VERIFIER_MAP[verifier](expected, actual) common/test_result_verifier.py:278: in verify_query_result_is_equal assert expected_results == actual_results E assert Comparing QueryTestResults (expected vs actual): E 0,-1,7300 != 0,-1,9366 E 0,1,7300 != 0,1,9800 E 0,NULL,7300 != 0,NULL,9796 E 1,1,7300 != 1,1,9796 E 1,2,7300 != 1,2,9800 E 2,2,7300 != 2,2,9796 E 2,3,7300 != 2,3,9800 E 3,NULL,7300 != 3,NULL,9796 E 4,3,7300 != 4,3,9796 E 5,NULL,7300 != 5,NULL,9796 {code} > TestNestedTypes.test_scanner_position fails in an ASAN test > --- > > Key: IMPALA-10250 > URL: https://issues.apache.org/jira/browse/IMPALA-10250 > Project: IMPALA > Issue Type: Bug >Reporter: Quanlong Huang >Priority: Critical > > TestNestedTypes.test_scanner_position fails in a CORE ASAN job: > {code:java} > query_test.test_nested_types.TestNestedTypes.test_scanner_position[mt_dop: 0 > | protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, > 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, > 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: > orc/def/block] > query_test.test_nested_types.TestNestedTypes.test_scanner_position[mt_dop: 2 > | protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, > 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, > 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: > orc/def/block] {code} > The stacktrace are the same: > {code:java} > query_test/test_nested_types.py:76: in test_scanner_position > self.run_test_case('QueryTest/nested-types-scanner-position', vector) > common/impala_test_suite.py:693: in run_test_case > self.__verify_results_and_errors(vector, test_section, result, use_db) > common/impala_test_suite.py:529: in
[jira] [Updated] (IMPALA-10254) Load data files via Iceberg for Iceberg Tables
[ https://issues.apache.org/jira/browse/IMPALA-10254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy updated IMPALA-10254: --- Description: Currently we still load the file descriptors of an Iceberg table via recursive file listing. This lists too many files, e.g. metadata files, files that are being written (can later throw checksum errors), files from aborted INSERTs, removed files, etc. We should use the Iceberg API to load the file descriptors corresponding to the table snapshot. Note that we already load data files through the Iceberg APIs to fill the 'path_hash_to_file_descriptor' map ([https://github.com/apache/impala/blob/master/common/thrift/CatalogObjects.thrift#L551).] was: Currently we still load the file descriptors of an Iceberg table via recursive file listing. This lists too many files, e.g. metadata files, files that are being written (can later throw checksum errors), files from aborted INSERTs, removed files, etc. We should use the Iceberg API to load the file descriptors corresponding to the table snapshot. > Load data files via Iceberg for Iceberg Tables > -- > > Key: IMPALA-10254 > URL: https://issues.apache.org/jira/browse/IMPALA-10254 > Project: IMPALA > Issue Type: Bug >Reporter: Zoltán Borók-Nagy >Priority: Major > Labels: impala-iceberg > > Currently we still load the file descriptors of an Iceberg table via > recursive file listing. > This lists too many files, e.g. metadata files, files that are being written > (can later throw checksum errors), files from aborted INSERTs, removed files, > etc. > We should use the Iceberg API to load the file descriptors corresponding to > the table snapshot. > Note that we already load data files through the Iceberg APIs to fill the > 'path_hash_to_file_descriptor' map > ([https://github.com/apache/impala/blob/master/common/thrift/CatalogObjects.thrift#L551).] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Comment Edited] (IMPALA-9884) TestAdmissionControllerStress.test_mem_limit failing occasionally
[ https://issues.apache.org/jira/browse/IMPALA-9884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17216561#comment-17216561 ] Quanlong Huang edited comment on IMPALA-9884 at 10/19/20, 9:24 AM: --- Saw this again in an internal exhaustive build. custom_cluster.test_admission_controller.TestAdmissionControllerStress.test_mem_limit[num_queries: 50 | protocol: beeswax | table_format: text/none | exec_option: \{'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | submission_delay_ms: 150 | round_robin_submission: False] {code:java} custom_cluster/test_admission_controller.py:1856: in test_mem_limit {'request_pool': self.pool_name, 'mem_limit': query_mem_limit}) custom_cluster/test_admission_controller.py:1712: in run_admission_test assert metric_deltas['dequeued'] == 0,\ E AssertionError: Queued queries should not run until others are made to finish E assert 1 == 0 {code} was (Author: stiga-huang): Saw this again in an internal exhaustive build. > TestAdmissionControllerStress.test_mem_limit failing occasionally > - > > Key: IMPALA-9884 > URL: https://issues.apache.org/jira/browse/IMPALA-9884 > Project: IMPALA > Issue Type: Bug >Affects Versions: Impala 4.0 >Reporter: Vihang Karajgaonkar >Assignee: Tim Armstrong >Priority: Blocker > Labels: broken-build, flaky > > Recently, I saw this test failing with the exception trace below. > {noformat} > custom_cluster/test_admission_controller.py:1782: in test_mem_limit > {'request_pool': self.pool_name, 'mem_limit': query_mem_limit}) > custom_cluster/test_admission_controller.py:1638: in run_admission_test > assert metric_deltas['dequeued'] == 0,\ > E AssertionError: Queued queries should not run until others are made to > finish > E assert 1 == 0 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Reopened] (IMPALA-9884) TestAdmissionControllerStress.test_mem_limit failing occasionally
[ https://issues.apache.org/jira/browse/IMPALA-9884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang reopened IMPALA-9884: Saw this again in an internal exhaustive build. > TestAdmissionControllerStress.test_mem_limit failing occasionally > - > > Key: IMPALA-9884 > URL: https://issues.apache.org/jira/browse/IMPALA-9884 > Project: IMPALA > Issue Type: Bug >Affects Versions: Impala 4.0 >Reporter: Vihang Karajgaonkar >Assignee: Tim Armstrong >Priority: Blocker > Labels: broken-build, flaky > > Recently, I saw this test failing with the exception trace below. > {noformat} > custom_cluster/test_admission_controller.py:1782: in test_mem_limit > {'request_pool': self.pool_name, 'mem_limit': query_mem_limit}) > custom_cluster/test_admission_controller.py:1638: in run_admission_test > assert metric_deltas['dequeued'] == 0,\ > E AssertionError: Queued queries should not run until others are made to > finish > E assert 1 == 0 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Work started] (IMPALA-10255) query_test.test_insert.TestInsertQueries.test_insert fails in exhaustive builds
[ https://issues.apache.org/jira/browse/IMPALA-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-10255 started by Quanlong Huang. --- > query_test.test_insert.TestInsertQueries.test_insert fails in exhaustive > builds > --- > > Key: IMPALA-10255 > URL: https://issues.apache.org/jira/browse/IMPALA-10255 > Project: IMPALA > Issue Type: Bug >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Blocker > > The patch in IMPALA-10233 adds 3 insert statements in > testdata/workloads/functional-query/queries/QueryTest/insert.test. They > introduce test failures in parquet format with non-none compressions: > {code:java} > query_test.test_insert.TestInsertQueries.test_insert[compression_codec: > snappy | protocol: beeswax | exec_option: {'sync_ddl': 1, 'batch_size': 0, > 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, > 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: > parquet/none-unique_database0]query_test.test_insert.TestInsertQueries.test_insert[compression_codec: > gzip | protocol: beeswax | exec_option: {'sync_ddl': 0, 'batch_size': 0, > 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': > False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | > table_format: > parquet/none-unique_database0]query_test.test_insert.TestInsertQueries.test_insert[compression_codec: > zstd | protocol: beeswax | exec_option: {'sync_ddl': 1, 'batch_size': 0, > 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': > False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | > table_format: > parquet/none-unique_database0]query_test.test_insert.TestInsertQueries.test_insert[compression_codec: > lz4 | protocol: beeswax | exec_option: {'sync_ddl': 1, 'batch_size': 0, > 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': > False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | > table_format: > parquet/none-unique_database0]query_test.test_insert.TestInsertQueries.test_insert[compression_codec: > lz4 | protocol: beeswax | exec_option: {'sync_ddl': 0, 'batch_size': 0, > 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': > False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | > table_format: > parquet/none-unique_database0]query_test.test_insert.TestInsertQueries.test_insert[compression_codec: > zstd | protocol: beeswax | exec_option: {'sync_ddl': 0, 'batch_size': 0, > 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': > False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | > table_format: > parquet/none-unique_database0]query_test.test_insert.TestInsertQueries.test_insert[compression_codec: > gzip | protocol: beeswax | exec_option: {'sync_ddl': 1, 'batch_size': 0, > 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': > False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | > table_format: > parquet/none-unique_database0]query_test.test_insert.TestInsertQueries.test_insert[compression_codec: > snappy | protocol: beeswax | exec_option: {'sync_ddl': 1, 'batch_size': 0, > 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': > False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | > table_format: > parquet/none-unique_database0]query_test.test_insert.TestInsertQueries.test_insert[compression_codec: > snappy | protocol: beeswax | exec_option: {'sync_ddl': 0, 'batch_size': 0, > 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': > False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | > table_format: > parquet/none-unique_database0]query_test.test_insert.TestInsertQueries.test_insert[compression_codec: > gzip | protocol: beeswax | exec_option: {'sync_ddl': 0, 'batch_size': 0, > 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, > 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: > parquet/none-unique_database0]query_test.test_insert.TestInsertQueries.test_insert[compression_codec: > zstd | protocol: beeswax | exec_option: {'sync_ddl': 1, 'batch_size': 0, > 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, > 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: > parquet/none-unique_database0]query_test.test_insert.TestInsertQueries.test_insert[compression_codec: > lz4 | protocol: beeswax | exec_option: {'sync_ddl': 1, 'batch_size': 0, > 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, > 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: >
[jira] [Created] (IMPALA-10255) query_test.test_insert.TestInsertQueries.test_insert fails in exhaustive builds
Quanlong Huang created IMPALA-10255: --- Summary: query_test.test_insert.TestInsertQueries.test_insert fails in exhaustive builds Key: IMPALA-10255 URL: https://issues.apache.org/jira/browse/IMPALA-10255 Project: IMPALA Issue Type: Bug Reporter: Quanlong Huang Assignee: Quanlong Huang The patch in IMPALA-10233 adds 3 insert statements in testdata/workloads/functional-query/queries/QueryTest/insert.test. They introduce test failures in parquet format with non-none compressions: {code:java} query_test.test_insert.TestInsertQueries.test_insert[compression_codec: snappy | protocol: beeswax | exec_option: {'sync_ddl': 1, 'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: parquet/none-unique_database0]query_test.test_insert.TestInsertQueries.test_insert[compression_codec: gzip | protocol: beeswax | exec_option: {'sync_ddl': 0, 'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: parquet/none-unique_database0]query_test.test_insert.TestInsertQueries.test_insert[compression_codec: zstd | protocol: beeswax | exec_option: {'sync_ddl': 1, 'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: parquet/none-unique_database0]query_test.test_insert.TestInsertQueries.test_insert[compression_codec: lz4 | protocol: beeswax | exec_option: {'sync_ddl': 1, 'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: parquet/none-unique_database0]query_test.test_insert.TestInsertQueries.test_insert[compression_codec: lz4 | protocol: beeswax | exec_option: {'sync_ddl': 0, 'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: parquet/none-unique_database0]query_test.test_insert.TestInsertQueries.test_insert[compression_codec: zstd | protocol: beeswax | exec_option: {'sync_ddl': 0, 'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: parquet/none-unique_database0]query_test.test_insert.TestInsertQueries.test_insert[compression_codec: gzip | protocol: beeswax | exec_option: {'sync_ddl': 1, 'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: parquet/none-unique_database0]query_test.test_insert.TestInsertQueries.test_insert[compression_codec: snappy | protocol: beeswax | exec_option: {'sync_ddl': 1, 'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: parquet/none-unique_database0]query_test.test_insert.TestInsertQueries.test_insert[compression_codec: snappy | protocol: beeswax | exec_option: {'sync_ddl': 0, 'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: parquet/none-unique_database0]query_test.test_insert.TestInsertQueries.test_insert[compression_codec: gzip | protocol: beeswax | exec_option: {'sync_ddl': 0, 'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: parquet/none-unique_database0]query_test.test_insert.TestInsertQueries.test_insert[compression_codec: zstd | protocol: beeswax | exec_option: {'sync_ddl': 1, 'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: parquet/none-unique_database0]query_test.test_insert.TestInsertQueries.test_insert[compression_codec: lz4 | protocol: beeswax | exec_option: {'sync_ddl': 1, 'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: parquet/none-unique_database0]query_test.test_insert.TestInsertQueries.test_insert[compression_codec: lz4 | protocol: beeswax | exec_option: {'sync_ddl': 0, 'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format:
[jira] [Created] (IMPALA-10254) Load data files via Iceberg for Iceberg Tables
Zoltán Borók-Nagy created IMPALA-10254: -- Summary: Load data files via Iceberg for Iceberg Tables Key: IMPALA-10254 URL: https://issues.apache.org/jira/browse/IMPALA-10254 Project: IMPALA Issue Type: Bug Reporter: Zoltán Borók-Nagy Currently we still load the file descriptors of an Iceberg table via recursive file listing. This lists too many files, e.g. metadata files, files that are being written (can later throw checksum errors), files from aborted INSERTs, removed files, etc. We should use the Iceberg API to load the file descriptors corresponding to the table snapshot. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10253) Improve query performance contains dict function
[ https://issues.apache.org/jira/browse/IMPALA-10253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaoxiaoqing updated IMPALA-10253: - Description: we have the following parquet table: {code:java} CREATE EXTERNAL TABLE rawdata.event_ros_p1 ( event_id INT, user_id BIGINT, time TIMESTAMP, p_abook_type STRING ) PARTITIONED BY ( day INT, event_bucket INT ) STORED AS PARQUET LOCATION 'hdfs://localhost:20500/sa/data/1/event' {code} the data show as following: ||event_id||user_id||time||p_abook_type|| |1|-922235446862664806|2018-07-18 09:01:06.158|小说| |2|-922235446862664806|2018-07-19 09:01:06.158|小说| if we want remapping event_id to the real event name, we can realize dict udf. the dict udf is defined as DICT(BIGINT expression, STRING path). first parameter is the column, second parameter is hdfs path which store the remapping rule like this: {code:java} 1,SignUp 2,ViewProduct{code} then build a view table which add the dict column on original table: {code:java} CREATE VIEW rawdata.event_external_view_p7 AS SELECT events.*, dict(`event_id`, '/data/1/event.txt') AS `event` FROM rawdata.event_view_p7 events {code} If the query group by column has dict, the query is slower then group by original column. when explain the sql, we found that each line data need remapping in SCAN phase and AGGREGATE phase. {code:java} select event, count(*) from event_external_view_p7 where event in ('SignUp', 'ViewProduct') group by event;{code} {code:java} PLAN-ROOT SINK | 04:EXCHANGE [UNPARTITIONED] | 03:AGGREGATE [FINALIZE] | output: count:merge(*) | group by: event | row-size=20B cardinality=0 | 02:EXCHANGE [HASH(event)] | 01:AGGREGATE [STREAMING] | output: count(*) | group by: rawdata.DICT(event_id, '/data/1/event.txt') | row-size=20B cardinality=0 | 00:SCAN HDFS [rawdata.event_ros_p7_merge_offline] | partitions=39/39 files=99 size=9.00GB | predicates: rawdata.DICT(event_id, '/data/1/event.txt') IN ('SignUp', 'ViewProduct') | row-size=4B cardinality=unavailable {code} the idea is to modify plan, use original column in SCAN phase and AGGREGATE phase and remapping the original column at last, the new plan like this: {code:java} PLAN-ROOT SINK | 05:SELECT [FINALIZE] | output: dict(event_id) | row-size=20B cardinality=0 | 04:EXCHANGE [UNPARTITIONED] | 03:AGGREGATE [FINALIZE] | output: count:merge(*) | group by: event_id | row-size=20B cardinality=0 | 02:EXCHANGE [HASH(event)] | 01:AGGREGATE [STREAMING] | output: count(*) | group by: event_id | row-size=20B cardinality=0 | 00:SCAN HDFS [rawdata.event_ros_p7_merge_offline] | partitions=39/39 files=99 size=9.00GB | predicates: event_id IN (1, 2) | row-size=4B cardinality=unavailable {code} was: we have the following parquet table: {code:java} CREATE EXTERNAL TABLE rawdata.event_ros_p1 ( event_id INT, user_id BIGINT, time TIMESTAMP, p_abook_type STRING ) PARTITIONED BY ( day INT, event_bucket INT ) STORED AS PARQUET LOCATION 'hdfs://localhost:20500/sa/data/1/event' {code} the data show as following: ||event_id||user_id||time||p_abook_type|| |1|-922235446862664806|2018-07-18 09:01:06.158|小说| |2|-922235446862664806|2018-07-19 09:01:06.158|小说| if we want remapping event_id to the real event name, we can realize dict udf. the dict udf is defined as DICT(BIGINT expression, STRING path). first parameter is the column, second parameter is hdfs path which store the remapping rule like this: {code:java} 1,SignUp 2,ViewProduct{code} then build a view table which add the dict column on original table: {code:java} CREATE VIEW rawdata.event_external_view_p7 AS SELECT events.*, dict(`event_id`, '/data/1/event.txt') AS `event` FROM rawdata.event_view_p7 events {code} If the query group by column has dict, the query is very slow because of each line need remapping: {code:java} select event, count(*) from event_external_view_p7 where event in ('SignUp', 'ViewProduct') group by event;{code} explain result is {code:java} PLAN-ROOT SINK | 04:EXCHANGE [UNPARTITIONED] | 03:AGGREGATE [FINALIZE] | output: count:merge(*) | group by: event | row-size=20B cardinality=0 | 02:EXCHANGE [HASH(event)] | 01:AGGREGATE [STREAMING] | output: count(*) | group by: rawdata.DICT(event_id, '/data/1/event.txt') | row-size=20B cardinality=0 | 00:SCAN HDFS [rawdata.event_ros_p7_merge_offline] | partitions=39/39 files=99 size=9.00GB | predicates: rawdata.DICT(event_id, '/data/1/event.txt') IN ('SignUp', 'ViewProduct') | row-size=4B cardinality=unavailable {code} we can modify plan, rewrite AGGREGATE NODE and SCAN NODE, the new plan like this: {code:java} PLAN-ROOT SINK | 05:SELECT [FINALIZE] | output: dict(event_id) | row-size=20B cardinality=0 | 04:EXCHANGE [UNPARTITIONED] | 03:AGGREGATE [FINALIZE] | output: count:merge(*) | group by: event_id | row-size=20B cardinality=0 | 02:EXCHANGE [HASH(event)] | 01:AGGREGATE [STREAMING] | output: count(*) | group by: event_id |
[jira] [Updated] (IMPALA-10253) Improve query performance contains dict function
[ https://issues.apache.org/jira/browse/IMPALA-10253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaoxiaoqing updated IMPALA-10253: - Description: we have the following parquet table: {code:java} CREATE EXTERNAL TABLE rawdata.event_ros_p1 ( event_id INT, user_id BIGINT, time TIMESTAMP, p_abook_type STRING ) PARTITIONED BY ( day INT, event_bucket INT ) STORED AS PARQUET LOCATION 'hdfs://localhost:20500/sa/data/1/event' {code} the data show as following: ||event_id||user_id||time||p_abook_type|| |1|-922235446862664806|2018-07-18 09:01:06.158|小说| |2|-922235446862664806|2018-07-19 09:01:06.158|小说| if we want remapping event_id to the real event name, we can realize dict udf. the dict udf is defined as DICT(BIGINT expression, STRING path). first parameter is the column, second parameter is hdfs path which store the remapping rule like this: {code:java} 1,SignUp 2,ViewProduct{code} then build a view table which add the dict column on original table: {code:java} CREATE VIEW rawdata.event_external_view_p7 AS SELECT events.*, dict(`event_id`, '/data/1/event.txt') AS `event` FROM rawdata.event_view_p7 events {code} If the query group by column has dict, the query is very slow because of each line need remapping: {code:java} select event, count(*) from event_external_view_p7 where event in ('SignUp', 'ViewProduct') group by event;{code} explain result is {code:java} PLAN-ROOT SINK | 04:EXCHANGE [UNPARTITIONED] | 03:AGGREGATE [FINALIZE] | output: count:merge(*) | group by: event | row-size=20B cardinality=0 | 02:EXCHANGE [HASH(event)] | 01:AGGREGATE [STREAMING] | output: count(*) | group by: rawdata.DICT(event_id, '/data/1/event.txt') | row-size=20B cardinality=0 | 00:SCAN HDFS [rawdata.event_ros_p7_merge_offline] | partitions=39/39 files=99 size=9.00GB | predicates: rawdata.DICT(event_id, '/data/1/event.txt') IN ('SignUp', 'ViewProduct') | row-size=4B cardinality=unavailable {code} we can modify plan, rewrite AGGREGATE NODE and SCAN NODE, the new plan like this: {code:java} PLAN-ROOT SINK | 05:SELECT [FINALIZE] | output: dict(event_id) | row-size=20B cardinality=0 | 04:EXCHANGE [UNPARTITIONED] | 03:AGGREGATE [FINALIZE] | output: count:merge(*) | group by: event_id | row-size=20B cardinality=0 | 02:EXCHANGE [HASH(event)] | 01:AGGREGATE [STREAMING] | output: count(*) | group by: event_id | row-size=20B cardinality=0 | 00:SCAN HDFS [rawdata.event_ros_p7_merge_offline] | partitions=39/39 files=99 size=9.00GB | predicates: event_id IN (1, 2) | row-size=4B cardinality=unavailable {code} was: If we have the following parquet table: {code:java} CREATE EXTERNAL TABLE rawdata.event_ros_p1 ( event_id INT, user_id BIGINT, time TIMESTAMP, p_abook_type STRING ) PARTITIONED BY ( day INT, event_bucket INT ) STORED AS PARQUET LOCATION 'hdfs://localhost:20500/sa/data/1/event' {code} the data as the following: ||event_id||user_id||time||p_abook_type|| |1|-922235446862664806|2018-07-18 09:01:06.158|小说| |2|-922235446862664806|2018-07-19 09:01:06.158|小说| now, we need remapping event_id to the real event name to show customer, the remapping rule like this: {code:java} 1,SignUp 2,ViewProduct{code} we can realize udf remapping event_id to event_name, the rule store on hdfs, and then build a view table: {code:java} CREATE VIEW rawdata.event_external_view_p7 AS SELECT events.*, dict(`event_id`, '/data/1/event.txt') AS `event` FROM rawdata.event_view_p7 events {code} If the query group by dict udf function, the query is very slow because of each line need remapping: {code:java} select event, count(*) from event_external_view_p7 where event in ('SignUp', 'ViewProduct') group by event;{code} explain result is {code:java} PLAN-ROOT SINK | 04:EXCHANGE [UNPARTITIONED] | 03:AGGREGATE [FINALIZE] | output: count:merge(*) | group by: event | row-size=20B cardinality=0 | 02:EXCHANGE [HASH(event)] | 01:AGGREGATE [STREAMING] | output: count(*) | group by: rawdata.dict(event_id) | row-size=20B cardinality=0 | 00:SCAN HDFS [rawdata.event_ros_p7_merge_offline] | partitions=39/39 files=99 size=9.00GB | predicates: rawdata.dict(event_id) IN ('SignUp', 'ViewProduct') | row-size=4B cardinality=unavailable {code} we can modify plan, rewrite AGGREGATE NODE and SCAN NODE, the new plan like this: {code:java} PLAN-ROOT SINK | 05:SELECT [FINALIZE] | output: dict(event_id) | row-size=20B cardinality=0 | 04:EXCHANGE [UNPARTITIONED] | 03:AGGREGATE [FINALIZE] | output: count:merge(*) | group by: event_id | row-size=20B cardinality=0 | 02:EXCHANGE [HASH(event)] | 01:AGGREGATE [STREAMING] | output: count(*) | group by: event_id | row-size=20B cardinality=0 | 00:SCAN HDFS [rawdata.event_ros_p7_merge_offline] | partitions=39/39 files=99 size=9.00GB | predicates: event_id IN (1, 2) | row-size=4B cardinality=unavailable {code} > Improve query performance contains dict function >