date:20201019

[jira] [Commented] (IMPALA-9884) TestAdmissionControllerStress.test_mem_limit failing occasionally

2020-10-19 Thread Tim Armstrong (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-9884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217255#comment-17217255
 ] 

Tim Armstrong commented on IMPALA-9884:
---

On that executor we see that it finished in a few seconds.
{noformat}
I1017 01:11:24.338460 24753 control-service.cc:142] 
1b4e1ee5d51fc461:12219325] ExecQueryFInstances(): 
query_id=1b4e1ee5d51fc461:12219325 
coord=impala-ec2-centos74-m5-4xlarge-ondemand-1925.vpc.cloudera.com:27000 
#instances=1
I1017 01:11:24.349339 25196 query-state.cc:897] 
1b4e1ee5d51fc461:122193250002] Executing instance. 
instance_id=1b4e1ee5d51fc461:122193250002 fragment_idx=1 
per_fragment_instance_idx=1 coord_state_idx=2 #in-flight=3
...
I1017 01:11:29.631078 25196 query-state.cc:906] 
1b4e1ee5d51fc461:122193250002] Instance completed. 
instance_id=1b4e1ee5d51fc461:122193250002 #in-flight=2 status=OK
I1017 01:11:29.631088 25178 query-state.cc:464] 
1b4e1ee5d51fc461:12219325] UpdateBackendExecState(): last report for 
1b4e1ee5d51fc461:12219325
{noformat}

So I think this is a repeat of the scenario in IMPALA-8565, I guess even though 
I made the query bigger it isn't enough. I'll probably just increase it further.

> TestAdmissionControllerStress.test_mem_limit failing occasionally
> -
>
> Key: IMPALA-9884
> URL: https://issues.apache.org/jira/browse/IMPALA-9884
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 4.0
>Reporter: Vihang Karajgaonkar
>Assignee: Tim Armstrong
>Priority: Blocker
>  Labels: broken-build, flaky
> Attachments: impalad-executors.tar.gz, 
> impalad.impala-ec2-centos74-m5-4xlarge-ondemand-1925.vpc.cloudera.com.jenkins.log.INFO.20201017-06.23933.gz
>
>
> Recently, I saw this test failing with the exception trace below. 
> {noformat}
> custom_cluster/test_admission_controller.py:1782: in test_mem_limit
> {'request_pool': self.pool_name, 'mem_limit': query_mem_limit})
> custom_cluster/test_admission_controller.py:1638: in run_admission_test
> assert metric_deltas['dequeued'] == 0,\
> E   AssertionError: Queued queries should not run until others are made to 
> finish
> E   assert 1 == 0
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-9884) TestAdmissionControllerStress.test_mem_limit failing occasionally

2020-10-19 Thread Tim Armstrong (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-9884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217253#comment-17217253
 ] 

Tim Armstrong commented on IMPALA-9884:
---

{noformat}
I1017 01:11:24.339452 25165 admission-controller.cc:1638] 
3144178b629c699c:dde994b7] Stats: agg_num_running=5, agg_num_queued=0, 
agg_mem_reserved=0,  local_host(local_mem_admitted=12.00 GB, 
num_admitted_running=5, num_queued=0, backend_mem_reserved=0, topN_query_stats: 
queries=[], total_mem_consumed=0; pool_level_stats: num_running=0, min=0, 
max=0, pool_total_mem=0)
...
I1017 01:11:24.339519 25165 admission-controller.cc:1195] 
3144178b629c699c:dde994b7] Queuing, query 
id=3144178b629c699c:dde994b7 reason: Not enough aggregate memory 
available in pool default-pool with max mem resources 12.00 GB. Needed 2.40 GB 
but only 18.00 B was available.
...
I1017 01:11:29.640173 24428 admission-controller.cc:1630] Trying to admit 
id=3144178b629c699c:dde994b7 in pool_name=default-pool 
executor_group_name=default per_host_mem_estimate=81.29 MB 
dedicated_coord_mem_estimate=101.29 MB max_requests=150 max_queued=10 
max_mem=12.00 GB
I1017 01:11:29.640328 24428 admission-controller.cc:1652] Cannot admit query 
3144178b629c699c:dde994b7 to group default: Not enough aggregate memory 
available in pool default-pool with max mem resources 12.00 GB. Needed 2.40 GB 
but only 18.00 B was available. Details:
I1017 01:11:29.640334 24428 admission-controller.cc:1851] Could not dequeue 
query id=3144178b629c699c:dde994b7 reason: Not enough aggregate memory 
available in pool default-pool with max mem resources 12.00 GB. Needed 2.40 GB 
but only 18.00 B was available.
I1017 01:11:29.677559 24428 admission-controller.cc:1630] Trying to admit 
id=3144178b629c699c:dde994b7 in pool_name=default-pool 
executor_group_name=default per_host_mem_estimate=81.29 MB 
dedicated_coord_mem_estimate=101.29 MB max_requests=150 max_queued=10 
max_mem=12.00 GB
I1017 01:11:29.677701 24428 admission-controller.cc:1786] Admitting from queue: 
query=3144178b629c699c:dde994b7
I1017 01:11:29.677712 24428 admission-controller.cc:1878] For Query 
3144178b629c699c:dde994b7 per_backend_mem_limit set to: 819.20 MB 
per_backend_mem_to_admit set to: 819.20 MB coord_backend_mem_limit set to: 
819.20 MB coord_backend_mem_to_admit set to: 819.20 MB
I1017 01:11:29.677990 25165 admission-controller.cc:1273] 
3144178b629c699c:dde994b7] Admitted queued query 
id=3144178b629c699c:dde994b7
I1017 01:11:29.678004 25165 admission-controller.cc:1274] 
3144178b629c699c:dde994b7] Final: agg_num_running=6, agg_num_queued=9, 
agg_mem_reserved=9.60 GB,  local_host(local_mem_admitted=12.00 GB, 
num_admitted_running=6, num_queued=9, backend_mem_reserved=4.00 GB, 
topN_query_stats: queries=[8f462fa2ce60d289:e0631471, 
d5466702e1e5c14e:43f31d30, 1b4e1ee5d51fc461:12219325, 
cf498fd1ece032b6:b4f673d1, 4a4d18e5caa85310:022e0229], 
total_mem_consumed=59.95 MB, fraction_of_pool_total_mem=1; pool_level_stats: 
num_running=5, min=5.03 MB, max=13.76 MB, pool_total_mem=59.95 MB, 
average_per_query=11.99 MB)
{noformat}

It looks like this was able to be dequeued because a query finished running on 
a backend:
{noformat}
I1017 01:11:29.639609 24226 coordinator.cc:959] Backend completed: 
host=impala-ec2-centos74-m5-4xlarge-ondemand-1925.vpc.cloudera.com:27001 
remaining=3 query_id= [^impalad-executors.tar.gz] 00
I1017 01:11:29.639629 24226 coordinator-backend-state.cc:362] 
query_id=1b4e1ee5d51fc461:12219325: first in-progress backend: 
impala-ec2-centos74-m5-4xlarge-ondemand-1925.vpc.cloudera.com:27000
I1017 01:11:29.639644 24226 admission-controller.cc:759] Update admitted mem 
reserved for 
host=impala-ec2-centos74-m5-4xlarge-ondemand-1925.vpc.cloudera.com:27001 
prev=2.40 GB new=1.60 GB
I1017 01:11:29.639657 24226 admission-controller.cc:764] Update admitted 
queries for 
host=impala-ec2-centos74-m5-4xlarge-ondemand-1925.vpc.cloudera.com:27001 prev=3 
new=2
I1017 01:11:29.639659 24226 admission-controller.cc:769] Update slots in use 
for host=impala-ec2-centos74-m5-4xlarge-ondemand-1925.vpc.cloudera.com:27001 
prev=3 new=2
I1017 01:11:29.639701 24226 admission-controller.cc:1337] Released query 
backend(s) impala-ec2-centos74-m5-4xlarge-ondemand-1925.vpc.cloudera.com:27001 
for query id=1b4e1ee5d51fc461:12219325 agg_num_running=5, 
agg_num_queued=10, agg_mem_reserved=12.00 GB,  
local_host(local_mem_admitted=9.60 GB, num_admitted_running=5, num_queued=10, 
backend_mem_reserved=4.00 GB, topN_query_stats: 
queries=[cf498fd1ece032b6:b4f673d1, d5466702e1e5c14e:43f31d30, 
1b4e1ee5d51fc461:12219325, 8f462fa2ce60d289:e0631471, 
4a4d18e5caa85310:022e0229], total_mem_consumed=37.12 MB, 
fraction_of_pool_total_mem=1; pool_level_stats: num_running=5,

[jira] [Updated] (IMPALA-9884) TestAdmissionControllerStress.test_mem_limit failing occasionally

2020-10-19 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-9884:
--
Attachment: impalad-executors.tar.gz

> TestAdmissionControllerStress.test_mem_limit failing occasionally
> -
>
> Key: IMPALA-9884
> URL: https://issues.apache.org/jira/browse/IMPALA-9884
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 4.0
>Reporter: Vihang Karajgaonkar
>Assignee: Tim Armstrong
>Priority: Blocker
>  Labels: broken-build, flaky
> Attachments: impalad-executors.tar.gz, 
> impalad.impala-ec2-centos74-m5-4xlarge-ondemand-1925.vpc.cloudera.com.jenkins.log.INFO.20201017-06.23933.gz
>
>
> Recently, I saw this test failing with the exception trace below. 
> {noformat}
> custom_cluster/test_admission_controller.py:1782: in test_mem_limit
> {'request_pool': self.pool_name, 'mem_limit': query_mem_limit})
> custom_cluster/test_admission_controller.py:1638: in run_admission_test
> assert metric_deltas['dequeued'] == 0,\
> E   AssertionError: Queued queries should not run until others are made to 
> finish
> E   assert 1 == 0
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-10261) impala-minimal-hive-exec should include org/apache/hive/com/google/**

2020-10-19 Thread Joe McDonnell (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-10261.

Fix Version/s: Impala 4.0
 Assignee: Joe McDonnell
   Resolution: Fixed

> impala-minimal-hive-exec should include org/apache/hive/com/google/**
> -
>
> Key: IMPALA-10261
> URL: https://issues.apache.org/jira/browse/IMPALA-10261
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 4.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Critical
> Fix For: Impala 4.0
>
>
> Hive started shading guava (com/google) with HIVE-22126, so 
> impala-minimal-hive-exec should add org/apache/hive/com/google to its 
> inclusions. This will allow Impala to build/work with newer versions of Hive 
> that have this change. Leaving the existing com/google inclusion should let 
> it work with both:
> [https://github.com/apache/impala/blob/master/java/shaded-deps/hive-exec/pom.xml#L116]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-9918) HdfsOrcScanner crash on resolving columns

2020-10-19 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217250#comment-17217250
 ] 

ASF subversion and git services commented on IMPALA-9918:
-

Commit 1e2176c84909a26e6405df7ae6d34d724e5a5217 in impala's branch 
refs/heads/master from Csaba Ringhofer
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=1e2176c ]

IMPALA-9918: ORC scanner hits DCHECK when GLOG_v=3

PrintPath assumed that all elements in the path are complex,
and hit a DCHECK if it contained a scalar element. This didn't
seem to cause problems in Parquet, but the ORC scanner called
this function with paths where the last element was scalar.
This problem was probably not discovered because no one tested
ORC scanning with v=3 logging + DEBUG builds.

Also added logging to the events when log levels are changed
through the webpage. In case of ResetJavaLogLevelCallback
there was already log line from GlogAppender.java.

Note that the cause of the original issue is still unknown,
as it occurred during custom cluster tests where no other tests
should change the log levels in parallel.

Testing:
- tested the log changes manually

Change-Id: I94e12d2a62ccab5eb5d21675d5f0138f04e622ac
Reviewed-on: http://gerrit.cloudera.org:8080/16611
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins 


> HdfsOrcScanner crash on resolving columns
> -
>
> Key: IMPALA-9918
> URL: https://issues.apache.org/jira/browse/IMPALA-9918
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.0
> Environment: BUILD_TAG
> jenkins-impala-cdpd-master-core-ubsan-111
>Reporter: Wenzhe Zhou
>Assignee: Csaba Ringhofer
>Priority: Major
>  Labels: broken-build
> Attachments: 092420_backtraces.txt, backtraces.txt, backtraces.txt
>
>
> Core file generated in impala-cdpd-master-core-ubsan build
> Back traces:
> CORE: ./tests/core.1594000709.13971.impalad
> BINARY: ./be/build/latest/service/impalad
> Core was generated by 
> `/data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/build/lat'.
> Program terminated with signal SIGABRT, Aborted.
> #0 0x7f7a481851f7 in raise () from /lib64/libc.so.6
> To enable execution of this file add
>  add-auto-load-safe-path 
> /data0/jenkins/workspace/impala-cdpd-master-core-ubsan/Impala-Toolchain/gcc-4.9.2/lib64/libstdc++.so.6.0.20-gdb.py
> line to your configuration file "/var/lib/jenkins/.gdbinit".
> To completely disable this security protection add
>  set auto-load safe-path /
> line to your configuration file "/var/lib/jenkins/.gdbinit".
> For more information about this security protection see the
> "Auto-loading safe path" section in the GDB manual. E.g., run from the shell:
>  info "(gdb)Auto-loading safe path"
> #0 0x7f7a481851f7 in raise () from /lib64/libc.so.6
> #1 0x7f7a481868e8 in abort () from /lib64/libc.so.6
> #2 0x083401c4 in google::DumpStackTraceAndExit() ()
> #3 0x08336b5d in google::LogMessage::Fail() ()
> #4 0x08338402 in google::LogMessage::SendToLog() ()
> #5 0x08336537 in google::LogMessage::Flush() ()
> #6 0x08339afe in google::LogMessageFatal::~LogMessageFatal() ()
> #7 0x03215662 in impala::PrintPath (tbl_desc=..., path=...) at 
> /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/util/debug-util.cc:259
> #8 0x0370dfe9 in impala::HdfsOrcScanner::ResolveColumns 
> (this=0x14555c00, tuple_desc=..., selected_nodes=0x7f79722730a8, 
> pos_slots=0x7f7972273058) at 
> /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/exec/hdfs-orc-scanner.cc:436
> #9 0x037099dd in impala::HdfsOrcScanner::SelectColumns 
> (this=0x14555c00, tuple_desc=...) at 
> /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/exec/hdfs-orc-scanner.cc:456
> #10 0x03707688 in impala::HdfsOrcScanner::Open (this=0x14555c00, 
> context=0x7f7972274700) at 
> /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/exec/hdfs-orc-scanner.cc:221
> #11 0x035e0a48 in 
> impala::HdfsScanNodeBase::CreateAndOpenScannerHelper (this=0x1b1c7100, 
> partition=0x142f9d00, context=0x7f7972274700, scanner=0x7f79722746f8) at 
> /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/exec/hdfs-scan-node-base.cc:882
> #12 0x039df2e8 in impala::HdfsScanNode::ProcessSplit 
> (this=0x1b1c7100, filter_ctxs=..., expr_results_pool=0x7f7972274bd8, 
> scan_range=0x12a16c40, scanner_thread_reservation=0x7f7972274e18) at 
> /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/exec/hdfs-scan-node.cc:480
> #13 0x039ddd85 in impala::HdfsScanNode::ScannerThread 
> (this=0x1b1c7100, first_thread=true,

[jira] [Commented] (IMPALA-10261) impala-minimal-hive-exec should include org/apache/hive/com/google/**

2020-10-19 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-10261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217251#comment-17217251
 ] 

ASF subversion and git services commented on IMPALA-10261:
--

Commit ca4d6912be7c89acd518bbbe44e7c2407f1bb217 in impala's branch 
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=ca4d691 ]

IMPALA-10261: Include org/apache/hive/com/google in impala-minimal-hive-exec

Newer versions of Hive shade guava, which means that they require
the presence of artifacts in org/apache/hive/com/google. To
support these newer versions, this adds that path to the inclusions
for impala-minimal-hive-exec.

Testing:
 - Tested with a newer version of Hive that has the shading
   and verified that Impala starts up and functions.

Change-Id: I87ac089fdacc6fc5089ed68be92dedce514050b9
Reviewed-on: http://gerrit.cloudera.org:8080/16614
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins 


> impala-minimal-hive-exec should include org/apache/hive/com/google/**
> -
>
> Key: IMPALA-10261
> URL: https://issues.apache.org/jira/browse/IMPALA-10261
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 4.0
>Reporter: Joe McDonnell
>Priority: Critical
>
> Hive started shading guava (com/google) with HIVE-22126, so 
> impala-minimal-hive-exec should add org/apache/hive/com/google to its 
> inclusions. This will allow Impala to build/work with newer versions of Hive 
> that have this change. Leaving the existing com/google inclusion should let 
> it work with both:
> [https://github.com/apache/impala/blob/master/java/shaded-deps/hive-exec/pom.xml#L116]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-9884) TestAdmissionControllerStress.test_mem_limit failing occasionally

2020-10-19 Thread Tim Armstrong (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-9884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217245#comment-17217245
 ] 

Tim Armstrong commented on IMPALA-9884:
---

This assertion is meant to check that, when the first wave of queries is 
submitted, none of them will be queued and then dequeued (because none of the 
admitted queries will finish running before the initial wave of admission 
decisions is made). 

This could be because a query was dequeued when it shouldn't have been. But it 
could also be explained if the test is incorrect in detecting when the 
admission decisions have been made for the first wave. One thing that's

Piecing together what happened, the test was stuck for a long time waiting for 
the initial admission decisions. I see that the last query rejected (out of 34 
rejected) in the initial wave was at 01:11:31:
{noformat}
$ grep 'Rejected' 
impalad.impala-ec2-centos74-m5-4xlarge-ondemand-1925.vpc.cloudera.com.jenkins.log.INFO.20201017-06.23933
...
I1017 01:11:31.068819 25916 admission-controller.cc:1169] 
8448348c952b1d85:ac9a6bb1] Rejected query from pool default-pool: queue 
full, limit=10, num_queued=10.
{noformat}

There are 5 queries admitted right away, then a 6th admitted from the queue, 
which is likely the problematic one:
{noformat}
$grep 'Admitting.*que' 
impalad.impala-ec2-centos74-m5-4xlarge-ondemand-1925.vpc.cloudera.com.jenkins.log.INFO.20201017-06.23933
I1017 01:11:24.310699 25150 admission-controller.cc:1185] 
cf498fd1ece032b6:b4f673d1] Admitting query 
id=cf498fd1ece032b6:b4f673d1
I1017 01:11:24.317937 25152 admission-controller.cc:1185] 
1b4e1ee5d51fc461:12219325] Admitting query 
id=1b4e1ee5d51fc461:12219325
I1017 01:11:24.321123 25154 admission-controller.cc:1185] 
4a4d18e5caa85310:022e0229] Admitting query 
id=4a4d18e5caa85310:022e0229
I1017 01:11:24.327414 25157 admission-controller.cc:1185] 
8f462fa2ce60d289:e0631471] Admitting query 
id=8f462fa2ce60d289:e0631471
I1017 01:11:24.334887 25161 admission-controller.cc:1185] 
d5466702e1e5c14e:43f31d30] Admitting query 
id=d5466702e1e5c14e:43f31d30
I1017 01:11:29.677701 24428 admission-controller.cc:1786] Admitting from queue: 
query=3144178b629c699c:dde994b7
I1017 01:21:24.471750 24428 admission-controller.cc:1786] Admitting from queue: 
query=bd408b203d1cc15e:141cd954
I1017 01:21:24.471935 24428 admission-controller.cc:1786] Admitting from queue: 
query=7e47cf24a70c6e29:fe48a517
I1017 01:21:24.472086 24428 admission-controller.cc:1786] Admitting from queue: 
query=9041a95ec5b2de44:536b85bc
I1017 01:21:24.472236 24428 admission-controller.cc:1786] Admitting from queue: 
query=9b4b9f3a12265bad:3c8fbdab
I1017 01:21:24.472388 24428 admission-controller.cc:1786] Admitting from queue: 
query=254d8911a574bbda:4ddbf95c
I1017 01:21:29.131278 24428 admission-controller.cc:1786] Admitting from queue: 
query=5744d3d253edafeb:f0bbc048
{noformat}

I think the 10 minute gap might be from the test getting stuck at 
wait_for_admitted_threads here, but I'm not sure:
{noformat}
LOG.info("Wait for initial admission decisions")
(metric_deltas, curr_metrics) = self.wait_for_metric_changes(
['admitted', 'queued', 'rejected'], initial_metrics, num_queries)
# Also wait for the test threads that submitted the queries to start 
executing.
self.wait_for_admitted_threads(metric_deltas['admitted'])
{noformat}

Next is to figure out what happened with 3144178b629c699c:dde994b7

> TestAdmissionControllerStress.test_mem_limit failing occasionally
> -
>
> Key: IMPALA-9884
> URL: https://issues.apache.org/jira/browse/IMPALA-9884
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 4.0
>Reporter: Vihang Karajgaonkar
>Assignee: Tim Armstrong
>Priority: Blocker
>  Labels: broken-build, flaky
> Attachments: 
> impalad.impala-ec2-centos74-m5-4xlarge-ondemand-1925.vpc.cloudera.com.jenkins.log.INFO.20201017-06.23933.gz
>
>
> Recently, I saw this test failing with the exception trace below. 
> {noformat}
> custom_cluster/test_admission_controller.py:1782: in test_mem_limit
> {'request_pool': self.pool_name, 'mem_limit': query_mem_limit})
> custom_cluster/test_admission_controller.py:1638: in run_admission_test
> assert metric_deltas['dequeued'] == 0,\
> E   AssertionError: Queued queries should not run until others are made to 
> finish
> E   assert 1 == 0
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail:

[jira] [Commented] (IMPALA-10256) TestDisableFeatures.test_disable_incremental_metadata_updates fails

2020-10-19 Thread Quanlong Huang (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-10256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217215#comment-17217215
 ] 

Quanlong Huang commented on IMPALA-10256:
-

S3 doesn't have hdfs cache pools. We should skip this test when running on 
non-hdfs systems.

> TestDisableFeatures.test_disable_incremental_metadata_updates fails
> ---
>
> Key: IMPALA-10256
> URL: https://issues.apache.org/jira/browse/IMPALA-10256
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Blocker
>  Labels: broken-build
>
> Saw test failures in internal CORE S3 builds:
> custom_cluster.test_disable_features.TestDisableFeatures.test_disable_incremental_metadata_updates[protocol:
>  beeswax | exec_option: \{'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> text/none-unique_database0]
> {code:java}
> custom_cluster/test_disable_features.py:45: in 
> test_disable_incremental_metadata_updates
> use_db=unique_database, multiple_impalad=True)
> common/impala_test_suite.py:662: in run_test_case
> result = exec_fn(query, user=test_section.get('USER', '').strip() or None)
> common/impala_test_suite.py:600: in __exec_in_impala
> result = self.__execute_query(target_impalad_client, query, user=user)
> common/impala_test_suite.py:909: in __execute_query
> return impalad_client.execute(query, user=user)
> common/impala_connection.py:205: in execute
> return self.__beeswax_client.execute(sql_stmt, user=user)
> beeswax/impala_beeswax.py:187: in execute
> handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:363: in __execute_query
> handle = self.execute_query_async(query_string, user=user)
> beeswax/impala_beeswax.py:357: in execute_query_async
> handle = self.__do_rpc(lambda: self.imp_service.query(query,))
> beeswax/impala_beeswax.py:520: in __do_rpc
> raise ImpalaBeeswaxException(self.__build_error_message(b), b)
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EINNER EXCEPTION: 
> EMESSAGE: AnalysisException: The specified cache pool does not exist: 
> testPool {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-10256) TestDisableFeatures.test_disable_incremental_metadata_updates fails

2020-10-19 Thread Quanlong Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-10256:

Description: 
Saw test failures in internal CORE S3 builds:

custom_cluster.test_disable_features.TestDisableFeatures.test_disable_incremental_metadata_updates[protocol:
 beeswax | exec_option: \{'batch_size': 0, 'num_nodes': 0, 
'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
text/none-unique_database0]
{code:java}
custom_cluster/test_disable_features.py:45: in 
test_disable_incremental_metadata_updates
use_db=unique_database, multiple_impalad=True)
common/impala_test_suite.py:662: in run_test_case
result = exec_fn(query, user=test_section.get('USER', '').strip() or None)
common/impala_test_suite.py:600: in __exec_in_impala
result = self.__execute_query(target_impalad_client, query, user=user)
common/impala_test_suite.py:909: in __execute_query
return impalad_client.execute(query, user=user)
common/impala_connection.py:205: in execute
return self.__beeswax_client.execute(sql_stmt, user=user)
beeswax/impala_beeswax.py:187: in execute
handle = self.__execute_query(query_string.strip(), user=user)
beeswax/impala_beeswax.py:363: in __execute_query
handle = self.execute_query_async(query_string, user=user)
beeswax/impala_beeswax.py:357: in execute_query_async
handle = self.__do_rpc(lambda: self.imp_service.query(query,))
beeswax/impala_beeswax.py:520: in __do_rpc
raise ImpalaBeeswaxException(self.__build_error_message(b), b)
E   ImpalaBeeswaxException: ImpalaBeeswaxException:
EINNER EXCEPTION: 
EMESSAGE: AnalysisException: The specified cache pool does not exist: 
testPool {code}

  was:
Saw test failures in internal CORE builds:

custom_cluster.test_disable_features.TestDisableFeatures.test_disable_incremental_metadata_updates[protocol:
 beeswax | exec_option: \{'batch_size': 0, 'num_nodes': 0, 
'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
text/none-unique_database0]
{code:java}
custom_cluster/test_disable_features.py:45: in 
test_disable_incremental_metadata_updates
use_db=unique_database, multiple_impalad=True)
common/impala_test_suite.py:662: in run_test_case
result = exec_fn(query, user=test_section.get('USER', '').strip() or None)
common/impala_test_suite.py:600: in __exec_in_impala
result = self.__execute_query(target_impalad_client, query, user=user)
common/impala_test_suite.py:909: in __execute_query
return impalad_client.execute(query, user=user)
common/impala_connection.py:205: in execute
return self.__beeswax_client.execute(sql_stmt, user=user)
beeswax/impala_beeswax.py:187: in execute
handle = self.__execute_query(query_string.strip(), user=user)
beeswax/impala_beeswax.py:363: in __execute_query
handle = self.execute_query_async(query_string, user=user)
beeswax/impala_beeswax.py:357: in execute_query_async
handle = self.__do_rpc(lambda: self.imp_service.query(query,))
beeswax/impala_beeswax.py:520: in __do_rpc
raise ImpalaBeeswaxException(self.__build_error_message(b), b)
E   ImpalaBeeswaxException: ImpalaBeeswaxException:
EINNER EXCEPTION: 
EMESSAGE: AnalysisException: The specified cache pool does not exist: 
testPool {code}


> TestDisableFeatures.test_disable_incremental_metadata_updates fails
> ---
>
> Key: IMPALA-10256
> URL: https://issues.apache.org/jira/browse/IMPALA-10256
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Blocker
>  Labels: broken-build
>
> Saw test failures in internal CORE S3 builds:
> custom_cluster.test_disable_features.TestDisableFeatures.test_disable_incremental_metadata_updates[protocol:
>  beeswax | exec_option: \{'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> text/none-unique_database0]
> {code:java}
> custom_cluster/test_disable_features.py:45: in 
> test_disable_incremental_metadata_updates
> use_db=unique_database, multiple_impalad=True)
> common/impala_test_suite.py:662: in run_test_case
> result = exec_fn(query, user=test_section.get('USER', '').strip() or None)
> common/impala_test_suite.py:600: in __exec_in_impala
> result = self.__execute_query(target_impalad_client, query, user=user)
> common/impala_test_suite.py:909: in __execute_query
> return impalad_client.execute(query, user=user)
> common/impala_connection.py:205: in execute
> return self.__beeswax_client.execute(sql_stmt,

[jira] [Created] (IMPALA-10265) Doc about enable_incremental_metadata_updates flag

2020-10-19 Thread Quanlong Huang (Jira)

Quanlong Huang created IMPALA-10265:
---

 Summary: Doc about enable_incremental_metadata_updates flag
 Key: IMPALA-10265
 URL: https://issues.apache.org/jira/browse/IMPALA-10265
 Project: IMPALA
  Issue Type: Documentation
Reporter: Quanlong Huang


IMPALA-10113 adds a feature flag to turn off the incremental metadata update 
feature which defaults to be turned on. This flag decides how catalogd 
propagates metadata updates to the catalog topic.

If enable_incremental_metadata_updates is true, catalogd will send metadata 
updates in partition granularity. So a table that just has one partition 
changed will only have an update on that partition. This reduces the size of 
the metadata that needs to be sent from the catalogd.

If enable_incremental_metadata_updates is false, that's the legacy behavior. 
Catalogd will send metadata updates in table granularity. So a table that just 
has one partition changed will still have an update for the whole table object. 
Catalogd still sends the whole table thrift object to the catalog topic.

Note that this is a catalogd-only flag. Don't need to be set on impalads since 
impalad can process both incremental and non-incremental catalog updates.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-10113) Add feature flag for incremental metadata update

2020-10-19 Thread Quanlong Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang resolved IMPALA-10113.
-
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Add feature flag for incremental metadata update
> 
>
> Key: IMPALA-10113
> URL: https://issues.apache.org/jira/browse/IMPALA-10113
> Project: IMPALA
>  Issue Type: New Feature
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
> Fix For: Impala 4.0
>
>
> Now catalogd sends metadata updates in partition level. A feature flag to 
> switch back to the original behavior (i.e. sending metadata updates in table 
> level) will ease performance tests like IMPALA-10079.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-9884) TestAdmissionControllerStress.test_mem_limit failing occasionally

2020-10-19 Thread Tim Armstrong (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-9884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217170#comment-17217170
 ] 

Tim Armstrong commented on IMPALA-9884:
---

Grabbed the coordinator log  
[^impalad.impala-ec2-centos74-m5-4xlarge-ondemand-1925.vpc.cloudera.com.jenkins.log.INFO.20201017-06.23933.gz]
 . Here's the JUnit XML output too
{noformat}
custom_cluster/test_admission_controller.py:1856: in 
test_mem_limit
{request_pool: self.pool_name, mem_limit: 
query_mem_limit})
custom_cluster/test_admission_controller.py:1712: in run_admission_test
assert metric_deltas[dequeued] == 0,\
E   AssertionError: Queued queries should not run until others are made to 
finish
E   assert 1 == 001:11:15 MainThread: Found 0 impalad/0 
statestored/0 catalogd process(es)
01:11:15 MainThread: Starting State Store logging to 
/data/jenkins/workspace/impala-cdpd-master-staging-exhaustive-release/repos/Impala/logs/custom_cluster_tests/statestored.INFO
01:11:16 MainThread: Starting Catalog Service logging to 
/data/jenkins/workspace/impala-cdpd-master-staging-exhaustive-release/repos/Impala/logs/custom_cluster_tests/catalogd.INFO
01:11:16 MainThread: Starting Impala Daemon logging to 
/data/jenkins/workspace/impala-cdpd-master-staging-exhaustive-release/repos/Impala/logs/custom_cluster_tests/impalad.INFO
01:11:16 MainThread: Starting Impala Daemon logging to 
/data/jenkins/workspace/impala-cdpd-master-staging-exhaustive-release/repos/Impala/logs/custom_cluster_tests/impalad_node1.INFO
01:11:16 MainThread: Starting Impala Daemon logging to 
/data/jenkins/workspace/impala-cdpd-master-staging-exhaustive-release/repos/Impala/logs/custom_cluster_tests/impalad_node2.INFO
01:11:19 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
01:11:19 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
01:11:19 MainThread: Getting num_known_live_backends from 
impala-ec2-centos74-m5-4xlarge-ondemand-1925.vpc.cloudera.com:25000
01:11:19 MainThread: Waiting for num_known_live_backends=3. Current value: 0
01:11:20 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
01:11:20 MainThread: Getting num_known_live_backends from 
impala-ec2-centos74-m5-4xlarge-ondemand-1925.vpc.cloudera.com:25000
01:11:20 MainThread: Waiting for num_known_live_backends=3. Current value: 0
01:11:21 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
01:11:21 MainThread: Getting num_known_live_backends from 
impala-ec2-centos74-m5-4xlarge-ondemand-1925.vpc.cloudera.com:25000
01:11:21 MainThread: num_known_live_backends has reached value: 3
01:11:21 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
01:11:21 MainThread: Getting num_known_live_backends from 
impala-ec2-centos74-m5-4xlarge-ondemand-1925.vpc.cloudera.com:25001
01:11:21 MainThread: num_known_live_backends has reached value: 3
01:11:22 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
01:11:22 MainThread: Getting num_known_live_backends from 
impala-ec2-centos74-m5-4xlarge-ondemand-1925.vpc.cloudera.com:25002
01:11:22 MainThread: num_known_live_backends has reached value: 3
01:11:22 MainThread: Impala Cluster Running with 3 nodes (3 coordinators, 3 
executors).
DEBUG:impala_cluster:Found 3 impalad/1 statestored/1 catalogd process(es)
INFO:impala_service:Getting metric: statestore.live-backends from 
impala-ec2-centos74-m5-4xlarge-ondemand-1925.vpc.cloudera.com:25010
INFO:impala_service:Metric statestore.live-backends has reached 
desired value: 4
DEBUG:impala_service:Getting num_known_live_backends from 
impala-ec2-centos74-m5-4xlarge-ondemand-1925.vpc.cloudera.com:25000
INFO:impala_service:num_known_live_backends has reached value: 3
DEBUG:impala_service:Getting num_known_live_backends from 
impala-ec2-centos74-m5-4xlarge-ondemand-1925.vpc.cloudera.com:25001
INFO:impala_service:num_known_live_backends has reached value: 3
DEBUG:impala_service:Getting num_known_live_backends from 
impala-ec2-centos74-m5-4xlarge-ondemand-1925.vpc.cloudera.com:25002
INFO:impala_service:num_known_live_backends has reached value: 3

{noformat}

> TestAdmissionControllerStress.test_mem_limit failing occasionally
> -
>
> Key: IMPALA-9884
> URL: https://issues.apache.org/jira/browse/IMPALA-9884
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 4.0
>Reporter: Vihang Karajgaonkar
>Assignee: Tim Armstrong
>Priority: Blocker
>  Labels: broken-build, flaky
> Attachments: 
> impalad.impala-ec2-centos74-m5-4xlarge-ondemand-1925.vpc.cloudera.com.jenkins.log.INFO.20201017-06.23933.gz
>
>
> Recently, I saw this test failing with the exception trace below. 
> {noformat}
> custom_cluster/test_admission_controller.py:1782: in test_mem_limit
> {'request_pool': self.pool_name, 'mem_limit':

[jira] [Updated] (IMPALA-9884) TestAdmissionControllerStress.test_mem_limit failing occasionally

2020-10-19 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-9884:
--
Attachment: 
impalad.impala-ec2-centos74-m5-4xlarge-ondemand-1925.vpc.cloudera.com.jenkins.log.INFO.20201017-06.23933.gz

> TestAdmissionControllerStress.test_mem_limit failing occasionally
> -
>
> Key: IMPALA-9884
> URL: https://issues.apache.org/jira/browse/IMPALA-9884
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 4.0
>Reporter: Vihang Karajgaonkar
>Assignee: Tim Armstrong
>Priority: Blocker
>  Labels: broken-build, flaky
> Attachments: 
> impalad.impala-ec2-centos74-m5-4xlarge-ondemand-1925.vpc.cloudera.com.jenkins.log.INFO.20201017-06.23933.gz
>
>
> Recently, I saw this test failing with the exception trace below. 
> {noformat}
> custom_cluster/test_admission_controller.py:1782: in test_mem_limit
> {'request_pool': self.pool_name, 'mem_limit': query_mem_limit})
> custom_cluster/test_admission_controller.py:1638: in run_admission_test
> assert metric_deltas['dequeued'] == 0,\
> E   AssertionError: Queued queries should not run until others are made to 
> finish
> E   assert 1 == 0
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-10264) Add ability to build docker images for a different Linux distribution

2020-10-19 Thread Joe McDonnell (Jira)

Joe McDonnell created IMPALA-10264:
--

 Summary: Add ability to build docker images for a different Linux 
distribution
 Key: IMPALA-10264
 URL: https://issues.apache.org/jira/browse/IMPALA-10264
 Project: IMPALA
  Issue Type: Improvement
  Components: Infrastructure
Affects Versions: Impala 4.0
Reporter: Joe McDonnell


Currently, the build for Impala's docker images builds on the local host OS and 
then makes the binaries available in the docker build context. The docker image 
thus needs to run the same Linux distribution and version as the host. Ubuntu 
16 docker images should be built on an Ubuntu 16 host. Centos 7 docker images 
should be built on a Centos 7 host.

It would be useful to be able to build docker containers for a different Linux 
distribution. Developers often develop on Ubuntu 16, but it would be useful to 
be able to build Centos 7 docker images to use in other contexts.

To do this, we could build the binaries inside a docker container of a matching 
version as the docker image we want to produce. This would construct the docker 
build context, and the binaries would always match. An Ubuntu 16 machine could 
produce Centos 7 docker containers. Hypothetically, this could also use QEMU to 
build ARM docker containers on an x86 host.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-10263) Native toolchain support for cross compiling to produce ARM binaries

2020-10-19 Thread Joe McDonnell (Jira)

Joe McDonnell created IMPALA-10263:
--

 Summary: Native toolchain support for cross compiling to produce 
ARM binaries
 Key: IMPALA-10263
 URL: https://issues.apache.org/jira/browse/IMPALA-10263
 Project: IMPALA
  Issue Type: Improvement
  Components: Infrastructure
Affects Versions: Impala 4.0
Reporter: Joe McDonnell


With support for ARM added to upstream Impala, it would be useful to be able to 
build the ARM native toolchain from an x86 machine. This would allow it to be 
built and uploaded to s3 using the same infrastructure that currently builds 
the x86 binaries. Having the ARM binaries in s3 opens up possibilities to 
incorporate an ARM build into GVO.

QEMU has the ability to emulate ARM on an x86 machine, and it is surprisingly 
simple to get an ARM docker container running on x86. This article provides 
some depth:

[https://ownyourbits.com/2018/06/27/running-and-building-arm-docker-containers-in-x86/]

The basic story is that the steps are:
 # Install qemu-user/qemu-user-static (which installs appropriate hooks in the 
kernel)
 # Make qemu-aarch64-static available in the context for building the docker 
container
 # In the Dockerfile, copy qemu-aarch64-static into /usr/bin

For example, here is the start of the ubuntu1804 Dockerfile:
{noformat}
FROM arm64v8/ubuntu:18.04

COPY qemu-aarch64-static /usr/bin/qemu-aarch64-static

# The rest of the dockerfile{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-10102) Impalad crashses when writting a parquet file with large rows

2020-10-19 Thread Abhishek Rawat (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-10102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217155#comment-17217155
 ] 

Abhishek Rawat commented on IMPALA-10102:
-

I *believe*, in this case, the issue is that query "mem_limit" is set to higher 
memory than what's available in the system for Impala. And in such a case, 
memory allocation fails. Impala will not cancel the query since the query 
hasn't exceeded the query mem_limit. Also, the process mem_limit is not 
exceeded. In such a scenario it will be difficult to handle memory allocation 
failure, since it would pretty much require putting a check for every single 
memory allocation. And so it looked like a bad configuration problem. If both 
query/process mem_limits are configured properly then query cleanly fails 
without crashing the impalad.

> Impalad crashses when writting a parquet file with large rows
> -
>
> Key: IMPALA-10102
> URL: https://issues.apache.org/jira/browse/IMPALA-10102
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Quanlong Huang
>Assignee: Yida Wu
>Priority: Critical
>  Labels: crash
>
> Encountered a crash when testing following queries on my local branch:
> {code:sql}
> create table bigstrs3 stored as parquet as
> select *, repeat(uuid(), cast(random() * 20 as int)) as bigstr
> from functional.alltypes
> limit 1000;
> # Length of uuid() is 36. So the max row size is 7,200,000.
> set MAX_ROW_SIZE=8m;
> create table my_str_group stored as parquet as
>   select group_concat(string_col) as ss, bigstr
>   from bigstrs3 group by bigstr;
> create table my_cnt stored as parquet as
>   select count(*) as cnt, bigstr
>   from bigstrs3 group by bigstr;
> {code}
> The crash stacktrace:
> {code}
> Crash reason:  SIGSEGV
> Crash address: 0x0
> Process uptime: not available
> Thread 336 (crashed)
>  0  libc-2.23.so + 0x14e10b
>  1  impalad!snappy::UncheckedByteArraySink::Append(char const*, unsigned 
> long) [clone .localalias.0] + 0x1a 
>  2  impalad!snappy::Compress(snappy::Source*, snappy::Sink*) + 0xb1 
>  3  impalad!snappy::RawCompress(char const*, unsigned long, char*, unsigned 
> long*) + 0x51 
>  4  impalad!impala::SnappyCompressor::ProcessBlock(bool, long, unsigned char 
> const*, long*, unsigned char**) [compress.cc : 295 + 0x24]
>  5  impalad!impala::Codec::ProcessBlock32(bool, int, unsigned char const*, 
> int*, unsigned char**) [codec.cc : 211 + 0x41]
>  6  impalad!impala::HdfsParquetTableWriter::BaseColumnWriter::Flush(long*, 
> long*, long*) [hdfs-parquet-table-writer.cc : 775 + 0x56]
>  7  impalad!impala::HdfsParquetTableWriter::FlushCurrentRowGroup() 
> [hdfs-parquet-table-writer.cc : 1330 + 0x60]
>  8  impalad!impala::HdfsParquetTableWriter::Finalize() 
> [hdfs-parquet-table-writer.cc : 1297 + 0x19]
>  9  
> impalad!impala::HdfsTableSink::FinalizePartitionFile(impala::RuntimeState*, 
> impala::OutputPartition*) [hdfs-table-sink.cc : 652 + 0x2e]
> 10  
> impalad!impala::HdfsTableSink::WriteRowsToPartition(impala::RuntimeState*, 
> impala::RowBatch*, std::pair std::default_delete >, std::vector std::allocator > >*) [hdfs-table-sink.cc : 282 + 0x21]
> 11  impalad!impala::HdfsTableSink::Send(impala::RuntimeState*, 
> impala::RowBatch*) [hdfs-table-sink.cc : 621 + 0x2e]
> 12  impalad!impala::FragmentInstanceState::ExecInternal() 
> [fragment-instance-state.cc : 422 + 0x58]
> 13  impalad!impala::FragmentInstanceState::Exec() [fragment-instance-state.cc 
> : 106 + 0x16]
> 14  impalad!impala::QueryState::ExecFInstance(impala::FragmentInstanceState*) 
> [query-state.cc : 836 + 0x19]
> 15  impalad!impala::QueryState::StartFInstances()::{lambda()#1}::operator()() 
> const + 0x26 
> 16  
> impalad!boost::detail::function::void_function_obj_invoker0,
>  void>::invoke [function_template.hpp : 159 + 0xc] 
> 17  impalad!boost::function0::operator()() const [function_template.hpp 
> : 770 + 0x1d]
> 18  impalad!impala::Thread::SuperviseThread(std::__cxx11::basic_string std::char_traits, std::allocator > const&, 
> std::__cxx11::basic_string, std::allocator 
> > const&, boost::function, impala::ThreadDebugInfo const*, 
> impala::Promise*) [thread.cc : 360 + 0xf]
> 19  impalad!void 
> boost::_bi::list5 std::char_traits, std::allocator > >, 
> boost::_bi::value, 
> std::allocator > >, boost::_bi::value >, 
> boost::_bi::value, 
> boost::_bi::value*> 
> >::operator() std::char_traits, std::allocator > const&, 
> std::__cxx11::basic_string, std::allocator 
> > const&, boost::function, impala::ThreadDebugInfo const*, 
> impala::Promise*), 
> boost::_bi::list0>(boost::_bi::type, void 
> (*&)(std::__cxx11::basic_string, 
> std::allocator > const&, std::__cxx11::basic_string std::char_traits, std::allocator > const&, boost::function ()>, impala::ThreadDebugInfo const*, impala::Promise

[jira] [Reopened] (IMPALA-10102) Impalad crashses when writting a parquet file with large rows

2020-10-19 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reopened IMPALA-10102:


[~baggio000] Impala still shouldn't crash if there isn't enough memory, it 
should return a clean error to the user and not disrupt other running queries. 
So I'm going to reopen this.

> Impalad crashses when writting a parquet file with large rows
> -
>
> Key: IMPALA-10102
> URL: https://issues.apache.org/jira/browse/IMPALA-10102
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Quanlong Huang
>Assignee: Yida Wu
>Priority: Critical
>  Labels: crash
>
> Encountered a crash when testing following queries on my local branch:
> {code:sql}
> create table bigstrs3 stored as parquet as
> select *, repeat(uuid(), cast(random() * 20 as int)) as bigstr
> from functional.alltypes
> limit 1000;
> # Length of uuid() is 36. So the max row size is 7,200,000.
> set MAX_ROW_SIZE=8m;
> create table my_str_group stored as parquet as
>   select group_concat(string_col) as ss, bigstr
>   from bigstrs3 group by bigstr;
> create table my_cnt stored as parquet as
>   select count(*) as cnt, bigstr
>   from bigstrs3 group by bigstr;
> {code}
> The crash stacktrace:
> {code}
> Crash reason:  SIGSEGV
> Crash address: 0x0
> Process uptime: not available
> Thread 336 (crashed)
>  0  libc-2.23.so + 0x14e10b
>  1  impalad!snappy::UncheckedByteArraySink::Append(char const*, unsigned 
> long) [clone .localalias.0] + 0x1a 
>  2  impalad!snappy::Compress(snappy::Source*, snappy::Sink*) + 0xb1 
>  3  impalad!snappy::RawCompress(char const*, unsigned long, char*, unsigned 
> long*) + 0x51 
>  4  impalad!impala::SnappyCompressor::ProcessBlock(bool, long, unsigned char 
> const*, long*, unsigned char**) [compress.cc : 295 + 0x24]
>  5  impalad!impala::Codec::ProcessBlock32(bool, int, unsigned char const*, 
> int*, unsigned char**) [codec.cc : 211 + 0x41]
>  6  impalad!impala::HdfsParquetTableWriter::BaseColumnWriter::Flush(long*, 
> long*, long*) [hdfs-parquet-table-writer.cc : 775 + 0x56]
>  7  impalad!impala::HdfsParquetTableWriter::FlushCurrentRowGroup() 
> [hdfs-parquet-table-writer.cc : 1330 + 0x60]
>  8  impalad!impala::HdfsParquetTableWriter::Finalize() 
> [hdfs-parquet-table-writer.cc : 1297 + 0x19]
>  9  
> impalad!impala::HdfsTableSink::FinalizePartitionFile(impala::RuntimeState*, 
> impala::OutputPartition*) [hdfs-table-sink.cc : 652 + 0x2e]
> 10  
> impalad!impala::HdfsTableSink::WriteRowsToPartition(impala::RuntimeState*, 
> impala::RowBatch*, std::pair std::default_delete >, std::vector std::allocator > >*) [hdfs-table-sink.cc : 282 + 0x21]
> 11  impalad!impala::HdfsTableSink::Send(impala::RuntimeState*, 
> impala::RowBatch*) [hdfs-table-sink.cc : 621 + 0x2e]
> 12  impalad!impala::FragmentInstanceState::ExecInternal() 
> [fragment-instance-state.cc : 422 + 0x58]
> 13  impalad!impala::FragmentInstanceState::Exec() [fragment-instance-state.cc 
> : 106 + 0x16]
> 14  impalad!impala::QueryState::ExecFInstance(impala::FragmentInstanceState*) 
> [query-state.cc : 836 + 0x19]
> 15  impalad!impala::QueryState::StartFInstances()::{lambda()#1}::operator()() 
> const + 0x26 
> 16  
> impalad!boost::detail::function::void_function_obj_invoker0,
>  void>::invoke [function_template.hpp : 159 + 0xc] 
> 17  impalad!boost::function0::operator()() const [function_template.hpp 
> : 770 + 0x1d]
> 18  impalad!impala::Thread::SuperviseThread(std::__cxx11::basic_string std::char_traits, std::allocator > const&, 
> std::__cxx11::basic_string, std::allocator 
> > const&, boost::function, impala::ThreadDebugInfo const*, 
> impala::Promise*) [thread.cc : 360 + 0xf]
> 19  impalad!void 
> boost::_bi::list5 std::char_traits, std::allocator > >, 
> boost::_bi::value, 
> std::allocator > >, boost::_bi::value >, 
> boost::_bi::value, 
> boost::_bi::value*> 
> >::operator() std::char_traits, std::allocator > const&, 
> std::__cxx11::basic_string, std::allocator 
> > const&, boost::function, impala::ThreadDebugInfo const*, 
> impala::Promise*), 
> boost::_bi::list0>(boost::_bi::type, void 
> (*&)(std::__cxx11::basic_string, 
> std::allocator > const&, std::__cxx11::basic_string std::char_traits, std::allocator > const&, boost::function ()>, impala::ThreadDebugInfo const*, impala::Promise (impala::PromiseMode)0>*), boost::_bi::list0&, int) [bind.hpp : 531 + 0x15]
> 20  impalad!boost::_bi::bind_t (*)(std::__cxx11::basic_string, 
> std::allocator > const&, std::__cxx11::basic_string std::char_traits, std::allocator > const&, boost::function ()>, impala::ThreadDebugInfo const*, impala::Promise (impala::PromiseMode)0>*), 
> boost::_bi::list5 std::char_traits, std::allocator > >, 
> boost::_bi::value, 
> std::allocator > >, boost::_bi::value >, 
> boost::_bi::value, 
> boost::_bi::value*> > 
>

[jira] [Resolved] (IMPALA-10226) Change buildall.sh -notests to invoke a single Make target

2020-10-19 Thread Joe McDonnell (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-10226.

 Fix Version/s: Impala 4.0
Target Version: Impala 4.0
  Assignee: Joe McDonnell
Resolution: Fixed

> Change buildall.sh -notests to invoke a single Make target
> --
>
> Key: IMPALA-10226
> URL: https://issues.apache.org/jira/browse/IMPALA-10226
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 4.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
> Fix For: Impala 4.0
>
>
> Currently, running "buildall.sh -notests" boils down to invoking make with 
> multiple targets:
>  
> {noformat}
>   if [[ $BUILD_TESTS -eq 0 ]]; then
> # Specify all the non-test targets
> MAKE_TARGETS="impalad statestored catalogd fesupport loggingsupport 
> ImpalaUdf \
> udasample udfsample"
> if (( build_independent_targets )); then
>   MAKE_TARGETS+=" cscope fe tarballs"
> fi
>   fi
>   ${MAKE_CMD} -j${IMPALA_BUILD_THREADS:-4} ${IMPALA_MAKE_FLAGS} 
> ${MAKE_TARGETS}{noformat}
> Based on the build output, it looks like each make target is invoked 
> individually (with the commands underneath going parallel). This is 
> particularly a problem for impalad (which needs to build the backend) and fe. 
> We want these to run simultaneously, and this limitation prevents that.
> We should create a single target that builds all the things needing to be 
> built for -notests. Then, this will be invoking one target and allowing all 
> the pieces go parallel.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-10226) Change buildall.sh -notests to invoke a single Make target

2020-10-19 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217108#comment-17217108
 ] 

ASF subversion and git services commented on IMPALA-10226:
--

Commit e76010d62889aaa2b04f6cfea9bb74b829877eb9 in impala's branch 
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=e76010d ]

IMPALA-10226: Change buildall.sh -notests to invoke a single Make target

This is a small cleanup to add specific targets in CMake for
buildall.sh -notests to invoke. Previously, it ran multiple
targets like:
make target1 target2 target3 ...
In hand tests, make builds each target separately, so it is
unable to overlap the builds of the multiple targets. Pushing
it into CMake simplifies the code and allows the targets to
build simultaneously.

Testing:
 - Ran buildall.sh -notests

Change-Id: Id881d6f481b32ba82501b16bada14b6630ba32d2
Reviewed-on: http://gerrit.cloudera.org:8080/16605
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins 


> Change buildall.sh -notests to invoke a single Make target
> --
>
> Key: IMPALA-10226
> URL: https://issues.apache.org/jira/browse/IMPALA-10226
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 4.0
>Reporter: Joe McDonnell
>Priority: Major
>
> Currently, running "buildall.sh -notests" boils down to invoking make with 
> multiple targets:
>  
> {noformat}
>   if [[ $BUILD_TESTS -eq 0 ]]; then
> # Specify all the non-test targets
> MAKE_TARGETS="impalad statestored catalogd fesupport loggingsupport 
> ImpalaUdf \
> udasample udfsample"
> if (( build_independent_targets )); then
>   MAKE_TARGETS+=" cscope fe tarballs"
> fi
>   fi
>   ${MAKE_CMD} -j${IMPALA_BUILD_THREADS:-4} ${IMPALA_MAKE_FLAGS} 
> ${MAKE_TARGETS}{noformat}
> Based on the build output, it looks like each make target is invoked 
> individually (with the commands underneath going parallel). This is 
> particularly a problem for impalad (which needs to build the backend) and fe. 
> We want these to run simultaneously, and this limitation prevents that.
> We should create a single target that builds all the things needing to be 
> built for -notests. Then, this will be invoking one target and allowing all 
> the pieces go parallel.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-9954) RpcRecvrTime can be negative

2020-10-19 Thread Sahil Takiar (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-9954.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> RpcRecvrTime can be negative
> 
>
> Key: IMPALA-9954
> URL: https://issues.apache.org/jira/browse/IMPALA-9954
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Riza Suminto
>Priority: Major
> Fix For: Impala 4.0
>
> Attachments: profile_034e7209bd98c96c_9a448dfc.txt
>
>
> Saw this on a recent version of master. Attached the full runtime profile.
> {code:java}
> KrpcDataStreamSender (dst_id=2):(Total: 9.863ms, non-child: 3.185ms, 
> % non-child: 32.30%)
>   ExecOption: Unpartitioned Sender Codegen Disabled: not needed
>- BytesSent (500.000ms): 0, 0
>- NetworkThroughput: (Avg: 4.34 MB/sec ; Min: 4.34 MB/sec ; Max: 
> 4.34 MB/sec ; Number of samples: 1)
>- RpcNetworkTime: (Avg: 3.562ms ; Min: 679.676us ; Max: 6.445ms ; 
> Number of samples: 2)
>- RpcRecvrTime: (Avg: -151281.000ns ; Min: -231485.000ns ; Max: 
> -71077.000ns ; Number of samples: 2)
>- EosSent: 1 (1)
>- PeakMemoryUsage: 416.00 B (416)
>- RowsSent: 100 (100)
>- RpcFailure: 0 (0)
>- RpcRetry: 0 (0)
>- SerializeBatchTime: 2.880ms
>- TotalBytesSent: 28.67 KB (29355)
>- UncompressedRowBatchSize: 69.29 KB (70950) {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-9954) RpcRecvrTime can be negative

2020-10-19 Thread Sahil Takiar (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar reassigned IMPALA-9954:


Assignee: Riza Suminto

> RpcRecvrTime can be negative
> 
>
> Key: IMPALA-9954
> URL: https://issues.apache.org/jira/browse/IMPALA-9954
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Riza Suminto
>Priority: Major
> Attachments: profile_034e7209bd98c96c_9a448dfc.txt
>
>
> Saw this on a recent version of master. Attached the full runtime profile.
> {code:java}
> KrpcDataStreamSender (dst_id=2):(Total: 9.863ms, non-child: 3.185ms, 
> % non-child: 32.30%)
>   ExecOption: Unpartitioned Sender Codegen Disabled: not needed
>- BytesSent (500.000ms): 0, 0
>- NetworkThroughput: (Avg: 4.34 MB/sec ; Min: 4.34 MB/sec ; Max: 
> 4.34 MB/sec ; Number of samples: 1)
>- RpcNetworkTime: (Avg: 3.562ms ; Min: 679.676us ; Max: 6.445ms ; 
> Number of samples: 2)
>- RpcRecvrTime: (Avg: -151281.000ns ; Min: -231485.000ns ; Max: 
> -71077.000ns ; Number of samples: 2)
>- EosSent: 1 (1)
>- PeakMemoryUsage: 416.00 B (416)
>- RowsSent: 100 (100)
>- RpcFailure: 0 (0)
>- RpcRetry: 0 (0)
>- SerializeBatchTime: 2.880ms
>- TotalBytesSent: 28.67 KB (29355)
>- UncompressedRowBatchSize: 69.29 KB (70950) {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-10102) Impalad crashses when writting a parquet file with large rows

2020-10-19 Thread Yida Wu (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yida Wu resolved IMPALA-10102.
--
Resolution: Not A Bug

Set a proper impalad mem_limit option should avoid the crash happens.

> Impalad crashses when writting a parquet file with large rows
> -
>
> Key: IMPALA-10102
> URL: https://issues.apache.org/jira/browse/IMPALA-10102
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Quanlong Huang
>Assignee: Yida Wu
>Priority: Critical
>  Labels: crash
>
> Encountered a crash when testing following queries on my local branch:
> {code:sql}
> create table bigstrs3 stored as parquet as
> select *, repeat(uuid(), cast(random() * 20 as int)) as bigstr
> from functional.alltypes
> limit 1000;
> # Length of uuid() is 36. So the max row size is 7,200,000.
> set MAX_ROW_SIZE=8m;
> create table my_str_group stored as parquet as
>   select group_concat(string_col) as ss, bigstr
>   from bigstrs3 group by bigstr;
> create table my_cnt stored as parquet as
>   select count(*) as cnt, bigstr
>   from bigstrs3 group by bigstr;
> {code}
> The crash stacktrace:
> {code}
> Crash reason:  SIGSEGV
> Crash address: 0x0
> Process uptime: not available
> Thread 336 (crashed)
>  0  libc-2.23.so + 0x14e10b
>  1  impalad!snappy::UncheckedByteArraySink::Append(char const*, unsigned 
> long) [clone .localalias.0] + 0x1a 
>  2  impalad!snappy::Compress(snappy::Source*, snappy::Sink*) + 0xb1 
>  3  impalad!snappy::RawCompress(char const*, unsigned long, char*, unsigned 
> long*) + 0x51 
>  4  impalad!impala::SnappyCompressor::ProcessBlock(bool, long, unsigned char 
> const*, long*, unsigned char**) [compress.cc : 295 + 0x24]
>  5  impalad!impala::Codec::ProcessBlock32(bool, int, unsigned char const*, 
> int*, unsigned char**) [codec.cc : 211 + 0x41]
>  6  impalad!impala::HdfsParquetTableWriter::BaseColumnWriter::Flush(long*, 
> long*, long*) [hdfs-parquet-table-writer.cc : 775 + 0x56]
>  7  impalad!impala::HdfsParquetTableWriter::FlushCurrentRowGroup() 
> [hdfs-parquet-table-writer.cc : 1330 + 0x60]
>  8  impalad!impala::HdfsParquetTableWriter::Finalize() 
> [hdfs-parquet-table-writer.cc : 1297 + 0x19]
>  9  
> impalad!impala::HdfsTableSink::FinalizePartitionFile(impala::RuntimeState*, 
> impala::OutputPartition*) [hdfs-table-sink.cc : 652 + 0x2e]
> 10  
> impalad!impala::HdfsTableSink::WriteRowsToPartition(impala::RuntimeState*, 
> impala::RowBatch*, std::pair std::default_delete >, std::vector std::allocator > >*) [hdfs-table-sink.cc : 282 + 0x21]
> 11  impalad!impala::HdfsTableSink::Send(impala::RuntimeState*, 
> impala::RowBatch*) [hdfs-table-sink.cc : 621 + 0x2e]
> 12  impalad!impala::FragmentInstanceState::ExecInternal() 
> [fragment-instance-state.cc : 422 + 0x58]
> 13  impalad!impala::FragmentInstanceState::Exec() [fragment-instance-state.cc 
> : 106 + 0x16]
> 14  impalad!impala::QueryState::ExecFInstance(impala::FragmentInstanceState*) 
> [query-state.cc : 836 + 0x19]
> 15  impalad!impala::QueryState::StartFInstances()::{lambda()#1}::operator()() 
> const + 0x26 
> 16  
> impalad!boost::detail::function::void_function_obj_invoker0,
>  void>::invoke [function_template.hpp : 159 + 0xc] 
> 17  impalad!boost::function0::operator()() const [function_template.hpp 
> : 770 + 0x1d]
> 18  impalad!impala::Thread::SuperviseThread(std::__cxx11::basic_string std::char_traits, std::allocator > const&, 
> std::__cxx11::basic_string, std::allocator 
> > const&, boost::function, impala::ThreadDebugInfo const*, 
> impala::Promise*) [thread.cc : 360 + 0xf]
> 19  impalad!void 
> boost::_bi::list5 std::char_traits, std::allocator > >, 
> boost::_bi::value, 
> std::allocator > >, boost::_bi::value >, 
> boost::_bi::value, 
> boost::_bi::value*> 
> >::operator() std::char_traits, std::allocator > const&, 
> std::__cxx11::basic_string, std::allocator 
> > const&, boost::function, impala::ThreadDebugInfo const*, 
> impala::Promise*), 
> boost::_bi::list0>(boost::_bi::type, void 
> (*&)(std::__cxx11::basic_string, 
> std::allocator > const&, std::__cxx11::basic_string std::char_traits, std::allocator > const&, boost::function ()>, impala::ThreadDebugInfo const*, impala::Promise (impala::PromiseMode)0>*), boost::_bi::list0&, int) [bind.hpp : 531 + 0x15]
> 20  impalad!boost::_bi::bind_t (*)(std::__cxx11::basic_string, 
> std::allocator > const&, std::__cxx11::basic_string std::char_traits, std::allocator > const&, boost::function ()>, impala::ThreadDebugInfo const*, impala::Promise (impala::PromiseMode)0>*), 
> boost::_bi::list5 std::char_traits, std::allocator > >, 
> boost::_bi::value, 
> std::allocator > >, boost::_bi::value >, 
> boost::_bi::value, 
> boost::_bi::value*> > 
> >::operator()() [bind.hpp : 1222 + 0x22]
> 21  impalad!boost::detail::thread_data (*)(std::__cxx11::basic_string,

[jira] [Commented] (IMPALA-9954) RpcRecvrTime can be negative

2020-10-19 Thread Riza Suminto (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-9954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217070#comment-17217070
 ] 

Riza Suminto commented on IMPALA-9954:
--

Hi [~stakiar], I have resolved IMPALA-10220. That Jira also include fix to 
acquire lock first before computing elapsed time. I think this Jira can be 
closed as well.

> RpcRecvrTime can be negative
> 
>
> Key: IMPALA-9954
> URL: https://issues.apache.org/jira/browse/IMPALA-9954
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Priority: Major
> Attachments: profile_034e7209bd98c96c_9a448dfc.txt
>
>
> Saw this on a recent version of master. Attached the full runtime profile.
> {code:java}
> KrpcDataStreamSender (dst_id=2):(Total: 9.863ms, non-child: 3.185ms, 
> % non-child: 32.30%)
>   ExecOption: Unpartitioned Sender Codegen Disabled: not needed
>- BytesSent (500.000ms): 0, 0
>- NetworkThroughput: (Avg: 4.34 MB/sec ; Min: 4.34 MB/sec ; Max: 
> 4.34 MB/sec ; Number of samples: 1)
>- RpcNetworkTime: (Avg: 3.562ms ; Min: 679.676us ; Max: 6.445ms ; 
> Number of samples: 2)
>- RpcRecvrTime: (Avg: -151281.000ns ; Min: -231485.000ns ; Max: 
> -71077.000ns ; Number of samples: 2)
>- EosSent: 1 (1)
>- PeakMemoryUsage: 416.00 B (416)
>- RowsSent: 100 (100)
>- RpcFailure: 0 (0)
>- RpcRetry: 0 (0)
>- SerializeBatchTime: 2.880ms
>- TotalBytesSent: 28.67 KB (29355)
>- UncompressedRowBatchSize: 69.29 KB (70950) {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-10220) Min value of RpcNetworkTime can be negative

2020-10-19 Thread Riza Suminto (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Riza Suminto resolved IMPALA-10220.
---
Fix Version/s: Impala 4.0
   Resolution: Fixed

Closing this Jira since the patch has been merged.
cc: [~stakiar]

> Min value of RpcNetworkTime can be negative
> ---
>
> Key: IMPALA-10220
> URL: https://issues.apache.org/jira/browse/IMPALA-10220
> Project: IMPALA
>  Issue Type: Bug
>  Components: Distributed Exec
>Affects Versions: Impala 3.4.0
>Reporter: Riza Suminto
>Assignee: Riza Suminto
>Priority: Major
> Fix For: Impala 4.0
>
>
> There is a bug in function 
> KrpcDataStreamSender::Channel::EndDataStreamCompleteCb(), particularly in 
> this line:
> [https://github.com/apache/impala/blob/d453d52/be/src/runtime/krpc-data-stream-sender.cc#L635]
> network_time_ns should be computed using eos_rsp_.receiver_latency_ns() 
> instead of resp_.receiver_latency_ns().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-9954) RpcRecvrTime can be negative

2020-10-19 Thread Sahil Takiar (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-9954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217063#comment-17217063
 ] 

Sahil Takiar commented on IMPALA-9954:
--

[~rizaon] so if understand correctly, the remaining work to be done here is to 
add proper locking of {{rpc_start_time_ns_}} in 
{{be/src/runtime/krpc-data-stream-sender.cc}}

> RpcRecvrTime can be negative
> 
>
> Key: IMPALA-9954
> URL: https://issues.apache.org/jira/browse/IMPALA-9954
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Priority: Major
> Attachments: profile_034e7209bd98c96c_9a448dfc.txt
>
>
> Saw this on a recent version of master. Attached the full runtime profile.
> {code:java}
> KrpcDataStreamSender (dst_id=2):(Total: 9.863ms, non-child: 3.185ms, 
> % non-child: 32.30%)
>   ExecOption: Unpartitioned Sender Codegen Disabled: not needed
>- BytesSent (500.000ms): 0, 0
>- NetworkThroughput: (Avg: 4.34 MB/sec ; Min: 4.34 MB/sec ; Max: 
> 4.34 MB/sec ; Number of samples: 1)
>- RpcNetworkTime: (Avg: 3.562ms ; Min: 679.676us ; Max: 6.445ms ; 
> Number of samples: 2)
>- RpcRecvrTime: (Avg: -151281.000ns ; Min: -231485.000ns ; Max: 
> -71077.000ns ; Number of samples: 2)
>- EosSent: 1 (1)
>- PeakMemoryUsage: 416.00 B (416)
>- RowsSent: 100 (100)
>- RpcFailure: 0 (0)
>- RpcRetry: 0 (0)
>- SerializeBatchTime: 2.880ms
>- TotalBytesSent: 28.67 KB (29355)
>- UncompressedRowBatchSize: 69.29 KB (70950) {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-10220) Min value of RpcNetworkTime can be negative

2020-10-19 Thread Sahil Takiar (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-10220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217061#comment-17217061
 ] 

Sahil Takiar commented on IMPALA-10220:
---

[~rizaon] can this be closed?

> Min value of RpcNetworkTime can be negative
> ---
>
> Key: IMPALA-10220
> URL: https://issues.apache.org/jira/browse/IMPALA-10220
> Project: IMPALA
>  Issue Type: Bug
>  Components: Distributed Exec
>Affects Versions: Impala 3.4.0
>Reporter: Riza Suminto
>Assignee: Riza Suminto
>Priority: Major
>
> There is a bug in function 
> KrpcDataStreamSender::Channel::EndDataStreamCompleteCb(), particularly in 
> this line:
> [https://github.com/apache/impala/blob/d453d52/be/src/runtime/krpc-data-stream-sender.cc#L635]
> network_time_ns should be computed using eos_rsp_.receiver_latency_ns() 
> instead of resp_.receiver_latency_ns().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Work started] (IMPALA-9918) HdfsOrcScanner crash on resolving columns

2020-10-19 Thread Csaba Ringhofer (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-9918 started by Csaba Ringhofer.
---
> HdfsOrcScanner crash on resolving columns
> -
>
> Key: IMPALA-9918
> URL: https://issues.apache.org/jira/browse/IMPALA-9918
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.0
> Environment: BUILD_TAG
> jenkins-impala-cdpd-master-core-ubsan-111
>Reporter: Wenzhe Zhou
>Assignee: Csaba Ringhofer
>Priority: Major
>  Labels: broken-build
> Attachments: 092420_backtraces.txt, backtraces.txt, backtraces.txt
>
>
> Core file generated in impala-cdpd-master-core-ubsan build
> Back traces:
> CORE: ./tests/core.1594000709.13971.impalad
> BINARY: ./be/build/latest/service/impalad
> Core was generated by 
> `/data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/build/lat'.
> Program terminated with signal SIGABRT, Aborted.
> #0 0x7f7a481851f7 in raise () from /lib64/libc.so.6
> To enable execution of this file add
>  add-auto-load-safe-path 
> /data0/jenkins/workspace/impala-cdpd-master-core-ubsan/Impala-Toolchain/gcc-4.9.2/lib64/libstdc++.so.6.0.20-gdb.py
> line to your configuration file "/var/lib/jenkins/.gdbinit".
> To completely disable this security protection add
>  set auto-load safe-path /
> line to your configuration file "/var/lib/jenkins/.gdbinit".
> For more information about this security protection see the
> "Auto-loading safe path" section in the GDB manual. E.g., run from the shell:
>  info "(gdb)Auto-loading safe path"
> #0 0x7f7a481851f7 in raise () from /lib64/libc.so.6
> #1 0x7f7a481868e8 in abort () from /lib64/libc.so.6
> #2 0x083401c4 in google::DumpStackTraceAndExit() ()
> #3 0x08336b5d in google::LogMessage::Fail() ()
> #4 0x08338402 in google::LogMessage::SendToLog() ()
> #5 0x08336537 in google::LogMessage::Flush() ()
> #6 0x08339afe in google::LogMessageFatal::~LogMessageFatal() ()
> #7 0x03215662 in impala::PrintPath (tbl_desc=..., path=...) at 
> /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/util/debug-util.cc:259
> #8 0x0370dfe9 in impala::HdfsOrcScanner::ResolveColumns 
> (this=0x14555c00, tuple_desc=..., selected_nodes=0x7f79722730a8, 
> pos_slots=0x7f7972273058) at 
> /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/exec/hdfs-orc-scanner.cc:436
> #9 0x037099dd in impala::HdfsOrcScanner::SelectColumns 
> (this=0x14555c00, tuple_desc=...) at 
> /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/exec/hdfs-orc-scanner.cc:456
> #10 0x03707688 in impala::HdfsOrcScanner::Open (this=0x14555c00, 
> context=0x7f7972274700) at 
> /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/exec/hdfs-orc-scanner.cc:221
> #11 0x035e0a48 in 
> impala::HdfsScanNodeBase::CreateAndOpenScannerHelper (this=0x1b1c7100, 
> partition=0x142f9d00, context=0x7f7972274700, scanner=0x7f79722746f8) at 
> /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/exec/hdfs-scan-node-base.cc:882
> #12 0x039df2e8 in impala::HdfsScanNode::ProcessSplit 
> (this=0x1b1c7100, filter_ctxs=..., expr_results_pool=0x7f7972274bd8, 
> scan_range=0x12a16c40, scanner_thread_reservation=0x7f7972274e18) at 
> /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/exec/hdfs-scan-node.cc:480
> #13 0x039ddd85 in impala::HdfsScanNode::ScannerThread 
> (this=0x1b1c7100, first_thread=true, scanner_thread_reservation=8192) at 
> /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/exec/hdfs-scan-node.cc:418
> #14 0x039e1980 in 
> impala::HdfsScanNode::ThreadTokenAvailableCb(impala::ThreadResourcePool*)::$_0::operator()()
>  const (this=0x7f7972275450) at 
> /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/exec/hdfs-scan-node.cc:339
> #15 0x039e13b2 in 
> boost::detail::function::void_function_obj_invoker0  void>::invoke(boost::detail::function::function_buffer&) 
> (function_obj_ptr=...) at 
> /data/jenkins/workspace/impala-cdpd-master-core-ubsan/Impala-Toolchain/boost-1.61.0-p2/include/boost/function/function_template.hpp:159
> #16 0x024d46f0 in boost::function0::operator() 
> (this=0x7f7972275448) at 
> /data/jenkins/workspace/impala-cdpd-master-core-ubsan/Impala-Toolchain/boost-1.61.0-p2/include/boost/function/function_template.hpp:770
> #17 0x03425ba7 in impala::Thread::SuperviseThread(std::string const&, 
> std::string const&, boost::function, impala::ThreadDebugInfo const*, 
> impala::Promise*) (name=..., category=..., 
> functor=..., parent_thread_info=0x7f797006d068, 
>

[jira] [Commented] (IMPALA-9918) HdfsOrcScanner crash on resolving columns

2020-10-19 Thread Csaba Ringhofer (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217040#comment-17217040
 ] 

Csaba Ringhofer commented on IMPALA-9918:
-

A fix is on review: https://gerrit.cloudera.org/#/c/16611/

Note that there is still no explanation why a function behind VLOG(3) << was 
called in a custom cluster test.


> HdfsOrcScanner crash on resolving columns
> -
>
> Key: IMPALA-9918
> URL: https://issues.apache.org/jira/browse/IMPALA-9918
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.0
> Environment: BUILD_TAG
> jenkins-impala-cdpd-master-core-ubsan-111
>Reporter: Wenzhe Zhou
>Assignee: Csaba Ringhofer
>Priority: Major
>  Labels: broken-build
> Attachments: 092420_backtraces.txt, backtraces.txt, backtraces.txt
>
>
> Core file generated in impala-cdpd-master-core-ubsan build
> Back traces:
> CORE: ./tests/core.1594000709.13971.impalad
> BINARY: ./be/build/latest/service/impalad
> Core was generated by 
> `/data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/build/lat'.
> Program terminated with signal SIGABRT, Aborted.
> #0 0x7f7a481851f7 in raise () from /lib64/libc.so.6
> To enable execution of this file add
>  add-auto-load-safe-path 
> /data0/jenkins/workspace/impala-cdpd-master-core-ubsan/Impala-Toolchain/gcc-4.9.2/lib64/libstdc++.so.6.0.20-gdb.py
> line to your configuration file "/var/lib/jenkins/.gdbinit".
> To completely disable this security protection add
>  set auto-load safe-path /
> line to your configuration file "/var/lib/jenkins/.gdbinit".
> For more information about this security protection see the
> "Auto-loading safe path" section in the GDB manual. E.g., run from the shell:
>  info "(gdb)Auto-loading safe path"
> #0 0x7f7a481851f7 in raise () from /lib64/libc.so.6
> #1 0x7f7a481868e8 in abort () from /lib64/libc.so.6
> #2 0x083401c4 in google::DumpStackTraceAndExit() ()
> #3 0x08336b5d in google::LogMessage::Fail() ()
> #4 0x08338402 in google::LogMessage::SendToLog() ()
> #5 0x08336537 in google::LogMessage::Flush() ()
> #6 0x08339afe in google::LogMessageFatal::~LogMessageFatal() ()
> #7 0x03215662 in impala::PrintPath (tbl_desc=..., path=...) at 
> /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/util/debug-util.cc:259
> #8 0x0370dfe9 in impala::HdfsOrcScanner::ResolveColumns 
> (this=0x14555c00, tuple_desc=..., selected_nodes=0x7f79722730a8, 
> pos_slots=0x7f7972273058) at 
> /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/exec/hdfs-orc-scanner.cc:436
> #9 0x037099dd in impala::HdfsOrcScanner::SelectColumns 
> (this=0x14555c00, tuple_desc=...) at 
> /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/exec/hdfs-orc-scanner.cc:456
> #10 0x03707688 in impala::HdfsOrcScanner::Open (this=0x14555c00, 
> context=0x7f7972274700) at 
> /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/exec/hdfs-orc-scanner.cc:221
> #11 0x035e0a48 in 
> impala::HdfsScanNodeBase::CreateAndOpenScannerHelper (this=0x1b1c7100, 
> partition=0x142f9d00, context=0x7f7972274700, scanner=0x7f79722746f8) at 
> /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/exec/hdfs-scan-node-base.cc:882
> #12 0x039df2e8 in impala::HdfsScanNode::ProcessSplit 
> (this=0x1b1c7100, filter_ctxs=..., expr_results_pool=0x7f7972274bd8, 
> scan_range=0x12a16c40, scanner_thread_reservation=0x7f7972274e18) at 
> /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/exec/hdfs-scan-node.cc:480
> #13 0x039ddd85 in impala::HdfsScanNode::ScannerThread 
> (this=0x1b1c7100, first_thread=true, scanner_thread_reservation=8192) at 
> /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/exec/hdfs-scan-node.cc:418
> #14 0x039e1980 in 
> impala::HdfsScanNode::ThreadTokenAvailableCb(impala::ThreadResourcePool*)::$_0::operator()()
>  const (this=0x7f7972275450) at 
> /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/exec/hdfs-scan-node.cc:339
> #15 0x039e13b2 in 
> boost::detail::function::void_function_obj_invoker0  void>::invoke(boost::detail::function::function_buffer&) 
> (function_obj_ptr=...) at 
> /data/jenkins/workspace/impala-cdpd-master-core-ubsan/Impala-Toolchain/boost-1.61.0-p2/include/boost/function/function_template.hpp:159
> #16 0x024d46f0 in boost::function0::operator() 
> (this=0x7f7972275448) at 
> /data/jenkins/workspace/impala-cdpd-master-core-ubsan/Impala-Toolchain/boost-1.61.0-p2/include/boost/function/function_template.hpp:770
> #17 0x03425ba7 in

[jira] [Commented] (IMPALA-9918) HdfsOrcScanner crash on resolving columns

2020-10-19 Thread Csaba Ringhofer (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17216998#comment-17216998
 ] 

Csaba Ringhofer commented on IMPALA-9918:
-

This seems to be a weird logging bug.

What is sure is that 
https://github.com/apache/impala/blob/master/be/src/exec/hdfs-orc-scanner.cc#L455
 is wrong - PrintPath always hits the dcheck if it is called, which should 
happen only with GLOG_v=3 logging. This is clearly a bug, as nearly all queries 
on ORC tables with hit this if GLOG_v=3. This is only a problem in DEBUG 
builds, PrintPath returns a sensible result in RELEASE.

The mysterious part is that we shouldn't  use GLOG_v=3. in the tests that 
broke, and a logs also don't seem to be so verbose. On lower verbosity the 
functions called during logging should not be invoked. I suspect this to be 
some kind of GLOG bug.

 

> HdfsOrcScanner crash on resolving columns
> -
>
> Key: IMPALA-9918
> URL: https://issues.apache.org/jira/browse/IMPALA-9918
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.0
> Environment: BUILD_TAG
> jenkins-impala-cdpd-master-core-ubsan-111
>Reporter: Wenzhe Zhou
>Assignee: Csaba Ringhofer
>Priority: Major
>  Labels: broken-build
> Attachments: 092420_backtraces.txt, backtraces.txt, backtraces.txt
>
>
> Core file generated in impala-cdpd-master-core-ubsan build
> Back traces:
> CORE: ./tests/core.1594000709.13971.impalad
> BINARY: ./be/build/latest/service/impalad
> Core was generated by 
> `/data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/build/lat'.
> Program terminated with signal SIGABRT, Aborted.
> #0 0x7f7a481851f7 in raise () from /lib64/libc.so.6
> To enable execution of this file add
>  add-auto-load-safe-path 
> /data0/jenkins/workspace/impala-cdpd-master-core-ubsan/Impala-Toolchain/gcc-4.9.2/lib64/libstdc++.so.6.0.20-gdb.py
> line to your configuration file "/var/lib/jenkins/.gdbinit".
> To completely disable this security protection add
>  set auto-load safe-path /
> line to your configuration file "/var/lib/jenkins/.gdbinit".
> For more information about this security protection see the
> "Auto-loading safe path" section in the GDB manual. E.g., run from the shell:
>  info "(gdb)Auto-loading safe path"
> #0 0x7f7a481851f7 in raise () from /lib64/libc.so.6
> #1 0x7f7a481868e8 in abort () from /lib64/libc.so.6
> #2 0x083401c4 in google::DumpStackTraceAndExit() ()
> #3 0x08336b5d in google::LogMessage::Fail() ()
> #4 0x08338402 in google::LogMessage::SendToLog() ()
> #5 0x08336537 in google::LogMessage::Flush() ()
> #6 0x08339afe in google::LogMessageFatal::~LogMessageFatal() ()
> #7 0x03215662 in impala::PrintPath (tbl_desc=..., path=...) at 
> /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/util/debug-util.cc:259
> #8 0x0370dfe9 in impala::HdfsOrcScanner::ResolveColumns 
> (this=0x14555c00, tuple_desc=..., selected_nodes=0x7f79722730a8, 
> pos_slots=0x7f7972273058) at 
> /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/exec/hdfs-orc-scanner.cc:436
> #9 0x037099dd in impala::HdfsOrcScanner::SelectColumns 
> (this=0x14555c00, tuple_desc=...) at 
> /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/exec/hdfs-orc-scanner.cc:456
> #10 0x03707688 in impala::HdfsOrcScanner::Open (this=0x14555c00, 
> context=0x7f7972274700) at 
> /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/exec/hdfs-orc-scanner.cc:221
> #11 0x035e0a48 in 
> impala::HdfsScanNodeBase::CreateAndOpenScannerHelper (this=0x1b1c7100, 
> partition=0x142f9d00, context=0x7f7972274700, scanner=0x7f79722746f8) at 
> /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/exec/hdfs-scan-node-base.cc:882
> #12 0x039df2e8 in impala::HdfsScanNode::ProcessSplit 
> (this=0x1b1c7100, filter_ctxs=..., expr_results_pool=0x7f7972274bd8, 
> scan_range=0x12a16c40, scanner_thread_reservation=0x7f7972274e18) at 
> /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/exec/hdfs-scan-node.cc:480
> #13 0x039ddd85 in impala::HdfsScanNode::ScannerThread 
> (this=0x1b1c7100, first_thread=true, scanner_thread_reservation=8192) at 
> /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/exec/hdfs-scan-node.cc:418
> #14 0x039e1980 in 
> impala::HdfsScanNode::ThreadTokenAvailableCb(impala::ThreadResourcePool*)::$_0::operator()()
>  const (this=0x7f7972275450) at 
> /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/exec/hdfs-scan-node.cc:339
> #15 0x039e13b2 in 
> boost::detail::function::void_function_obj_invoker0

[jira] [Created] (IMPALA-10262) Linux Packaging Support

2020-10-19 Thread Shant Hovsepian (Jira)

Shant Hovsepian created IMPALA-10262:


 Summary: Linux Packaging Support
 Key: IMPALA-10262
 URL: https://issues.apache.org/jira/browse/IMPALA-10262
 Project: IMPALA
  Issue Type: New Feature
Reporter: Shant Hovsepian


Would be nice if we could easily make installation packages from the Impala 
source code. For example RPM or DEB packages.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-10261) impala-minimal-hive-exec should include org/apache/hive/com/google/**

2020-10-19 Thread Joe McDonnell (Jira)

Joe McDonnell created IMPALA-10261:
--

 Summary: impala-minimal-hive-exec should include 
org/apache/hive/com/google/**
 Key: IMPALA-10261
 URL: https://issues.apache.org/jira/browse/IMPALA-10261
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Affects Versions: Impala 4.0
Reporter: Joe McDonnell


Hive started shading guava (com/google) with HIVE-22126, so 
impala-minimal-hive-exec should add org/apache/hive/com/google to its 
inclusions. This will allow Impala to build/work with newer versions of Hive 
that have this change. Leaving the existing com/google inclusion should let it 
work with both:

[https://github.com/apache/impala/blob/master/java/shaded-deps/hive-exec/pom.xml#L116]

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-10253) Improve query performance contains dict function

2020-10-19 Thread Tim Armstrong (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-10253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17216964#comment-17216964
 ] 

Tim Armstrong commented on IMPALA-10253:


You could try rewriting this query as a join as well, e.g.

{noformat}
SELECT events.*, d AS `event`
FROM rawdata.event_view_p7 events
   INNER JOIN event_dict d on  events.event_id = d.event_id
WHERE  d.event in ('SignUp', 'ViewProduct');
{noformat}

A bloom filter on event_id should be pushed from the join into the scan of 
events in this case and that probably performs better than the UDF dict lookup.

> Improve query performance contains dict function
> 
>
> Key: IMPALA-10253
> URL: https://issues.apache.org/jira/browse/IMPALA-10253
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Reporter: gaoxiaoqing
>Priority: Major
>
> we have the following parquet table:
> {code:java}
> CREATE EXTERNAL TABLE rawdata.event_ros_p1 (
>   event_id INT,
>   user_id BIGINT,
>   time TIMESTAMP,
>   p_abook_type STRING 
> )
> PARTITIONED BY (
>   day INT,
>   event_bucket INT
> )
> STORED AS PARQUET
> LOCATION 'hdfs://localhost:20500/sa/data/1/event'
> {code}
> the data show as following:
> ||event_id||user_id||time||p_abook_type||
> |1|-922235446862664806|2018-07-18 09:01:06.158|小说|
> |2|-922235446862664806|2018-07-19 09:01:06.158|小说|
> if we want remapping event_id to the real event name, we can realize dict 
> udf. the dict udf is defined as DICT(BIGINT expression, STRING path). first 
> parameter is the column, second parameter is hdfs path which store the 
> remapping rule like this:
> {code:java}
> 1,SignUp
> 2,ViewProduct{code}
> then build a view table which add the dict column on original table:
> {code:java}
> CREATE VIEW rawdata.event_external_view_p7 AS SELECT events.*, 
> dict(`event_id`, '/data/1/event.txt') AS `event` FROM rawdata.event_view_p7 
> events
> {code}
> If the query group by column has dict, the query is slower then group by 
> original column. when explain the sql, we found that each line data need 
> remapping in SCAN phase and AGGREGATE phase. 
> {code:java}
> select event, count(*) from event_external_view_p7 where event in ('SignUp', 
> 'ViewProduct') group by event;{code}
> {code:java}
> PLAN-ROOT SINK
> |
> 04:EXCHANGE [UNPARTITIONED]
> |
> 03:AGGREGATE [FINALIZE]
> |  output: count:merge(*)
> |  group by: event
> |  row-size=20B cardinality=0
> |
> 02:EXCHANGE [HASH(event)]
> |
> 01:AGGREGATE [STREAMING]
> |  output: count(*)
> |  group by: rawdata.DICT(event_id, '/data/1/event.txt')
> |  row-size=20B cardinality=0
> |
> 00:SCAN HDFS [rawdata.event_ros_p7_merge_offline]
> |  partitions=39/39 files=99 size=9.00GB
> |  predicates: rawdata.DICT(event_id, '/data/1/event.txt') IN ('SignUp', 
> 'ViewProduct')
> |  row-size=4B cardinality=unavailable
> {code}
> the idea is to modify plan, use original column in SCAN phase and AGGREGATE 
> phase and remapping the original column at last, the new plan like this:
> {code:java}
> PLAN-ROOT SINK
> |
> 05:SELECT [FINALIZE]
> |  output: dict(event_id)
> |  row-size=20B cardinality=0
> |
> 04:EXCHANGE [UNPARTITIONED]
> |
> 03:AGGREGATE [FINALIZE]
> |  output: count:merge(*)
> |  group by: event_id
> |  row-size=20B cardinality=0
> |
> 02:EXCHANGE [HASH(event)]
> |
> 01:AGGREGATE [STREAMING]
> |  output: count(*)
> |  group by: event_id
> |  row-size=20B cardinality=0
> |
> 00:SCAN HDFS [rawdata.event_ros_p7_merge_offline]
> |  partitions=39/39 files=99 size=9.00GB
> |  predicates: event_id IN (1, 2)
> |  row-size=4B cardinality=unavailable
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-10253) Improve query performance contains dict function

2020-10-19 Thread Tim Armstrong (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-10253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17216961#comment-17216961
 ] 

Tim Armstrong commented on IMPALA-10253:


[~gaoxiaoqing] if you created a UDF that did a reverse dictionary lookup you 
could rewrite the query something like this:

{noformat}
select event, count(*)
from event_external_view_p7
where event_id in (dict_reverse('SignUp', '/path/to/dict'), 
dict_reverse('ViewProduct', '/path/to/dict'))) group by event;
{noformat}

I think to do it automatically you'd need some additional metadata for the UDF 
that specified the inverse of the function, then to do an expression rewrite 
rule that detected a pattern that could exploit it.

We did try to do a similar rewrite for strings cases using the expression 
rewriter in the frontend - IMPALA-5929 (although had to revert that change 
because it had bugs).

Anyway I added this as a contributor if you want to assign this JIRA to 
yourself.

> Improve query performance contains dict function
> 
>
> Key: IMPALA-10253
> URL: https://issues.apache.org/jira/browse/IMPALA-10253
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Reporter: gaoxiaoqing
>Priority: Major
>
> we have the following parquet table:
> {code:java}
> CREATE EXTERNAL TABLE rawdata.event_ros_p1 (
>   event_id INT,
>   user_id BIGINT,
>   time TIMESTAMP,
>   p_abook_type STRING 
> )
> PARTITIONED BY (
>   day INT,
>   event_bucket INT
> )
> STORED AS PARQUET
> LOCATION 'hdfs://localhost:20500/sa/data/1/event'
> {code}
> the data show as following:
> ||event_id||user_id||time||p_abook_type||
> |1|-922235446862664806|2018-07-18 09:01:06.158|小说|
> |2|-922235446862664806|2018-07-19 09:01:06.158|小说|
> if we want remapping event_id to the real event name, we can realize dict 
> udf. the dict udf is defined as DICT(BIGINT expression, STRING path). first 
> parameter is the column, second parameter is hdfs path which store the 
> remapping rule like this:
> {code:java}
> 1,SignUp
> 2,ViewProduct{code}
> then build a view table which add the dict column on original table:
> {code:java}
> CREATE VIEW rawdata.event_external_view_p7 AS SELECT events.*, 
> dict(`event_id`, '/data/1/event.txt') AS `event` FROM rawdata.event_view_p7 
> events
> {code}
> If the query group by column has dict, the query is slower then group by 
> original column. when explain the sql, we found that each line data need 
> remapping in SCAN phase and AGGREGATE phase. 
> {code:java}
> select event, count(*) from event_external_view_p7 where event in ('SignUp', 
> 'ViewProduct') group by event;{code}
> {code:java}
> PLAN-ROOT SINK
> |
> 04:EXCHANGE [UNPARTITIONED]
> |
> 03:AGGREGATE [FINALIZE]
> |  output: count:merge(*)
> |  group by: event
> |  row-size=20B cardinality=0
> |
> 02:EXCHANGE [HASH(event)]
> |
> 01:AGGREGATE [STREAMING]
> |  output: count(*)
> |  group by: rawdata.DICT(event_id, '/data/1/event.txt')
> |  row-size=20B cardinality=0
> |
> 00:SCAN HDFS [rawdata.event_ros_p7_merge_offline]
> |  partitions=39/39 files=99 size=9.00GB
> |  predicates: rawdata.DICT(event_id, '/data/1/event.txt') IN ('SignUp', 
> 'ViewProduct')
> |  row-size=4B cardinality=unavailable
> {code}
> the idea is to modify plan, use original column in SCAN phase and AGGREGATE 
> phase and remapping the original column at last, the new plan like this:
> {code:java}
> PLAN-ROOT SINK
> |
> 05:SELECT [FINALIZE]
> |  output: dict(event_id)
> |  row-size=20B cardinality=0
> |
> 04:EXCHANGE [UNPARTITIONED]
> |
> 03:AGGREGATE [FINALIZE]
> |  output: count:merge(*)
> |  group by: event_id
> |  row-size=20B cardinality=0
> |
> 02:EXCHANGE [HASH(event)]
> |
> 01:AGGREGATE [STREAMING]
> |  output: count(*)
> |  group by: event_id
> |  row-size=20B cardinality=0
> |
> 00:SCAN HDFS [rawdata.event_ros_p7_merge_offline]
> |  partitions=39/39 files=99 size=9.00GB
> |  predicates: event_id IN (1, 2)
> |  row-size=4B cardinality=unavailable
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-10259) Hit DCHECK in TestImpalaShell.test_completed_query_errors_2

2020-10-19 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-10259:
---
Labels: crash  (was: )

> Hit DCHECK in TestImpalaShell.test_completed_query_errors_2
> ---
>
> Key: IMPALA-10259
> URL: https://issues.apache.org/jira/browse/IMPALA-10259
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Quanlong Huang
>Assignee: Wenzhe Zhou
>Priority: Blocker
>  Labels: crash
>
> TestImpalaShell.test_completed_query_errors_2 hits a DCHECK in a core ASAN 
> build:
> {code:java}
> F1016 17:08:54.728466 19955 query-state.cc:877] 
> 924f4ce603ac07bb:a08656e3] Check failed: is_cancelled_.Load() == 1 (0 
> vs. 1)  {code}
> The test is:
> {code:java}
> shell.test_shell_commandline.TestImpalaShell.test_completed_query_errors_2[table_format_and_file_extension:
>  ('textfile', '.txt') | protocol: hs2] {code}
> The query is:
> {code:java}
> I1016 17:08:49.026532 19947 Frontend.java:1522] 
> 924f4ce603ac07bb:a08656e3] Analyzing query: select id, cnt from 
> functional_parquet.bad_column_metadata t, (select 1 cnt) u db: default {code}
> Query options:
> {code:java}
> I1016 17:08:49.020670 19947 impala-hs2-server.cc:269] 
> TClientRequest.queryOptions: TQueryOptions {
>   01: abort_on_error (bool) = true,
>   02: max_errors (i32) = 100,
>   03: disable_codegen (bool) = false,
>   04: batch_size (i32) = 0,
>   05: num_nodes (i32) = 0,
>   06: max_scan_range_length (i64) = 0,
>   07: num_scanner_threads (i32) = 0,
>   11: debug_action (string) = "",
>   12: mem_limit (i64) = 0,
>   15: hbase_caching (i32) = 0,
>   16: hbase_cache_blocks (bool) = false,
>   17: parquet_file_size (i64) = 0,
>   18: explain_level (i32) = 1,
>   19: sync_ddl (bool) = false,
>   24: disable_outermost_topn (bool) = false,
>   26: query_timeout_s (i32) = 0,
>   28: appx_count_distinct (bool) = false,
>   29: disable_unsafe_spills (bool) = false,
>   31: exec_single_node_rows_threshold (i32) = 100,
>   32: optimize_partition_key_scans (bool) = false,
>   33: replica_preference (i32) = 0,
>   34: schedule_random_replica (bool) = false,
>   36: disable_streaming_preaggregations (bool) = false,
>   37: runtime_filter_mode (i32) = 2,
>   38: runtime_bloom_filter_size (i32) = 1048576,
>   39: runtime_filter_wait_time_ms (i32) = 0,
>   40: disable_row_runtime_filtering (bool) = false,
>   41: max_num_runtime_filters (i32) = 10,
>   42: parquet_annotate_strings_utf8 (bool) = false,
>   43: parquet_fallback_schema_resolution (i32) = 0,
>   45: s3_skip_insert_staging (bool) = true,
>   46: runtime_filter_min_size (i32) = 1048576,
>   47: runtime_filter_max_size (i32) = 16777216,
>   48: prefetch_mode (i32) = 1,
>   49: strict_mode (bool) = false,
>   50: scratch_limit (i64) = -1,
>   51: enable_expr_rewrites (bool) = true,
>   52: decimal_v2 (bool) = true,
>   53: parquet_dictionary_filtering (bool) = true,
>   54: parquet_array_resolution (i32) = 0,
>   55: parquet_read_statistics (bool) = true,
>   56: default_join_distribution_mode (i32) = 0,
>   57: disable_codegen_rows_threshold (i32) = 5,
>   58: default_spillable_buffer_size (i64) = 2097152,
>   59: min_spillable_buffer_size (i64) = 65536,
>   60: max_row_size (i64) = 524288,
>   61: idle_session_timeout (i32) = 0,
>   62: compute_stats_min_sample_size (i64) = 1073741824,
>   63: exec_time_limit_s (i32) = 0,
>   64: shuffle_distinct_exprs (bool) = true,
>   65: max_mem_estimate_for_admission (i64) = 0,
>   66: thread_reservation_limit (i32) = 3000,
>   67: thread_reservation_aggregate_limit (i32) = 0,
>   68: kudu_read_mode (i32) = 0,
>   69: allow_erasure_coded_files (bool) = false,
>   70: timezone (string) = "",
>   71: scan_bytes_limit (i64) = 0,
>   72: cpu_limit_s (i64) = 0,
>   73: topn_bytes_limit (i64) = 536870912,
>   74: client_identifier (string) = "Impala Shell v4.0.0-SNAPSHOT (1e30eec) 
> built on Fri Oct 16 13:26:18 PDT 2020",
>   75: resource_trace_ratio (double) = 0,
>   76: num_remote_executor_candidates (i32) = 3,
>   77: num_rows_produced_limit (i64) = 0,
>   78: planner_testcase_mode (bool) = false,
>   79: default_file_format (i32) = 0,
>   80: parquet_timestamp_type (i32) = 0,
>   81: parquet_read_page_index (bool) = true,
>   82: parquet_write_page_index (bool) = true,
>   84: disable_hdfs_num_rows_estimate (bool) = false,
>   86: spool_query_results (bool) = false,
>   87: default_transactional_type (i32) = 0,
>   88: statement_expression_limit (i32) = 25,
>   89: max_statement_length_bytes (i32) = 16777216,
>   90: disable_data_cache (bool) = false,
>   91: max_result_spooling_mem (i64) = 104857600,
>   92: max_spilled_result_spooling_mem (i64) = 1073741824,
>   93: disable_hbase_num_rows_estimate (bool) = false,
>   94: fetch_rows_timeout_ms (i64) = 1,
>   95: now_string

[jira] [Updated] (IMPALA-10259) Hit DCHECK in TestImpalaShell.test_completed_query_errors_2

2020-10-19 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-10259:
---
Priority: Blocker  (was: Critical)

> Hit DCHECK in TestImpalaShell.test_completed_query_errors_2
> ---
>
> Key: IMPALA-10259
> URL: https://issues.apache.org/jira/browse/IMPALA-10259
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Quanlong Huang
>Assignee: Wenzhe Zhou
>Priority: Blocker
>
> TestImpalaShell.test_completed_query_errors_2 hits a DCHECK in a core ASAN 
> build:
> {code:java}
> F1016 17:08:54.728466 19955 query-state.cc:877] 
> 924f4ce603ac07bb:a08656e3] Check failed: is_cancelled_.Load() == 1 (0 
> vs. 1)  {code}
> The test is:
> {code:java}
> shell.test_shell_commandline.TestImpalaShell.test_completed_query_errors_2[table_format_and_file_extension:
>  ('textfile', '.txt') | protocol: hs2] {code}
> The query is:
> {code:java}
> I1016 17:08:49.026532 19947 Frontend.java:1522] 
> 924f4ce603ac07bb:a08656e3] Analyzing query: select id, cnt from 
> functional_parquet.bad_column_metadata t, (select 1 cnt) u db: default {code}
> Query options:
> {code:java}
> I1016 17:08:49.020670 19947 impala-hs2-server.cc:269] 
> TClientRequest.queryOptions: TQueryOptions {
>   01: abort_on_error (bool) = true,
>   02: max_errors (i32) = 100,
>   03: disable_codegen (bool) = false,
>   04: batch_size (i32) = 0,
>   05: num_nodes (i32) = 0,
>   06: max_scan_range_length (i64) = 0,
>   07: num_scanner_threads (i32) = 0,
>   11: debug_action (string) = "",
>   12: mem_limit (i64) = 0,
>   15: hbase_caching (i32) = 0,
>   16: hbase_cache_blocks (bool) = false,
>   17: parquet_file_size (i64) = 0,
>   18: explain_level (i32) = 1,
>   19: sync_ddl (bool) = false,
>   24: disable_outermost_topn (bool) = false,
>   26: query_timeout_s (i32) = 0,
>   28: appx_count_distinct (bool) = false,
>   29: disable_unsafe_spills (bool) = false,
>   31: exec_single_node_rows_threshold (i32) = 100,
>   32: optimize_partition_key_scans (bool) = false,
>   33: replica_preference (i32) = 0,
>   34: schedule_random_replica (bool) = false,
>   36: disable_streaming_preaggregations (bool) = false,
>   37: runtime_filter_mode (i32) = 2,
>   38: runtime_bloom_filter_size (i32) = 1048576,
>   39: runtime_filter_wait_time_ms (i32) = 0,
>   40: disable_row_runtime_filtering (bool) = false,
>   41: max_num_runtime_filters (i32) = 10,
>   42: parquet_annotate_strings_utf8 (bool) = false,
>   43: parquet_fallback_schema_resolution (i32) = 0,
>   45: s3_skip_insert_staging (bool) = true,
>   46: runtime_filter_min_size (i32) = 1048576,
>   47: runtime_filter_max_size (i32) = 16777216,
>   48: prefetch_mode (i32) = 1,
>   49: strict_mode (bool) = false,
>   50: scratch_limit (i64) = -1,
>   51: enable_expr_rewrites (bool) = true,
>   52: decimal_v2 (bool) = true,
>   53: parquet_dictionary_filtering (bool) = true,
>   54: parquet_array_resolution (i32) = 0,
>   55: parquet_read_statistics (bool) = true,
>   56: default_join_distribution_mode (i32) = 0,
>   57: disable_codegen_rows_threshold (i32) = 5,
>   58: default_spillable_buffer_size (i64) = 2097152,
>   59: min_spillable_buffer_size (i64) = 65536,
>   60: max_row_size (i64) = 524288,
>   61: idle_session_timeout (i32) = 0,
>   62: compute_stats_min_sample_size (i64) = 1073741824,
>   63: exec_time_limit_s (i32) = 0,
>   64: shuffle_distinct_exprs (bool) = true,
>   65: max_mem_estimate_for_admission (i64) = 0,
>   66: thread_reservation_limit (i32) = 3000,
>   67: thread_reservation_aggregate_limit (i32) = 0,
>   68: kudu_read_mode (i32) = 0,
>   69: allow_erasure_coded_files (bool) = false,
>   70: timezone (string) = "",
>   71: scan_bytes_limit (i64) = 0,
>   72: cpu_limit_s (i64) = 0,
>   73: topn_bytes_limit (i64) = 536870912,
>   74: client_identifier (string) = "Impala Shell v4.0.0-SNAPSHOT (1e30eec) 
> built on Fri Oct 16 13:26:18 PDT 2020",
>   75: resource_trace_ratio (double) = 0,
>   76: num_remote_executor_candidates (i32) = 3,
>   77: num_rows_produced_limit (i64) = 0,
>   78: planner_testcase_mode (bool) = false,
>   79: default_file_format (i32) = 0,
>   80: parquet_timestamp_type (i32) = 0,
>   81: parquet_read_page_index (bool) = true,
>   82: parquet_write_page_index (bool) = true,
>   84: disable_hdfs_num_rows_estimate (bool) = false,
>   86: spool_query_results (bool) = false,
>   87: default_transactional_type (i32) = 0,
>   88: statement_expression_limit (i32) = 25,
>   89: max_statement_length_bytes (i32) = 16777216,
>   90: disable_data_cache (bool) = false,
>   91: max_result_spooling_mem (i64) = 104857600,
>   92: max_spilled_result_spooling_mem (i64) = 1073741824,
>   93: disable_hbase_num_rows_estimate (bool) = false,
>   94: fetch_rows_timeout_ms (i64) = 1,
>   95: now_string (string) = "",
>

[jira] [Updated] (IMPALA-10259) Hit DCHECK in TestImpalaShell.test_completed_query_errors_2

2020-10-19 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-10259:
---
Component/s: Backend

> Hit DCHECK in TestImpalaShell.test_completed_query_errors_2
> ---
>
> Key: IMPALA-10259
> URL: https://issues.apache.org/jira/browse/IMPALA-10259
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Quanlong Huang
>Assignee: Wenzhe Zhou
>Priority: Blocker
>  Labels: crash
>
> TestImpalaShell.test_completed_query_errors_2 hits a DCHECK in a core ASAN 
> build:
> {code:java}
> F1016 17:08:54.728466 19955 query-state.cc:877] 
> 924f4ce603ac07bb:a08656e3] Check failed: is_cancelled_.Load() == 1 (0 
> vs. 1)  {code}
> The test is:
> {code:java}
> shell.test_shell_commandline.TestImpalaShell.test_completed_query_errors_2[table_format_and_file_extension:
>  ('textfile', '.txt') | protocol: hs2] {code}
> The query is:
> {code:java}
> I1016 17:08:49.026532 19947 Frontend.java:1522] 
> 924f4ce603ac07bb:a08656e3] Analyzing query: select id, cnt from 
> functional_parquet.bad_column_metadata t, (select 1 cnt) u db: default {code}
> Query options:
> {code:java}
> I1016 17:08:49.020670 19947 impala-hs2-server.cc:269] 
> TClientRequest.queryOptions: TQueryOptions {
>   01: abort_on_error (bool) = true,
>   02: max_errors (i32) = 100,
>   03: disable_codegen (bool) = false,
>   04: batch_size (i32) = 0,
>   05: num_nodes (i32) = 0,
>   06: max_scan_range_length (i64) = 0,
>   07: num_scanner_threads (i32) = 0,
>   11: debug_action (string) = "",
>   12: mem_limit (i64) = 0,
>   15: hbase_caching (i32) = 0,
>   16: hbase_cache_blocks (bool) = false,
>   17: parquet_file_size (i64) = 0,
>   18: explain_level (i32) = 1,
>   19: sync_ddl (bool) = false,
>   24: disable_outermost_topn (bool) = false,
>   26: query_timeout_s (i32) = 0,
>   28: appx_count_distinct (bool) = false,
>   29: disable_unsafe_spills (bool) = false,
>   31: exec_single_node_rows_threshold (i32) = 100,
>   32: optimize_partition_key_scans (bool) = false,
>   33: replica_preference (i32) = 0,
>   34: schedule_random_replica (bool) = false,
>   36: disable_streaming_preaggregations (bool) = false,
>   37: runtime_filter_mode (i32) = 2,
>   38: runtime_bloom_filter_size (i32) = 1048576,
>   39: runtime_filter_wait_time_ms (i32) = 0,
>   40: disable_row_runtime_filtering (bool) = false,
>   41: max_num_runtime_filters (i32) = 10,
>   42: parquet_annotate_strings_utf8 (bool) = false,
>   43: parquet_fallback_schema_resolution (i32) = 0,
>   45: s3_skip_insert_staging (bool) = true,
>   46: runtime_filter_min_size (i32) = 1048576,
>   47: runtime_filter_max_size (i32) = 16777216,
>   48: prefetch_mode (i32) = 1,
>   49: strict_mode (bool) = false,
>   50: scratch_limit (i64) = -1,
>   51: enable_expr_rewrites (bool) = true,
>   52: decimal_v2 (bool) = true,
>   53: parquet_dictionary_filtering (bool) = true,
>   54: parquet_array_resolution (i32) = 0,
>   55: parquet_read_statistics (bool) = true,
>   56: default_join_distribution_mode (i32) = 0,
>   57: disable_codegen_rows_threshold (i32) = 5,
>   58: default_spillable_buffer_size (i64) = 2097152,
>   59: min_spillable_buffer_size (i64) = 65536,
>   60: max_row_size (i64) = 524288,
>   61: idle_session_timeout (i32) = 0,
>   62: compute_stats_min_sample_size (i64) = 1073741824,
>   63: exec_time_limit_s (i32) = 0,
>   64: shuffle_distinct_exprs (bool) = true,
>   65: max_mem_estimate_for_admission (i64) = 0,
>   66: thread_reservation_limit (i32) = 3000,
>   67: thread_reservation_aggregate_limit (i32) = 0,
>   68: kudu_read_mode (i32) = 0,
>   69: allow_erasure_coded_files (bool) = false,
>   70: timezone (string) = "",
>   71: scan_bytes_limit (i64) = 0,
>   72: cpu_limit_s (i64) = 0,
>   73: topn_bytes_limit (i64) = 536870912,
>   74: client_identifier (string) = "Impala Shell v4.0.0-SNAPSHOT (1e30eec) 
> built on Fri Oct 16 13:26:18 PDT 2020",
>   75: resource_trace_ratio (double) = 0,
>   76: num_remote_executor_candidates (i32) = 3,
>   77: num_rows_produced_limit (i64) = 0,
>   78: planner_testcase_mode (bool) = false,
>   79: default_file_format (i32) = 0,
>   80: parquet_timestamp_type (i32) = 0,
>   81: parquet_read_page_index (bool) = true,
>   82: parquet_write_page_index (bool) = true,
>   84: disable_hdfs_num_rows_estimate (bool) = false,
>   86: spool_query_results (bool) = false,
>   87: default_transactional_type (i32) = 0,
>   88: statement_expression_limit (i32) = 25,
>   89: max_statement_length_bytes (i32) = 16777216,
>   90: disable_data_cache (bool) = false,
>   91: max_result_spooling_mem (i64) = 104857600,
>   92: max_spilled_result_spooling_mem (i64) = 1073741824,
>   93: disable_hbase_num_rows_estimate (bool) = false,
>   94: fetch_rows_timeout_ms

[jira] [Updated] (IMPALA-10259) Hit DCHECK in TestImpalaShell.test_completed_query_errors_2

2020-10-19 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-10259:
---
Labels: broken-build crash  (was: crash)

> Hit DCHECK in TestImpalaShell.test_completed_query_errors_2
> ---
>
> Key: IMPALA-10259
> URL: https://issues.apache.org/jira/browse/IMPALA-10259
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Quanlong Huang
>Assignee: Wenzhe Zhou
>Priority: Blocker
>  Labels: broken-build, crash
>
> TestImpalaShell.test_completed_query_errors_2 hits a DCHECK in a core ASAN 
> build:
> {code:java}
> F1016 17:08:54.728466 19955 query-state.cc:877] 
> 924f4ce603ac07bb:a08656e3] Check failed: is_cancelled_.Load() == 1 (0 
> vs. 1)  {code}
> The test is:
> {code:java}
> shell.test_shell_commandline.TestImpalaShell.test_completed_query_errors_2[table_format_and_file_extension:
>  ('textfile', '.txt') | protocol: hs2] {code}
> The query is:
> {code:java}
> I1016 17:08:49.026532 19947 Frontend.java:1522] 
> 924f4ce603ac07bb:a08656e3] Analyzing query: select id, cnt from 
> functional_parquet.bad_column_metadata t, (select 1 cnt) u db: default {code}
> Query options:
> {code:java}
> I1016 17:08:49.020670 19947 impala-hs2-server.cc:269] 
> TClientRequest.queryOptions: TQueryOptions {
>   01: abort_on_error (bool) = true,
>   02: max_errors (i32) = 100,
>   03: disable_codegen (bool) = false,
>   04: batch_size (i32) = 0,
>   05: num_nodes (i32) = 0,
>   06: max_scan_range_length (i64) = 0,
>   07: num_scanner_threads (i32) = 0,
>   11: debug_action (string) = "",
>   12: mem_limit (i64) = 0,
>   15: hbase_caching (i32) = 0,
>   16: hbase_cache_blocks (bool) = false,
>   17: parquet_file_size (i64) = 0,
>   18: explain_level (i32) = 1,
>   19: sync_ddl (bool) = false,
>   24: disable_outermost_topn (bool) = false,
>   26: query_timeout_s (i32) = 0,
>   28: appx_count_distinct (bool) = false,
>   29: disable_unsafe_spills (bool) = false,
>   31: exec_single_node_rows_threshold (i32) = 100,
>   32: optimize_partition_key_scans (bool) = false,
>   33: replica_preference (i32) = 0,
>   34: schedule_random_replica (bool) = false,
>   36: disable_streaming_preaggregations (bool) = false,
>   37: runtime_filter_mode (i32) = 2,
>   38: runtime_bloom_filter_size (i32) = 1048576,
>   39: runtime_filter_wait_time_ms (i32) = 0,
>   40: disable_row_runtime_filtering (bool) = false,
>   41: max_num_runtime_filters (i32) = 10,
>   42: parquet_annotate_strings_utf8 (bool) = false,
>   43: parquet_fallback_schema_resolution (i32) = 0,
>   45: s3_skip_insert_staging (bool) = true,
>   46: runtime_filter_min_size (i32) = 1048576,
>   47: runtime_filter_max_size (i32) = 16777216,
>   48: prefetch_mode (i32) = 1,
>   49: strict_mode (bool) = false,
>   50: scratch_limit (i64) = -1,
>   51: enable_expr_rewrites (bool) = true,
>   52: decimal_v2 (bool) = true,
>   53: parquet_dictionary_filtering (bool) = true,
>   54: parquet_array_resolution (i32) = 0,
>   55: parquet_read_statistics (bool) = true,
>   56: default_join_distribution_mode (i32) = 0,
>   57: disable_codegen_rows_threshold (i32) = 5,
>   58: default_spillable_buffer_size (i64) = 2097152,
>   59: min_spillable_buffer_size (i64) = 65536,
>   60: max_row_size (i64) = 524288,
>   61: idle_session_timeout (i32) = 0,
>   62: compute_stats_min_sample_size (i64) = 1073741824,
>   63: exec_time_limit_s (i32) = 0,
>   64: shuffle_distinct_exprs (bool) = true,
>   65: max_mem_estimate_for_admission (i64) = 0,
>   66: thread_reservation_limit (i32) = 3000,
>   67: thread_reservation_aggregate_limit (i32) = 0,
>   68: kudu_read_mode (i32) = 0,
>   69: allow_erasure_coded_files (bool) = false,
>   70: timezone (string) = "",
>   71: scan_bytes_limit (i64) = 0,
>   72: cpu_limit_s (i64) = 0,
>   73: topn_bytes_limit (i64) = 536870912,
>   74: client_identifier (string) = "Impala Shell v4.0.0-SNAPSHOT (1e30eec) 
> built on Fri Oct 16 13:26:18 PDT 2020",
>   75: resource_trace_ratio (double) = 0,
>   76: num_remote_executor_candidates (i32) = 3,
>   77: num_rows_produced_limit (i64) = 0,
>   78: planner_testcase_mode (bool) = false,
>   79: default_file_format (i32) = 0,
>   80: parquet_timestamp_type (i32) = 0,
>   81: parquet_read_page_index (bool) = true,
>   82: parquet_write_page_index (bool) = true,
>   84: disable_hdfs_num_rows_estimate (bool) = false,
>   86: spool_query_results (bool) = false,
>   87: default_transactional_type (i32) = 0,
>   88: statement_expression_limit (i32) = 25,
>   89: max_statement_length_bytes (i32) = 16777216,
>   90: disable_data_cache (bool) = false,
>   91: max_result_spooling_mem (i64) = 104857600,
>   92: max_spilled_result_spooling_mem (i64) = 1073741824,
>   93: disable_hbase_num_rows_estimate (bool) = false,

[jira] [Updated] (IMPALA-10259) Hit DCHECK in TestImpalaShell.test_completed_query_errors_2

2020-10-19 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-10259:
---
Target Version: Impala 4.0

> Hit DCHECK in TestImpalaShell.test_completed_query_errors_2
> ---
>
> Key: IMPALA-10259
> URL: https://issues.apache.org/jira/browse/IMPALA-10259
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Quanlong Huang
>Assignee: Wenzhe Zhou
>Priority: Blocker
>  Labels: broken-build, crash
>
> TestImpalaShell.test_completed_query_errors_2 hits a DCHECK in a core ASAN 
> build:
> {code:java}
> F1016 17:08:54.728466 19955 query-state.cc:877] 
> 924f4ce603ac07bb:a08656e3] Check failed: is_cancelled_.Load() == 1 (0 
> vs. 1)  {code}
> The test is:
> {code:java}
> shell.test_shell_commandline.TestImpalaShell.test_completed_query_errors_2[table_format_and_file_extension:
>  ('textfile', '.txt') | protocol: hs2] {code}
> The query is:
> {code:java}
> I1016 17:08:49.026532 19947 Frontend.java:1522] 
> 924f4ce603ac07bb:a08656e3] Analyzing query: select id, cnt from 
> functional_parquet.bad_column_metadata t, (select 1 cnt) u db: default {code}
> Query options:
> {code:java}
> I1016 17:08:49.020670 19947 impala-hs2-server.cc:269] 
> TClientRequest.queryOptions: TQueryOptions {
>   01: abort_on_error (bool) = true,
>   02: max_errors (i32) = 100,
>   03: disable_codegen (bool) = false,
>   04: batch_size (i32) = 0,
>   05: num_nodes (i32) = 0,
>   06: max_scan_range_length (i64) = 0,
>   07: num_scanner_threads (i32) = 0,
>   11: debug_action (string) = "",
>   12: mem_limit (i64) = 0,
>   15: hbase_caching (i32) = 0,
>   16: hbase_cache_blocks (bool) = false,
>   17: parquet_file_size (i64) = 0,
>   18: explain_level (i32) = 1,
>   19: sync_ddl (bool) = false,
>   24: disable_outermost_topn (bool) = false,
>   26: query_timeout_s (i32) = 0,
>   28: appx_count_distinct (bool) = false,
>   29: disable_unsafe_spills (bool) = false,
>   31: exec_single_node_rows_threshold (i32) = 100,
>   32: optimize_partition_key_scans (bool) = false,
>   33: replica_preference (i32) = 0,
>   34: schedule_random_replica (bool) = false,
>   36: disable_streaming_preaggregations (bool) = false,
>   37: runtime_filter_mode (i32) = 2,
>   38: runtime_bloom_filter_size (i32) = 1048576,
>   39: runtime_filter_wait_time_ms (i32) = 0,
>   40: disable_row_runtime_filtering (bool) = false,
>   41: max_num_runtime_filters (i32) = 10,
>   42: parquet_annotate_strings_utf8 (bool) = false,
>   43: parquet_fallback_schema_resolution (i32) = 0,
>   45: s3_skip_insert_staging (bool) = true,
>   46: runtime_filter_min_size (i32) = 1048576,
>   47: runtime_filter_max_size (i32) = 16777216,
>   48: prefetch_mode (i32) = 1,
>   49: strict_mode (bool) = false,
>   50: scratch_limit (i64) = -1,
>   51: enable_expr_rewrites (bool) = true,
>   52: decimal_v2 (bool) = true,
>   53: parquet_dictionary_filtering (bool) = true,
>   54: parquet_array_resolution (i32) = 0,
>   55: parquet_read_statistics (bool) = true,
>   56: default_join_distribution_mode (i32) = 0,
>   57: disable_codegen_rows_threshold (i32) = 5,
>   58: default_spillable_buffer_size (i64) = 2097152,
>   59: min_spillable_buffer_size (i64) = 65536,
>   60: max_row_size (i64) = 524288,
>   61: idle_session_timeout (i32) = 0,
>   62: compute_stats_min_sample_size (i64) = 1073741824,
>   63: exec_time_limit_s (i32) = 0,
>   64: shuffle_distinct_exprs (bool) = true,
>   65: max_mem_estimate_for_admission (i64) = 0,
>   66: thread_reservation_limit (i32) = 3000,
>   67: thread_reservation_aggregate_limit (i32) = 0,
>   68: kudu_read_mode (i32) = 0,
>   69: allow_erasure_coded_files (bool) = false,
>   70: timezone (string) = "",
>   71: scan_bytes_limit (i64) = 0,
>   72: cpu_limit_s (i64) = 0,
>   73: topn_bytes_limit (i64) = 536870912,
>   74: client_identifier (string) = "Impala Shell v4.0.0-SNAPSHOT (1e30eec) 
> built on Fri Oct 16 13:26:18 PDT 2020",
>   75: resource_trace_ratio (double) = 0,
>   76: num_remote_executor_candidates (i32) = 3,
>   77: num_rows_produced_limit (i64) = 0,
>   78: planner_testcase_mode (bool) = false,
>   79: default_file_format (i32) = 0,
>   80: parquet_timestamp_type (i32) = 0,
>   81: parquet_read_page_index (bool) = true,
>   82: parquet_write_page_index (bool) = true,
>   84: disable_hdfs_num_rows_estimate (bool) = false,
>   86: spool_query_results (bool) = false,
>   87: default_transactional_type (i32) = 0,
>   88: statement_expression_limit (i32) = 25,
>   89: max_statement_length_bytes (i32) = 16777216,
>   90: disable_data_cache (bool) = false,
>   91: max_result_spooling_mem (i64) = 104857600,
>   92: max_spilled_result_spooling_mem (i64) = 1073741824,
>   93: disable_hbase_num_rows_estimate (bool) = false,
>   94:

[jira] [Updated] (IMPALA-9240) Impala shell using hs2-http reports all http error codes as EOFError

2020-10-19 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-9240:
--
Fix Version/s: (was: Impala 4.0)
   Impala 3.4.0

> Impala shell using hs2-http reports all http error codes as EOFError
> 
>
> Key: IMPALA-9240
> URL: https://issues.apache.org/jira/browse/IMPALA-9240
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Andrew Sherman
>Assignee: Andrew Sherman
>Priority: Major
> Fix For: Impala 3.4.0
>
>
> For example if I try to connect to an http endpoint that returns 503 I see
> {code}
> $ impala-shell -V --protocol='hs2-http' -i "localhost:28000"
> Starting Impala Shell without Kerberos authentication
> Warning: --connect_timeout_ms is currently ignored with HTTP transport.
> Opened TCP connection to localhost:28000
> Error connecting: EOFError, 
> {code}
> At present Impala shell does not properly check http return calls. 
> When this fix is complete it should be also put into Impyla.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-9240) Impala shell using hs2-http reports all http error codes as EOFError

2020-10-19 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-9240:
--
Fix Version/s: Impala 4.0

> Impala shell using hs2-http reports all http error codes as EOFError
> 
>
> Key: IMPALA-9240
> URL: https://issues.apache.org/jira/browse/IMPALA-9240
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Andrew Sherman
>Assignee: Andrew Sherman
>Priority: Major
> Fix For: Impala 4.0
>
>
> For example if I try to connect to an http endpoint that returns 503 I see
> {code}
> $ impala-shell -V --protocol='hs2-http' -i "localhost:28000"
> Starting Impala Shell without Kerberos authentication
> Warning: --connect_timeout_ms is currently ignored with HTTP transport.
> Opened TCP connection to localhost:28000
> Error connecting: EOFError, 
> {code}
> At present Impala shell does not properly check http return calls. 
> When this fix is complete it should be also put into Impyla.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-9240) Impala shell using hs2-http reports all http error codes as EOFError

2020-10-19 Thread Andrew Sherman (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Sherman resolved IMPALA-9240.

Resolution: Fixed

Thanks [~tarmstrong] for pointing this out

> Impala shell using hs2-http reports all http error codes as EOFError
> 
>
> Key: IMPALA-9240
> URL: https://issues.apache.org/jira/browse/IMPALA-9240
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Andrew Sherman
>Assignee: Andrew Sherman
>Priority: Major
>
> For example if I try to connect to an http endpoint that returns 503 I see
> {code}
> $ impala-shell -V --protocol='hs2-http' -i "localhost:28000"
> Starting Impala Shell without Kerberos authentication
> Warning: --connect_timeout_ms is currently ignored with HTTP transport.
> Opened TCP connection to localhost:28000
> Error connecting: EOFError, 
> {code}
> At present Impala shell does not properly check http return calls. 
> When this fix is complete it should be also put into Impyla.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-10057) TransactionKeepalive NoClassDefFoundError floods logs during JDBC_TEST/FE_TEST

2020-10-19 Thread Joe McDonnell (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-10057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17216853#comment-17216853
 ] 

Joe McDonnell commented on IMPALA-10057:


The maven test phase does not touch the fe/target/impala-frontend*.jar. It does 
recompile and touch fe/target/classes. Interesting enough, our classpath for 
impalads, catalogds, etc don't use fe/target/impala-frontend*.jar. Instead, 
they directly reference fe/target/classes:

[https://github.com/apache/impala/blob/master/bin/impala-config.sh#L691]

[https://github.com/apache/impala/blob/master/bin/set-classpath.sh#L33]

So, there could be a race condition between the maven test phase 
deleting/recompiling the classes and the impalad/catalogd reading the class. 
One solution is to use fe/target/impala-frontend*.jar on the impalad/catalogd 
classpath rather than referencing fe/target/classes directly. The 
impala-frontend*.jar would get rebuilt at appropriate times, but it is 
untouched by the maven test phase.

> TransactionKeepalive NoClassDefFoundError floods logs during JDBC_TEST/FE_TEST
> --
>
> Key: IMPALA-10057
> URL: https://issues.apache.org/jira/browse/IMPALA-10057
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 4.0
>Reporter: Joe McDonnell
>Priority: Major
>
> For the both the normal tests and the docker-based tests, the Impala logs 
> generated during the FE_TEST/JDBC_TEST can be huge:
>  
> {noformat}
> $ du -c -h fe_test/ee_tests
> 4.0K  fe_test/ee_tests/minidumps/statestored
> 4.0K  fe_test/ee_tests/minidumps/impalad
> 4.0K  fe_test/ee_tests/minidumps/catalogd
> 16K   fe_test/ee_tests/minidumps
> 352K  fe_test/ee_tests/profiles
> 81G   fe_test/ee_tests
> 81G   total{noformat}
> Creating a tarball of these logs takes 10 minutes. The Impalad/catalogd logs 
> are filled with this error over and over:
> {noformat}
> E0903 02:25:39.453887 12060 TransactionKeepalive.java:137] Unexpected 
> exception thrown
> Java exception follows:
> java.lang.BootstrapMethodError: java.lang.NoClassDefFoundError: 
> org/apache/impala/common/TransactionKeepalive$HeartbeatContext
>   at 
> org.apache.impala.common.TransactionKeepalive$DaemonThread.run(TransactionKeepalive.java:114)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NoClassDefFoundError: 
> org/apache/impala/common/TransactionKeepalive$HeartbeatContext
>   ... 2 more
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.impala.common.TransactionKeepalive$HeartbeatContext
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
>   ... 2 more{noformat}
> Two interesting points:
>  # The frontend/jdbc tests are passing, so all of these errors in the impalad 
> logs are not impacting tests.
>  # These errors aren't concurrently with any of the other tests (ee tests, 
> custom cluster tests, etc).
> This is happening on normal core runs (including the GVO job that does 
> FE_TEST/JDBC_TEST) on both Ubuntu and Centos 7. It is also happening on 
> docker-based tests. A theory is that FE_TEST/JDBC_TEST have an Impala cluster 
> running and then invoke maven to run the tests. Maven could manipulate jars 
> while Impala is running. Maybe there is a race-condition or conflict when 
> manipulating those jars that could cause the NoClassDefFoundError. It makes 
> no sense for Impala not to be able to find 
> TransactionKeepalive$HeartbeatContext.
> When it happens, it is in a tight loop, printing the message more than once 
> per millisecond. It fills the ERROR, WARNING, and INFO logs with that 
> message, sometimes for multiple Impalads and/or catalogd.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-9240) Impala shell using hs2-http reports all http error codes as EOFError

2020-10-19 Thread Tim Armstrong (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-9240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17216850#comment-17216850
 ] 

Tim Armstrong commented on IMPALA-9240:
---

[~asherman] can we close this?

> Impala shell using hs2-http reports all http error codes as EOFError
> 
>
> Key: IMPALA-9240
> URL: https://issues.apache.org/jira/browse/IMPALA-9240
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Andrew Sherman
>Assignee: Andrew Sherman
>Priority: Major
>
> For example if I try to connect to an http endpoint that returns 503 I see
> {code}
> $ impala-shell -V --protocol='hs2-http' -i "localhost:28000"
> Starting Impala Shell without Kerberos authentication
> Warning: --connect_timeout_ms is currently ignored with HTTP transport.
> Opened TCP connection to localhost:28000
> Error connecting: EOFError, 
> {code}
> At present Impala shell does not properly check http return calls. 
> When this fix is complete it should be also put into Impyla.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-10062) TestCompressedNonText.test_insensitivity_to_extension can fail due to wrong filename

2020-10-19 Thread Joe McDonnell (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-10062.

Fix Version/s: Impala 4.0
   Resolution: Fixed

> TestCompressedNonText.test_insensitivity_to_extension can fail due to wrong 
> filename
> 
>
> Key: IMPALA-10062
> URL: https://issues.apache.org/jira/browse/IMPALA-10062
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 4.0
>Reporter: Joe McDonnell
>Priority: Critical
>  Labels: broken-build, flaky
> Fix For: Impala 4.0
>
>
> The fix for IMPALA-10005 added a new TestCompressedNonText test. It relies on 
> Hive generating specific file names when writing these compressed tables 
> (i.e. it expects a file named 00_0). It looks like that is not guaranteed 
> by dataload, which can lead to failures like this:
> {noformat}
> query_test/test_compressed_formats.py:142: in test_insensitivity_to_extension
> unique_database, 'tinytable', db_suffix, '00_0', src_extension, ext)
> query_test/test_compressed_formats.py:86: in _copy_and_query_compressed_file
> self.filesystem_client.copy(src_file, dest_file, overwrite=True)
> util/hdfs_util.py:79: in copy
> self.hdfs_filesystem_client.copy(src, dst, overwrite)
> util/hdfs_util.py:241: in copy
> '{0} copy failed: '.format(self.filesystem_type) + stderr + "; " + stdout
> E   AssertionError: HDFS copy failed: cp: 
> `/test-warehouse/tinytable_avro_snap/00_0': No such file or directory
> E   ;{noformat}
> The file list shows that the filename is actually 
> "/test-warehouse/tinytable_avro_snap/00_1"
> We should update the test to tolerate this. The actual base filename doesn't 
> matter for this test.
> I have seen this exactly once so far.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-10127) LIRS enforcement of tombstone limit has pathological performance scenarios

2020-10-19 Thread Joe McDonnell (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-10127.

Fix Version/s: Impala 4.0
   Resolution: Fixed

> LIRS enforcement of tombstone limit has pathological performance scenarios
> --
>
> Key: IMPALA-10127
> URL: https://issues.apache.org/jira/browse/IMPALA-10127
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Blocker
> Fix For: Impala 4.0
>
>
> LIRS maintains metadata for some tombstone entries that have been evicted 
> from the cache (but might be seen again). To limit the memory used for these 
> entries, the lirs_tombstone_multiple limits the total number of tombstone 
> entries. The enforcement walks through the recency list from oldest to newest 
> and removes tombstone entries until it is back underneath the limit. This 
> requires it to go past a certain number of non-tombstone entries before it 
> reaches a tombstone entry. There are pathological cases where this 
> enforcement needs to walk past a very large number of entries before reaching 
> a tombstone entry.
> Suppose that the cache never accesses the same entry more than once. Starting 
> from empty, the first entries representing 95% of the capacity are 
> automatically considered protected. The subsequent accesses are all 
> unprotected. In order to dislodge a protected entry, an entry needs to 
> accessed more than once. If every entry is unique, the protected entries are 
> never touched again and form a contiguous block as the oldest entries on the 
> recency list. Tombstone entries are above them, and unprotected elements are 
> the newest entries on the recency list. It looks like this:
> Oldest
> Protected entries (representing 95% of cache capacity)
> ...
> Tombstone entries
> ...
> Unprotected entries (representing 5% of cache capacity)
> Newest
> To enforce the tombstone limit, it would need to pass all the protected 
> entries to reach a single tombstone entry. I modified cache-bench to add a 
> case with UNIFORM distribution but a 500x ratio of entries to the cache size. 
> This shows pathological performance compared to the 3x ratio:
> {noformat}
> I0831 18:22:06.356406  2605 cache-bench.cc:180] Warming up...
> I0831 18:22:07.357687  2605 cache-bench.cc:183] Running benchmark...
> I0831 18:22:22.358944  2605 cache-bench.cc:191] UNIFORM ratio=3.00x 
> n_unique=786432: 3.48M lookups/sec < FINE
> I0831 18:22:22.358958  2605 cache-bench.cc:192] UNIFORM ratio=3.00x 
> n_unique=786432: 33.3% hit rate
> I0831 18:22:22.961802  2605 cache-bench.cc:180] Warming up...
> I0831 18:22:24.010735  2605 cache-bench.cc:183] Running benchmark...
> I0831 18:22:39.026588  2605 cache-bench.cc:191] UNIFORM ratio=500.00x 
> n_unique=131072000: 1.31k lookups/sec <- OUCH
> I0831 18:22:39.026614  2605 cache-bench.cc:192] UNIFORM ratio=500.00x 
> n_unique=131072000: 0.2% hit rate{noformat}
> We should rework the enforcement of the tombstone limit to avoid walking past 
> all those entries. One option is to keep the tombstone entries on their own 
> list.
> Note that without the tombstone limit, this pathological case would use an 
> unbounded amount of memory (because the tombstone entries would never be 
> reach the bottom of the recency list and get removed).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-10198) Unify Java components into a single maven project

2020-10-19 Thread Joe McDonnell (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-10198.

Fix Version/s: Impala 4.0
 Assignee: Joe McDonnell
   Resolution: Fixed

> Unify Java components into a single maven project
> -
>
> Key: IMPALA-10198
> URL: https://issues.apache.org/jira/browse/IMPALA-10198
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 4.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
> Fix For: Impala 4.0
>
>
> Currently, there are multiple maven projects in Impala's source. Each one is 
> built separately with a separate maven invocation, while sharing a parent pom 
> (impala-parent/pom.xml). This requires artificial CMake dependencies to avoid 
> concurrent maven invocations (e.g. 
> [https://github.com/apache/impala/commit/4c3f701204f92f8753cf65a97fe4804d1f77bc08]).
>  
> We should unify the Java projects into a single project with submodules. This 
> will allow a single maven invocation. This makes it easier to add new Java 
> submodules, and it fixes the "mvn versions:set" command.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-10256) TestDisableFeatures.test_disable_incremental_metadata_updates fails

2020-10-19 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-10256:
---
Labels: broken-build  (was: )

> TestDisableFeatures.test_disable_incremental_metadata_updates fails
> ---
>
> Key: IMPALA-10256
> URL: https://issues.apache.org/jira/browse/IMPALA-10256
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Blocker
>  Labels: broken-build
>
> Saw test failures in internal CORE builds:
> custom_cluster.test_disable_features.TestDisableFeatures.test_disable_incremental_metadata_updates[protocol:
>  beeswax | exec_option: \{'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> text/none-unique_database0]
> {code:java}
> custom_cluster/test_disable_features.py:45: in 
> test_disable_incremental_metadata_updates
> use_db=unique_database, multiple_impalad=True)
> common/impala_test_suite.py:662: in run_test_case
> result = exec_fn(query, user=test_section.get('USER', '').strip() or None)
> common/impala_test_suite.py:600: in __exec_in_impala
> result = self.__execute_query(target_impalad_client, query, user=user)
> common/impala_test_suite.py:909: in __execute_query
> return impalad_client.execute(query, user=user)
> common/impala_connection.py:205: in execute
> return self.__beeswax_client.execute(sql_stmt, user=user)
> beeswax/impala_beeswax.py:187: in execute
> handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:363: in __execute_query
> handle = self.execute_query_async(query_string, user=user)
> beeswax/impala_beeswax.py:357: in execute_query_async
> handle = self.__do_rpc(lambda: self.imp_service.query(query,))
> beeswax/impala_beeswax.py:520: in __do_rpc
> raise ImpalaBeeswaxException(self.__build_error_message(b), b)
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EINNER EXCEPTION: 
> EMESSAGE: AnalysisException: The specified cache pool does not exist: 
> testPool {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-10256) TestDisableFeatures.test_disable_incremental_metadata_updates fails

2020-10-19 Thread Tim Armstrong (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-10256:
---
Component/s: Catalog

> TestDisableFeatures.test_disable_incremental_metadata_updates fails
> ---
>
> Key: IMPALA-10256
> URL: https://issues.apache.org/jira/browse/IMPALA-10256
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Blocker
>
> Saw test failures in internal CORE builds:
> custom_cluster.test_disable_features.TestDisableFeatures.test_disable_incremental_metadata_updates[protocol:
>  beeswax | exec_option: \{'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> text/none-unique_database0]
> {code:java}
> custom_cluster/test_disable_features.py:45: in 
> test_disable_incremental_metadata_updates
> use_db=unique_database, multiple_impalad=True)
> common/impala_test_suite.py:662: in run_test_case
> result = exec_fn(query, user=test_section.get('USER', '').strip() or None)
> common/impala_test_suite.py:600: in __exec_in_impala
> result = self.__execute_query(target_impalad_client, query, user=user)
> common/impala_test_suite.py:909: in __execute_query
> return impalad_client.execute(query, user=user)
> common/impala_connection.py:205: in execute
> return self.__beeswax_client.execute(sql_stmt, user=user)
> beeswax/impala_beeswax.py:187: in execute
> handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:363: in __execute_query
> handle = self.execute_query_async(query_string, user=user)
> beeswax/impala_beeswax.py:357: in execute_query_async
> handle = self.__do_rpc(lambda: self.imp_service.query(query,))
> beeswax/impala_beeswax.py:520: in __do_rpc
> raise ImpalaBeeswaxException(self.__build_error_message(b), b)
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EINNER EXCEPTION: 
> EMESSAGE: AnalysisException: The specified cache pool does not exist: 
> testPool {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-10260) heap-use-after-free AddressSanitizer error in aggregating runtime filters

2020-10-19 Thread Fang-Yu Rao (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao resolved IMPALA-10260.
--
Resolution: Duplicate

The JIRA seems to be a duplicate of IMPALA-9767.

> heap-use-after-free AddressSanitizer error in aggregating runtime filters
> -
>
> Key: IMPALA-10260
> URL: https://issues.apache.org/jira/browse/IMPALA-10260
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Quanlong Huang
>Assignee: Fang-Yu Rao
>Priority: Critical
>
> Saw the following ASAN failure in an internal CORE build:
> {code:java}
> ==7121==ERROR: AddressSanitizer: heap-use-after-free on address 
> 0x7fec0d74d800 at pc 0x01ae9f71 bp 0x7fecfe5d7180 sp 0x7fecfe5d6930
> READ of size 1048576 at 0x7fec0d74d800 thread T82 (rpc reactor-757)
> #0 0x1ae9f70 in read_iovec(void*, __sanitizer::__sanitizer_iovec*, 
> unsigned long, unsigned long) 
> /mnt/source/llvm/llvm-5.0.1.src-p3/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:904
> #1 0x1b005d1 in read_msghdr(void*, __sanitizer::__sanitizer_msghdr*, 
> long) 
> /mnt/source/llvm/llvm-5.0.1.src-p3/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:2781
> #2 0x1b02eb3 in __interceptor_sendmsg 
> /mnt/source/llvm/llvm-5.0.1.src-p3/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:2796
> #3 0x399f54c in kudu::Socket::Writev(iovec const*, int, long*) 
> /data/jenkins/workspace/impala-cdpd-master-staging-core-asan/repos/Impala/be/src/kudu/util/net/socket.cc:447:3
> #4 0x35afe75 in kudu::rpc::OutboundTransfer::SendBuffer(kudu::Socket&) 
> /data/jenkins/workspace/impala-cdpd-master-staging-core-asan/repos/Impala/be/src/kudu/rpc/transfer.cc:227:26
> #5 0x35b8930 in kudu::rpc::Connection::WriteHandler(ev::io&, int) 
> /data/jenkins/workspace/impala-cdpd-master-staging-core-asan/repos/Impala/be/src/kudu/rpc/connection.cc:802:31
> #6 0x580bd12 in ev_invoke_pending 
> (/data0/jenkins/workspace/impala-cdpd-master-staging-core-asan/repos/Impala/be/build/debug/service/impalad+0x580bd12)
> #7 0x3542c9c in kudu::rpc::ReactorThread::InvokePendingCb(ev_loop*) 
> /data/jenkins/workspace/impala-cdpd-master-staging-core-asan/repos/Impala/be/src/kudu/rpc/reactor.cc:196:3
> #8 0x580f3bf in ev_run 
> (/data0/jenkins/workspace/impala-cdpd-master-staging-core-asan/repos/Impala/be/build/debug/service/impalad+0x580f3bf)
> #9 0x3542e91 in kudu::rpc::ReactorThread::RunThread() 
> /data/jenkins/workspace/impala-cdpd-master-staging-core-asan/repos/Impala/be/src/kudu/rpc/reactor.cc:497:9
> #10 0x35545cb in boost::_bi::bind_t kudu::rpc::ReactorThread>, 
> boost::_bi::list1 > 
> >::operator()() 
> /data/jenkins/workspace/impala-cdpd-master-staging-core-asan/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/bind.hpp:1222:16
> #11 0x23417c6 in boost::function0::operator()() const 
> /data/jenkins/workspace/impala-cdpd-master-staging-core-asan/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:770:14
> #12 0x233e039 in kudu::Thread::SuperviseThread(void*) 
> /data/jenkins/workspace/impala-cdpd-master-staging-core-asan/repos/Impala/be/src/kudu/util/thread.cc:675:3
> #13 0x7ff54bd29e24 in start_thread (/lib64/libpthread.so.0+0x7e24)
> #14 0x7ff5487c034c in __clone (/lib64/libc.so.6+0xf834c)
> 0x7fec0d74d800 is located 0 bytes inside of 1048577-byte region 
> [0x7fec0d74d800,0x7fec0d84d801)
> freed by thread T112 here:
> #0 0x1b6ff50 in operator delete(void*) 
> /mnt/source/llvm/llvm-5.0.1.src-p3/projects/compiler-rt/lib/asan/asan_new_delete.cc:137
> #1 0x7ff5490c35a9 in __gnu_cxx::new_allocator::deallocate(char*, 
> unsigned long) 
> /mnt/source/gcc/build-7.5.0/x86_64-pc-linux-gnu/libstdc++-v3/include/ext/new_allocator.h:125
> #2 0x7ff5490c35a9 in std::allocator_traits 
> >::deallocate(std::allocator&, char*, unsigned long) 
> /mnt/source/gcc/build-7.5.0/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/alloc_traits.h:462
> #3 0x7ff5490c35a9 in std::__cxx11::basic_string std::char_traits, std::allocator >::_M_destroy(unsigned long) 
> /mnt/source/gcc/build-7.5.0/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/basic_string.h:226
> #4 0x7ff5490c35a9 in std::__cxx11::basic_string std::char_traits, std::allocator >::reserve(unsigned long) 
> /mnt/source/gcc/build-7.5.0/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/basic_string.tcc:302
> previously allocated by thread T112 here:
> #0 0x1b6f1e0 in operator new(unsigned long) 
> /mnt/source/llvm/llvm-5.0.1.src-p3/projects/compiler-rt/lib/asan/asan_new_delete.cc:92
> #1 0x1b73ece in void std::__cxx11::basic_string std::char_traits, std::allocator

[jira] [Assigned] (IMPALA-10132) Implement ds_hll_estimate_bounds()

2020-10-19 Thread Fucun Chu (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fucun Chu reassigned IMPALA-10132:
--

Assignee: Fucun Chu

> Implement ds_hll_estimate_bounds()
> --
>
> Key: IMPALA-10132
> URL: https://issues.apache.org/jira/browse/IMPALA-10132
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Adam Tamas
>Assignee: Fucun Chu
>Priority: Major
>
> In hive ds_hll_estimate_bounds() gives back an array of doubles.
> An example for a sketch created from a table which contains only a single 
> value:
> {code:java}
> (select ds_hll_estimate_bounds(ds_hll_sketch(i)) from t;)
> +---+
> |  _c0  |
> +---+
> | [1.0,1.0,1.998634873453]  |
> +---+
> {code}
> The values of the array is probably a lower bound, an estimate and an upper 
> bound of the sketch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-10249) TestImpalaShell.test_queries_closed is flaky

2020-10-19 Thread Quanlong Huang (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-10249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17216769#comment-17216769
 ] 

Quanlong Huang commented on IMPALA-10249:
-

Hi [~tarmstr...@cloudera.com], this looks like an old test. Do you know who is 
suitable to be the assignee?

> TestImpalaShell.test_queries_closed is flaky
> 
>
> Key: IMPALA-10249
> URL: https://issues.apache.org/jira/browse/IMPALA-10249
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Quanlong Huang
>Priority: Critical
>
> Saw a failure in a CORE ASAN build:
> shell.test_shell_commandline.TestImpalaShell.test_queries_closed[table_format_and_file_extension:
>  ('textfile', '.txt') | protocol: hs2-http] (from pytest)
> {code:java}
> /data/jenkins/workspace/impala-cdpd-master-core-asan/repos/Impala/tests/shell/test_shell_commandline.py:365:
>  in test_queries_closed
> assert 0 == impalad_service.get_num_in_flight_queries()
> E   assert 0 == 1
> E+  where 1 =  >()
> E+where  > = 
>  0x7ac8510>.get_num_in_flight_queries {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-10249) TestImpalaShell.test_queries_closed is flaky

2020-10-19 Thread Quanlong Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-10249:

Description: 
Saw a failure in a CORE ASAN build:

shell.test_shell_commandline.TestImpalaShell.test_queries_closed[table_format_and_file_extension:
 ('textfile', '.txt') | protocol: hs2-http] (from pytest)
{code:java}
/data/jenkins/workspace/impala-cdpd-master-core-asan/repos/Impala/tests/shell/test_shell_commandline.py:365:
 in test_queries_closed
assert 0 == impalad_service.get_num_in_flight_queries()
E   assert 0 == 1
E+  where 1 = >()
E+where > = 
.get_num_in_flight_queries {code}

  was:
Saw a failure in an internal job:

shell.test_shell_commandline.TestImpalaShell.test_queries_closed[table_format_and_file_extension:
 ('textfile', '.txt') | protocol: hs2-http] (from pytest)
{code:java}
/data/jenkins/workspace/impala-cdpd-master-core-asan/repos/Impala/tests/shell/test_shell_commandline.py:365:
 in test_queries_closed
assert 0 == impalad_service.get_num_in_flight_queries()
E   assert 0 == 1
E+  where 1 = >()
E+where > = 
.get_num_in_flight_queries {code}


> TestImpalaShell.test_queries_closed is flaky
> 
>
> Key: IMPALA-10249
> URL: https://issues.apache.org/jira/browse/IMPALA-10249
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Quanlong Huang
>Priority: Critical
>
> Saw a failure in a CORE ASAN build:
> shell.test_shell_commandline.TestImpalaShell.test_queries_closed[table_format_and_file_extension:
>  ('textfile', '.txt') | protocol: hs2-http] (from pytest)
> {code:java}
> /data/jenkins/workspace/impala-cdpd-master-core-asan/repos/Impala/tests/shell/test_shell_commandline.py:365:
>  in test_queries_closed
> assert 0 == impalad_service.get_num_in_flight_queries()
> E   assert 0 == 1
> E+  where 1 =  >()
> E+where  > = 
>  0x7ac8510>.get_num_in_flight_queries {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-10245) Test fails in TestKuduReadTokenSplit.test_kudu_scanner

2020-10-19 Thread Quanlong Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang reassigned IMPALA-10245:
---

Assignee: Thomas Tauber-Marshall

Hi [~twmarshall], randomly assign this to you. Feel free to reassign it. Thanks!

> Test fails in TestKuduReadTokenSplit.test_kudu_scanner
> --
>
> Key: IMPALA-10245
> URL: https://issues.apache.org/jira/browse/IMPALA-10245
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Quanlong Huang
>Assignee: Thomas Tauber-Marshall
>Priority: Critical
>
> Tests with erasure-coding enabled failed in: 
> query_test.test_kudu.TestKuduReadTokenSplit.test_kudu_scanner[protocol: 
> beeswax | exec_option: \{'kudu_read_mode': 'READ_AT_SNAPSHOT', 'batch_size': 
> 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': 
> False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | 
> table_format: kudu/none] (from pytest)
> {code:java}
> query_test/test_kudu.py:1508: in test_kudu_scanner
> targeted_kudu_scan_range_length=None, plans=plans)
> query_test/test_kudu.py:1542: in __get_num_scanner_instances
> assert len(matches.groups()) == 1
> E   AttributeError: 'NoneType' object has no attribute 'groups' {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-10216) BufferPoolTest.WriteErrorBlacklistCompression is flaky on UBSAN builds

2020-10-19 Thread Quanlong Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang reassigned IMPALA-10216:
---

Assignee: Tim Armstrong

Hi [~tarmstr...@cloudera.com], assign this to you first since you added the 
test. Feel free to reassign it if you don't have bandwidth. Thanks!

> BufferPoolTest.WriteErrorBlacklistCompression is flaky on UBSAN builds
> --
>
> Key: IMPALA-10216
> URL: https://issues.apache.org/jira/browse/IMPALA-10216
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Sahil Takiar
>Assignee: Tim Armstrong
>Priority: Critical
>
> Only seen this once so far:
> {code}
> BufferPoolTest.WriteErrorBlacklistCompression
> Error Message
> Value of: FindPageInDir(pages[NO_ERROR_QUERY], error_dir) != NULL
>   Actual: false
> Expected: true
> Stacktrace
> Impala/be/src/runtime/bufferpool/buffer-pool-test.cc:1764
> Value of: FindPageInDir(pages[NO_ERROR_QUERY], error_dir) != NULL
>   Actual: false
> Expected: true
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-10260) heap-use-after-free AddressSanitizer error in aggregating runtime filters

2020-10-19 Thread Quanlong Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-10260:

Description: 
Saw the following ASAN failure in an internal CORE build:
{code:java}
==7121==ERROR: AddressSanitizer: heap-use-after-free on address 0x7fec0d74d800 
at pc 0x01ae9f71 bp 0x7fecfe5d7180 sp 0x7fecfe5d6930
READ of size 1048576 at 0x7fec0d74d800 thread T82 (rpc reactor-757)
#0 0x1ae9f70 in read_iovec(void*, __sanitizer::__sanitizer_iovec*, unsigned 
long, unsigned long) 
/mnt/source/llvm/llvm-5.0.1.src-p3/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:904
#1 0x1b005d1 in read_msghdr(void*, __sanitizer::__sanitizer_msghdr*, long) 
/mnt/source/llvm/llvm-5.0.1.src-p3/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:2781
#2 0x1b02eb3 in __interceptor_sendmsg 
/mnt/source/llvm/llvm-5.0.1.src-p3/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:2796
#3 0x399f54c in kudu::Socket::Writev(iovec const*, int, long*) 
/data/jenkins/workspace/impala-cdpd-master-staging-core-asan/repos/Impala/be/src/kudu/util/net/socket.cc:447:3
#4 0x35afe75 in kudu::rpc::OutboundTransfer::SendBuffer(kudu::Socket&) 
/data/jenkins/workspace/impala-cdpd-master-staging-core-asan/repos/Impala/be/src/kudu/rpc/transfer.cc:227:26
#5 0x35b8930 in kudu::rpc::Connection::WriteHandler(ev::io&, int) 
/data/jenkins/workspace/impala-cdpd-master-staging-core-asan/repos/Impala/be/src/kudu/rpc/connection.cc:802:31
#6 0x580bd12 in ev_invoke_pending 
(/data0/jenkins/workspace/impala-cdpd-master-staging-core-asan/repos/Impala/be/build/debug/service/impalad+0x580bd12)
#7 0x3542c9c in kudu::rpc::ReactorThread::InvokePendingCb(ev_loop*) 
/data/jenkins/workspace/impala-cdpd-master-staging-core-asan/repos/Impala/be/src/kudu/rpc/reactor.cc:196:3
#8 0x580f3bf in ev_run 
(/data0/jenkins/workspace/impala-cdpd-master-staging-core-asan/repos/Impala/be/build/debug/service/impalad+0x580f3bf)
#9 0x3542e91 in kudu::rpc::ReactorThread::RunThread() 
/data/jenkins/workspace/impala-cdpd-master-staging-core-asan/repos/Impala/be/src/kudu/rpc/reactor.cc:497:9
#10 0x35545cb in boost::_bi::bind_t, 
boost::_bi::list1 > 
>::operator()() 
/data/jenkins/workspace/impala-cdpd-master-staging-core-asan/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/bind.hpp:1222:16
#11 0x23417c6 in boost::function0::operator()() const 
/data/jenkins/workspace/impala-cdpd-master-staging-core-asan/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:770:14
#12 0x233e039 in kudu::Thread::SuperviseThread(void*) 
/data/jenkins/workspace/impala-cdpd-master-staging-core-asan/repos/Impala/be/src/kudu/util/thread.cc:675:3
#13 0x7ff54bd29e24 in start_thread (/lib64/libpthread.so.0+0x7e24)
#14 0x7ff5487c034c in __clone (/lib64/libc.so.6+0xf834c)

0x7fec0d74d800 is located 0 bytes inside of 1048577-byte region 
[0x7fec0d74d800,0x7fec0d84d801)
freed by thread T112 here:
#0 0x1b6ff50 in operator delete(void*) 
/mnt/source/llvm/llvm-5.0.1.src-p3/projects/compiler-rt/lib/asan/asan_new_delete.cc:137
#1 0x7ff5490c35a9 in __gnu_cxx::new_allocator::deallocate(char*, 
unsigned long) 
/mnt/source/gcc/build-7.5.0/x86_64-pc-linux-gnu/libstdc++-v3/include/ext/new_allocator.h:125
#2 0x7ff5490c35a9 in std::allocator_traits 
>::deallocate(std::allocator&, char*, unsigned long) 
/mnt/source/gcc/build-7.5.0/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/alloc_traits.h:462
#3 0x7ff5490c35a9 in std::__cxx11::basic_string, std::allocator >::_M_destroy(unsigned long) 
/mnt/source/gcc/build-7.5.0/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/basic_string.h:226
#4 0x7ff5490c35a9 in std::__cxx11::basic_string, std::allocator >::reserve(unsigned long) 
/mnt/source/gcc/build-7.5.0/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/basic_string.tcc:302

previously allocated by thread T112 here:
#0 0x1b6f1e0 in operator new(unsigned long) 
/mnt/source/llvm/llvm-5.0.1.src-p3/projects/compiler-rt/lib/asan/asan_new_delete.cc:92
#1 0x1b73ece in void std::__cxx11::basic_string, std::allocator >::_M_construct(char 
const*, char const*, std::forward_iterator_tag) 
/data/jenkins/workspace/impala-cdpd-master-staging-core-asan/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/basic_string.tcc:219:14
#2 0x7ff5490c5994 in void std::__cxx11::basic_string, std::allocator >::_M_construct_aux(char const*, char const*, std::__false_type) 
/mnt/source/gcc/build-7.5.0/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/basic_string.h:236
#3 0x7ff5490c5994 in void std::__cxx11::basic_string, std::allocator >::_M_construct(char 
const*, char const*)

[jira] [Updated] (IMPALA-10216) BufferPoolTest.WriteErrorBlacklistCompression is flaky on UBSAN builds

2020-10-19 Thread Quanlong Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-10216:

Issue Type: Bug  (was: Test)

> BufferPoolTest.WriteErrorBlacklistCompression is flaky on UBSAN builds
> --
>
> Key: IMPALA-10216
> URL: https://issues.apache.org/jira/browse/IMPALA-10216
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Sahil Takiar
>Priority: Critical
>
> Only seen this once so far:
> {code}
> BufferPoolTest.WriteErrorBlacklistCompression
> Error Message
> Value of: FindPageInDir(pages[NO_ERROR_QUERY], error_dir) != NULL
>   Actual: false
> Expected: true
> Stacktrace
> Impala/be/src/runtime/bufferpool/buffer-pool-test.cc:1764
> Value of: FindPageInDir(pages[NO_ERROR_QUERY], error_dir) != NULL
>   Actual: false
> Expected: true
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-10260) heap-use-after-free AddressSanitizer error in aggregating runtime filters

2020-10-19 Thread Quanlong Huang (Jira)

Quanlong Huang created IMPALA-10260:
---

 Summary: heap-use-after-free AddressSanitizer error in aggregating 
runtime filters
 Key: IMPALA-10260
 URL: https://issues.apache.org/jira/browse/IMPALA-10260
 Project: IMPALA
  Issue Type: Bug
Reporter: Quanlong Huang
Assignee: Fang-Yu Rao


Saw the following ASAN failure in an internal build:
{code:java}
==7121==ERROR: AddressSanitizer: heap-use-after-free on address 0x7fec0d74d800 
at pc 0x01ae9f71 bp 0x7fecfe5d7180 sp 0x7fecfe5d6930
READ of size 1048576 at 0x7fec0d74d800 thread T82 (rpc reactor-757)
#0 0x1ae9f70 in read_iovec(void*, __sanitizer::__sanitizer_iovec*, unsigned 
long, unsigned long) 
/mnt/source/llvm/llvm-5.0.1.src-p3/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:904
#1 0x1b005d1 in read_msghdr(void*, __sanitizer::__sanitizer_msghdr*, long) 
/mnt/source/llvm/llvm-5.0.1.src-p3/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:2781
#2 0x1b02eb3 in __interceptor_sendmsg 
/mnt/source/llvm/llvm-5.0.1.src-p3/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:2796
#3 0x399f54c in kudu::Socket::Writev(iovec const*, int, long*) 
/data/jenkins/workspace/impala-cdpd-master-staging-core-asan/repos/Impala/be/src/kudu/util/net/socket.cc:447:3
#4 0x35afe75 in kudu::rpc::OutboundTransfer::SendBuffer(kudu::Socket&) 
/data/jenkins/workspace/impala-cdpd-master-staging-core-asan/repos/Impala/be/src/kudu/rpc/transfer.cc:227:26
#5 0x35b8930 in kudu::rpc::Connection::WriteHandler(ev::io&, int) 
/data/jenkins/workspace/impala-cdpd-master-staging-core-asan/repos/Impala/be/src/kudu/rpc/connection.cc:802:31
#6 0x580bd12 in ev_invoke_pending 
(/data0/jenkins/workspace/impala-cdpd-master-staging-core-asan/repos/Impala/be/build/debug/service/impalad+0x580bd12)
#7 0x3542c9c in kudu::rpc::ReactorThread::InvokePendingCb(ev_loop*) 
/data/jenkins/workspace/impala-cdpd-master-staging-core-asan/repos/Impala/be/src/kudu/rpc/reactor.cc:196:3
#8 0x580f3bf in ev_run 
(/data0/jenkins/workspace/impala-cdpd-master-staging-core-asan/repos/Impala/be/build/debug/service/impalad+0x580f3bf)
#9 0x3542e91 in kudu::rpc::ReactorThread::RunThread() 
/data/jenkins/workspace/impala-cdpd-master-staging-core-asan/repos/Impala/be/src/kudu/rpc/reactor.cc:497:9
#10 0x35545cb in boost::_bi::bind_t, 
boost::_bi::list1 > 
>::operator()() 
/data/jenkins/workspace/impala-cdpd-master-staging-core-asan/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/bind.hpp:1222:16
#11 0x23417c6 in boost::function0::operator()() const 
/data/jenkins/workspace/impala-cdpd-master-staging-core-asan/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:770:14
#12 0x233e039 in kudu::Thread::SuperviseThread(void*) 
/data/jenkins/workspace/impala-cdpd-master-staging-core-asan/repos/Impala/be/src/kudu/util/thread.cc:675:3
#13 0x7ff54bd29e24 in start_thread (/lib64/libpthread.so.0+0x7e24)
#14 0x7ff5487c034c in __clone (/lib64/libc.so.6+0xf834c)

0x7fec0d74d800 is located 0 bytes inside of 1048577-byte region 
[0x7fec0d74d800,0x7fec0d84d801)
freed by thread T112 here:
#0 0x1b6ff50 in operator delete(void*) 
/mnt/source/llvm/llvm-5.0.1.src-p3/projects/compiler-rt/lib/asan/asan_new_delete.cc:137
#1 0x7ff5490c35a9 in __gnu_cxx::new_allocator::deallocate(char*, 
unsigned long) 
/mnt/source/gcc/build-7.5.0/x86_64-pc-linux-gnu/libstdc++-v3/include/ext/new_allocator.h:125
#2 0x7ff5490c35a9 in std::allocator_traits 
>::deallocate(std::allocator&, char*, unsigned long) 
/mnt/source/gcc/build-7.5.0/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/alloc_traits.h:462
#3 0x7ff5490c35a9 in std::__cxx11::basic_string, std::allocator >::_M_destroy(unsigned long) 
/mnt/source/gcc/build-7.5.0/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/basic_string.h:226
#4 0x7ff5490c35a9 in std::__cxx11::basic_string, std::allocator >::reserve(unsigned long) 
/mnt/source/gcc/build-7.5.0/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/basic_string.tcc:302

previously allocated by thread T112 here:
#0 0x1b6f1e0 in operator new(unsigned long) 
/mnt/source/llvm/llvm-5.0.1.src-p3/projects/compiler-rt/lib/asan/asan_new_delete.cc:92
#1 0x1b73ece in void std::__cxx11::basic_string, std::allocator >::_M_construct(char 
const*, char const*, std::forward_iterator_tag) 
/data/jenkins/workspace/impala-cdpd-master-staging-core-asan/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/basic_string.tcc:219:14
#2 0x7ff5490c5994 in void std::__cxx11::basic_string, std::allocator >::_M_construct_aux(char const*, char const*, std::__false_type)

[jira] [Commented] (IMPALA-10259) Hit DCHECK in TestImpalaShell.test_completed_query_errors_2

2020-10-19 Thread Quanlong Huang (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-10259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17216725#comment-17216725
 ] 

Quanlong Huang commented on IMPALA-10259:
-

[~wzhou], assign this to you since you resolved IMPALA-10050. Feel free to mark 
it as duplicated or reassign it. Thanks!

> Hit DCHECK in TestImpalaShell.test_completed_query_errors_2
> ---
>
> Key: IMPALA-10259
> URL: https://issues.apache.org/jira/browse/IMPALA-10259
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Quanlong Huang
>Assignee: Wenzhe Zhou
>Priority: Critical
>
> TestImpalaShell.test_completed_query_errors_2 hits a DCHECK in a core ASAN 
> build:
> {code:java}
> F1016 17:08:54.728466 19955 query-state.cc:877] 
> 924f4ce603ac07bb:a08656e3] Check failed: is_cancelled_.Load() == 1 (0 
> vs. 1)  {code}
> The test is:
> {code:java}
> shell.test_shell_commandline.TestImpalaShell.test_completed_query_errors_2[table_format_and_file_extension:
>  ('textfile', '.txt') | protocol: hs2] {code}
> The query is:
> {code:java}
> I1016 17:08:49.026532 19947 Frontend.java:1522] 
> 924f4ce603ac07bb:a08656e3] Analyzing query: select id, cnt from 
> functional_parquet.bad_column_metadata t, (select 1 cnt) u db: default {code}
> Query options:
> {code:java}
> I1016 17:08:49.020670 19947 impala-hs2-server.cc:269] 
> TClientRequest.queryOptions: TQueryOptions {
>   01: abort_on_error (bool) = true,
>   02: max_errors (i32) = 100,
>   03: disable_codegen (bool) = false,
>   04: batch_size (i32) = 0,
>   05: num_nodes (i32) = 0,
>   06: max_scan_range_length (i64) = 0,
>   07: num_scanner_threads (i32) = 0,
>   11: debug_action (string) = "",
>   12: mem_limit (i64) = 0,
>   15: hbase_caching (i32) = 0,
>   16: hbase_cache_blocks (bool) = false,
>   17: parquet_file_size (i64) = 0,
>   18: explain_level (i32) = 1,
>   19: sync_ddl (bool) = false,
>   24: disable_outermost_topn (bool) = false,
>   26: query_timeout_s (i32) = 0,
>   28: appx_count_distinct (bool) = false,
>   29: disable_unsafe_spills (bool) = false,
>   31: exec_single_node_rows_threshold (i32) = 100,
>   32: optimize_partition_key_scans (bool) = false,
>   33: replica_preference (i32) = 0,
>   34: schedule_random_replica (bool) = false,
>   36: disable_streaming_preaggregations (bool) = false,
>   37: runtime_filter_mode (i32) = 2,
>   38: runtime_bloom_filter_size (i32) = 1048576,
>   39: runtime_filter_wait_time_ms (i32) = 0,
>   40: disable_row_runtime_filtering (bool) = false,
>   41: max_num_runtime_filters (i32) = 10,
>   42: parquet_annotate_strings_utf8 (bool) = false,
>   43: parquet_fallback_schema_resolution (i32) = 0,
>   45: s3_skip_insert_staging (bool) = true,
>   46: runtime_filter_min_size (i32) = 1048576,
>   47: runtime_filter_max_size (i32) = 16777216,
>   48: prefetch_mode (i32) = 1,
>   49: strict_mode (bool) = false,
>   50: scratch_limit (i64) = -1,
>   51: enable_expr_rewrites (bool) = true,
>   52: decimal_v2 (bool) = true,
>   53: parquet_dictionary_filtering (bool) = true,
>   54: parquet_array_resolution (i32) = 0,
>   55: parquet_read_statistics (bool) = true,
>   56: default_join_distribution_mode (i32) = 0,
>   57: disable_codegen_rows_threshold (i32) = 5,
>   58: default_spillable_buffer_size (i64) = 2097152,
>   59: min_spillable_buffer_size (i64) = 65536,
>   60: max_row_size (i64) = 524288,
>   61: idle_session_timeout (i32) = 0,
>   62: compute_stats_min_sample_size (i64) = 1073741824,
>   63: exec_time_limit_s (i32) = 0,
>   64: shuffle_distinct_exprs (bool) = true,
>   65: max_mem_estimate_for_admission (i64) = 0,
>   66: thread_reservation_limit (i32) = 3000,
>   67: thread_reservation_aggregate_limit (i32) = 0,
>   68: kudu_read_mode (i32) = 0,
>   69: allow_erasure_coded_files (bool) = false,
>   70: timezone (string) = "",
>   71: scan_bytes_limit (i64) = 0,
>   72: cpu_limit_s (i64) = 0,
>   73: topn_bytes_limit (i64) = 536870912,
>   74: client_identifier (string) = "Impala Shell v4.0.0-SNAPSHOT (1e30eec) 
> built on Fri Oct 16 13:26:18 PDT 2020",
>   75: resource_trace_ratio (double) = 0,
>   76: num_remote_executor_candidates (i32) = 3,
>   77: num_rows_produced_limit (i64) = 0,
>   78: planner_testcase_mode (bool) = false,
>   79: default_file_format (i32) = 0,
>   80: parquet_timestamp_type (i32) = 0,
>   81: parquet_read_page_index (bool) = true,
>   82: parquet_write_page_index (bool) = true,
>   84: disable_hdfs_num_rows_estimate (bool) = false,
>   86: spool_query_results (bool) = false,
>   87: default_transactional_type (i32) = 0,
>   88: statement_expression_limit (i32) = 25,
>   89: max_statement_length_bytes (i32) = 16777216,
>   90: disable_data_cache (bool) = false,
>   91: max_result_spooling_mem (i64) = 104857600,
>   92: max_spilled_result_spooling_mem (i64) = 1073741824,
>   93:

[jira] [Created] (IMPALA-10259) Hit DCHECK in TestImpalaShell.test_completed_query_errors_2

2020-10-19 Thread Quanlong Huang (Jira)

Quanlong Huang created IMPALA-10259:
---

 Summary: Hit DCHECK in 
TestImpalaShell.test_completed_query_errors_2
 Key: IMPALA-10259
 URL: https://issues.apache.org/jira/browse/IMPALA-10259
 Project: IMPALA
  Issue Type: Bug
Reporter: Quanlong Huang
Assignee: Wenzhe Zhou


TestImpalaShell.test_completed_query_errors_2 hits a DCHECK in a core ASAN 
build:
{code:java}
F1016 17:08:54.728466 19955 query-state.cc:877] 
924f4ce603ac07bb:a08656e3] Check failed: is_cancelled_.Load() == 1 (0 
vs. 1)  {code}
The test is:
{code:java}
shell.test_shell_commandline.TestImpalaShell.test_completed_query_errors_2[table_format_and_file_extension:
 ('textfile', '.txt') | protocol: hs2] {code}
The query is:
{code:java}
I1016 17:08:49.026532 19947 Frontend.java:1522] 
924f4ce603ac07bb:a08656e3] Analyzing query: select id, cnt from 
functional_parquet.bad_column_metadata t, (select 1 cnt) u db: default {code}
Query options:
{code:java}
I1016 17:08:49.020670 19947 impala-hs2-server.cc:269] 
TClientRequest.queryOptions: TQueryOptions {
  01: abort_on_error (bool) = true,
  02: max_errors (i32) = 100,
  03: disable_codegen (bool) = false,
  04: batch_size (i32) = 0,
  05: num_nodes (i32) = 0,
  06: max_scan_range_length (i64) = 0,
  07: num_scanner_threads (i32) = 0,
  11: debug_action (string) = "",
  12: mem_limit (i64) = 0,
  15: hbase_caching (i32) = 0,
  16: hbase_cache_blocks (bool) = false,
  17: parquet_file_size (i64) = 0,
  18: explain_level (i32) = 1,
  19: sync_ddl (bool) = false,
  24: disable_outermost_topn (bool) = false,
  26: query_timeout_s (i32) = 0,
  28: appx_count_distinct (bool) = false,
  29: disable_unsafe_spills (bool) = false,
  31: exec_single_node_rows_threshold (i32) = 100,
  32: optimize_partition_key_scans (bool) = false,
  33: replica_preference (i32) = 0,
  34: schedule_random_replica (bool) = false,
  36: disable_streaming_preaggregations (bool) = false,
  37: runtime_filter_mode (i32) = 2,
  38: runtime_bloom_filter_size (i32) = 1048576,
  39: runtime_filter_wait_time_ms (i32) = 0,
  40: disable_row_runtime_filtering (bool) = false,
  41: max_num_runtime_filters (i32) = 10,
  42: parquet_annotate_strings_utf8 (bool) = false,
  43: parquet_fallback_schema_resolution (i32) = 0,
  45: s3_skip_insert_staging (bool) = true,
  46: runtime_filter_min_size (i32) = 1048576,
  47: runtime_filter_max_size (i32) = 16777216,
  48: prefetch_mode (i32) = 1,
  49: strict_mode (bool) = false,
  50: scratch_limit (i64) = -1,
  51: enable_expr_rewrites (bool) = true,
  52: decimal_v2 (bool) = true,
  53: parquet_dictionary_filtering (bool) = true,
  54: parquet_array_resolution (i32) = 0,
  55: parquet_read_statistics (bool) = true,
  56: default_join_distribution_mode (i32) = 0,
  57: disable_codegen_rows_threshold (i32) = 5,
  58: default_spillable_buffer_size (i64) = 2097152,
  59: min_spillable_buffer_size (i64) = 65536,
  60: max_row_size (i64) = 524288,
  61: idle_session_timeout (i32) = 0,
  62: compute_stats_min_sample_size (i64) = 1073741824,
  63: exec_time_limit_s (i32) = 0,
  64: shuffle_distinct_exprs (bool) = true,
  65: max_mem_estimate_for_admission (i64) = 0,
  66: thread_reservation_limit (i32) = 3000,
  67: thread_reservation_aggregate_limit (i32) = 0,
  68: kudu_read_mode (i32) = 0,
  69: allow_erasure_coded_files (bool) = false,
  70: timezone (string) = "",
  71: scan_bytes_limit (i64) = 0,
  72: cpu_limit_s (i64) = 0,
  73: topn_bytes_limit (i64) = 536870912,
  74: client_identifier (string) = "Impala Shell v4.0.0-SNAPSHOT (1e30eec) 
built on Fri Oct 16 13:26:18 PDT 2020",
  75: resource_trace_ratio (double) = 0,
  76: num_remote_executor_candidates (i32) = 3,
  77: num_rows_produced_limit (i64) = 0,
  78: planner_testcase_mode (bool) = false,
  79: default_file_format (i32) = 0,
  80: parquet_timestamp_type (i32) = 0,
  81: parquet_read_page_index (bool) = true,
  82: parquet_write_page_index (bool) = true,
  84: disable_hdfs_num_rows_estimate (bool) = false,
  86: spool_query_results (bool) = false,
  87: default_transactional_type (i32) = 0,
  88: statement_expression_limit (i32) = 25,
  89: max_statement_length_bytes (i32) = 16777216,
  90: disable_data_cache (bool) = false,
  91: max_result_spooling_mem (i64) = 104857600,
  92: max_spilled_result_spooling_mem (i64) = 1073741824,
  93: disable_hbase_num_rows_estimate (bool) = false,
  94: fetch_rows_timeout_ms (i64) = 1,
  95: now_string (string) = "",
  96: parquet_object_store_split_size (i64) = 268435456,
  97: mem_limit_executors (i64) = 0,
  98: broadcast_bytes_limit (i64) = 34359738368,
  99: preagg_bytes_limit (i64) = -1,
  100: enable_cnf_rewrites (bool) = true,
  101: max_cnf_exprs (i32) = 0,
  102: kudu_snapshot_read_timestamp_micros (i64) = 0,
  103: retry_failed_queries (bool) = false,
  104: enabled_runtime_filter_types (i32) = 3,
  105: async_codegen (bool) = false,
  106:

[jira] [Commented] (IMPALA-10247) Data loading of functional-query ORC fails with EOFException

2020-10-19 Thread Peter Vary (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-10247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17216717#comment-17216717
 ] 

Peter Vary commented on IMPALA-10247:
-

Maybe [~kuczoram] could be of more help if this is happening in direct insert 
path?

I see this in the log:
{code:java}
 Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at 
org.apache.hadoop.hive.ql.exec.Utilities.handleDirectInsertTableFinalPath(Utilities.java:4587)
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.jobCloseOp(FileSinkOperator.java:1462)
... 29 more
{code}

> Data loading of functional-query ORC fails with EOFException
> 
>
> Key: IMPALA-10247
> URL: https://issues.apache.org/jira/browse/IMPALA-10247
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Quanlong Huang
>Assignee: Zoltán Borók-Nagy
>Priority: Critical
>
> Data loading of functional-query on ORC tables occasionally fails with
> {code:java}
> 16:41:21 Loading custom schemas (logging to 
> /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/data_loading/load-custom-schemas.log)...
>  
> 16:41:24   Loading custom schemas OK (Took: 0 min 4 sec)
> 16:41:24 Started Loading functional-query data in background; pid 23644.
> 16:41:24 Started Loading TPC-H data in background; pid 23645.
> 16:41:24 Loading functional-query data (logging to 
> /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/data_loading/load-functional-query.log)...
>  
> 16:41:24 Started Loading TPC-DS data in background; pid 23646.
> 16:41:24 Loading TPC-H data (logging to 
> /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/data_loading/load-tpch.log)...
>  
> 16:41:24 Loading TPC-DS data (logging to 
> /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/data_loading/load-tpcds.log)...
>  
> 16:48:51   Loading workload 'tpch' using exploration strategy 'core' OK 
> (Took: 7 min 27 sec)
> 16:50:53 FAILED (Took: 9 min 29 sec)
> 16:50:53 'load-data functional-query exhaustive' failed. Tail of log: 
> {code}
> This looks similar to IMPALA-9923 but have a different error stacktrace:
> {code:java}
> 2020-10-13T16:50:50,369  INFO [HiveServer2-Background-Pool: Thread-23853] 
> ql.Driver: Executing 
> command(queryId=jenkins_20201013165050_5dc3d632-a5c3-4f85-b2d3-8c1dc6682322): 
> INSERT OVERWRITE TABLE tpcds_orc_def.web_sales
> SELECT * FROM tpcds.web_sales
> ..
> 2020-10-13T16:50:53,423  INFO [HiveServer2-Background-Pool: Thread-23832] 
> FileOperations: Reading manifest 
> hdfs://localhost:20500/test-warehouse/managed/jointbl_orc_def/_tmp.base_001_0/00_0.manifest
> 2020-10-13T16:50:53,423  INFO [HiveServer2-Background-Pool: Thread-23832] 
> FileOperations: Reading manifest 
> hdfs://localhost:20500/test-warehouse/managed/jointbl_orc_def/_tmp.base_001_0/00_1.manifest
> 2020-10-13T16:50:53,423  INFO [HiveServer2-Background-Pool: Thread-23832] 
> FileOperations: Looking at manifest file: 
> hdfs://localhost:20500/test-warehouse/managed/jointbl_orc_def/_tmp.base_001_0/00_0.manifest
> 2020-10-13T16:50:53,424 ERROR [HiveServer2-Background-Pool: Thread-23832] 
> exec.Task: Job Commit failed with exception 
> 'org.apache.hadoop.hive.ql.metadata.HiveException(java.io.EOFException)'
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.EOFException
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.jobCloseOp(FileSinkOperator.java:1468)
> at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:798)
> at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:803)
> at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:803)
> at org.apache.hadoop.hive.ql.exec.tez.TezTask.close(TezTask.java:627)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:342)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
> at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:357)
> at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330)
> at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246)
> at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:488)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:482)
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166)
> at 
>

[jira] [Resolved] (IMPALA-10248) TestKuduOperations.test_column_storage_attributes on exhaustive tests

2020-10-19 Thread Quanlong Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang resolved IMPALA-10248.
-
Fix Version/s: Impala 4.0
   Resolution: Fixed

> TestKuduOperations.test_column_storage_attributes on exhaustive tests
> -
>
> Key: IMPALA-10248
> URL: https://issues.apache.org/jira/browse/IMPALA-10248
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Blocker
> Fix For: Impala 4.0
>
>
> This is a reverse issue of IMPALA-9513. The failure is
> {code:java}
> query_test/test_kudu.py:472: in test_column_storage_attributes
> assert cursor.fetchall() == \
> E   assert [(0, True, 0, 0, 0, 0, ...)] == [(0, True, 0, 0, 0, 0, ...)]
> E At index 0 diff: (0, True, 0, 0, 0, 0, 0.0, 0.0, '0', 
> datetime.datetime(2009, 1, 1, 0, 0), Decimal('0'), datetime.date(2010, 1, 1), 
> '') != (0, True, 0, 0, 0, 0, 0.0, 0.0, '0', datetime.datetime(2009, 1, 1, 0, 
> 0), 0, '2010-01-01', '')
> E Use -v to get the full diff{code}
> The difference to IMPALA-9513 is that it's expected to get a string 
> '2020-01-01' instead of the actual {{datetime.date(2010, 1, 1)}}. Maybe due 
> to the recent bumping of impyla version in IMPALA-10225.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-10247) Data loading of functional-query ORC fails with EOFException

2020-10-19 Thread Jira



[ 
https://issues.apache.org/jira/browse/IMPALA-10247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17216673#comment-17216673
 ] 

Zoltán Borók-Nagy commented on IMPALA-10247:


Thanks, Quanlong.

[~pvary] could you please take a look? This seems like a Hive issue.

> Data loading of functional-query ORC fails with EOFException
> 
>
> Key: IMPALA-10247
> URL: https://issues.apache.org/jira/browse/IMPALA-10247
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Quanlong Huang
>Assignee: Zoltán Borók-Nagy
>Priority: Critical
>
> Data loading of functional-query on ORC tables occasionally fails with
> {code:java}
> 16:41:21 Loading custom schemas (logging to 
> /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/data_loading/load-custom-schemas.log)...
>  
> 16:41:24   Loading custom schemas OK (Took: 0 min 4 sec)
> 16:41:24 Started Loading functional-query data in background; pid 23644.
> 16:41:24 Started Loading TPC-H data in background; pid 23645.
> 16:41:24 Loading functional-query data (logging to 
> /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/data_loading/load-functional-query.log)...
>  
> 16:41:24 Started Loading TPC-DS data in background; pid 23646.
> 16:41:24 Loading TPC-H data (logging to 
> /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/data_loading/load-tpch.log)...
>  
> 16:41:24 Loading TPC-DS data (logging to 
> /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/data_loading/load-tpcds.log)...
>  
> 16:48:51   Loading workload 'tpch' using exploration strategy 'core' OK 
> (Took: 7 min 27 sec)
> 16:50:53 FAILED (Took: 9 min 29 sec)
> 16:50:53 'load-data functional-query exhaustive' failed. Tail of log: 
> {code}
> This looks similar to IMPALA-9923 but have a different error stacktrace:
> {code:java}
> 2020-10-13T16:50:50,369  INFO [HiveServer2-Background-Pool: Thread-23853] 
> ql.Driver: Executing 
> command(queryId=jenkins_20201013165050_5dc3d632-a5c3-4f85-b2d3-8c1dc6682322): 
> INSERT OVERWRITE TABLE tpcds_orc_def.web_sales
> SELECT * FROM tpcds.web_sales
> ..
> 2020-10-13T16:50:53,423  INFO [HiveServer2-Background-Pool: Thread-23832] 
> FileOperations: Reading manifest 
> hdfs://localhost:20500/test-warehouse/managed/jointbl_orc_def/_tmp.base_001_0/00_0.manifest
> 2020-10-13T16:50:53,423  INFO [HiveServer2-Background-Pool: Thread-23832] 
> FileOperations: Reading manifest 
> hdfs://localhost:20500/test-warehouse/managed/jointbl_orc_def/_tmp.base_001_0/00_1.manifest
> 2020-10-13T16:50:53,423  INFO [HiveServer2-Background-Pool: Thread-23832] 
> FileOperations: Looking at manifest file: 
> hdfs://localhost:20500/test-warehouse/managed/jointbl_orc_def/_tmp.base_001_0/00_0.manifest
> 2020-10-13T16:50:53,424 ERROR [HiveServer2-Background-Pool: Thread-23832] 
> exec.Task: Job Commit failed with exception 
> 'org.apache.hadoop.hive.ql.metadata.HiveException(java.io.EOFException)'
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.EOFException
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.jobCloseOp(FileSinkOperator.java:1468)
> at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:798)
> at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:803)
> at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:803)
> at org.apache.hadoop.hive.ql.exec.tez.TezTask.close(TezTask.java:627)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:342)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
> at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:357)
> at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330)
> at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246)
> at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:488)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:482)
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322)
> at java.security.AccessController.doPrivileged(Native Method)
> at

[jira] [Commented] (IMPALA-10248) TestKuduOperations.test_column_storage_attributes on exhaustive tests

2020-10-19 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-10248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17216649#comment-17216649
 ] 

ASF subversion and git services commented on IMPALA-10248:
--

Commit c7f118a860af6b811e2e2c4c5e3693f43429def8 in impala's branch 
refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=c7f118a ]

IMPALA-10248: Fix test_column_storage_attributes date string errors

After IMPALA-10225 bumps the impyla version to 0.17a1, we should expect
impyla return a datetime.date instead of a string for DATE type data.

Tests:
 - Run test_column_storage_attributes with
   --exploration_strategy=exhaustive to verify the fix.

Change-Id: I618a759a03213efc22a5e54e9a30fa09e8929023
Reviewed-on: http://gerrit.cloudera.org:8080/16608
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins 


> TestKuduOperations.test_column_storage_attributes on exhaustive tests
> -
>
> Key: IMPALA-10248
> URL: https://issues.apache.org/jira/browse/IMPALA-10248
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Blocker
>
> This is a reverse issue of IMPALA-9513. The failure is
> {code:java}
> query_test/test_kudu.py:472: in test_column_storage_attributes
> assert cursor.fetchall() == \
> E   assert [(0, True, 0, 0, 0, 0, ...)] == [(0, True, 0, 0, 0, 0, ...)]
> E At index 0 diff: (0, True, 0, 0, 0, 0, 0.0, 0.0, '0', 
> datetime.datetime(2009, 1, 1, 0, 0), Decimal('0'), datetime.date(2010, 1, 1), 
> '') != (0, True, 0, 0, 0, 0, 0.0, 0.0, '0', datetime.datetime(2009, 1, 1, 0, 
> 0), 0, '2010-01-01', '')
> E Use -v to get the full diff{code}
> The difference to IMPALA-9513 is that it's expected to get a string 
> '2020-01-01' instead of the actual {{datetime.date(2010, 1, 1)}}. Maybe due 
> to the recent bumping of impyla version in IMPALA-10225.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-10225) Bump Impyla version

2020-10-19 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-10225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17216650#comment-17216650
 ] 

ASF subversion and git services commented on IMPALA-10225:
--

Commit c7f118a860af6b811e2e2c4c5e3693f43429def8 in impala's branch 
refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=c7f118a ]

IMPALA-10248: Fix test_column_storage_attributes date string errors

After IMPALA-10225 bumps the impyla version to 0.17a1, we should expect
impyla return a datetime.date instead of a string for DATE type data.

Tests:
 - Run test_column_storage_attributes with
   --exploration_strategy=exhaustive to verify the fix.

Change-Id: I618a759a03213efc22a5e54e9a30fa09e8929023
Reviewed-on: http://gerrit.cloudera.org:8080/16608
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins 


> Bump Impyla version
> ---
>
> Key: IMPALA-10225
> URL: https://issues.apache.org/jira/browse/IMPALA-10225
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
> Fix For: Impala 4.0
>
>
> There are a couple of new Impyla releases that we can test out in Impala's 
> end-to-end test environment - https://pypi.org/project/impyla/0.17a1/#history



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-10258) TestQueryRetries.test_original_query_cancel is flaky

2020-10-19 Thread Quanlong Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang reassigned IMPALA-10258:
---

Assignee: Sahil Takiar

Assign this to [~stakiar] first since this seems similar to IMPALA-9550. Feel 
free to reassign it to me. Thanks!

> TestQueryRetries.test_original_query_cancel is flaky 
> -
>
> Key: IMPALA-10258
> URL: https://issues.apache.org/jira/browse/IMPALA-10258
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Quanlong Huang
>Assignee: Sahil Takiar
>Priority: Critical
>
> Saw this fails in a core-s3-data-cache build:
> custom_cluster.test_query_retries.TestQueryRetries.test_original_query_cancel
> {code:java}
> custom_cluster/test_query_retries.py:622: in test_original_query_cancel
> self.wait_for_state(handle, self.client.QUERY_STATES[state], 60)
> common/impala_test_suite.py:1053: in wait_for_state
> self.wait_for_any_state(handle, [expected_state], timeout, client)
> common/impala_test_suite.py:1070: in wait_for_any_state
> actual_state))
> E   Timeout: query 494af68cdf3d8ecb:d3a3bf36 did not reach one of the 
> expected states [3], last known state 5{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-10258) TestQueryRetries.test_original_query_cancel is flaky

2020-10-19 Thread Quanlong Huang (Jira)

Quanlong Huang created IMPALA-10258:
---

 Summary: TestQueryRetries.test_original_query_cancel is flaky 
 Key: IMPALA-10258
 URL: https://issues.apache.org/jira/browse/IMPALA-10258
 Project: IMPALA
  Issue Type: Bug
Reporter: Quanlong Huang


Saw this fails in a core-s3-data-cache build:

custom_cluster.test_query_retries.TestQueryRetries.test_original_query_cancel
{code:java}
custom_cluster/test_query_retries.py:622: in test_original_query_cancel
self.wait_for_state(handle, self.client.QUERY_STATES[state], 60)
common/impala_test_suite.py:1053: in wait_for_state
self.wait_for_any_state(handle, [expected_state], timeout, client)
common/impala_test_suite.py:1070: in wait_for_any_state
actual_state))
E   Timeout: query 494af68cdf3d8ecb:d3a3bf36 did not reach one of the 
expected states [3], last known state 5{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-10257) Hit DCHECK in HdfsParquetScanner::CheckPageFiltering in a CORE S3 build

2020-10-19 Thread Quanlong Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang reassigned IMPALA-10257:
---

Assignee: Zoltán Borók-Nagy

Assign to [~boroknagyz] first since you are the expert on parquet page indexing.

> Hit DCHECK in HdfsParquetScanner::CheckPageFiltering in a CORE S3 build
> ---
>
> Key: IMPALA-10257
> URL: https://issues.apache.org/jira/browse/IMPALA-10257
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Quanlong Huang
>Assignee: Zoltán Borók-Nagy
>Priority: Critical
>
> Saw the crash in a CORE S3 build:
> {code:java}
> F1018 06:14:23.631407 12990 hdfs-parquet-scanner.cc:1170] 
> cf49030f4bbe0736:84de19aa0002] Check failed: false 
> {code}
> The query is a tpch-nested query:
> {code:java}
> I1018 06:14:22.352707 12712 Frontend.java:1522] 
> cf49030f4bbe0736:84de19aa] Analyzing query: select
>   l_shipmode,
>   sum(case
> when o_orderpriority = '1-URGENT'
>   or o_orderpriority = '2-HIGH'
> then 1
> else 0
>   end) as high_line_count,
>   sum(case
> when o_orderpriority <> '1-URGENT'
>   and o_orderpriority <> '2-HIGH'
> then 1
> else 0
>   end) as low_line_count
> from
>   customer.c_orders o,
>   o.o_lineitems l
> where
>   l_shipmode in ('MAIL', 'SHIP')
>   and l_commitdate < l_receiptdate
>   and l_shipdate < l_commitdate
>   and l_receiptdate >= '1994-01-01'
>   and l_receiptdate < '1995-01-01'
> group by
>   l_shipmode
> order by
>   l_shipmode db: tpch_nested_parquet
> {code}
> The test is
> {code:java}
> query_test.test_tpch_nested_queries.TestTpchNestedQuery.test_tpch_q12[protocol:
>  beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> parquet/none]{code}
> A similar test also failed in the same build:
> {code:java}
> authorization.test_ranger.TestRangerColumnMaskingTpchNested.test_tpch_nested_column_masking[protocol:
>  beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> parquet/none]{code}
> Backtrace:
> {code:java}
> #0  0x7f32b548c1f7 in raise () from /lib64/libc.so.6
> #1  0x7f32b548d8e8 in abort () from /lib64/libc.so.6
> #2  0x0521cce4 in google::DumpStackTraceAndExit() ()
> #3  0x052120dd in google::LogMessage::Fail() ()
> #4  0x052139cd in google::LogMessage::SendToLog() ()
> #5  0x05211a3b in google::LogMessage::Flush() ()
> #6  0x05215639 in google::LogMessageFatal::~LogMessageFatal() ()
> #7  0x02d87f54 in impala::HdfsParquetScanner::CheckPageFiltering 
> (this=0xb6f) at 
> /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/be/src/exec/parquet/hdfs-parquet-scanner.cc:1170
> #8  0x02d87860 in impala::HdfsParquetScanner::AssembleRows 
> (this=0xb6f, column_readers=..., row_batch=0x10bc95a0, 
> skip_row_group=0xb6f01d0) at 
> /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/be/src/exec/parquet/hdfs-parquet-scanner.cc:1150
> #9  0x02d82453 in impala::HdfsParquetScanner::GetNextInternal 
> (this=0xb6f, row_batch=0x10bc95a0) at 
> /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/be/src/exec/parquet/hdfs-parquet-scanner.cc:458
> #10 0x02d803e2 in impala::HdfsParquetScanner::ProcessSplit 
> (this=0xb6f) at 
> /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/be/src/exec/parquet/hdfs-parquet-scanner.cc:350
> #11 0x02986f4d in impala::HdfsScanNode::ProcessSplit 
> (this=0x11ade800, filter_ctxs=..., expr_results_pool=0x7f31da200480, 
> scan_range=0x16a08b20, scanner_thread_reservation=0x7f31da2003a8) at 
> /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/be/src/exec/hdfs-scan-node.cc:500
> #12 0x029862ce in impala::HdfsScanNode::ScannerThread 
> (this=0x11ade800, first_thread=true, scanner_thread_reservation=25165824) at 
> /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/be/src/exec/hdfs-scan-node.cc:418
> #13 0x02985636 in impala::HdfsScanNodeoperator()(void) 
> const (__closure=0x7f31da200ba8) at 
> /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/be/src/exec/hdfs-scan-node.cc:339
> #14 0x029879ef in 
> boost::detail::function::void_function_obj_invoker0,
>  void>::invoke(boost::detail::function::function_buffer &) 
> (function_obj_ptr=...) at 
>

[jira] [Updated] (IMPALA-10257) Hit DCHECK in HdfsParquetScanner::CheckPageFiltering in a CORE S3 build

2020-10-19 Thread Quanlong Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-10257:

Description: 
Saw the crash in a CORE S3 build:
{code:java}
F1018 06:14:23.631407 12990 hdfs-parquet-scanner.cc:1170] 
cf49030f4bbe0736:84de19aa0002] Check failed: false 
{code}
The query is a tpch-nested query:
{code:java}
I1018 06:14:22.352707 12712 Frontend.java:1522] 
cf49030f4bbe0736:84de19aa] Analyzing query: select
  l_shipmode,
  sum(case
when o_orderpriority = '1-URGENT'
  or o_orderpriority = '2-HIGH'
then 1
else 0
  end) as high_line_count,
  sum(case
when o_orderpriority <> '1-URGENT'
  and o_orderpriority <> '2-HIGH'
then 1
else 0
  end) as low_line_count
from
  customer.c_orders o,
  o.o_lineitems l
where
  l_shipmode in ('MAIL', 'SHIP')
  and l_commitdate < l_receiptdate
  and l_shipdate < l_commitdate
  and l_receiptdate >= '1994-01-01'
  and l_receiptdate < '1995-01-01'
group by
  l_shipmode
order by
  l_shipmode db: tpch_nested_parquet
{code}
The test is
{code:java}
query_test.test_tpch_nested_queries.TestTpchNestedQuery.test_tpch_q12[protocol: 
beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, 
'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
parquet/none]{code}
A similar test also failed in the same build:
{code:java}
authorization.test_ranger.TestRangerColumnMaskingTpchNested.test_tpch_nested_column_masking[protocol:
 beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
parquet/none]{code}
Backtrace:
{code:java}
#0  0x7f32b548c1f7 in raise () from /lib64/libc.so.6
#1  0x7f32b548d8e8 in abort () from /lib64/libc.so.6
#2  0x0521cce4 in google::DumpStackTraceAndExit() ()
#3  0x052120dd in google::LogMessage::Fail() ()
#4  0x052139cd in google::LogMessage::SendToLog() ()
#5  0x05211a3b in google::LogMessage::Flush() ()
#6  0x05215639 in google::LogMessageFatal::~LogMessageFatal() ()
#7  0x02d87f54 in impala::HdfsParquetScanner::CheckPageFiltering 
(this=0xb6f) at 
/data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/be/src/exec/parquet/hdfs-parquet-scanner.cc:1170
#8  0x02d87860 in impala::HdfsParquetScanner::AssembleRows 
(this=0xb6f, column_readers=..., row_batch=0x10bc95a0, 
skip_row_group=0xb6f01d0) at 
/data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/be/src/exec/parquet/hdfs-parquet-scanner.cc:1150
#9  0x02d82453 in impala::HdfsParquetScanner::GetNextInternal 
(this=0xb6f, row_batch=0x10bc95a0) at 
/data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/be/src/exec/parquet/hdfs-parquet-scanner.cc:458
#10 0x02d803e2 in impala::HdfsParquetScanner::ProcessSplit 
(this=0xb6f) at 
/data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/be/src/exec/parquet/hdfs-parquet-scanner.cc:350
#11 0x02986f4d in impala::HdfsScanNode::ProcessSplit (this=0x11ade800, 
filter_ctxs=..., expr_results_pool=0x7f31da200480, scan_range=0x16a08b20, 
scanner_thread_reservation=0x7f31da2003a8) at 
/data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/be/src/exec/hdfs-scan-node.cc:500
#12 0x029862ce in impala::HdfsScanNode::ScannerThread (this=0x11ade800, 
first_thread=true, scanner_thread_reservation=25165824) at 
/data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/be/src/exec/hdfs-scan-node.cc:418
#13 0x02985636 in impala::HdfsScanNodeoperator()(void) 
const (__closure=0x7f31da200ba8) at 
/data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/be/src/exec/hdfs-scan-node.cc:339
#14 0x029879ef in 
boost::detail::function::void_function_obj_invoker0,
 void>::invoke(boost::detail::function::function_buffer &) 
(function_obj_ptr=...) at 
/data/jenkins/workspace/impala-cdpd-master-staging-core-s3/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:159
#15 0x021467d6 in boost::function0::operator() 
(this=0x7f31da200ba0) at 
/data/jenkins/workspace/impala-cdpd-master-staging-core-s3/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:770
#16 0x02727552 in 
impala::Thread::SuperviseThread(std::__cxx11::basic_string, std::allocator > const&, 
std::__cxx11::basic_string, std::allocator > 
const&, boost::function, impala::ThreadDebugInfo const*, 
impala::Promise*) (name=..., category=..., 
functor=..., parent_thread_info=0x7f31dc204840, thread_started=0x7f31dc202e70) 
at

[jira] [Created] (IMPALA-10257) Hit DCHECK in HdfsParquetScanner::CheckPageFiltering in a CORE S3 build

2020-10-19 Thread Quanlong Huang (Jira)

Quanlong Huang created IMPALA-10257:
---

 Summary: Hit DCHECK in HdfsParquetScanner::CheckPageFiltering in a 
CORE S3 build
 Key: IMPALA-10257
 URL: https://issues.apache.org/jira/browse/IMPALA-10257
 Project: IMPALA
  Issue Type: Bug
Reporter: Quanlong Huang


Saw the crash in a CORE S3 build:
{code}F1018 08:41:48.955114 27641 hdfs-parquet-scanner.cc:1170] 
ed47e522687c15e8:f07974d10002] Check failed: false 
{code}
Backtrace:
{code}
#0  0x7f32b548c1f7 in raise () from /lib64/libc.so.6
#1  0x7f32b548d8e8 in abort () from /lib64/libc.so.6
#2  0x0521cce4 in google::DumpStackTraceAndExit() ()
#3  0x052120dd in google::LogMessage::Fail() ()
#4  0x052139cd in google::LogMessage::SendToLog() ()
#5  0x05211a3b in google::LogMessage::Flush() ()
#6  0x05215639 in google::LogMessageFatal::~LogMessageFatal() ()
#7  0x02d87f54 in impala::HdfsParquetScanner::CheckPageFiltering 
(this=0xb6f) at 
/data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/be/src/exec/parquet/hdfs-parquet-scanner.cc:1170
#8  0x02d87860 in impala::HdfsParquetScanner::AssembleRows 
(this=0xb6f, column_readers=..., row_batch=0x10bc95a0, 
skip_row_group=0xb6f01d0) at 
/data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/be/src/exec/parquet/hdfs-parquet-scanner.cc:1150
#9  0x02d82453 in impala::HdfsParquetScanner::GetNextInternal 
(this=0xb6f, row_batch=0x10bc95a0) at 
/data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/be/src/exec/parquet/hdfs-parquet-scanner.cc:458
#10 0x02d803e2 in impala::HdfsParquetScanner::ProcessSplit 
(this=0xb6f) at 
/data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/be/src/exec/parquet/hdfs-parquet-scanner.cc:350
#11 0x02986f4d in impala::HdfsScanNode::ProcessSplit (this=0x11ade800, 
filter_ctxs=..., expr_results_pool=0x7f31da200480, scan_range=0x16a08b20, 
scanner_thread_reservation=0x7f31da2003a8) at 
/data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/be/src/exec/hdfs-scan-node.cc:500
#12 0x029862ce in impala::HdfsScanNode::ScannerThread (this=0x11ade800, 
first_thread=true, scanner_thread_reservation=25165824) at 
/data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/be/src/exec/hdfs-scan-node.cc:418
#13 0x02985636 in impala::HdfsScanNodeoperator()(void) 
const (__closure=0x7f31da200ba8) at 
/data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/be/src/exec/hdfs-scan-node.cc:339
#14 0x029879ef in 
boost::detail::function::void_function_obj_invoker0,
 void>::invoke(boost::detail::function::function_buffer &) 
(function_obj_ptr=...) at 
/data/jenkins/workspace/impala-cdpd-master-staging-core-s3/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:159
#15 0x021467d6 in boost::function0::operator() 
(this=0x7f31da200ba0) at 
/data/jenkins/workspace/impala-cdpd-master-staging-core-s3/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:770
#16 0x02727552 in 
impala::Thread::SuperviseThread(std::__cxx11::basic_string, std::allocator > const&, 
std::__cxx11::basic_string, std::allocator > 
const&, boost::function, impala::ThreadDebugInfo const*, 
impala::Promise*) (name=..., category=..., 
functor=..., parent_thread_info=0x7f31dc204840, thread_started=0x7f31dc202e70) 
at 
/data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/be/src/util/thread.cc:360
#17 0x0272f4ef in 
boost::_bi::list5, std::allocator > >, 
boost::_bi::value, 
std::allocator > >, boost::_bi::value >, 
boost::_bi::value, 
boost::_bi::value*> 
>::operator(), 
std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&, boost::function, impala::ThreadDebugInfo const*, impala::Promise*), boost::_bi::list0>(boost::_bi::type, void 
(*&)(std::__cxx11::basic_string, 
std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&, boost::function, impala::ThreadDebugInfo const*, impala::Promise*), boost::_bi::list0&, int) (this=0x15915340, 
f=@0x15915338: 0x272720c 
, std::allocator > const&, 
std::__cxx11::basic_string, std::allocator > 
const&, boost::function, impala::ThreadDebugInfo const*, 
impala::Promise*)>, a=...) at 
/data/jenkins/workspace/impala-cdpd-master-staging-core-s3/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/bind.hpp:531
#18 0x0272f413 in boost::_bi::bind_t, 
std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&, boost::function, impala::ThreadDebugInfo const*, impala::Promise*), 
boost::_bi::list5, std::allocator > >, 
boost::_bi::value, 
std::allocator > >, boost::_bi::value >, 
boost::_bi::value, 
boost::_bi::value*> >

[jira] [Created] (IMPALA-10256) TestDisableFeatures.test_disable_incremental_metadata_updates fails

2020-10-19 Thread Quanlong Huang (Jira)

Quanlong Huang created IMPALA-10256:
---

 Summary: 
TestDisableFeatures.test_disable_incremental_metadata_updates fails
 Key: IMPALA-10256
 URL: https://issues.apache.org/jira/browse/IMPALA-10256
 Project: IMPALA
  Issue Type: Bug
Reporter: Quanlong Huang
Assignee: Quanlong Huang


Saw test failures in internal CORE builds:

custom_cluster.test_disable_features.TestDisableFeatures.test_disable_incremental_metadata_updates[protocol:
 beeswax | exec_option: \{'batch_size': 0, 'num_nodes': 0, 
'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
text/none-unique_database0]
{code:java}
custom_cluster/test_disable_features.py:45: in 
test_disable_incremental_metadata_updates
use_db=unique_database, multiple_impalad=True)
common/impala_test_suite.py:662: in run_test_case
result = exec_fn(query, user=test_section.get('USER', '').strip() or None)
common/impala_test_suite.py:600: in __exec_in_impala
result = self.__execute_query(target_impalad_client, query, user=user)
common/impala_test_suite.py:909: in __execute_query
return impalad_client.execute(query, user=user)
common/impala_connection.py:205: in execute
return self.__beeswax_client.execute(sql_stmt, user=user)
beeswax/impala_beeswax.py:187: in execute
handle = self.__execute_query(query_string.strip(), user=user)
beeswax/impala_beeswax.py:363: in __execute_query
handle = self.execute_query_async(query_string, user=user)
beeswax/impala_beeswax.py:357: in execute_query_async
handle = self.__do_rpc(lambda: self.imp_service.query(query,))
beeswax/impala_beeswax.py:520: in __do_rpc
raise ImpalaBeeswaxException(self.__build_error_message(b), b)
E   ImpalaBeeswaxException: ImpalaBeeswaxException:
EINNER EXCEPTION: 
EMESSAGE: AnalysisException: The specified cache pool does not exist: 
testPool {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-10256) TestDisableFeatures.test_disable_incremental_metadata_updates fails

2020-10-19 Thread Quanlong Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-10256:

Priority: Blocker  (was: Major)

> TestDisableFeatures.test_disable_incremental_metadata_updates fails
> ---
>
> Key: IMPALA-10256
> URL: https://issues.apache.org/jira/browse/IMPALA-10256
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Blocker
>
> Saw test failures in internal CORE builds:
> custom_cluster.test_disable_features.TestDisableFeatures.test_disable_incremental_metadata_updates[protocol:
>  beeswax | exec_option: \{'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> text/none-unique_database0]
> {code:java}
> custom_cluster/test_disable_features.py:45: in 
> test_disable_incremental_metadata_updates
> use_db=unique_database, multiple_impalad=True)
> common/impala_test_suite.py:662: in run_test_case
> result = exec_fn(query, user=test_section.get('USER', '').strip() or None)
> common/impala_test_suite.py:600: in __exec_in_impala
> result = self.__execute_query(target_impalad_client, query, user=user)
> common/impala_test_suite.py:909: in __execute_query
> return impalad_client.execute(query, user=user)
> common/impala_connection.py:205: in execute
> return self.__beeswax_client.execute(sql_stmt, user=user)
> beeswax/impala_beeswax.py:187: in execute
> handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:363: in __execute_query
> handle = self.execute_query_async(query_string, user=user)
> beeswax/impala_beeswax.py:357: in execute_query_async
> handle = self.__do_rpc(lambda: self.imp_service.query(query,))
> beeswax/impala_beeswax.py:520: in __do_rpc
> raise ImpalaBeeswaxException(self.__build_error_message(b), b)
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EINNER EXCEPTION: 
> EMESSAGE: AnalysisException: The specified cache pool does not exist: 
> testPool {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-10250) TestNestedTypes.test_scanner_position fails in an ASAN test

2020-10-19 Thread Quanlong Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang reassigned IMPALA-10250:
---

Assignee: Quanlong Huang

> TestNestedTypes.test_scanner_position fails in an ASAN test
> ---
>
> Key: IMPALA-10250
> URL: https://issues.apache.org/jira/browse/IMPALA-10250
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
>
> TestNestedTypes.test_scanner_position fails in a CORE ASAN job:
> {code:java}
> query_test.test_nested_types.TestNestedTypes.test_scanner_position[mt_dop: 0 
> | protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> orc/def/block]
> query_test.test_nested_types.TestNestedTypes.test_scanner_position[mt_dop: 2 
> | protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> orc/def/block] {code}
> The stacktrace are the same:
> {code:java}
> query_test/test_nested_types.py:76: in test_scanner_position
> self.run_test_case('QueryTest/nested-types-scanner-position', vector)
> common/impala_test_suite.py:693: in run_test_case
> self.__verify_results_and_errors(vector, test_section, result, use_db)
> common/impala_test_suite.py:529: in __verify_results_and_errors
> replace_filenames_with_placeholder)
> common/test_result_verifier.py:456: in verify_raw_results
> VERIFIER_MAP[verifier](expected, actual)
> common/test_result_verifier.py:278: in verify_query_result_is_equal
> assert expected_results == actual_results
> E   assert Comparing QueryTestResults (expected vs actual):
> E 0,-1,7300 != 0,-1,9366
> E 0,1,7300 != 0,1,9800
> E 0,NULL,7300 != 0,NULL,9796
> E 1,1,7300 != 1,1,9796
> E 1,2,7300 != 1,2,9800
> E 2,2,7300 != 2,2,9796
> E 2,3,7300 != 2,3,9800
> E 3,NULL,7300 != 3,NULL,9796
> E 4,3,7300 != 4,3,9796
> E 5,NULL,7300 != 5,NULL,9796 {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-10247) Data loading of functional-query ORC fails with EOFException

2020-10-19 Thread Quanlong Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang reassigned IMPALA-10247:
---

Assignee: Zoltán Borók-Nagy

Assign to [~boroknagyz] first since this looks like a variant of IMPALA-9923.

> Data loading of functional-query ORC fails with EOFException
> 
>
> Key: IMPALA-10247
> URL: https://issues.apache.org/jira/browse/IMPALA-10247
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Quanlong Huang
>Assignee: Zoltán Borók-Nagy
>Priority: Critical
>
> Data loading of functional-query on ORC tables occasionally fails with
> {code:java}
> 16:41:21 Loading custom schemas (logging to 
> /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/data_loading/load-custom-schemas.log)...
>  
> 16:41:24   Loading custom schemas OK (Took: 0 min 4 sec)
> 16:41:24 Started Loading functional-query data in background; pid 23644.
> 16:41:24 Started Loading TPC-H data in background; pid 23645.
> 16:41:24 Loading functional-query data (logging to 
> /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/data_loading/load-functional-query.log)...
>  
> 16:41:24 Started Loading TPC-DS data in background; pid 23646.
> 16:41:24 Loading TPC-H data (logging to 
> /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/data_loading/load-tpch.log)...
>  
> 16:41:24 Loading TPC-DS data (logging to 
> /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/data_loading/load-tpcds.log)...
>  
> 16:48:51   Loading workload 'tpch' using exploration strategy 'core' OK 
> (Took: 7 min 27 sec)
> 16:50:53 FAILED (Took: 9 min 29 sec)
> 16:50:53 'load-data functional-query exhaustive' failed. Tail of log: 
> {code}
> This looks similar to IMPALA-9923 but have a different error stacktrace:
> {code:java}
> 2020-10-13T16:50:50,369  INFO [HiveServer2-Background-Pool: Thread-23853] 
> ql.Driver: Executing 
> command(queryId=jenkins_20201013165050_5dc3d632-a5c3-4f85-b2d3-8c1dc6682322): 
> INSERT OVERWRITE TABLE tpcds_orc_def.web_sales
> SELECT * FROM tpcds.web_sales
> ..
> 2020-10-13T16:50:53,423  INFO [HiveServer2-Background-Pool: Thread-23832] 
> FileOperations: Reading manifest 
> hdfs://localhost:20500/test-warehouse/managed/jointbl_orc_def/_tmp.base_001_0/00_0.manifest
> 2020-10-13T16:50:53,423  INFO [HiveServer2-Background-Pool: Thread-23832] 
> FileOperations: Reading manifest 
> hdfs://localhost:20500/test-warehouse/managed/jointbl_orc_def/_tmp.base_001_0/00_1.manifest
> 2020-10-13T16:50:53,423  INFO [HiveServer2-Background-Pool: Thread-23832] 
> FileOperations: Looking at manifest file: 
> hdfs://localhost:20500/test-warehouse/managed/jointbl_orc_def/_tmp.base_001_0/00_0.manifest
> 2020-10-13T16:50:53,424 ERROR [HiveServer2-Background-Pool: Thread-23832] 
> exec.Task: Job Commit failed with exception 
> 'org.apache.hadoop.hive.ql.metadata.HiveException(java.io.EOFException)'
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.EOFException
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.jobCloseOp(FileSinkOperator.java:1468)
> at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:798)
> at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:803)
> at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:803)
> at org.apache.hadoop.hive.ql.exec.tez.TezTask.close(TezTask.java:627)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:342)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
> at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:357)
> at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330)
> at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246)
> at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:488)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:482)
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322)
> at java.security.AccessController.doPrivileged(Native Method)
> at

[jira] [Updated] (IMPALA-10250) TestNestedTypes.test_scanner_position fails in an ASAN test

2020-10-19 Thread Quanlong Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-10250:

Description: 
TestNestedTypes.test_scanner_position fails in a CORE ASAN job:
{code:java}
query_test.test_nested_types.TestNestedTypes.test_scanner_position[mt_dop: 0 | 
protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
orc/def/block]
query_test.test_nested_types.TestNestedTypes.test_scanner_position[mt_dop: 2 | 
protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
orc/def/block] {code}
The stacktrace are the same:
{code:java}
query_test/test_nested_types.py:76: in test_scanner_position
self.run_test_case('QueryTest/nested-types-scanner-position', vector)
common/impala_test_suite.py:693: in run_test_case
self.__verify_results_and_errors(vector, test_section, result, use_db)
common/impala_test_suite.py:529: in __verify_results_and_errors
replace_filenames_with_placeholder)
common/test_result_verifier.py:456: in verify_raw_results
VERIFIER_MAP[verifier](expected, actual)
common/test_result_verifier.py:278: in verify_query_result_is_equal
assert expected_results == actual_results
E   assert Comparing QueryTestResults (expected vs actual):
E 0,-1,7300 != 0,-1,9366
E 0,1,7300 != 0,1,9800
E 0,NULL,7300 != 0,NULL,9796
E 1,1,7300 != 1,1,9796
E 1,2,7300 != 1,2,9800
E 2,2,7300 != 2,2,9796
E 2,3,7300 != 2,3,9800
E 3,NULL,7300 != 3,NULL,9796
E 4,3,7300 != 4,3,9796
E 5,NULL,7300 != 5,NULL,9796 {code}

  was:
TestNestedTypes.test_scanner_position fails in a CORE ASAN job:
{code:java}
query_test.test_nested_types.TestNestedTypes.test_scanner_position[mt_dop: 0 | 
protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
orc/def/block]query_test.test_nested_types.TestNestedTypes.test_scanner_position[mt_dop:
 2 | protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
orc/def/block] {code}
The stacktrace are the same:
{code:java}
query_test/test_nested_types.py:76: in test_scanner_position
self.run_test_case('QueryTest/nested-types-scanner-position', vector)
common/impala_test_suite.py:693: in run_test_case
self.__verify_results_and_errors(vector, test_section, result, use_db)
common/impala_test_suite.py:529: in __verify_results_and_errors
replace_filenames_with_placeholder)
common/test_result_verifier.py:456: in verify_raw_results
VERIFIER_MAP[verifier](expected, actual)
common/test_result_verifier.py:278: in verify_query_result_is_equal
assert expected_results == actual_results
E   assert Comparing QueryTestResults (expected vs actual):
E 0,-1,7300 != 0,-1,9366
E 0,1,7300 != 0,1,9800
E 0,NULL,7300 != 0,NULL,9796
E 1,1,7300 != 1,1,9796
E 1,2,7300 != 1,2,9800
E 2,2,7300 != 2,2,9796
E 2,3,7300 != 2,3,9800
E 3,NULL,7300 != 3,NULL,9796
E 4,3,7300 != 4,3,9796
E 5,NULL,7300 != 5,NULL,9796 {code}


> TestNestedTypes.test_scanner_position fails in an ASAN test
> ---
>
> Key: IMPALA-10250
> URL: https://issues.apache.org/jira/browse/IMPALA-10250
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Quanlong Huang
>Priority: Critical
>
> TestNestedTypes.test_scanner_position fails in a CORE ASAN job:
> {code:java}
> query_test.test_nested_types.TestNestedTypes.test_scanner_position[mt_dop: 0 
> | protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> orc/def/block]
> query_test.test_nested_types.TestNestedTypes.test_scanner_position[mt_dop: 2 
> | protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> orc/def/block] {code}
> The stacktrace are the same:
> {code:java}
> query_test/test_nested_types.py:76: in test_scanner_position
> self.run_test_case('QueryTest/nested-types-scanner-position', vector)
> common/impala_test_suite.py:693: in run_test_case
> self.__verify_results_and_errors(vector, test_section, result, use_db)
> common/impala_test_suite.py:529: in

[jira] [Updated] (IMPALA-10254) Load data files via Iceberg for Iceberg Tables

2020-10-19 Thread Jira



 [ 
https://issues.apache.org/jira/browse/IMPALA-10254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy updated IMPALA-10254:
---
Description: 
Currently we still load the file descriptors of an Iceberg table via recursive 
file listing.

This lists too many files, e.g. metadata files, files that are being written 
(can later throw checksum errors), files from aborted INSERTs, removed files, 
etc.

We should use the Iceberg API to load the file descriptors corresponding to the 
table snapshot.

Note that we already load data files through the Iceberg APIs to fill the 
'path_hash_to_file_descriptor' map 
([https://github.com/apache/impala/blob/master/common/thrift/CatalogObjects.thrift#L551).]

  was:
Currently we still load the file descriptors of an Iceberg table via recursive 
file listing.

This lists too many files, e.g. metadata files, files that are being written 
(can later throw checksum errors), files from aborted INSERTs, removed files, 
etc.

We should use the Iceberg API to load the file descriptors corresponding to the 
table snapshot.


> Load data files via Iceberg for Iceberg Tables
> --
>
> Key: IMPALA-10254
> URL: https://issues.apache.org/jira/browse/IMPALA-10254
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Priority: Major
>  Labels: impala-iceberg
>
> Currently we still load the file descriptors of an Iceberg table via 
> recursive file listing.
> This lists too many files, e.g. metadata files, files that are being written 
> (can later throw checksum errors), files from aborted INSERTs, removed files, 
> etc.
> We should use the Iceberg API to load the file descriptors corresponding to 
> the table snapshot.
> Note that we already load data files through the Iceberg APIs to fill the 
> 'path_hash_to_file_descriptor' map 
> ([https://github.com/apache/impala/blob/master/common/thrift/CatalogObjects.thrift#L551).]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Comment Edited] (IMPALA-9884) TestAdmissionControllerStress.test_mem_limit failing occasionally

2020-10-19 Thread Quanlong Huang (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-9884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17216561#comment-17216561
 ] 

Quanlong Huang edited comment on IMPALA-9884 at 10/19/20, 9:24 AM:
---

Saw this again in an internal exhaustive build.

custom_cluster.test_admission_controller.TestAdmissionControllerStress.test_mem_limit[num_queries:
 50 | protocol: beeswax | table_format: text/none | exec_option: 
\{'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 5000, 
'disable_codegen': False, 'abort_on_error': 1, 
'exec_single_node_rows_threshold': 0} | submission_delay_ms: 150 | 
round_robin_submission: False]
{code:java}
custom_cluster/test_admission_controller.py:1856: in test_mem_limit
{'request_pool': self.pool_name, 'mem_limit': query_mem_limit})
custom_cluster/test_admission_controller.py:1712: in run_admission_test
assert metric_deltas['dequeued'] == 0,\
E   AssertionError: Queued queries should not run until others are made to 
finish
E   assert 1 == 0 {code}


was (Author: stiga-huang):
Saw this again in an internal exhaustive build.

> TestAdmissionControllerStress.test_mem_limit failing occasionally
> -
>
> Key: IMPALA-9884
> URL: https://issues.apache.org/jira/browse/IMPALA-9884
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 4.0
>Reporter: Vihang Karajgaonkar
>Assignee: Tim Armstrong
>Priority: Blocker
>  Labels: broken-build, flaky
>
> Recently, I saw this test failing with the exception trace below. 
> {noformat}
> custom_cluster/test_admission_controller.py:1782: in test_mem_limit
> {'request_pool': self.pool_name, 'mem_limit': query_mem_limit})
> custom_cluster/test_admission_controller.py:1638: in run_admission_test
> assert metric_deltas['dequeued'] == 0,\
> E   AssertionError: Queued queries should not run until others are made to 
> finish
> E   assert 1 == 0
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Reopened] (IMPALA-9884) TestAdmissionControllerStress.test_mem_limit failing occasionally

2020-10-19 Thread Quanlong Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang reopened IMPALA-9884:


Saw this again in an internal exhaustive build.

> TestAdmissionControllerStress.test_mem_limit failing occasionally
> -
>
> Key: IMPALA-9884
> URL: https://issues.apache.org/jira/browse/IMPALA-9884
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 4.0
>Reporter: Vihang Karajgaonkar
>Assignee: Tim Armstrong
>Priority: Blocker
>  Labels: broken-build, flaky
>
> Recently, I saw this test failing with the exception trace below. 
> {noformat}
> custom_cluster/test_admission_controller.py:1782: in test_mem_limit
> {'request_pool': self.pool_name, 'mem_limit': query_mem_limit})
> custom_cluster/test_admission_controller.py:1638: in run_admission_test
> assert metric_deltas['dequeued'] == 0,\
> E   AssertionError: Queued queries should not run until others are made to 
> finish
> E   assert 1 == 0
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Work started] (IMPALA-10255) query_test.test_insert.TestInsertQueries.test_insert fails in exhaustive builds

2020-10-19 Thread Quanlong Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-10255 started by Quanlong Huang.
---
> query_test.test_insert.TestInsertQueries.test_insert fails in exhaustive 
> builds
> ---
>
> Key: IMPALA-10255
> URL: https://issues.apache.org/jira/browse/IMPALA-10255
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Blocker
>
> The patch in IMPALA-10233 adds 3 insert statements in 
> testdata/workloads/functional-query/queries/QueryTest/insert.test. They 
> introduce test failures in parquet format with non-none compressions:
> {code:java}
> query_test.test_insert.TestInsertQueries.test_insert[compression_codec: 
> snappy | protocol: beeswax | exec_option: {'sync_ddl': 1, 'batch_size': 0, 
> 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> parquet/none-unique_database0]query_test.test_insert.TestInsertQueries.test_insert[compression_codec:
>  gzip | protocol: beeswax | exec_option: {'sync_ddl': 0, 'batch_size': 0, 
> 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': 
> False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | 
> table_format: 
> parquet/none-unique_database0]query_test.test_insert.TestInsertQueries.test_insert[compression_codec:
>  zstd | protocol: beeswax | exec_option: {'sync_ddl': 1, 'batch_size': 0, 
> 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': 
> False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | 
> table_format: 
> parquet/none-unique_database0]query_test.test_insert.TestInsertQueries.test_insert[compression_codec:
>  lz4 | protocol: beeswax | exec_option: {'sync_ddl': 1, 'batch_size': 0, 
> 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': 
> False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | 
> table_format: 
> parquet/none-unique_database0]query_test.test_insert.TestInsertQueries.test_insert[compression_codec:
>  lz4 | protocol: beeswax | exec_option: {'sync_ddl': 0, 'batch_size': 0, 
> 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': 
> False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | 
> table_format: 
> parquet/none-unique_database0]query_test.test_insert.TestInsertQueries.test_insert[compression_codec:
>  zstd | protocol: beeswax | exec_option: {'sync_ddl': 0, 'batch_size': 0, 
> 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': 
> False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | 
> table_format: 
> parquet/none-unique_database0]query_test.test_insert.TestInsertQueries.test_insert[compression_codec:
>  gzip | protocol: beeswax | exec_option: {'sync_ddl': 1, 'batch_size': 0, 
> 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': 
> False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | 
> table_format: 
> parquet/none-unique_database0]query_test.test_insert.TestInsertQueries.test_insert[compression_codec:
>  snappy | protocol: beeswax | exec_option: {'sync_ddl': 1, 'batch_size': 0, 
> 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': 
> False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | 
> table_format: 
> parquet/none-unique_database0]query_test.test_insert.TestInsertQueries.test_insert[compression_codec:
>  snappy | protocol: beeswax | exec_option: {'sync_ddl': 0, 'batch_size': 0, 
> 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': 
> False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | 
> table_format: 
> parquet/none-unique_database0]query_test.test_insert.TestInsertQueries.test_insert[compression_codec:
>  gzip | protocol: beeswax | exec_option: {'sync_ddl': 0, 'batch_size': 0, 
> 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> parquet/none-unique_database0]query_test.test_insert.TestInsertQueries.test_insert[compression_codec:
>  zstd | protocol: beeswax | exec_option: {'sync_ddl': 1, 'batch_size': 0, 
> 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> parquet/none-unique_database0]query_test.test_insert.TestInsertQueries.test_insert[compression_codec:
>  lz4 | protocol: beeswax | exec_option: {'sync_ddl': 1, 'batch_size': 0, 
> 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
>

[jira] [Created] (IMPALA-10255) query_test.test_insert.TestInsertQueries.test_insert fails in exhaustive builds

2020-10-19 Thread Quanlong Huang (Jira)

Quanlong Huang created IMPALA-10255:
---

 Summary: query_test.test_insert.TestInsertQueries.test_insert 
fails in exhaustive builds
 Key: IMPALA-10255
 URL: https://issues.apache.org/jira/browse/IMPALA-10255
 Project: IMPALA
  Issue Type: Bug
Reporter: Quanlong Huang
Assignee: Quanlong Huang


The patch in IMPALA-10233 adds 3 insert statements in 
testdata/workloads/functional-query/queries/QueryTest/insert.test. They 
introduce test failures in parquet format with non-none compressions:
{code:java}
query_test.test_insert.TestInsertQueries.test_insert[compression_codec: snappy 
| protocol: beeswax | exec_option: {'sync_ddl': 1, 'batch_size': 0, 
'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 
'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
parquet/none-unique_database0]query_test.test_insert.TestInsertQueries.test_insert[compression_codec:
 gzip | protocol: beeswax | exec_option: {'sync_ddl': 0, 'batch_size': 0, 
'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
parquet/none-unique_database0]query_test.test_insert.TestInsertQueries.test_insert[compression_codec:
 zstd | protocol: beeswax | exec_option: {'sync_ddl': 1, 'batch_size': 0, 
'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
parquet/none-unique_database0]query_test.test_insert.TestInsertQueries.test_insert[compression_codec:
 lz4 | protocol: beeswax | exec_option: {'sync_ddl': 1, 'batch_size': 0, 
'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
parquet/none-unique_database0]query_test.test_insert.TestInsertQueries.test_insert[compression_codec:
 lz4 | protocol: beeswax | exec_option: {'sync_ddl': 0, 'batch_size': 0, 
'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
parquet/none-unique_database0]query_test.test_insert.TestInsertQueries.test_insert[compression_codec:
 zstd | protocol: beeswax | exec_option: {'sync_ddl': 0, 'batch_size': 0, 
'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
parquet/none-unique_database0]query_test.test_insert.TestInsertQueries.test_insert[compression_codec:
 gzip | protocol: beeswax | exec_option: {'sync_ddl': 1, 'batch_size': 0, 
'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
parquet/none-unique_database0]query_test.test_insert.TestInsertQueries.test_insert[compression_codec:
 snappy | protocol: beeswax | exec_option: {'sync_ddl': 1, 'batch_size': 0, 
'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
parquet/none-unique_database0]query_test.test_insert.TestInsertQueries.test_insert[compression_codec:
 snappy | protocol: beeswax | exec_option: {'sync_ddl': 0, 'batch_size': 0, 
'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
parquet/none-unique_database0]query_test.test_insert.TestInsertQueries.test_insert[compression_codec:
 gzip | protocol: beeswax | exec_option: {'sync_ddl': 0, 'batch_size': 0, 
'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 
'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
parquet/none-unique_database0]query_test.test_insert.TestInsertQueries.test_insert[compression_codec:
 zstd | protocol: beeswax | exec_option: {'sync_ddl': 1, 'batch_size': 0, 
'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 
'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
parquet/none-unique_database0]query_test.test_insert.TestInsertQueries.test_insert[compression_codec:
 lz4 | protocol: beeswax | exec_option: {'sync_ddl': 1, 'batch_size': 0, 
'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 
'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
parquet/none-unique_database0]query_test.test_insert.TestInsertQueries.test_insert[compression_codec:
 lz4 | protocol: beeswax | exec_option: {'sync_ddl': 0, 'batch_size': 0, 
'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 
'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format:

[jira] [Created] (IMPALA-10254) Load data files via Iceberg for Iceberg Tables

2020-10-19 Thread Jira

Zoltán Borók-Nagy created IMPALA-10254:
--

 Summary: Load data files via Iceberg for Iceberg Tables
 Key: IMPALA-10254
 URL: https://issues.apache.org/jira/browse/IMPALA-10254
 Project: IMPALA
  Issue Type: Bug
Reporter: Zoltán Borók-Nagy


Currently we still load the file descriptors of an Iceberg table via recursive 
file listing.

This lists too many files, e.g. metadata files, files that are being written 
(can later throw checksum errors), files from aborted INSERTs, removed files, 
etc.

We should use the Iceberg API to load the file descriptors corresponding to the 
table snapshot.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-10253) Improve query performance contains dict function

2020-10-19 Thread gaoxiaoqing (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gaoxiaoqing updated IMPALA-10253:
-
Description: 
we have the following parquet table:
{code:java}
CREATE EXTERNAL TABLE rawdata.event_ros_p1 (
  event_id INT,
  user_id BIGINT,
  time TIMESTAMP,
  p_abook_type STRING 
)
PARTITIONED BY (
  day INT,
  event_bucket INT
)
STORED AS PARQUET
LOCATION 'hdfs://localhost:20500/sa/data/1/event'
{code}
the data show as following:
||event_id||user_id||time||p_abook_type||
|1|-922235446862664806|2018-07-18 09:01:06.158|小说|
|2|-922235446862664806|2018-07-19 09:01:06.158|小说|

if we want remapping event_id to the real event name, we can realize dict udf. 
the dict udf is defined as DICT(BIGINT expression, STRING path). first 
parameter is the column, second parameter is hdfs path which store the 
remapping rule like this:
{code:java}
1,SignUp
2,ViewProduct{code}
then build a view table which add the dict column on original table:
{code:java}
CREATE VIEW rawdata.event_external_view_p7 AS SELECT events.*, dict(`event_id`, 
'/data/1/event.txt') AS `event` FROM rawdata.event_view_p7 events
{code}
If the query group by column has dict, the query is slower then group by 
original column. when explain the sql, we found that each line data need 
remapping in SCAN phase and AGGREGATE phase. 
{code:java}
select event, count(*) from event_external_view_p7 where event in ('SignUp', 
'ViewProduct') group by event;{code}
{code:java}
PLAN-ROOT SINK
|
04:EXCHANGE [UNPARTITIONED]
|
03:AGGREGATE [FINALIZE]
|  output: count:merge(*)
|  group by: event
|  row-size=20B cardinality=0
|
02:EXCHANGE [HASH(event)]
|
01:AGGREGATE [STREAMING]
|  output: count(*)
|  group by: rawdata.DICT(event_id, '/data/1/event.txt')
|  row-size=20B cardinality=0
|
00:SCAN HDFS [rawdata.event_ros_p7_merge_offline]
|  partitions=39/39 files=99 size=9.00GB
|  predicates: rawdata.DICT(event_id, '/data/1/event.txt') IN ('SignUp', 
'ViewProduct')
|  row-size=4B cardinality=unavailable
{code}
the idea is to modify plan, use original column in SCAN phase and AGGREGATE 
phase and remapping the original column at last, the new plan like this:
{code:java}
PLAN-ROOT SINK
|
05:SELECT [FINALIZE]
|  output: dict(event_id)
|  row-size=20B cardinality=0
|
04:EXCHANGE [UNPARTITIONED]
|
03:AGGREGATE [FINALIZE]
|  output: count:merge(*)
|  group by: event_id
|  row-size=20B cardinality=0
|
02:EXCHANGE [HASH(event)]
|
01:AGGREGATE [STREAMING]
|  output: count(*)
|  group by: event_id
|  row-size=20B cardinality=0
|
00:SCAN HDFS [rawdata.event_ros_p7_merge_offline]
|  partitions=39/39 files=99 size=9.00GB
|  predicates: event_id IN (1, 2)
|  row-size=4B cardinality=unavailable
{code}
 

 

  was:
we have the following parquet table:
{code:java}
CREATE EXTERNAL TABLE rawdata.event_ros_p1 (
  event_id INT,
  user_id BIGINT,
  time TIMESTAMP,
  p_abook_type STRING 
)
PARTITIONED BY (
  day INT,
  event_bucket INT
)
STORED AS PARQUET
LOCATION 'hdfs://localhost:20500/sa/data/1/event'
{code}
the data show as following:
||event_id||user_id||time||p_abook_type||
|1|-922235446862664806|2018-07-18 09:01:06.158|小说|
|2|-922235446862664806|2018-07-19 09:01:06.158|小说|

if we want remapping event_id to the real event name, we can realize dict udf. 
the dict udf is defined as DICT(BIGINT expression, STRING path). first 
parameter is the column, second parameter is hdfs path which store the 
remapping rule like this:
{code:java}
1,SignUp
2,ViewProduct{code}
then build a view table which add the dict column on original table:
{code:java}
CREATE VIEW rawdata.event_external_view_p7 AS SELECT events.*, dict(`event_id`, 
'/data/1/event.txt') AS `event` FROM rawdata.event_view_p7 events
{code}
If the query group by column has dict, the query is very slow because of each 
line need remapping: 
{code:java}
select event, count(*) from event_external_view_p7 where event in ('SignUp', 
'ViewProduct') group by event;{code}
 explain result is 
{code:java}
PLAN-ROOT SINK
|
04:EXCHANGE [UNPARTITIONED]
|
03:AGGREGATE [FINALIZE]
|  output: count:merge(*)
|  group by: event
|  row-size=20B cardinality=0
|
02:EXCHANGE [HASH(event)]
|
01:AGGREGATE [STREAMING]
|  output: count(*)
|  group by: rawdata.DICT(event_id, '/data/1/event.txt')
|  row-size=20B cardinality=0
|
00:SCAN HDFS [rawdata.event_ros_p7_merge_offline]
|  partitions=39/39 files=99 size=9.00GB
|  predicates: rawdata.DICT(event_id, '/data/1/event.txt') IN ('SignUp', 
'ViewProduct')
|  row-size=4B cardinality=unavailable
{code}
we can modify plan, rewrite AGGREGATE NODE and SCAN NODE, the new plan like 
this:
{code:java}
PLAN-ROOT SINK
|
05:SELECT [FINALIZE]
|  output: dict(event_id)
|  row-size=20B cardinality=0
|
04:EXCHANGE [UNPARTITIONED]
|
03:AGGREGATE [FINALIZE]
|  output: count:merge(*)
|  group by: event_id
|  row-size=20B cardinality=0
|
02:EXCHANGE [HASH(event)]
|
01:AGGREGATE [STREAMING]
|  output: count(*)
|  group by: event_id
|

[jira] [Updated] (IMPALA-10253) Improve query performance contains dict function

2020-10-19 Thread gaoxiaoqing (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gaoxiaoqing updated IMPALA-10253:
-
Description: 
we have the following parquet table:
{code:java}
CREATE EXTERNAL TABLE rawdata.event_ros_p1 (
  event_id INT,
  user_id BIGINT,
  time TIMESTAMP,
  p_abook_type STRING 
)
PARTITIONED BY (
  day INT,
  event_bucket INT
)
STORED AS PARQUET
LOCATION 'hdfs://localhost:20500/sa/data/1/event'
{code}
the data show as following:
||event_id||user_id||time||p_abook_type||
|1|-922235446862664806|2018-07-18 09:01:06.158|小说|
|2|-922235446862664806|2018-07-19 09:01:06.158|小说|

if we want remapping event_id to the real event name, we can realize dict udf. 
the dict udf is defined as DICT(BIGINT expression, STRING path). first 
parameter is the column, second parameter is hdfs path which store the 
remapping rule like this:
{code:java}
1,SignUp
2,ViewProduct{code}
then build a view table which add the dict column on original table:
{code:java}
CREATE VIEW rawdata.event_external_view_p7 AS SELECT events.*, dict(`event_id`, 
'/data/1/event.txt') AS `event` FROM rawdata.event_view_p7 events
{code}
If the query group by column has dict, the query is very slow because of each 
line need remapping: 
{code:java}
select event, count(*) from event_external_view_p7 where event in ('SignUp', 
'ViewProduct') group by event;{code}
 explain result is 
{code:java}
PLAN-ROOT SINK
|
04:EXCHANGE [UNPARTITIONED]
|
03:AGGREGATE [FINALIZE]
|  output: count:merge(*)
|  group by: event
|  row-size=20B cardinality=0
|
02:EXCHANGE [HASH(event)]
|
01:AGGREGATE [STREAMING]
|  output: count(*)
|  group by: rawdata.DICT(event_id, '/data/1/event.txt')
|  row-size=20B cardinality=0
|
00:SCAN HDFS [rawdata.event_ros_p7_merge_offline]
|  partitions=39/39 files=99 size=9.00GB
|  predicates: rawdata.DICT(event_id, '/data/1/event.txt') IN ('SignUp', 
'ViewProduct')
|  row-size=4B cardinality=unavailable
{code}
we can modify plan, rewrite AGGREGATE NODE and SCAN NODE, the new plan like 
this:
{code:java}
PLAN-ROOT SINK
|
05:SELECT [FINALIZE]
|  output: dict(event_id)
|  row-size=20B cardinality=0
|
04:EXCHANGE [UNPARTITIONED]
|
03:AGGREGATE [FINALIZE]
|  output: count:merge(*)
|  group by: event_id
|  row-size=20B cardinality=0
|
02:EXCHANGE [HASH(event)]
|
01:AGGREGATE [STREAMING]
|  output: count(*)
|  group by: event_id
|  row-size=20B cardinality=0
|
00:SCAN HDFS [rawdata.event_ros_p7_merge_offline]
|  partitions=39/39 files=99 size=9.00GB
|  predicates: event_id IN (1, 2)
|  row-size=4B cardinality=unavailable
{code}
 

 

  was:
If we have the following parquet table:
{code:java}
CREATE EXTERNAL TABLE rawdata.event_ros_p1 (
  event_id INT,
  user_id BIGINT,
  time TIMESTAMP,
  p_abook_type STRING 
)
PARTITIONED BY (
  day INT,
  event_bucket INT
)
STORED AS PARQUET
LOCATION 'hdfs://localhost:20500/sa/data/1/event'
{code}
 the data as the following:
||event_id||user_id||time||p_abook_type||
|1|-922235446862664806|2018-07-18 09:01:06.158|小说|
|2|-922235446862664806|2018-07-19 09:01:06.158|小说|

now, we need remapping event_id to the real event name to show customer, the 
remapping rule like this:
{code:java}
1,SignUp
2,ViewProduct{code}
 we can realize udf remapping event_id to event_name, the rule store on hdfs, 
and then build a view table:
{code:java}
CREATE VIEW rawdata.event_external_view_p7 AS SELECT events.*, dict(`event_id`, 
'/data/1/event.txt') AS `event` FROM rawdata.event_view_p7 events
{code}
 If the query group by dict udf function, the query is very slow because of 
each line need remapping: 
{code:java}
select event, count(*) from event_external_view_p7 where event in ('SignUp', 
'ViewProduct') group by event;{code}
 explain result is 
{code:java}
PLAN-ROOT SINK
|
04:EXCHANGE [UNPARTITIONED]
|
03:AGGREGATE [FINALIZE]
|  output: count:merge(*)
|  group by: event
|  row-size=20B cardinality=0
|
02:EXCHANGE [HASH(event)]
|
01:AGGREGATE [STREAMING]
|  output: count(*)
|  group by: rawdata.dict(event_id)
|  row-size=20B cardinality=0
|
00:SCAN HDFS [rawdata.event_ros_p7_merge_offline]
|  partitions=39/39 files=99 size=9.00GB
|  predicates: rawdata.dict(event_id) IN ('SignUp', 'ViewProduct')
|  row-size=4B cardinality=unavailable
{code}
we can modify plan, rewrite AGGREGATE NODE and SCAN NODE, the new plan like 
this:
{code:java}
PLAN-ROOT SINK
|
05:SELECT [FINALIZE]
|  output: dict(event_id)
|  row-size=20B cardinality=0
|
04:EXCHANGE [UNPARTITIONED]
|
03:AGGREGATE [FINALIZE]
|  output: count:merge(*)
|  group by: event_id
|  row-size=20B cardinality=0
|
02:EXCHANGE [HASH(event)]
|
01:AGGREGATE [STREAMING]
|  output: count(*)
|  group by: event_id
|  row-size=20B cardinality=0
|
00:SCAN HDFS [rawdata.event_ros_p7_merge_offline]
|  partitions=39/39 files=99 size=9.00GB
|  predicates: event_id IN (1, 2)
|  row-size=4B cardinality=unavailable
{code}
 

 


> Improve query performance contains dict function
>

82 matches

Mail list logo