[jira] [Commented] (IMPALA-10120) Beeline hangs when connecting to coordinators

2020-10-23 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17219768#comment-17219768
 ] 

Sahil Takiar commented on IMPALA-10120:
---

Not particularly familiar with that config, but this is what the Hive code has 
to say about it:
{code:java}
HIVE_SERVER2_THRIFT_RESULTSET_DEFAULT_FETCH_SIZE("hive.server2.thrift.resultset.default.fetch.size",
 1000,
"The number of rows sent in one Fetch RPC call by the server to the 
client, if not\n" +
"specified by the client."), {code}

> Beeline hangs when connecting to coordinators
> -
>
> Key: IMPALA-10120
> URL: https://issues.apache.org/jira/browse/IMPALA-10120
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Quanlong Huang
>Priority: Critical
>
> Beeline is always hanging when connecting to a coordinator:
> {code:java}
> $ beeline -u "jdbc:hive2://localhost:21050/default;auth=noSasl"
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/home/quanlong/workspace/Impala/toolchain/cdp_components-4493826/apache-hive-3.1.3000.7.2.1.0-287-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/home/quanlong/workspace/Impala/toolchain/cdp_components-4493826/hadoop-3.1.1.7.2.1.0-287/share/hadoop/common/lib/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> ERROR StatusLogger No log4j2 configuration file found. Using default 
> configuration: logging only errors to the console. Set system property 
> 'log4j2.debug' to show Log4j2 internal initialization logging.
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/home/quanlong/workspace/Impala/toolchain/cdp_components-4493826/apache-hive-3.1.3000.7.2.1.0-287-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/home/quanlong/workspace/Impala/toolchain/cdp_components-4493826/hadoop-3.1.1.7.2.1.0-287/share/hadoop/common/lib/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> Connecting to jdbc:hive2://localhost:21050/default;auth=noSasl
> Connected to: Impala (version 4.0.0-SNAPSHOT)
> Driver: Hive JDBC (version 3.1.3000.7.2.1.0-287)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> {code}
> Looking into the impalad log file, there is a wrong option set:
> {code:java}
> I0901 15:41:14.577576 25325 TAcceptQueueServer.cpp:340] New connection to 
> server hiveserver2-frontend from client 
> I0901 15:41:14.577911 25597 impala-hs2-server.cc:300] Opening session: 
> 204a3f33cc8e28ea:d6a915ab96b26aa7 request username: 
> I0901 15:41:14.577970 25597 status.cc:129] Invalid query option: 
> set:hiveconf:hive.server2.thrift.resultset.default.fetch.size
> @  0x1cdba3d  impala::Status::Status()
> @  0x24c673f  impala::SetQueryOption()
> @  0x250c1d1  impala::ImpalaServer::OpenSession()
> @  0x2b0dc45  
> apache::hive::service::cli::thrift::TCLIServiceProcessor::process_OpenSession()
> @  0x2b0d993  
> apache::hive::service::cli::thrift::TCLIServiceProcessor::dispatchCall()
> @  0x2acd15a  
> impala::ImpalaHiveServer2ServiceProcessor::dispatchCall()
> @  0x1c8a483  apache::thrift::TDispatchProcessor::process()
> @  0x218ab4a  
> apache::thrift::server::TAcceptQueueServer::Task::run()
> @  0x218004a  impala::ThriftThread::RunRunnable()
> @  0x2181686  boost::_mfi::mf2<>::operator()()
> @  0x218151a  boost::_bi::list3<>::operator()<>()
> @  0x2181260  boost::_bi::bind_t<>::operator()()
> @  0x2181172  
> boost::detail::function::void_function_obj_invoker0<>::invoke()
> @  0x20fba57  boost::function0<>::operator()()
> @  0x26cb779  impala::Thread::SuperviseThread()
> @  0x26d3716  boost::_bi::list5<>::operator()<>()
> @  0x26d363a  boost::_bi::bind_t<>::operator()()
> @  0x26d35fb  boost::detail::thread_data<>::run()
> @  0x3eb7ae1  thread_proxy
> @ 0x7fc9443456b9  start_thread
> @ 0x7fc940e334dc  clone
> I0901 15:41:14.739985 25597 impala-hs2-server.cc:405] Opened session: 
> 204a3f33cc8e28ea:d6a915ab96b26aa7 effective username: 
> I0901 15:41:14.781677 25597 impala-hs2-server.cc:426] GetInfo(): 
> request=TGetInfoReq {
>   01: sessionHandle (struct) = TSessionHandle {
> 01: sessionId 

[jira] [Resolved] (IMPALA-9954) RpcRecvrTime can be negative

2020-10-19 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-9954.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> RpcRecvrTime can be negative
> 
>
> Key: IMPALA-9954
> URL: https://issues.apache.org/jira/browse/IMPALA-9954
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Riza Suminto
>Priority: Major
> Fix For: Impala 4.0
>
> Attachments: profile_034e7209bd98c96c_9a448dfc.txt
>
>
> Saw this on a recent version of master. Attached the full runtime profile.
> {code:java}
> KrpcDataStreamSender (dst_id=2):(Total: 9.863ms, non-child: 3.185ms, 
> % non-child: 32.30%)
>   ExecOption: Unpartitioned Sender Codegen Disabled: not needed
>- BytesSent (500.000ms): 0, 0
>- NetworkThroughput: (Avg: 4.34 MB/sec ; Min: 4.34 MB/sec ; Max: 
> 4.34 MB/sec ; Number of samples: 1)
>- RpcNetworkTime: (Avg: 3.562ms ; Min: 679.676us ; Max: 6.445ms ; 
> Number of samples: 2)
>- RpcRecvrTime: (Avg: -151281.000ns ; Min: -231485.000ns ; Max: 
> -71077.000ns ; Number of samples: 2)
>- EosSent: 1 (1)
>- PeakMemoryUsage: 416.00 B (416)
>- RowsSent: 100 (100)
>- RpcFailure: 0 (0)
>- RpcRetry: 0 (0)
>- SerializeBatchTime: 2.880ms
>- TotalBytesSent: 28.67 KB (29355)
>- UncompressedRowBatchSize: 69.29 KB (70950) {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-9954) RpcRecvrTime can be negative

2020-10-19 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar reassigned IMPALA-9954:


Assignee: Riza Suminto

> RpcRecvrTime can be negative
> 
>
> Key: IMPALA-9954
> URL: https://issues.apache.org/jira/browse/IMPALA-9954
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Riza Suminto
>Priority: Major
> Attachments: profile_034e7209bd98c96c_9a448dfc.txt
>
>
> Saw this on a recent version of master. Attached the full runtime profile.
> {code:java}
> KrpcDataStreamSender (dst_id=2):(Total: 9.863ms, non-child: 3.185ms, 
> % non-child: 32.30%)
>   ExecOption: Unpartitioned Sender Codegen Disabled: not needed
>- BytesSent (500.000ms): 0, 0
>- NetworkThroughput: (Avg: 4.34 MB/sec ; Min: 4.34 MB/sec ; Max: 
> 4.34 MB/sec ; Number of samples: 1)
>- RpcNetworkTime: (Avg: 3.562ms ; Min: 679.676us ; Max: 6.445ms ; 
> Number of samples: 2)
>- RpcRecvrTime: (Avg: -151281.000ns ; Min: -231485.000ns ; Max: 
> -71077.000ns ; Number of samples: 2)
>- EosSent: 1 (1)
>- PeakMemoryUsage: 416.00 B (416)
>- RowsSent: 100 (100)
>- RpcFailure: 0 (0)
>- RpcRetry: 0 (0)
>- SerializeBatchTime: 2.880ms
>- TotalBytesSent: 28.67 KB (29355)
>- UncompressedRowBatchSize: 69.29 KB (70950) {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-9954) RpcRecvrTime can be negative

2020-10-19 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-9954.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> RpcRecvrTime can be negative
> 
>
> Key: IMPALA-9954
> URL: https://issues.apache.org/jira/browse/IMPALA-9954
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Riza Suminto
>Priority: Major
> Fix For: Impala 4.0
>
> Attachments: profile_034e7209bd98c96c_9a448dfc.txt
>
>
> Saw this on a recent version of master. Attached the full runtime profile.
> {code:java}
> KrpcDataStreamSender (dst_id=2):(Total: 9.863ms, non-child: 3.185ms, 
> % non-child: 32.30%)
>   ExecOption: Unpartitioned Sender Codegen Disabled: not needed
>- BytesSent (500.000ms): 0, 0
>- NetworkThroughput: (Avg: 4.34 MB/sec ; Min: 4.34 MB/sec ; Max: 
> 4.34 MB/sec ; Number of samples: 1)
>- RpcNetworkTime: (Avg: 3.562ms ; Min: 679.676us ; Max: 6.445ms ; 
> Number of samples: 2)
>- RpcRecvrTime: (Avg: -151281.000ns ; Min: -231485.000ns ; Max: 
> -71077.000ns ; Number of samples: 2)
>- EosSent: 1 (1)
>- PeakMemoryUsage: 416.00 B (416)
>- RowsSent: 100 (100)
>- RpcFailure: 0 (0)
>- RpcRetry: 0 (0)
>- SerializeBatchTime: 2.880ms
>- TotalBytesSent: 28.67 KB (29355)
>- UncompressedRowBatchSize: 69.29 KB (70950) {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IMPALA-9954) RpcRecvrTime can be negative

2020-10-19 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217063#comment-17217063
 ] 

Sahil Takiar commented on IMPALA-9954:
--

[~rizaon] so if understand correctly, the remaining work to be done here is to 
add proper locking of {{rpc_start_time_ns_}} in 
{{be/src/runtime/krpc-data-stream-sender.cc}}

> RpcRecvrTime can be negative
> 
>
> Key: IMPALA-9954
> URL: https://issues.apache.org/jira/browse/IMPALA-9954
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Priority: Major
> Attachments: profile_034e7209bd98c96c_9a448dfc.txt
>
>
> Saw this on a recent version of master. Attached the full runtime profile.
> {code:java}
> KrpcDataStreamSender (dst_id=2):(Total: 9.863ms, non-child: 3.185ms, 
> % non-child: 32.30%)
>   ExecOption: Unpartitioned Sender Codegen Disabled: not needed
>- BytesSent (500.000ms): 0, 0
>- NetworkThroughput: (Avg: 4.34 MB/sec ; Min: 4.34 MB/sec ; Max: 
> 4.34 MB/sec ; Number of samples: 1)
>- RpcNetworkTime: (Avg: 3.562ms ; Min: 679.676us ; Max: 6.445ms ; 
> Number of samples: 2)
>- RpcRecvrTime: (Avg: -151281.000ns ; Min: -231485.000ns ; Max: 
> -71077.000ns ; Number of samples: 2)
>- EosSent: 1 (1)
>- PeakMemoryUsage: 416.00 B (416)
>- RowsSent: 100 (100)
>- RpcFailure: 0 (0)
>- RpcRetry: 0 (0)
>- SerializeBatchTime: 2.880ms
>- TotalBytesSent: 28.67 KB (29355)
>- UncompressedRowBatchSize: 69.29 KB (70950) {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10220) Min value of RpcNetworkTime can be negative

2020-10-19 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217061#comment-17217061
 ] 

Sahil Takiar commented on IMPALA-10220:
---

[~rizaon] can this be closed?

> Min value of RpcNetworkTime can be negative
> ---
>
> Key: IMPALA-10220
> URL: https://issues.apache.org/jira/browse/IMPALA-10220
> Project: IMPALA
>  Issue Type: Bug
>  Components: Distributed Exec
>Affects Versions: Impala 3.4.0
>Reporter: Riza Suminto
>Assignee: Riza Suminto
>Priority: Major
>
> There is a bug in function 
> KrpcDataStreamSender::Channel::EndDataStreamCompleteCb(), particularly in 
> this line:
> [https://github.com/apache/impala/blob/d453d52/be/src/runtime/krpc-data-stream-sender.cc#L635]
> network_time_ns should be computed using eos_rsp_.receiver_latency_ns() 
> instead of resp_.receiver_latency_ns().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-9370) Re-factor ImpalaServer, ClientRequestState, Coordinator protocol

2020-10-14 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar reassigned IMPALA-9370:


Assignee: (was: Sahil Takiar)

> Re-factor ImpalaServer, ClientRequestState, Coordinator protocol
> 
>
> Key: IMPALA-9370
> URL: https://issues.apache.org/jira/browse/IMPALA-9370
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Priority: Major
>
> All of these classes need to be updated to support transparent query retries, 
> and each one could due with some re-factoring so that query retries don't 
> make this code even more complex. For now, I'm going to list out some ideas / 
> suggestions:
>  * Rename ImpalaServer to ImpalaService, I think ImpalaServer is a bit of a 
> misnomer because Impala isn't implementing its own server (it uses Thrift for 
> that) instead it is providing a "service" to end users - this name is 
> consistent with Thrift "service"s as well
>  * Split up ClientRequestState - I'm not sure I fully understand what 
> ClientRequestState is suppose to encapsulate - perhaps originally it captured 
> the state of the actual client request as well as some helper code, but it 
> seems to have evolved over time; it doesn't really look like a purely 
> "stateful" object any more (e.g. it manages admission control submission)
> One possible end state could be:
> ImpalaService <–> QueryDriver (has a ClientRequestState that is not exposed 
> externally) <–> QueryInstance <–> Coordinator
> The QueryDriver is responsible for E2E execution of a query, including all 
> stages such as parsing / planning of a query, submission to admission 
> control, and backend execution. A QueryInstance is a single instance of a 
> query, this is necessary for query retry support since a single query can be 
> run multiple times. The Coordinator remains mostly the same - it is purely 
> responsible for *backend* coordination / execution of a query.
> This provides an opportunity to move a lot of the execution specific logic 
> out of ImpalaServer and into QueryDriver. Currently, ImpalaServer is 
> responsible for submitting the query to the fe/ and then passing the result 
> to the ClientRequestState which submits it for admission control (and 
> eventually the Coordinator for execution).
> QueryDriver encapsulates the E2E execution of a query (starting from a query 
> string, and then returning the results of a query) (inspired by Hive's 
> IDriver interface - 
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/IDriver.java]).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9856) Enable result spooling by default

2020-10-14 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214243#comment-17214243
 ] 

Sahil Takiar commented on IMPALA-9856:
--

I tried to do this, but hit a DCHECK while running exhaustive tests:
{code:java}
Log file created at: 2020/10/13 15:33:06
Running on machine: 
impala-ec2-centos74-m5-4xlarge-ondemand-012a.vpc.cloudera.com
Log line format: [IWEF]mmdd hh:mm:ss.uu threadid file:line] msg
F1013 15:33:06.568224 22912 query-state.cc:877] 
914777cab6a164b8:dce62b1d] Check failed: is_cancelled_.Load() == 1 (0 
vs. 1) {code}
Mindump Stack:
{code}
Operating system: Linux
  0.0.0 Linux 3.10.0-693.5.2.el7.x86_64 #1 SMP Fri Oct 20 
20:32:50 UTC 2017 x86_64
CPU: amd64
 family 6 model 85 stepping 7
 1 CPU

GPU: UNKNOWN

Crash reason:  SIGABRT
Crash address: 0x7d12913
Process uptime: not available

Thread 410 (crashed)
 0  libc-2.17.so + 0x351f7
rax = 0x   rdx = 0x0006
rcx = 0x   rbx = 0x0004
rsi = 0x5980   rdi = 0x2913
rbp = 0x7f53dff7acc0   rsp = 0x7f53dff7a948
 r8 = 0xr9 = 0x7f53dff7a7c0
r10 = 0x0008   r11 = 0x0202
r12 = 0x076bb400   r13 = 0x0086
r14 = 0x076bb404   r15 = 0x076b3a20
rip = 0x7f54b87021f7
Found by: given as instruction pointer in context
 1  impalad!google::LogMessage::Flush() + 0x1eb
rbp = 0x7f53dff7ae10   rsp = 0x7f53dff7acd0
rip = 0x051fec5b
Found by: previous frame's frame pointer
 2  impalad!google::LogMessageFatal::~LogMessageFatal() + 0x9
rbx = 0x0001   rbp = 0x7f53dff7ae60
rsp = 0x7f53dff7ad70   r12 = 0x0d2ad680
r13 = 0x7f5458a88690   r14 = 0x2074e8a0
r15 = 0x0034   rip = 0x05202859
Found by: call frame info
 3  impalad!impala::QueryState::MonitorFInstances() [query-state.cc : 877 + 0xc]
rbx = 0x0001   rbp = 0x7f53dff7ae60
rsp = 0x7f53dff7ad80   r12 = 0x0d2ad680
r13 = 0x7f5458a88690   r14 = 0x2074e8a0
r15 = 0x0034   rip = 0x0227b5a0
Found by: call frame info
 4  impalad!impala::QueryExecMgr::ExecuteQueryHelper(impala::QueryState*) 
[query-exec-mgr.cc : 162 + 0xf]
rbx = 0x13e76000   rbp = 0x7f53dff7b6b0
rsp = 0x7f53dff7ae70   r12 = 0x0d2ad680
r13 = 0x7f5458a88690   r14 = 0x2074e8a0
r15 = 0x0034   rip = 0x0226ad41
Found by: call frame info
 5  impalad!boost::_mfi::mf1::operator()(impala::QueryExecMgr*, impala::QueryState*) 
const [mem_fn_template.hpp : 165 + 0xc]
rbx = 0x13e76000   rbp = 0x7f53dff7b6e0
rsp = 0x7f53dff7b6c0   r12 = 0x0d2ad680
r13 = 0x7f5458a88690   r14 = 0x2074e8a0
r15 = 0x0034   rip = 0x02273655
Found by: call frame info
 6  impalad!void boost::_bi::list2, 
boost::_bi::value >::operator(), 
boost::_bi::list0>(boost::_bi::type, boost::_mfi::mf1&, boost::_bi::list0&, int) [bind.hpp 
: 319 + 0x52]
rbx = 0x13e76000   rbp = 0x7f53dff7b720
rsp = 0x7f53dff7b6f0   r12 = 0x0d2ad680
r13 = 0x7f5458a88690   r14 = 0x2074e8a0
r15 = 0x0034   rip = 0x02272f1e
Found by: call frame info
 7  impalad!boost::_bi::bind_t, 
boost::_bi::list2, 
boost::_bi::value > >::operator()() [bind.hpp : 1222 + 
0x22]
rbx = 0x5980   rbp = 0x7f53dff7b770
rsp = 0x7f53dff7b730   r12 = 0x086e72c0
r13 = 0x7f5458a88690   r14 = 0x2074e8a0
r15 = 0x0034   rip = 0x02272525
Found by: call frame info
 8  
impalad!boost::detail::function::void_function_obj_invoker0, 
boost::_bi::list2, 
boost::_bi::value > >, 
void>::invoke(boost::detail::function::function_buffer&) [function_template.hpp 
: 159 + 0xc]
rbx = 0x5980   rbp = 0x7f53dff7b7a0
rsp = 0x7f53dff7b780   r12 = 0x086e72c0
r13 = 0x7f5458a88690   r14 = 0x2074e8a0
r15 = 0x0034   rip = 0x0227193f
Found by: call frame info
 9  impalad!boost::function0::operator()() const [function_template.hpp : 
770 + 0x1d]
rbx = 0x5980   rbp = 0x7f53dff7b7e0
rsp = 0x7f53dff7b7b0   r12 = 0x086e72c0
r13 = 0x7f5458a88690   r14 = 0x2074e8a0
r15 = 0x0034   rip = 0x02137600
Found by: call frame info
10  impalad!impala::Thread::SuperviseThread(std::__cxx11::basic_string, std::allocator > const&, 
std::__cxx11::basic_string, std::allocator > 
const&, boost::function, impala::ThreadDebugInfo const*, 
impala::Promise*) [thread.cc : 360 + 0xf]
rbx = 0x5980   rbp = 

[jira] [Assigned] (IMPALA-10238) Add fault tolerance docs

2020-10-14 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar reassigned IMPALA-10238:
-

Assignee: (was: Sahil Takiar)

> Add fault tolerance docs
> 
>
> Key: IMPALA-10238
> URL: https://issues.apache.org/jira/browse/IMPALA-10238
> Project: IMPALA
>  Issue Type: Task
>  Components: Docs
>Reporter: Sahil Takiar
>Priority: Major
>
> Impala docs currently don't have much information about any of our fault 
> tolerance features. We should add a dedicated section with several sub-topics 
> to address this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10242) impala-shell client retry for failed Fetch RPC calls.

2020-10-14 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214232#comment-17214232
 ] 

Sahil Takiar commented on IMPALA-10242:
---

I'm actually not sure if this will work. Result spooling wasn't really designed 
with this use case in mind, although result spooling does make this much easier 
to implement. Result spooling is basically backed by a 
{{buffered-tuple-stream.h}} which is the same object used for operator 
spill-to-disk, I'm not sure if result spooling currently uses a 
{{buffered-tuple-stream.h}} in a way that would support this (I think it should 
by just pinning the whole stream in memory?), but Tim would know.

There are a few other considerations as well. I don't think we really support 
this from a client perspective. To make fetch operations idempotent, Impala 
would probably need to support a Fetch Orientation (e.g. TFetchOrientation) 
beyond FETCH_NEXT and FETCH_FIRST. Support for something like FETCH_ABSOLUTE 
might be necessary.

The issue is that fetch operations are just done using a simple iterator 
interface (e.g. FETCH_NEXT). I don't think the impala-shell even tracks at 
which point in the result set it has fetched rows up to. It just calls fetch 
next in a loop until it returns hasMoreRows = false.

> impala-shell client retry for failed Fetch RPC calls.
> -
>
> Key: IMPALA-10242
> URL: https://issues.apache.org/jira/browse/IMPALA-10242
> Project: IMPALA
>  Issue Type: New Feature
>Reporter: Abhishek Rawat
>Priority: Major
>
> impala-shell client can retry failed idempotent rpcs. This work was done as 
> part of IMPALA-9466.
> Since Impala also supports result spooling, the impala-shell client could 
> also retry failed fetch rpc calls in some scenarios when result spooling is 
> enabled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10241) Impala Doc: RPC troubleshooting guide

2020-10-14 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10241:
-

 Summary: Impala Doc: RPC troubleshooting guide
 Key: IMPALA-10241
 URL: https://issues.apache.org/jira/browse/IMPALA-10241
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Sahil Takiar


There have been several diagnostic improvements to how RPCs can be debugged. We 
should document them a bit along with the associated options for configuring 
them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10240) Impala Doc: Add docs for cluster membership statestore heartbeats

2020-10-14 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214064#comment-17214064
 ] 

Sahil Takiar commented on IMPALA-10240:
---

Would be nice to document what exactly happens when a node is remove from the 
cluster membership - e.g. all queries running on that node are either cancelled 
or retried (they are retried if transparent query retries are enabled). This 
should cover the scenario where a coordinator fails as well (e.g. all queries 
die, and the executors eventually timeout all fragments and cancel them).

> Impala Doc: Add docs for cluster membership statestore heartbeats
> -
>
> Key: IMPALA-10240
> URL: https://issues.apache.org/jira/browse/IMPALA-10240
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Priority: Major
>
> I don't see many docs explaining how the current cluster membership logic 
> works (e.g. via the statestored heartbeats). Would be nice to include a high 
> level explanation along with how to configure the heartbeat threshold.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10241) Impala Doc: RPC troubleshooting guide

2020-10-14 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10241:
-

 Summary: Impala Doc: RPC troubleshooting guide
 Key: IMPALA-10241
 URL: https://issues.apache.org/jira/browse/IMPALA-10241
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Sahil Takiar


There have been several diagnostic improvements to how RPCs can be debugged. We 
should document them a bit along with the associated options for configuring 
them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IMPALA-10239) Impala Doc: Add docs for node blacklisting

2020-10-14 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated IMPALA-10239:
--
Summary: Impala Doc: Add docs for node blacklisting  (was: Docs: Add docs 
for node blacklisting)

> Impala Doc: Add docs for node blacklisting
> --
>
> Key: IMPALA-10239
> URL: https://issues.apache.org/jira/browse/IMPALA-10239
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> We should add some docs for node blacklisting explaining what is it, how it 
> works at a high level, what errors it captures, how to debug it, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10240) Impala Doc: Add docs for cluster membership statestore heartbeats

2020-10-14 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10240:
-

 Summary: Impala Doc: Add docs for cluster membership statestore 
heartbeats
 Key: IMPALA-10240
 URL: https://issues.apache.org/jira/browse/IMPALA-10240
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Sahil Takiar


I don't see many docs explaining how the current cluster membership logic works 
(e.g. via the statestored heartbeats). Would be nice to include a high level 
explanation along with how to configure the heartbeat threshold.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10240) Impala Doc: Add docs for cluster membership statestore heartbeats

2020-10-14 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10240:
-

 Summary: Impala Doc: Add docs for cluster membership statestore 
heartbeats
 Key: IMPALA-10240
 URL: https://issues.apache.org/jira/browse/IMPALA-10240
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Sahil Takiar


I don't see many docs explaining how the current cluster membership logic works 
(e.g. via the statestored heartbeats). Would be nice to include a high level 
explanation along with how to configure the heartbeat threshold.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10239) Docs: Add docs for node blacklisting

2020-10-14 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10239:
-

 Summary: Docs: Add docs for node blacklisting
 Key: IMPALA-10239
 URL: https://issues.apache.org/jira/browse/IMPALA-10239
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Sahil Takiar
Assignee: Sahil Takiar


We should add some docs for node blacklisting explaining what is it, how it 
works at a high level, what errors it captures, how to debug it, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10239) Docs: Add docs for node blacklisting

2020-10-14 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10239:
-

 Summary: Docs: Add docs for node blacklisting
 Key: IMPALA-10239
 URL: https://issues.apache.org/jira/browse/IMPALA-10239
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Sahil Takiar
Assignee: Sahil Takiar


We should add some docs for node blacklisting explaining what is it, how it 
works at a high level, what errors it captures, how to debug it, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10238) Add fault tolerance docs

2020-10-14 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10238:
-

 Summary: Add fault tolerance docs
 Key: IMPALA-10238
 URL: https://issues.apache.org/jira/browse/IMPALA-10238
 Project: IMPALA
  Issue Type: Task
  Components: Docs
Reporter: Sahil Takiar
Assignee: Sahil Takiar


Impala docs currently don't have much information about any of our fault 
tolerance features. We should add a dedicated section with several sub-topics 
to address this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10238) Add fault tolerance docs

2020-10-14 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10238:
-

 Summary: Add fault tolerance docs
 Key: IMPALA-10238
 URL: https://issues.apache.org/jira/browse/IMPALA-10238
 Project: IMPALA
  Issue Type: Task
  Components: Docs
Reporter: Sahil Takiar
Assignee: Sahil Takiar


Impala docs currently don't have much information about any of our fault 
tolerance features. We should add a dedicated section with several sub-topics 
to address this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10235) Averaged timer profile counters can be negative for trivial queries

2020-10-13 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated IMPALA-10235:
--
Description: 
Steps to reproduce on master:
{code:java}
stakiar @ stakiar-desktop -bash ~/Impala 2020-10-13 11:13:02 master
 [74] → ./bin/impala-shell.sh -q "select sleep(100) from functional.alltypes 
limit 25" -p > profile-output.txt
...
Query: select sleep(100) from functional.alltypes limit 25
Query submitted at: 2020-10-13 11:13:07 (Coordinator: 
http://stakiar-desktop:25000)
Query progress can be monitored at: 
http://stakiar-desktop:25000/query_plan?query_id=694f94671571d4d1:cdec9db9
Fetched 25 row(s) in 2.64s
{code}
Attached the contents of {{profile-output.txt}}

Relevant portion of the profile:
{code:java}
Averaged Fragment F00:(Total: 2s603ms, non-child: 272.519us, % non-child: 
0.01%)
...
   - CompletionTime: -1665218428.000ns
...
   - TotalThreadsTotalWallClockTime: -1686005515.000ns
 - TotalThreadsSysTime: 0.000ns
 - TotalThreadsUserTime: 2.151ms
...
   - TotalTime: -1691524485.000ns
{code}
For whatever reason, this only affects the averaged fragment profile. For this 
query, there was only one coordinator fragment and thus only one fragment 
instance. The coordinator fragment instance showed normal timer values:
{code:java}
Coordinator Fragment F00:
...
 - CompletionTime: 2s629ms
...
 - TotalThreadsTotalWallClockTime: 2s608ms
   - TotalThreadsSysTime: 0.000ns
   - TotalThreadsUserTime: 2.151ms
...
 - TotalTime: 2s603ms
{code}

  was:
Steps to reproduce on master:
{code}
stakiar @ stakiar-desktop -bash ~/Impala 2020-10-13 11:13:02 master
 [74] → ./bin/impala-shell.sh -q "select sleep(100) from functional.alltypes 
limit 25" -p > profile-output.txt
...
Query: select sleep(100) from functional.alltypes limit 25
Query submitted at: 2020-10-13 11:13:07 (Coordinator: 
http://stakiar-desktop:25000)
Query progress can be monitored at: 
http://stakiar-desktop:25000/query_plan?query_id=694f94671571d4d1:cdec9db9
Fetched 25 row(s) in 2.64s
{code}

Attached the contents of {{profile-output.txt}}

Relevant portion of the profile:

{code}
Averaged Fragment F00:(Total: 2s603ms, non-child: 272.519us, % non-child: 
0.01%)
...
   - CompletionTime: -1665218428.000ns
...
   - TotalThreadsTotalWallClockTime: -1686005515.000ns
 - TotalThreadsSysTime: 0.000ns
 - TotalThreadsUserTime: 2.151ms
...
   - TotalTime: -1691524485.000ns
{code}

For whatever reason, this only affects the averaged fragment profile. For this 
query, there was only one coordinator fragment and thus only one fragment 
instance. It showed normal values:

{code}
Coordinator Fragment F00:
...
 - CompletionTime: 2s629ms
...
 - TotalThreadsTotalWallClockTime: 2s608ms
   - TotalThreadsSysTime: 0.000ns
   - TotalThreadsUserTime: 2.151ms
...
 - TotalTime: 2s603ms
{code}


> Averaged timer profile counters can be negative for trivial queries
> ---
>
> Key: IMPALA-10235
> URL: https://issues.apache.org/jira/browse/IMPALA-10235
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Sahil Takiar
>Priority: Major
>  Labels: newbie, ramp-up
> Attachments: profile-output.txt
>
>
> Steps to reproduce on master:
> {code:java}
> stakiar @ stakiar-desktop -bash ~/Impala 2020-10-13 11:13:02 master
>  [74] → ./bin/impala-shell.sh -q "select sleep(100) from functional.alltypes 
> limit 25" -p > profile-output.txt
> ...
> Query: select sleep(100) from functional.alltypes limit 25
> Query submitted at: 2020-10-13 11:13:07 (Coordinator: 
> http://stakiar-desktop:25000)
> Query progress can be monitored at: 
> http://stakiar-desktop:25000/query_plan?query_id=694f94671571d4d1:cdec9db9
> Fetched 25 row(s) in 2.64s
> {code}
> Attached the contents of {{profile-output.txt}}
> Relevant portion of the profile:
> {code:java}
> Averaged Fragment F00:(Total: 2s603ms, non-child: 272.519us, % non-child: 
> 0.01%)
> ...
>- CompletionTime: -1665218428.000ns
> ...
>- TotalThreadsTotalWallClockTime: -1686005515.000ns
>  - TotalThreadsSysTime: 0.000ns
>  - TotalThreadsUserTime: 2.151ms
> ...
>- TotalTime: -1691524485.000ns
> {code}
> For whatever reason, this only affects the averaged fragment profile. For 
> this query, there was only one coordinator fragment and thus only one 
> fragment instance. The coordinator fragment instance showed normal timer 
> values:
> {code:java}
> Coordinator Fragment F00:
> ...
>  - CompletionTime: 2s629ms
> ...
>  - TotalThreadsTotalWallClockTime: 2s608ms
>- TotalThreadsSysTime: 0.000ns
>- TotalThreadsUserTime: 2.151ms
> ...
>  - 

[jira] [Created] (IMPALA-10235) Averaged timer profile counters can be negative for trivial queries

2020-10-13 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10235:
-

 Summary: Averaged timer profile counters can be negative for 
trivial queries
 Key: IMPALA-10235
 URL: https://issues.apache.org/jira/browse/IMPALA-10235
 Project: IMPALA
  Issue Type: Bug
Reporter: Sahil Takiar
 Attachments: profile-output.txt

Steps to reproduce on master:
{code}
stakiar @ stakiar-desktop -bash ~/Impala 2020-10-13 11:13:02 master
 [74] → ./bin/impala-shell.sh -q "select sleep(100) from functional.alltypes 
limit 25" -p > profile-output.txt
...
Query: select sleep(100) from functional.alltypes limit 25
Query submitted at: 2020-10-13 11:13:07 (Coordinator: 
http://stakiar-desktop:25000)
Query progress can be monitored at: 
http://stakiar-desktop:25000/query_plan?query_id=694f94671571d4d1:cdec9db9
Fetched 25 row(s) in 2.64s
{code}

Attached the contents of {{profile-output.txt}}

Relevant portion of the profile:

{code}
Averaged Fragment F00:(Total: 2s603ms, non-child: 272.519us, % non-child: 
0.01%)
...
   - CompletionTime: -1665218428.000ns
...
   - TotalThreadsTotalWallClockTime: -1686005515.000ns
 - TotalThreadsSysTime: 0.000ns
 - TotalThreadsUserTime: 2.151ms
...
   - TotalTime: -1691524485.000ns
{code}

For whatever reason, this only affects the averaged fragment profile. For this 
query, there was only one coordinator fragment and thus only one fragment 
instance. It showed normal values:

{code}
Coordinator Fragment F00:
...
 - CompletionTime: 2s629ms
...
 - TotalThreadsTotalWallClockTime: 2s608ms
   - TotalThreadsSysTime: 0.000ns
   - TotalThreadsUserTime: 2.151ms
...
 - TotalTime: 2s603ms
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10235) Averaged timer profile counters can be negative for trivial queries

2020-10-13 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10235:
-

 Summary: Averaged timer profile counters can be negative for 
trivial queries
 Key: IMPALA-10235
 URL: https://issues.apache.org/jira/browse/IMPALA-10235
 Project: IMPALA
  Issue Type: Bug
Reporter: Sahil Takiar
 Attachments: profile-output.txt

Steps to reproduce on master:
{code}
stakiar @ stakiar-desktop -bash ~/Impala 2020-10-13 11:13:02 master
 [74] → ./bin/impala-shell.sh -q "select sleep(100) from functional.alltypes 
limit 25" -p > profile-output.txt
...
Query: select sleep(100) from functional.alltypes limit 25
Query submitted at: 2020-10-13 11:13:07 (Coordinator: 
http://stakiar-desktop:25000)
Query progress can be monitored at: 
http://stakiar-desktop:25000/query_plan?query_id=694f94671571d4d1:cdec9db9
Fetched 25 row(s) in 2.64s
{code}

Attached the contents of {{profile-output.txt}}

Relevant portion of the profile:

{code}
Averaged Fragment F00:(Total: 2s603ms, non-child: 272.519us, % non-child: 
0.01%)
...
   - CompletionTime: -1665218428.000ns
...
   - TotalThreadsTotalWallClockTime: -1686005515.000ns
 - TotalThreadsSysTime: 0.000ns
 - TotalThreadsUserTime: 2.151ms
...
   - TotalTime: -1691524485.000ns
{code}

For whatever reason, this only affects the averaged fragment profile. For this 
query, there was only one coordinator fragment and thus only one fragment 
instance. It showed normal values:

{code}
Coordinator Fragment F00:
...
 - CompletionTime: 2s629ms
...
 - TotalThreadsTotalWallClockTime: 2s608ms
   - TotalThreadsSysTime: 0.000ns
   - TotalThreadsUserTime: 2.151ms
...
 - TotalTime: 2s603ms
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8925) Consider replacing ClientRequestState ResultCache with result spooling

2020-10-12 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-8925.
--
Resolution: Later

This would be nice to have, but not seeing a strong reason to do this at the 
moment. So closing as "Later".

> Consider replacing ClientRequestState ResultCache with result spooling
> --
>
> Key: IMPALA-8925
> URL: https://issues.apache.org/jira/browse/IMPALA-8925
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend, Clients
>Reporter: Sahil Takiar
>Priority: Minor
>
> The {{ClientRequestState}} maintains an internal results cache (which is 
> really just a {{QueryResultSet}}) in order to provide support for the 
> {{TFetchOrientation.FETCH_FIRST}} fetch orientation (used by Hue - see 
> [https://github.com/apache/impala/commit/6b769d011d2016a73483f63b311e108d17d9a083]).
> The cache itself has some limitations:
>  * It caches all results in a {{QueryResultSet}} with limited admission 
> control integration
>  * It has a max size, if the size is exceeded the cache is emptied
>  * It cannot spill to disk
> Result spooling could potentially replace the query result cache and provide 
> a few benefits; it should be able to fit more rows since it can spill to 
> disk. The memory is better tracked as well since it integrates with both 
> admitted and reserved memory. Hue currently sets the max result set fetch 
> size to 
> [https://github.com/cloudera/hue/blob/master/apps/impala/src/impala/impala_flags.py#L61],
>  would be good to check how well that value works for Hue users so we can 
> decide if replacing the current result cache with result spooling makes sense.
> This would require some changes to result spooling as well, currently it 
> discards rows whenever it reads them from the underlying 
> {{BufferedTupleStream}}. It would need the ability to reset the read cursor, 
> which would require some changes to the {{PlanRootSink}} interface as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8925) Consider replacing ClientRequestState ResultCache with result spooling

2020-10-12 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-8925.
--
Resolution: Later

This would be nice to have, but not seeing a strong reason to do this at the 
moment. So closing as "Later".

> Consider replacing ClientRequestState ResultCache with result spooling
> --
>
> Key: IMPALA-8925
> URL: https://issues.apache.org/jira/browse/IMPALA-8925
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend, Clients
>Reporter: Sahil Takiar
>Priority: Minor
>
> The {{ClientRequestState}} maintains an internal results cache (which is 
> really just a {{QueryResultSet}}) in order to provide support for the 
> {{TFetchOrientation.FETCH_FIRST}} fetch orientation (used by Hue - see 
> [https://github.com/apache/impala/commit/6b769d011d2016a73483f63b311e108d17d9a083]).
> The cache itself has some limitations:
>  * It caches all results in a {{QueryResultSet}} with limited admission 
> control integration
>  * It has a max size, if the size is exceeded the cache is emptied
>  * It cannot spill to disk
> Result spooling could potentially replace the query result cache and provide 
> a few benefits; it should be able to fit more rows since it can spill to 
> disk. The memory is better tracked as well since it integrates with both 
> admitted and reserved memory. Hue currently sets the max result set fetch 
> size to 
> [https://github.com/cloudera/hue/blob/master/apps/impala/src/impala/impala_flags.py#L61],
>  would be good to check how well that value works for Hue users so we can 
> decide if replacing the current result cache with result spooling makes sense.
> This would require some changes to result spooling as well, currently it 
> discards rows whenever it reads them from the underlying 
> {{BufferedTupleStream}}. It would need the ability to reset the read cursor, 
> which would require some changes to the {{PlanRootSink}} interface as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IMPALA-10055) DCHECK was hit while executing e2e test TestQueries::test_subquery

2020-10-12 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212499#comment-17212499
 ] 

Sahil Takiar commented on IMPALA-10055:
---

Saw this again recently, any plans for a fix?

> DCHECK was hit while executing e2e test TestQueries::test_subquery
> --
>
> Key: IMPALA-10055
> URL: https://issues.apache.org/jira/browse/IMPALA-10055
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.0
>Reporter: Attila Jeges
>Assignee: Zoltán Borók-Nagy
>Priority: Blocker
>  Labels: broken-build, crash, flaky
> Fix For: Impala 4.0
>
>
> A DCHECK was hit while executing e2e test. Time frame suggests that it 
> possibly happened while executing TestQueries::test_subquery:
> {code}
> query_test/test_queries.py:149: in test_subquery
> self.run_test_case('QueryTest/subquery', vector)
> common/impala_test_suite.py:662: in run_test_case
> result = exec_fn(query, user=test_section.get('USER', '').strip() or None)
> common/impala_test_suite.py:600: in __exec_in_impala
> result = self.__execute_query(target_impalad_client, query, user=user)
> common/impala_test_suite.py:909: in __execute_query
> return impalad_client.execute(query, user=user)
> common/impala_connection.py:334: in execute
> r = self.__fetch_results(handle, profile_format=profile_format)
> common/impala_connection.py:436: in __fetch_results
> result_tuples = cursor.fetchall()
> /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/infra/python/env-gcc7.5.0/lib/python2.7/site-packages/impala/hiveserver2.py:532:
>  in fetchall
> self._wait_to_finish()
> /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/infra/python/env-gcc7.5.0/lib/python2.7/site-packages/impala/hiveserver2.py:405:
>  in _wait_to_finish
> resp = self._last_operation._rpc('GetOperationStatus', req)
> /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/infra/python/env-gcc7.5.0/lib/python2.7/site-packages/impala/hiveserver2.py:992:
>  in _rpc
> response = self._execute(func_name, request)
> /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/infra/python/env-gcc7.5.0/lib/python2.7/site-packages/impala/hiveserver2.py:1023:
>  in _execute
> .format(self.retries))
> E   HiveServer2Error: Failed after retrying 3 times
> {code}
> impalad log:
> {code}
> Log file created at: 2020/08/05 17:34:30
> Running on machine: 
> impala-ec2-centos74-m5-4xlarge-ondemand-18a5.vpc.cloudera.com
> Log line format: [IWEF]mmdd hh:mm:ss.uu threadid file:line] msg
> F0805 17:34:30.003247 10887 orc-column-readers.cc:423] 
> c34e87376f496a53:7ba6a2e40002] Check failed: (scanner_->row_batches_nee
> d_validation_ && scanner_->scan_node_->IsZeroSlotTableScan()) || 
> scanner_->acid_original_file
> {code}
> Stack trace:
> {code}
> CORE: ./fe/core.1596674070.14179.impalad
> BINARY: ./be/build/latest/service/impalad
> Core was generated by 
> `/data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/build/lat'.
> Program terminated with signal SIGABRT, Aborted.
> #0  0x7efd6ec6e1f7 in raise () from /lib64/libc.so.6
> To enable execution of this file add
>   add-auto-load-safe-path 
> /data0/jenkins/workspace/impala-cdpd-master-core-ubsan/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib64/libstdc++.so.6.0.24-gdb.py
> line to your configuration file "/var/lib/jenkins/.gdbinit".
> To completely disable this security protection add
>   set auto-load safe-path /
> line to your configuration file "/var/lib/jenkins/.gdbinit".
> For more information about this security protection see the
> "Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
>   info "(gdb)Auto-loading safe path"
> #0  0x7efd6ec6e1f7 in raise () from /lib64/libc.so.6
> #1  0x7efd6ec6f8e8 in abort () from /lib64/libc.so.6
> #2  0x086b8ea4 in google::DumpStackTraceAndExit() ()
> #3  0x086ae25d in google::LogMessage::Fail() ()
> #4  0x086afb4d in google::LogMessage::SendToLog() ()
> #5  0x086adbbb in google::LogMessage::Flush() ()
> #6  0x086b17b9 in google::LogMessageFatal::~LogMessageFatal() ()
> #7  0x0388e10a in impala::OrcStructReader::TopLevelReadValueBatch 
> (this=0x61162630, scratch_batch=0x824831e0, pool=0x82483258) at 
> /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/exec/orc-column-readers.cc:421
> #8  0x03810c92 in impala::HdfsOrcScanner::TransferTuples 
> (this=0x27143c00, dst_batch=0x2e5ca820) at 
> /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/exec/hdfs-orc-scanner.cc:808
> #9  0x03814e2a in impala::HdfsOrcScanner::AssembleRows 
> 

[jira] [Resolved] (IMPALA-9485) Enable file handle cache for EC files

2020-10-09 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-9485.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Enable file handle cache for EC files
> -
>
> Key: IMPALA-9485
> URL: https://issues.apache.org/jira/browse/IMPALA-9485
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
>
> Now that HDFS-14308 has been fixed, we can re-enable the file handle cache 
> for EC files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (IMPALA-9485) Enable file handle cache for EC files

2020-10-09 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar reassigned IMPALA-9485:


Assignee: Sahil Takiar

> Enable file handle cache for EC files
> -
>
> Key: IMPALA-9485
> URL: https://issues.apache.org/jira/browse/IMPALA-9485
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> Now that HDFS-14308 has been fixed, we can re-enable the file handle cache 
> for EC files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-9485) Enable file handle cache for EC files

2020-10-09 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-9485.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Enable file handle cache for EC files
> -
>
> Key: IMPALA-9485
> URL: https://issues.apache.org/jira/browse/IMPALA-9485
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
>
> Now that HDFS-14308 has been fixed, we can re-enable the file handle cache 
> for EC files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10028) Additional optimizations of Impala docker container sizes

2020-10-08 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17210512#comment-17210512
 ] 

Sahil Takiar commented on IMPALA-10028:
---

For reference, here is what the new Docker image sizes look like:
{code}
impalad_coordinator   latest  a54e3f5b73b22 days ago
  770MB
impalad_coord_execlatest  6eedba64cb422 days ago
  770MB
impalad_executor  latest  65998abf9cac2 days ago
  685MB
{code}

> Additional optimizations of Impala docker container sizes
> -
>
> Key: IMPALA-10028
> URL: https://issues.apache.org/jira/browse/IMPALA-10028
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
>
> There are some more optimizations we can make to get the images to be even 
> smaller. It looks like we may have regressed with regards to image size as 
> well. IMPALA-8425 reports the images at ~700 MB. I just checked on a release 
> build and they are currently 1.01 GB.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-10028) Additional optimizations of Impala docker container sizes

2020-10-08 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-10028.
---
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Additional optimizations of Impala docker container sizes
> -
>
> Key: IMPALA-10028
> URL: https://issues.apache.org/jira/browse/IMPALA-10028
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
>
> There are some more optimizations we can make to get the images to be even 
> smaller. It looks like we may have regressed with regards to image size as 
> well. IMPALA-8425 reports the images at ~700 MB. I just checked on a release 
> build and they are currently 1.01 GB.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-10028) Additional optimizations of Impala docker container sizes

2020-10-08 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-10028.
---
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Additional optimizations of Impala docker container sizes
> -
>
> Key: IMPALA-10028
> URL: https://issues.apache.org/jira/browse/IMPALA-10028
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
>
> There are some more optimizations we can make to get the images to be even 
> smaller. It looks like we may have regressed with regards to image size as 
> well. IMPALA-8425 reports the images at ~700 MB. I just checked on a release 
> build and they are currently 1.01 GB.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IMPALA-10028) Additional optimizations of Impala docker container sizes

2020-10-08 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17210510#comment-17210510
 ] 

Sahil Takiar commented on IMPALA-10028:
---

Not planning to tackle IMPALA-10068 anytime soon, so moving it out and closing 
this JIRA as all other subtasks have been completed.

> Additional optimizations of Impala docker container sizes
> -
>
> Key: IMPALA-10028
> URL: https://issues.apache.org/jira/browse/IMPALA-10028
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> There are some more optimizations we can make to get the images to be even 
> smaller. It looks like we may have regressed with regards to image size as 
> well. IMPALA-8425 reports the images at ~700 MB. I just checked on a release 
> build and they are currently 1.01 GB.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10068) Split out jars for catalog Docker images

2020-10-08 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated IMPALA-10068:
--
Parent: (was: IMPALA-10028)
Issue Type: Task  (was: Sub-task)

> Split out jars for catalog Docker images
> 
>
> Key: IMPALA-10068
> URL: https://issues.apache.org/jira/browse/IMPALA-10068
> Project: IMPALA
>  Issue Type: Task
>Reporter: Sahil Takiar
>Priority: Major
>
> One way to decrease the size of the catalogd images is to only include jar 
> files necessary to run the catalogd. Currently, all Impala coordiantor / 
> executor jars are included in the catalogd images, which is not necessary.
> This can be fixed by splitting the fe/ Java code into fe/ and catalogd/ 
> folders (and perhaps a  java-common/ folder). This is probably a nice 
> improvement to make regardless because the fe and catalogd code should really 
> be in separate Maven modules. By separating all catalogd code into a separate 
> Maven module it should be easy to modify the Docker built scripts to only 
> copy in the catalogd jars for the catalogd Impala image.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Closed] (IMPALA-10016) Split jars for Impala executor and coordinator Docker images

2020-10-08 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar closed IMPALA-10016.
-
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Split jars for Impala executor and coordinator Docker images
> 
>
> Key: IMPALA-10016
> URL: https://issues.apache.org/jira/browse/IMPALA-10016
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
>
> Impala executors and coordinator currently have a common base images. The 
> base image defines a set of jar files needed by either the coordinator or the 
> executor. In order to reduce the image size, we should split out the jars 
> into two categories: those necessary for the coordinator and those necessary 
> for the executor. This should help reduce overall image size.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Closed] (IMPALA-10016) Split jars for Impala executor and coordinator Docker images

2020-10-08 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar closed IMPALA-10016.
-
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Split jars for Impala executor and coordinator Docker images
> 
>
> Key: IMPALA-10016
> URL: https://issues.apache.org/jira/browse/IMPALA-10016
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
>
> Impala executors and coordinator currently have a common base images. The 
> base image defines a set of jar files needed by either the coordinator or the 
> executor. In order to reduce the image size, we should split out the jars 
> into two categories: those necessary for the coordinator and those necessary 
> for the executor. This should help reduce overall image size.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IMPALA-10217) test_runtime_filters.TestMinMaxFilters.test_decimal_min_max_filters is flaky

2020-10-05 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated IMPALA-10217:
--
Attachment: profile.txt

> test_runtime_filters.TestMinMaxFilters.test_decimal_min_max_filters is flaky
> 
>
> Key: IMPALA-10217
> URL: https://issues.apache.org/jira/browse/IMPALA-10217
> Project: IMPALA
>  Issue Type: Test
>Reporter: Sahil Takiar
>Priority: Major
> Attachments: profile.txt
>
>
> Seen this a few times in exhaustive builds:
> {code}
> query_test.test_runtime_filters.TestMinMaxFilters.test_decimal_min_max_filters[protocol:
>  beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> kudu/none] (from pytest)
> query_test/test_runtime_filters.py:231: in test_decimal_min_max_filters
> test_file_vars={'$RUNTIME_FILTER_WAIT_TIME_MS': str(WAIT_TIME_MS)})
> common/impala_test_suite.py:718: in run_test_case
> update_section=pytest.config.option.update_results)
> common/test_result_verifier.py:627: in verify_runtime_profile
> % (function, field, expected_value, actual_value, actual))
> E   AssertionError: Aggregation of SUM over ProbeRows did not match expected 
> results.
> E   EXPECTED VALUE:
> E   102
> E   
> E   ACTUAL VALUE:
> E   38
> E   
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10217) test_runtime_filters.TestMinMaxFilters.test_decimal_min_max_filters is flaky

2020-10-05 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208176#comment-17208176
 ] 

Sahil Takiar commented on IMPALA-10217:
---

Attached the full runtime profile dumped by the test failure.

> test_runtime_filters.TestMinMaxFilters.test_decimal_min_max_filters is flaky
> 
>
> Key: IMPALA-10217
> URL: https://issues.apache.org/jira/browse/IMPALA-10217
> Project: IMPALA
>  Issue Type: Test
>Reporter: Sahil Takiar
>Priority: Major
> Attachments: profile.txt
>
>
> Seen this a few times in exhaustive builds:
> {code}
> query_test.test_runtime_filters.TestMinMaxFilters.test_decimal_min_max_filters[protocol:
>  beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> kudu/none] (from pytest)
> query_test/test_runtime_filters.py:231: in test_decimal_min_max_filters
> test_file_vars={'$RUNTIME_FILTER_WAIT_TIME_MS': str(WAIT_TIME_MS)})
> common/impala_test_suite.py:718: in run_test_case
> update_section=pytest.config.option.update_results)
> common/test_result_verifier.py:627: in verify_runtime_profile
> % (function, field, expected_value, actual_value, actual))
> E   AssertionError: Aggregation of SUM over ProbeRows did not match expected 
> results.
> E   EXPECTED VALUE:
> E   102
> E   
> E   ACTUAL VALUE:
> E   38
> E   
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10217) test_runtime_filters.TestMinMaxFilters.test_decimal_min_max_filters is flaky

2020-10-05 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208175#comment-17208175
 ] 

Sahil Takiar commented on IMPALA-10217:
---

Might be a recurrence of IMPALA-8064

> test_runtime_filters.TestMinMaxFilters.test_decimal_min_max_filters is flaky
> 
>
> Key: IMPALA-10217
> URL: https://issues.apache.org/jira/browse/IMPALA-10217
> Project: IMPALA
>  Issue Type: Test
>Reporter: Sahil Takiar
>Priority: Major
>
> Seen this a few times in exhaustive builds:
> {code}
> query_test.test_runtime_filters.TestMinMaxFilters.test_decimal_min_max_filters[protocol:
>  beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> kudu/none] (from pytest)
> query_test/test_runtime_filters.py:231: in test_decimal_min_max_filters
> test_file_vars={'$RUNTIME_FILTER_WAIT_TIME_MS': str(WAIT_TIME_MS)})
> common/impala_test_suite.py:718: in run_test_case
> update_section=pytest.config.option.update_results)
> common/test_result_verifier.py:627: in verify_runtime_profile
> % (function, field, expected_value, actual_value, actual))
> E   AssertionError: Aggregation of SUM over ProbeRows did not match expected 
> results.
> E   EXPECTED VALUE:
> E   102
> E   
> E   ACTUAL VALUE:
> E   38
> E   
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10217) test_runtime_filters.TestMinMaxFilters.test_decimal_min_max_filters is flaky

2020-10-05 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10217:
-

 Summary: 
test_runtime_filters.TestMinMaxFilters.test_decimal_min_max_filters is flaky
 Key: IMPALA-10217
 URL: https://issues.apache.org/jira/browse/IMPALA-10217
 Project: IMPALA
  Issue Type: Test
Reporter: Sahil Takiar


Seen this a few times in exhaustive builds:
{code}
query_test.test_runtime_filters.TestMinMaxFilters.test_decimal_min_max_filters[protocol:
 beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 'abort_on_error': 
1, 'exec_single_node_rows_threshold': 0} | table_format: kudu/none] (from 
pytest)

query_test/test_runtime_filters.py:231: in test_decimal_min_max_filters
test_file_vars={'$RUNTIME_FILTER_WAIT_TIME_MS': str(WAIT_TIME_MS)})
common/impala_test_suite.py:718: in run_test_case
update_section=pytest.config.option.update_results)
common/test_result_verifier.py:627: in verify_runtime_profile
% (function, field, expected_value, actual_value, actual))
E   AssertionError: Aggregation of SUM over ProbeRows did not match expected 
results.
E   EXPECTED VALUE:
E   102
E   
E   ACTUAL VALUE:
E   38
E   
{code}





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10217) test_runtime_filters.TestMinMaxFilters.test_decimal_min_max_filters is flaky

2020-10-05 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10217:
-

 Summary: 
test_runtime_filters.TestMinMaxFilters.test_decimal_min_max_filters is flaky
 Key: IMPALA-10217
 URL: https://issues.apache.org/jira/browse/IMPALA-10217
 Project: IMPALA
  Issue Type: Test
Reporter: Sahil Takiar


Seen this a few times in exhaustive builds:
{code}
query_test.test_runtime_filters.TestMinMaxFilters.test_decimal_min_max_filters[protocol:
 beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 'abort_on_error': 
1, 'exec_single_node_rows_threshold': 0} | table_format: kudu/none] (from 
pytest)

query_test/test_runtime_filters.py:231: in test_decimal_min_max_filters
test_file_vars={'$RUNTIME_FILTER_WAIT_TIME_MS': str(WAIT_TIME_MS)})
common/impala_test_suite.py:718: in run_test_case
update_section=pytest.config.option.update_results)
common/test_result_verifier.py:627: in verify_runtime_profile
% (function, field, expected_value, actual_value, actual))
E   AssertionError: Aggregation of SUM over ProbeRows did not match expected 
results.
E   EXPECTED VALUE:
E   102
E   
E   ACTUAL VALUE:
E   38
E   
{code}





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10216) BufferPoolTest.WriteErrorBlacklistCompression is flaky on UBSAN builds

2020-10-05 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10216:
-

 Summary: BufferPoolTest.WriteErrorBlacklistCompression is flaky on 
UBSAN builds
 Key: IMPALA-10216
 URL: https://issues.apache.org/jira/browse/IMPALA-10216
 Project: IMPALA
  Issue Type: Test
Reporter: Sahil Takiar


Only seen this once so far:

{code}
BufferPoolTest.WriteErrorBlacklistCompression

Error Message
Value of: FindPageInDir(pages[NO_ERROR_QUERY], error_dir) != NULL
  Actual: false
Expected: true

Stacktrace

Impala/be/src/runtime/bufferpool/buffer-pool-test.cc:1764
Value of: FindPageInDir(pages[NO_ERROR_QUERY], error_dir) != NULL
  Actual: false
Expected: true
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10216) BufferPoolTest.WriteErrorBlacklistCompression is flaky on UBSAN builds

2020-10-05 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10216:
-

 Summary: BufferPoolTest.WriteErrorBlacklistCompression is flaky on 
UBSAN builds
 Key: IMPALA-10216
 URL: https://issues.apache.org/jira/browse/IMPALA-10216
 Project: IMPALA
  Issue Type: Test
Reporter: Sahil Takiar


Only seen this once so far:

{code}
BufferPoolTest.WriteErrorBlacklistCompression

Error Message
Value of: FindPageInDir(pages[NO_ERROR_QUERY], error_dir) != NULL
  Actual: false
Expected: true

Stacktrace

Impala/be/src/runtime/bufferpool/buffer-pool-test.cc:1764
Value of: FindPageInDir(pages[NO_ERROR_QUERY], error_dir) != NULL
  Actual: false
Expected: true
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IMPALA-9355) TestExchangeMemUsage.test_exchange_mem_usage_scaling doesn't hit the memory limit

2020-10-05 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208155#comment-17208155
 ] 

Sahil Takiar commented on IMPALA-9355:
--

Saw this again in an UBSAN build.

> TestExchangeMemUsage.test_exchange_mem_usage_scaling doesn't hit the memory 
> limit
> -
>
> Key: IMPALA-9355
> URL: https://issues.apache.org/jira/browse/IMPALA-9355
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Fang-Yu Rao
>Assignee: Joe McDonnell
>Priority: Critical
>  Labels: broken-build, flaky
>
> The EE test {{test_exchange_mem_usage_scaling}} failed because the query at 
> [https://github.com/apache/impala/blame/master/testdata/workloads/functional-query/queries/QueryTest/exchange-mem-scaling.test#L7-L15]
>  does not hit the specified memory limit (170m) at 
> [https://github.com/apache/impala/blame/master/testdata/workloads/functional-query/queries/QueryTest/exchange-mem-scaling.test#L7].
>  We may need to further reduce the specified limit. In what follows the error 
> message is also given. Recall that the same issue occurred at 
> https://issues.apache.org/jira/browse/IMPALA-7873 but was resolved.
> {code:java}
> FAIL 
> query_test/test_mem_usage_scaling.py::TestExchangeMemUsage::()::test_exchange_mem_usage_scaling[protocol:
>  beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> parquet/none]
> === FAILURES 
> ===
>  TestExchangeMemUsage.test_exchange_mem_usage_scaling[protocol: beeswax | 
> exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> parquet/none] 
> [gw3] linux2 -- Python 2.7.12 
> /home/ubuntu/Impala/bin/../infra/python/env/bin/python
> query_test/test_mem_usage_scaling.py:386: in test_exchange_mem_usage_scaling
> self.run_test_case('QueryTest/exchange-mem-scaling', vector)
> common/impala_test_suite.py:674: in run_test_case
> expected_str, query)
> E   AssertionError: Expected exception: Memory limit exceeded
> E   
> E   when running:
> E   
> E   set mem_limit=170m;
> E   set num_scanner_threads=1;
> E   select *
> E   from tpch_parquet.lineitem l1
> E join tpch_parquet.lineitem l2 on l1.l_orderkey = l2.l_orderkey and
> E l1.l_partkey = l2.l_partkey and l1.l_suppkey = l2.l_suppkey
> E and l1.l_linenumber = l2.l_linenumber
> E   order by l1.l_orderkey desc, l1.l_partkey, l1.l_suppkey, l1.l_linenumber
> E   limit 5
> {code}
> [~tarmstr...@cloudera.com] and [~joemcdonnell] reviewed the patch at 
> [https://gerrit.cloudera.org/c/11965/]. Assign this JIRA to [~joemcdonnell] 
> for now. Please re-assign the JIRA to others as appropriate. Thanks!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10214) Ozone support for file handle cache

2020-10-05 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10214:
-

 Summary: Ozone support for file handle cache
 Key: IMPALA-10214
 URL: https://issues.apache.org/jira/browse/IMPALA-10214
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Sahil Takiar


This is dependent on the Ozone input streams supporting the {{CanUnbuffer}} 
interface first (last I checked, the input streams don't implement the 
interface).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10214) Ozone support for file handle cache

2020-10-05 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10214:
-

 Summary: Ozone support for file handle cache
 Key: IMPALA-10214
 URL: https://issues.apache.org/jira/browse/IMPALA-10214
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Sahil Takiar


This is dependent on the Ozone input streams supporting the {{CanUnbuffer}} 
interface first (last I checked, the input streams don't implement the 
interface).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-10202) Enable file handle cache for ABFS files

2020-10-02 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-10202.
---
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Enable file handle cache for ABFS files
> ---
>
> Key: IMPALA-10202
> URL: https://issues.apache.org/jira/browse/IMPALA-10202
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
>
> We should enable the file handle cache for ABFS, we have already seen it 
> benefit jobs that read data from S3A.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-10202) Enable file handle cache for ABFS files

2020-10-02 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-10202.
---
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Enable file handle cache for ABFS files
> ---
>
> Key: IMPALA-10202
> URL: https://issues.apache.org/jira/browse/IMPALA-10202
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
>
> We should enable the file handle cache for ABFS, we have already seen it 
> benefit jobs that read data from S3A.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-9606) ABFS reads should use hdfsPreadFully

2020-10-01 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-9606.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> ABFS reads should use hdfsPreadFully
> 
>
> Key: IMPALA-9606
> URL: https://issues.apache.org/jira/browse/IMPALA-9606
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
>
> In IMPALA-8525, hdfs preads were enabled by default when reading data from 
> S3. IMPALA-8525 deferred enabling preads for ABFS because they didn't 
> significantly improve performance. After some more investigation into the 
> ABFS input streams, I think it is safe to use {{hdfsPreadFully}} for ABFS 
> reads.
> The ABFS client uses a different model for fetching data compared to S3A. 
> Details are beyond the scope of this JIRA, but it is related to a feature in 
> ABFS called "read-aheads". ABFS has logic to pre-fetch data it *thinks* will 
> be required by the client. By default, it pre-fetches # cores * 4 MB of data. 
> If the requested data exists in the client cache, it is read from the cache.
> However, there is no real drawback to using {{hdfsPreadFully}} for ABFS 
> reads. It's definitely safer, because while the current implementation of 
> ABFS always returns the amount of requested data, only the {{hdfsPreadFully}} 
> API makes that guarantee.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-9606) ABFS reads should use hdfsPreadFully

2020-10-01 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-9606.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> ABFS reads should use hdfsPreadFully
> 
>
> Key: IMPALA-9606
> URL: https://issues.apache.org/jira/browse/IMPALA-9606
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
>
> In IMPALA-8525, hdfs preads were enabled by default when reading data from 
> S3. IMPALA-8525 deferred enabling preads for ABFS because they didn't 
> significantly improve performance. After some more investigation into the 
> ABFS input streams, I think it is safe to use {{hdfsPreadFully}} for ABFS 
> reads.
> The ABFS client uses a different model for fetching data compared to S3A. 
> Details are beyond the scope of this JIRA, but it is related to a feature in 
> ABFS called "read-aheads". ABFS has logic to pre-fetch data it *thinks* will 
> be required by the client. By default, it pre-fetches # cores * 4 MB of data. 
> If the requested data exists in the client cache, it is read from the cache.
> However, there is no real drawback to using {{hdfsPreadFully}} for ABFS 
> reads. It's definitely safer, because while the current implementation of 
> ABFS always returns the amount of requested data, only the {{hdfsPreadFully}} 
> API makes that guarantee.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (IMPALA-3335) Allow single-node optimization with joins.

2020-10-01 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-3335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar reassigned IMPALA-3335:


Assignee: Sahil Takiar

> Allow single-node optimization with joins.
> --
>
> Key: IMPALA-3335
> URL: https://issues.apache.org/jira/browse/IMPALA-3335
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 2.5.0
>Reporter: Alexander Behm
>Assignee: Sahil Takiar
>Priority: Minor
>  Labels: ramp-up
>
> Now that IMPALA-561 has been fixed, we can remove the workaround that 
> disables the our single-node optimization for any plan with joins. See 
> MaxRowsProcessedVisitor.java:
> {code}
> } else if (caller instanceof HashJoinNode || caller instanceof 
> NestedLoopJoinNode) {
>   // Revisit when multiple scan nodes can be executed in a single fragment, 
> IMPALA-561
>   abort_ = true;
>   return;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-3335) Allow single-node optimization with joins.

2020-10-01 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-3335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-3335.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Allow single-node optimization with joins.
> --
>
> Key: IMPALA-3335
> URL: https://issues.apache.org/jira/browse/IMPALA-3335
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 2.5.0
>Reporter: Alexander Behm
>Assignee: Sahil Takiar
>Priority: Minor
>  Labels: ramp-up
> Fix For: Impala 4.0
>
>
> Now that IMPALA-561 has been fixed, we can remove the workaround that 
> disables the our single-node optimization for any plan with joins. See 
> MaxRowsProcessedVisitor.java:
> {code}
> } else if (caller instanceof HashJoinNode || caller instanceof 
> NestedLoopJoinNode) {
>   // Revisit when multiple scan nodes can be executed in a single fragment, 
> IMPALA-561
>   abort_ = true;
>   return;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-3335) Allow single-node optimization with joins.

2020-10-01 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-3335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-3335.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Allow single-node optimization with joins.
> --
>
> Key: IMPALA-3335
> URL: https://issues.apache.org/jira/browse/IMPALA-3335
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 2.5.0
>Reporter: Alexander Behm
>Assignee: Sahil Takiar
>Priority: Minor
>  Labels: ramp-up
> Fix For: Impala 4.0
>
>
> Now that IMPALA-561 has been fixed, we can remove the workaround that 
> disables the our single-node optimization for any plan with joins. See 
> MaxRowsProcessedVisitor.java:
> {code}
> } else if (caller instanceof HashJoinNode || caller instanceof 
> NestedLoopJoinNode) {
>   // Revisit when multiple scan nodes can be executed in a single fragment, 
> IMPALA-561
>   abort_ = true;
>   return;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10202) Enable file handle cache for ABFS files

2020-10-01 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10202:
-

 Summary: Enable file handle cache for ABFS files
 Key: IMPALA-10202
 URL: https://issues.apache.org/jira/browse/IMPALA-10202
 Project: IMPALA
  Issue Type: Improvement
Reporter: Sahil Takiar
Assignee: Sahil Takiar


We should enable the file handle cache for ABFS, we have already seen it 
benefit jobs that read data from S3A.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10202) Enable file handle cache for ABFS files

2020-10-01 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10202:
-

 Summary: Enable file handle cache for ABFS files
 Key: IMPALA-10202
 URL: https://issues.apache.org/jira/browse/IMPALA-10202
 Project: IMPALA
  Issue Type: Improvement
Reporter: Sahil Takiar
Assignee: Sahil Takiar


We should enable the file handle cache for ABFS, we have already seen it 
benefit jobs that read data from S3A.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-8577) Crash during OpenSSLSocket.read

2020-09-28 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-8577.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

This was fixed a while ago. Impala has been using wildfly for communication 
with S3 for a while now and everything seems stable.

> Crash during OpenSSLSocket.read
> ---
>
> Key: IMPALA-8577
> URL: https://issues.apache.org/jira/browse/IMPALA-8577
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.3.0
>Reporter: David Rorke
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
> Attachments: 5ca78771-ad78-4a29-31f88aa6-9bfac38c.dmp, 
> hs_err_pid6313.log, 
> impalad.drorke-impala-r5d2xl2-30w-17.vpc.cloudera.com.impala.log.ERROR.20190521-103105.6313,
>  
> impalad.drorke-impala-r5d2xl2-30w-17.vpc.cloudera.com.impala.log.INFO.20190521-103105.6313
>
>
> Impalad crashed while running a TPC-DS 10 TB run against S3.   Excerpt from 
> the stack trace (hs_err log file attached with more complete stack):
> {noformat}
> Stack: [0x7f3d095bc000,0x7f3d09dbc000],  sp=0x7f3d09db9050,  free 
> space=8180k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native 
> code)
> C  [impalad+0x2528a33]  
> tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*,
>  unsigned long, int)+0x133
> C  [impalad+0x2528e0f]  tcmalloc::ThreadCache::Scavenge()+0x3f
> C  [impalad+0x266468a]  operator delete(void*)+0x32a
> C  [libcrypto.so.10+0x6e70d]  CRYPTO_free+0x1d
> J 5709  org.wildfly.openssl.SSLImpl.freeBIO0(J)V (0 bytes) @ 
> 0x7f3d4dadf9f9 [0x7f3d4dadf940+0xb9]
> J 5708 C1 org.wildfly.openssl.SSLImpl.freeBIO(J)V (5 bytes) @ 
> 0x7f3d4dfd0dfc [0x7f3d4dfd0d80+0x7c]
> J 5158 C1 org.wildfly.openssl.OpenSSLEngine.shutdown()V (78 bytes) @ 
> 0x7f3d4de4fe2c [0x7f3d4de4f720+0x70c]
> J 5758 C1 org.wildfly.openssl.OpenSSLEngine.closeInbound()V (51 bytes) @ 
> 0x7f3d4de419cc [0x7f3d4de417c0+0x20c]
> J 2994 C2 
> org.wildfly.openssl.OpenSSLEngine.unwrap(Ljava/nio/ByteBuffer;[Ljava/nio/ByteBuffer;II)Ljavax/net/ssl/SSLEngineResult;
>  (892 bytes) @ 0x7f3d4db8da34 [0x7f3d4db8c900+0x1134]
> J 3161 C2 org.wildfly.openssl.OpenSSLSocket.read([BII)I (810 bytes) @ 
> 0x7f3d4dd64cb0 [0x7f3d4dd646c0+0x5f0]
> J 5090 C2 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.fillBuffer()I
>  (97 bytes) @ 0x7f3d4ddd9ee0 [0x7f3d4ddd9e40+0xa0]
> J 5846 C1 
> com.amazonaws.thirdparty.apache.http.impl.BHttpConnectionBase.fillInputBuffer(I)I
>  (48 bytes) @ 0x7f3d4d7acb24 [0x7f3d4d7ac7a0+0x384]
> J 5845 C1 
> com.amazonaws.thirdparty.apache.http.impl.BHttpConnectionBase.isStale()Z (31 
> bytes) @ 0x7f3d4d7ad49c [0x7f3d4d7ad220+0x27c]
> {noformat}
> The crash may not be easy to reproduce.  I've run this test multiple times 
> and only crashed once.   I have a core file if needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8577) Crash during OpenSSLSocket.read

2020-09-28 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-8577.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

This was fixed a while ago. Impala has been using wildfly for communication 
with S3 for a while now and everything seems stable.

> Crash during OpenSSLSocket.read
> ---
>
> Key: IMPALA-8577
> URL: https://issues.apache.org/jira/browse/IMPALA-8577
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.3.0
>Reporter: David Rorke
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
> Attachments: 5ca78771-ad78-4a29-31f88aa6-9bfac38c.dmp, 
> hs_err_pid6313.log, 
> impalad.drorke-impala-r5d2xl2-30w-17.vpc.cloudera.com.impala.log.ERROR.20190521-103105.6313,
>  
> impalad.drorke-impala-r5d2xl2-30w-17.vpc.cloudera.com.impala.log.INFO.20190521-103105.6313
>
>
> Impalad crashed while running a TPC-DS 10 TB run against S3.   Excerpt from 
> the stack trace (hs_err log file attached with more complete stack):
> {noformat}
> Stack: [0x7f3d095bc000,0x7f3d09dbc000],  sp=0x7f3d09db9050,  free 
> space=8180k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native 
> code)
> C  [impalad+0x2528a33]  
> tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*,
>  unsigned long, int)+0x133
> C  [impalad+0x2528e0f]  tcmalloc::ThreadCache::Scavenge()+0x3f
> C  [impalad+0x266468a]  operator delete(void*)+0x32a
> C  [libcrypto.so.10+0x6e70d]  CRYPTO_free+0x1d
> J 5709  org.wildfly.openssl.SSLImpl.freeBIO0(J)V (0 bytes) @ 
> 0x7f3d4dadf9f9 [0x7f3d4dadf940+0xb9]
> J 5708 C1 org.wildfly.openssl.SSLImpl.freeBIO(J)V (5 bytes) @ 
> 0x7f3d4dfd0dfc [0x7f3d4dfd0d80+0x7c]
> J 5158 C1 org.wildfly.openssl.OpenSSLEngine.shutdown()V (78 bytes) @ 
> 0x7f3d4de4fe2c [0x7f3d4de4f720+0x70c]
> J 5758 C1 org.wildfly.openssl.OpenSSLEngine.closeInbound()V (51 bytes) @ 
> 0x7f3d4de419cc [0x7f3d4de417c0+0x20c]
> J 2994 C2 
> org.wildfly.openssl.OpenSSLEngine.unwrap(Ljava/nio/ByteBuffer;[Ljava/nio/ByteBuffer;II)Ljavax/net/ssl/SSLEngineResult;
>  (892 bytes) @ 0x7f3d4db8da34 [0x7f3d4db8c900+0x1134]
> J 3161 C2 org.wildfly.openssl.OpenSSLSocket.read([BII)I (810 bytes) @ 
> 0x7f3d4dd64cb0 [0x7f3d4dd646c0+0x5f0]
> J 5090 C2 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.fillBuffer()I
>  (97 bytes) @ 0x7f3d4ddd9ee0 [0x7f3d4ddd9e40+0xa0]
> J 5846 C1 
> com.amazonaws.thirdparty.apache.http.impl.BHttpConnectionBase.fillInputBuffer(I)I
>  (48 bytes) @ 0x7f3d4d7acb24 [0x7f3d4d7ac7a0+0x384]
> J 5845 C1 
> com.amazonaws.thirdparty.apache.http.impl.BHttpConnectionBase.isStale()Z (31 
> bytes) @ 0x7f3d4d7ad49c [0x7f3d4d7ad220+0x27c]
> {noformat}
> The crash may not be easy to reproduce.  I've run this test multiple times 
> and only crashed once.   I have a core file if needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10191) Test impalad_coordinator and impalad_executor in Dockerized tests

2020-09-24 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10191:
-

 Summary: Test impalad_coordinator and impalad_executor in 
Dockerized tests
 Key: IMPALA-10191
 URL: https://issues.apache.org/jira/browse/IMPALA-10191
 Project: IMPALA
  Issue Type: Improvement
Reporter: Sahil Takiar


Currently only the impalad_coord_exec images are tested in the Dockerized 
tests, it would be nice to get test coverage for the other images as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10191) Test impalad_coordinator and impalad_executor in Dockerized tests

2020-09-24 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10191:
-

 Summary: Test impalad_coordinator and impalad_executor in 
Dockerized tests
 Key: IMPALA-10191
 URL: https://issues.apache.org/jira/browse/IMPALA-10191
 Project: IMPALA
  Issue Type: Improvement
Reporter: Sahil Takiar


Currently only the impalad_coord_exec images are tested in the Dockerized 
tests, it would be nice to get test coverage for the other images as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10190) Remove impalad_coord_exec Dockerfile

2020-09-24 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10190:
-

 Summary: Remove impalad_coord_exec Dockerfile
 Key: IMPALA-10190
 URL: https://issues.apache.org/jira/browse/IMPALA-10190
 Project: IMPALA
  Issue Type: Improvement
Reporter: Sahil Takiar


The impalad_coord_exec Dockerfile is a bit redundant because it basically 
contains all the same dependencies as the impalad_coordinator Dockerfile. The 
only different between the two files is that the startup flags for 
impalad_coordinator contain {{is_executor=false}}. We should find a way to 
remove the {{impalad_coord_exec}} altogether.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10190) Remove impalad_coord_exec Dockerfile

2020-09-24 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10190:
-

 Summary: Remove impalad_coord_exec Dockerfile
 Key: IMPALA-10190
 URL: https://issues.apache.org/jira/browse/IMPALA-10190
 Project: IMPALA
  Issue Type: Improvement
Reporter: Sahil Takiar


The impalad_coord_exec Dockerfile is a bit redundant because it basically 
contains all the same dependencies as the impalad_coordinator Dockerfile. The 
only different between the two files is that the startup flags for 
impalad_coordinator contain {{is_executor=false}}. We should find a way to 
remove the {{impalad_coord_exec}} altogether.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-10170) Data race on Webserver::UrlHandler::is_on_nav_bar_

2020-09-24 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-10170.
---
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Data race on Webserver::UrlHandler::is_on_nav_bar_
> --
>
> Key: IMPALA-10170
> URL: https://issues.apache.org/jira/browse/IMPALA-10170
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
>
> {code}
> WARNING: ThreadSanitizer: data race (pid=31102)
>   Read of size 1 at 0x7b2c0006e3b0 by thread T42:
> #0 impala::Webserver::UrlHandler::is_on_nav_bar() const 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.h:152:41
>  (impalad+0x256ff39)
> #1 
> impala::Webserver::GetCommonJson(rapidjson::GenericDocument,
>  rapidjson::MemoryPoolAllocator, 
> rapidjson::CrtAllocator>*, sq_connection const*, 
> kudu::WebCallbackRegistry::WebRequest const&) 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.cc:527:24
>  (impalad+0x256be13)
> #2 impala::Webserver::RenderUrlWithTemplate(sq_connection const*, 
> kudu::WebCallbackRegistry::WebRequest const&, impala::Webserver::UrlHandler 
> const&, std::__cxx11::basic_stringstream, 
> std::allocator >*, impala::ContentType*) 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.cc:816:3
>  (impalad+0x256e882)
> #3 impala::Webserver::BeginRequestCallback(sq_connection*, 
> sq_request_info*) 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.cc:714:5
>  (impalad+0x256cfbb)
> #4 impala::Webserver::BeginRequestCallbackStatic(sq_connection*) 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.cc:556:20
>  (impalad+0x256ba98)
> #5 handle_request  (impalad+0x2582d59)
>   Previous write of size 2 at 0x7b2c0006e3b0 by main thread:
> #0 
> impala::Webserver::UrlHandler::UrlHandler(impala::Webserver::UrlHandler&&) 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.h:141:9
>  (impalad+0x2570dbc)
> #1 std::pair, 
> std::allocator > const, 
> impala::Webserver::UrlHandler>::pair std::char_traits, std::allocator >, 
> impala::Webserver::UrlHandler, 
> true>(std::pair, 
> std::allocator >, impala::Webserver::UrlHandler>&&) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/stl_pair.h:362:4
>  (impalad+0x25738b3)
> #2 void 
> __gnu_cxx::new_allocator  std::char_traits, std::allocator > const, 
> impala::Webserver::UrlHandler> > 
> >::construct std::char_traits, std::allocator > const, 
> impala::Webserver::UrlHandler>, std::pair std::char_traits, std::allocator >, 
> impala::Webserver::UrlHandler> >(std::pair std::char_traits, std::allocator > const, 
> impala::Webserver::UrlHandler>*, std::pair std::char_traits, std::allocator >, 
> impala::Webserver::UrlHandler>&&) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/ext/new_allocator.h:136:23
>  (impalad+0x2573848)
> #3 void 
> std::allocator_traits  std::char_traits, std::allocator > const, 
> impala::Webserver::UrlHandler> > > 
> >::construct std::char_traits, std::allocator > const, 
> impala::Webserver::UrlHandler>, std::pair std::char_traits, std::allocator >, 
> impala::Webserver::UrlHandler> 
> >(std::allocator  std::char_traits, std::allocator > const, 
> impala::Webserver::UrlHandler> > >&, 
> std::pair, 
> std::allocator > const, impala::Webserver::UrlHandler>*, 
> std::pair, 
> std::allocator >, impala::Webserver::UrlHandler>&&) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/alloc_traits.h:475:8
>  (impalad+0x25737f1)
> #4 void std::_Rb_tree std::char_traits, std::allocator >, 
> std::pair, 
> std::allocator > const, impala::Webserver::UrlHandler>, 
> std::_Select1st std::char_traits, std::allocator > const, 
> impala::Webserver::UrlHandler> >, std::less std::char_traits, std::allocator > >, 
> std::allocator std::char_traits, std::allocator > const, 
> impala::Webserver::UrlHandler> > 
> >::_M_construct_node std::char_traits, std::allocator >, 
> impala::Webserver::UrlHandler> 
> >(std::_Rb_tree_node std::char_traits, std::allocator > const, 
> impala::Webserver::UrlHandler> >*, std::pair std::char_traits, std::allocator >, 
> impala::Webserver::UrlHandler>&&) 
> 

[jira] [Resolved] (IMPALA-10170) Data race on Webserver::UrlHandler::is_on_nav_bar_

2020-09-24 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-10170.
---
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Data race on Webserver::UrlHandler::is_on_nav_bar_
> --
>
> Key: IMPALA-10170
> URL: https://issues.apache.org/jira/browse/IMPALA-10170
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
>
> {code}
> WARNING: ThreadSanitizer: data race (pid=31102)
>   Read of size 1 at 0x7b2c0006e3b0 by thread T42:
> #0 impala::Webserver::UrlHandler::is_on_nav_bar() const 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.h:152:41
>  (impalad+0x256ff39)
> #1 
> impala::Webserver::GetCommonJson(rapidjson::GenericDocument,
>  rapidjson::MemoryPoolAllocator, 
> rapidjson::CrtAllocator>*, sq_connection const*, 
> kudu::WebCallbackRegistry::WebRequest const&) 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.cc:527:24
>  (impalad+0x256be13)
> #2 impala::Webserver::RenderUrlWithTemplate(sq_connection const*, 
> kudu::WebCallbackRegistry::WebRequest const&, impala::Webserver::UrlHandler 
> const&, std::__cxx11::basic_stringstream, 
> std::allocator >*, impala::ContentType*) 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.cc:816:3
>  (impalad+0x256e882)
> #3 impala::Webserver::BeginRequestCallback(sq_connection*, 
> sq_request_info*) 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.cc:714:5
>  (impalad+0x256cfbb)
> #4 impala::Webserver::BeginRequestCallbackStatic(sq_connection*) 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.cc:556:20
>  (impalad+0x256ba98)
> #5 handle_request  (impalad+0x2582d59)
>   Previous write of size 2 at 0x7b2c0006e3b0 by main thread:
> #0 
> impala::Webserver::UrlHandler::UrlHandler(impala::Webserver::UrlHandler&&) 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.h:141:9
>  (impalad+0x2570dbc)
> #1 std::pair, 
> std::allocator > const, 
> impala::Webserver::UrlHandler>::pair std::char_traits, std::allocator >, 
> impala::Webserver::UrlHandler, 
> true>(std::pair, 
> std::allocator >, impala::Webserver::UrlHandler>&&) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/stl_pair.h:362:4
>  (impalad+0x25738b3)
> #2 void 
> __gnu_cxx::new_allocator  std::char_traits, std::allocator > const, 
> impala::Webserver::UrlHandler> > 
> >::construct std::char_traits, std::allocator > const, 
> impala::Webserver::UrlHandler>, std::pair std::char_traits, std::allocator >, 
> impala::Webserver::UrlHandler> >(std::pair std::char_traits, std::allocator > const, 
> impala::Webserver::UrlHandler>*, std::pair std::char_traits, std::allocator >, 
> impala::Webserver::UrlHandler>&&) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/ext/new_allocator.h:136:23
>  (impalad+0x2573848)
> #3 void 
> std::allocator_traits  std::char_traits, std::allocator > const, 
> impala::Webserver::UrlHandler> > > 
> >::construct std::char_traits, std::allocator > const, 
> impala::Webserver::UrlHandler>, std::pair std::char_traits, std::allocator >, 
> impala::Webserver::UrlHandler> 
> >(std::allocator  std::char_traits, std::allocator > const, 
> impala::Webserver::UrlHandler> > >&, 
> std::pair, 
> std::allocator > const, impala::Webserver::UrlHandler>*, 
> std::pair, 
> std::allocator >, impala::Webserver::UrlHandler>&&) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/alloc_traits.h:475:8
>  (impalad+0x25737f1)
> #4 void std::_Rb_tree std::char_traits, std::allocator >, 
> std::pair, 
> std::allocator > const, impala::Webserver::UrlHandler>, 
> std::_Select1st std::char_traits, std::allocator > const, 
> impala::Webserver::UrlHandler> >, std::less std::char_traits, std::allocator > >, 
> std::allocator std::char_traits, std::allocator > const, 
> impala::Webserver::UrlHandler> > 
> >::_M_construct_node std::char_traits, std::allocator >, 
> impala::Webserver::UrlHandler> 
> >(std::_Rb_tree_node std::char_traits, std::allocator > const, 
> impala::Webserver::UrlHandler> >*, std::pair std::char_traits, std::allocator >, 
> impala::Webserver::UrlHandler>&&) 
> 

[jira] [Commented] (IMPALA-10183) Hit promise DCHECK while looping result spooling tests

2020-09-23 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17201044#comment-17201044
 ] 

Sahil Takiar commented on IMPALA-10183:
---

Thanks for reporting and fixing this!

> Hit promise DCHECK while looping result spooling tests
> --
>
> Key: IMPALA-10183
> URL: https://issues.apache.org/jira/browse/IMPALA-10183
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Quanlong Huang
>Priority: Major
> Attachments: impalad.ERROR.gz, impalad.FATAL.gz, impalad.INFO.gz
>
>
> {noformat}
> while impala-py.test tests/query_test/test_result_spooling.py -n4 ; do date; 
> done
> {noformat}
> {noformat}
> F0921 10:14:35.355281  5842 promise.h:61] Check failed: mode == 
> PromiseMode::MULTIPLE_PRODUCER [ mode = 0 , PromiseM
> ode::MULTIPLE_PRODUCER = 1 ]Called Set(..) twice on the same Promise in 
> SINGLE_PRODUCER mode
> *** Check failure stack trace: ***
> @  0x52087fc  google::LogMessage::Fail()
> @  0x520a0ec  google::LogMessage::SendToLog()
> @  0x520815a  google::LogMessage::Flush()
> @  0x520bd58  google::LogMessageFatal::~LogMessageFatal()
> @  0x223cc50  impala::Promise<>::Set()
> @  0x293f21d  impala::BufferedPlanRootSink::Cancel()
> @  0x2317856  impala::FragmentInstanceState::Cancel()
> @  0x2284c62  impala::QueryState::Cancel()
> @  0x2464728  impala::ControlService::CancelQueryFInstances()
> @  0x253df37  
> _ZZN6impala16ControlServiceIfC4ERK13scoped_refptrIN4kudu12MetricEntityEERKS1_INS2_3rpc13Re
> sultTrackerEEENKUlPKN6google8protobuf7MessageEPSE_PNS7_10RpcContextEE4_clESG_SH_SJ_
> @  0x253fb65  
> _ZNSt17_Function_handlerIFvPKN6google8protobuf7MessageEPS2_PN4kudu3rpc10RpcContextEEZN6imp
> ala16ControlServiceIfC4ERK13scoped_refptrINS6_12MetricEntityEERKSD_INS7_13ResultTrackerEEEUlS4_S5_S9_E4_E9_M_invokeE
> RKSt9_Any_dataOS4_OS5_OS9_
> @  0x2c9612f  std::function<>::operator()()
> @  0x2c95ade  kudu::rpc::GeneratedServiceIf::Handle()
> @  0x21d8c55  impala::ImpalaServicePool::RunThread()
> @  0x21de836  boost::_mfi::mf0<>::operator()()
> @  0x21de468  boost::_bi::list1<>::operator()<>()
> @  0x21de02e  boost::_bi::bind_t<>::operator()()
> @  0x21ddaa5  
> boost::detail::function::void_function_obj_invoker0<>::invoke()
> @  0x2140b55  boost::function0<>::operator()()
> @  0x271e1a9  impala::Thread::SuperviseThread()
> @  0x2726146  boost::_bi::list5<>::operator()<>()
> @  0x272606a  boost::_bi::bind_t<>::operator()()
> @  0x272602b  boost::detail::thread_data<>::run()
> @  0x3f0f621  thread_proxy
> @ 0x7f4db3f356da  start_thread
> @ 0x7f4db092ca3e  clone
> Wrote minidump to 
> /home/tarmstrong/Impala/impala/logs/cluster/minidumps/impalad/3204ffe5-6905-4842-d702c395-21c4eca5
> .dmp
> (END)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-9046) Profile counter that indicates if a process or JVM pause occurred

2020-09-22 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-9046.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Profile counter that indicates if a process or JVM pause occurred
> -
>
> Key: IMPALA-9046
> URL: https://issues.apache.org/jira/browse/IMPALA-9046
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 3.3.0
>Reporter: Tim Armstrong
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
>
> We currently log a message if a process or JVM pause is detected but there's 
> no indication in the query profile if it got affected. I suggest that we 
> should:
> * Add metrics that indicate the number and duration of detected pauses
> * Add counters to the backend profile for the deltas in those metrics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-9046) Profile counter that indicates if a process or JVM pause occurred

2020-09-22 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar reassigned IMPALA-9046:


Assignee: Sahil Takiar  (was: Tamas Mate)

> Profile counter that indicates if a process or JVM pause occurred
> -
>
> Key: IMPALA-9046
> URL: https://issues.apache.org/jira/browse/IMPALA-9046
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 3.3.0
>Reporter: Tim Armstrong
>Assignee: Sahil Takiar
>Priority: Major
>
> We currently log a message if a process or JVM pause is detected but there's 
> no indication in the query profile if it got affected. I suggest that we 
> should:
> * Add metrics that indicate the number and duration of detected pauses
> * Add counters to the backend profile for the deltas in those metrics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-9046) Profile counter that indicates if a process or JVM pause occurred

2020-09-22 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-9046.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Profile counter that indicates if a process or JVM pause occurred
> -
>
> Key: IMPALA-9046
> URL: https://issues.apache.org/jira/browse/IMPALA-9046
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 3.3.0
>Reporter: Tim Armstrong
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
>
> We currently log a message if a process or JVM pause is detected but there's 
> no indication in the query profile if it got affected. I suggest that we 
> should:
> * Add metrics that indicate the number and duration of detected pauses
> * Add counters to the backend profile for the deltas in those metrics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IMPALA-9870) summary and profile command in impala-shell should show both original and retried info

2020-09-18 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17198604#comment-17198604
 ] 

Sahil Takiar commented on IMPALA-9870:
--

The 'profile' part of this was done in IMPALA-9229, we still need support for 
the 'summary' command.

> summary and profile command in impala-shell should show both original and 
> retried info
> --
>
> Key: IMPALA-9870
> URL: https://issues.apache.org/jira/browse/IMPALA-9870
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Major
>
> If a query is retried, impala-shell still uses the original query handle 
> containing the original query id. Subsequent "summary" and "profile" commands 
> will return results of the original query. We should consider return both the 
> original and retried information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-9229) Link failed and retried runtime profiles

2020-09-18 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-9229.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

Marking as resolved. The Web UI improvements are tracked in a separate JIRA.

> Link failed and retried runtime profiles
> 
>
> Key: IMPALA-9229
> URL: https://issues.apache.org/jira/browse/IMPALA-9229
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Critical
> Fix For: Impala 4.0
>
>
> There should be a way for clients to link the runtime profiles from failed 
> queries to all retry attempts (whether successful or not), and vice versa.
> There are a few ways to do this:
>  * The simplest way would be to include the query id of the retried query in 
> the runtime profile of the failed query, and vice versa; users could then 
> manually create a chain of runtime profiles in order to fetch all failed / 
> successful attempts
>  * Extend TGetRuntimeProfileReq to include an option to fetch all runtime 
> profiles for the given query id + all retry attempts (or add a new Thrift 
> call TGetRetryQueryIds(TQueryId) which returns a list of retried ids for a 
> given query id)
>  * The Impala debug UI should include a simple way to view all the runtime 
> profiles of a query (the failed attempts + all retry attempts) side by side 
> (perhaps the query_profile?query_id profile should include tabs to easily 
> switch between the runtime profiles of each attempt)
> These are not mutually exclusive, and it might be good to stage these changes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-9229) Link failed and retried runtime profiles

2020-09-18 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-9229.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

Marking as resolved. The Web UI improvements are tracked in a separate JIRA.

> Link failed and retried runtime profiles
> 
>
> Key: IMPALA-9229
> URL: https://issues.apache.org/jira/browse/IMPALA-9229
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Critical
> Fix For: Impala 4.0
>
>
> There should be a way for clients to link the runtime profiles from failed 
> queries to all retry attempts (whether successful or not), and vice versa.
> There are a few ways to do this:
>  * The simplest way would be to include the query id of the retried query in 
> the runtime profile of the failed query, and vice versa; users could then 
> manually create a chain of runtime profiles in order to fetch all failed / 
> successful attempts
>  * Extend TGetRuntimeProfileReq to include an option to fetch all runtime 
> profiles for the given query id + all retry attempts (or add a new Thrift 
> call TGetRetryQueryIds(TQueryId) which returns a list of retried ids for a 
> given query id)
>  * The Impala debug UI should include a simple way to view all the runtime 
> profiles of a query (the failed attempts + all retry attempts) side by side 
> (perhaps the query_profile?query_id profile should include tabs to easily 
> switch between the runtime profiles of each attempt)
> These are not mutually exclusive, and it might be good to stage these changes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10180) Add average size of fetch requests in runtime profile

2020-09-18 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10180:
-

 Summary: Add average size of fetch requests in runtime profile
 Key: IMPALA-10180
 URL: https://issues.apache.org/jira/browse/IMPALA-10180
 Project: IMPALA
  Issue Type: Improvement
  Components: Clients
Reporter: Sahil Takiar


When queries with a high {{ClientFetchWaitTimer}} it would be useful to know 
the average number of rows requested by the client per fetch request. This can 
help determine if setting a higher fetch size would help improve fetch 
performance where the network RTT between the client and Impala is high.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10180) Add average size of fetch requests in runtime profile

2020-09-18 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10180:
-

 Summary: Add average size of fetch requests in runtime profile
 Key: IMPALA-10180
 URL: https://issues.apache.org/jira/browse/IMPALA-10180
 Project: IMPALA
  Issue Type: Improvement
  Components: Clients
Reporter: Sahil Takiar


When queries with a high {{ClientFetchWaitTimer}} it would be useful to know 
the average number of rows requested by the client per fetch request. This can 
help determine if setting a higher fetch size would help improve fetch 
performance where the network RTT between the client and Impala is high.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9923) Data loading of TPC-DS ORC fails with "Fail to get checksum"

2020-09-17 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17197766#comment-17197766
 ] 

Sahil Takiar commented on IMPALA-9923:
--

+1 on fixing this soon. Hit this twice in a row on the dryrun job.

> Data loading of TPC-DS ORC fails with "Fail to get checksum"
> 
>
> Key: IMPALA-9923
> URL: https://issues.apache.org/jira/browse/IMPALA-9923
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Reporter: Tim Armstrong
>Assignee: Zoltán Borók-Nagy
>Priority: Critical
>  Labels: broken-build, flaky
> Attachments: load-tpcds-core-hive-generated-orc-def-block.sql, 
> load-tpcds-core-hive-generated-orc-def-block.sql.log
>
>
> {noformat}
> INFO  : Loading data to table tpcds_orc_def.store_sales partition 
> (ss_sold_date_sk=null) from 
> hdfs://localhost:20500/test-warehouse/managed/tpcds.store_sales_orc_def
> INFO  : 
> ERROR : FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.MoveTask. java.io.IOException: Fail to get 
> checksum, since file 
> /test-warehouse/managed/tpcds.store_sales_orc_def/ss_sold_date_sk=2451646/base_003/_orc_acid_version
>  is under construction.
> INFO  : Completed executing 
> command(queryId=ubuntu_20200707055650_a1958916-1e85-4db5-b1bc-cc63d80b3537); 
> Time taken: 14.512 seconds
> INFO  : OK
> Error: Error while compiling statement: FAILED: Execution Error, return code 
> 1 from org.apache.hadoop.hive.ql.exec.MoveTask. java.io.IOException: Fail to 
> get checksum, since file 
> /test-warehouse/managed/tpcds.store_sales_orc_def/ss_sold_date_sk=2451646/base_003/_orc_acid_version
>  is under construction. (state=08S01,code=1)
> java.sql.SQLException: Error while compiling statement: FAILED: Execution 
> Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. 
> java.io.IOException: Fail to get checksum, since file 
> /test-warehouse/managed/tpcds.store_sales_orc_def/ss_sold_date_sk=2451646/base_003/_orc_acid_version
>  is under construction.
>   at 
> org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:401)
>   at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:266)
>   at org.apache.hive.beeline.Commands.executeInternal(Commands.java:1007)
>   at org.apache.hive.beeline.Commands.execute(Commands.java:1217)
>   at org.apache.hive.beeline.Commands.sql(Commands.java:1146)
>   at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1497)
>   at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:1355)
>   at org.apache.hive.beeline.BeeLine.executeFile(BeeLine.java:1329)
>   at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:1127)
>   at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:1082)
>   at 
> org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:546)
>   at org.apache.hive.beeline.BeeLine.main(BeeLine.java:528)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:318)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:232)
> Closing: 0: jdbc:hive2://localhost:11050/default;auth=none
> {noformat}
> https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/11223/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9923) Data loading of TPC-DS ORC fails with "Fail to get checksum"

2020-09-16 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17197281#comment-17197281
 ] 

Sahil Takiar commented on IMPALA-9923:
--

Hit this again: 
https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/12060/console

> Data loading of TPC-DS ORC fails with "Fail to get checksum"
> 
>
> Key: IMPALA-9923
> URL: https://issues.apache.org/jira/browse/IMPALA-9923
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Reporter: Tim Armstrong
>Assignee: Zoltán Borók-Nagy
>Priority: Critical
>  Labels: broken-build, flaky
> Attachments: load-tpcds-core-hive-generated-orc-def-block.sql, 
> load-tpcds-core-hive-generated-orc-def-block.sql.log
>
>
> {noformat}
> INFO  : Loading data to table tpcds_orc_def.store_sales partition 
> (ss_sold_date_sk=null) from 
> hdfs://localhost:20500/test-warehouse/managed/tpcds.store_sales_orc_def
> INFO  : 
> ERROR : FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.MoveTask. java.io.IOException: Fail to get 
> checksum, since file 
> /test-warehouse/managed/tpcds.store_sales_orc_def/ss_sold_date_sk=2451646/base_003/_orc_acid_version
>  is under construction.
> INFO  : Completed executing 
> command(queryId=ubuntu_20200707055650_a1958916-1e85-4db5-b1bc-cc63d80b3537); 
> Time taken: 14.512 seconds
> INFO  : OK
> Error: Error while compiling statement: FAILED: Execution Error, return code 
> 1 from org.apache.hadoop.hive.ql.exec.MoveTask. java.io.IOException: Fail to 
> get checksum, since file 
> /test-warehouse/managed/tpcds.store_sales_orc_def/ss_sold_date_sk=2451646/base_003/_orc_acid_version
>  is under construction. (state=08S01,code=1)
> java.sql.SQLException: Error while compiling statement: FAILED: Execution 
> Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. 
> java.io.IOException: Fail to get checksum, since file 
> /test-warehouse/managed/tpcds.store_sales_orc_def/ss_sold_date_sk=2451646/base_003/_orc_acid_version
>  is under construction.
>   at 
> org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:401)
>   at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:266)
>   at org.apache.hive.beeline.Commands.executeInternal(Commands.java:1007)
>   at org.apache.hive.beeline.Commands.execute(Commands.java:1217)
>   at org.apache.hive.beeline.Commands.sql(Commands.java:1146)
>   at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1497)
>   at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:1355)
>   at org.apache.hive.beeline.BeeLine.executeFile(BeeLine.java:1329)
>   at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:1127)
>   at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:1082)
>   at 
> org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:546)
>   at org.apache.hive.beeline.BeeLine.main(BeeLine.java:528)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:318)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:232)
> Closing: 0: jdbc:hive2://localhost:11050/default;auth=none
> {noformat}
> https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/11223/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10170) Data race on Webserver::UrlHandler::is_on_nav_bar_

2020-09-16 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10170:
-

 Summary: Data race on Webserver::UrlHandler::is_on_nav_bar_
 Key: IMPALA-10170
 URL: https://issues.apache.org/jira/browse/IMPALA-10170
 Project: IMPALA
  Issue Type: Sub-task
  Components: Backend
Reporter: Sahil Takiar
Assignee: Sahil Takiar


{code}
WARNING: ThreadSanitizer: data race (pid=31102)
  Read of size 1 at 0x7b2c0006e3b0 by thread T42:
#0 impala::Webserver::UrlHandler::is_on_nav_bar() const 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.h:152:41
 (impalad+0x256ff39)
#1 
impala::Webserver::GetCommonJson(rapidjson::GenericDocument,
 rapidjson::MemoryPoolAllocator, 
rapidjson::CrtAllocator>*, sq_connection const*, 
kudu::WebCallbackRegistry::WebRequest const&) 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.cc:527:24
 (impalad+0x256be13)
#2 impala::Webserver::RenderUrlWithTemplate(sq_connection const*, 
kudu::WebCallbackRegistry::WebRequest const&, impala::Webserver::UrlHandler 
const&, std::__cxx11::basic_stringstream, 
std::allocator >*, impala::ContentType*) 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.cc:816:3
 (impalad+0x256e882)
#3 impala::Webserver::BeginRequestCallback(sq_connection*, 
sq_request_info*) 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.cc:714:5
 (impalad+0x256cfbb)
#4 impala::Webserver::BeginRequestCallbackStatic(sq_connection*) 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.cc:556:20
 (impalad+0x256ba98)
#5 handle_request  (impalad+0x2582d59)

  Previous write of size 2 at 0x7b2c0006e3b0 by main thread:
#0 
impala::Webserver::UrlHandler::UrlHandler(impala::Webserver::UrlHandler&&) 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.h:141:9
 (impalad+0x2570dbc)
#1 std::pair, 
std::allocator > const, 
impala::Webserver::UrlHandler>::pair, std::allocator >, impala::Webserver::UrlHandler, 
true>(std::pair, 
std::allocator >, impala::Webserver::UrlHandler>&&) 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/stl_pair.h:362:4
 (impalad+0x25738b3)
#2 void 
__gnu_cxx::new_allocator, std::allocator > const, 
impala::Webserver::UrlHandler> > 
>::construct, 
std::allocator > const, impala::Webserver::UrlHandler>, 
std::pair, 
std::allocator >, impala::Webserver::UrlHandler> 
>(std::pair, 
std::allocator > const, impala::Webserver::UrlHandler>*, 
std::pair, 
std::allocator >, impala::Webserver::UrlHandler>&&) 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/ext/new_allocator.h:136:23
 (impalad+0x2573848)
#3 void 
std::allocator_traits, std::allocator > const, 
impala::Webserver::UrlHandler> > > 
>::construct, 
std::allocator > const, impala::Webserver::UrlHandler>, 
std::pair, 
std::allocator >, impala::Webserver::UrlHandler> 
>(std::allocator, std::allocator > const, 
impala::Webserver::UrlHandler> > >&, std::pair, std::allocator > const, 
impala::Webserver::UrlHandler>*, std::pair, std::allocator >, 
impala::Webserver::UrlHandler>&&) 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/alloc_traits.h:475:8
 (impalad+0x25737f1)
#4 void std::_Rb_tree, std::allocator >, 
std::pair, 
std::allocator > const, impala::Webserver::UrlHandler>, 
std::_Select1st, std::allocator > const, 
impala::Webserver::UrlHandler> >, std::less, std::allocator > >, 
std::allocator, std::allocator > const, 
impala::Webserver::UrlHandler> > 
>::_M_construct_node, std::allocator >, impala::Webserver::UrlHandler> 
>(std::_Rb_tree_node, std::allocator > const, 
impala::Webserver::UrlHandler> >*, std::pair, std::allocator >, 
impala::Webserver::UrlHandler>&&) 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/stl_tree.h:626:8
 (impalad+0x257369b)
#5 std::_Rb_tree_node, std::allocator > const, 
impala::Webserver::UrlHandler> >* 
std::_Rb_tree, 
std::allocator >, std::pair, std::allocator > const, 
impala::Webserver::UrlHandler>, 
std::_Select1st, std::allocator > const, 
impala::Webserver::UrlHandler> >, std::less, std::allocator > >, 
std::allocator, std::allocator > const, 
impala::Webserver::UrlHandler> > 
>::_M_create_node, std::allocator >, impala::Webserver::UrlHandler> 
>(std::pair, 
std::allocator >, impala::Webserver::UrlHandler>&&) 

[jira] [Created] (IMPALA-10170) Data race on Webserver::UrlHandler::is_on_nav_bar_

2020-09-16 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10170:
-

 Summary: Data race on Webserver::UrlHandler::is_on_nav_bar_
 Key: IMPALA-10170
 URL: https://issues.apache.org/jira/browse/IMPALA-10170
 Project: IMPALA
  Issue Type: Sub-task
  Components: Backend
Reporter: Sahil Takiar
Assignee: Sahil Takiar


{code}
WARNING: ThreadSanitizer: data race (pid=31102)
  Read of size 1 at 0x7b2c0006e3b0 by thread T42:
#0 impala::Webserver::UrlHandler::is_on_nav_bar() const 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.h:152:41
 (impalad+0x256ff39)
#1 
impala::Webserver::GetCommonJson(rapidjson::GenericDocument,
 rapidjson::MemoryPoolAllocator, 
rapidjson::CrtAllocator>*, sq_connection const*, 
kudu::WebCallbackRegistry::WebRequest const&) 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.cc:527:24
 (impalad+0x256be13)
#2 impala::Webserver::RenderUrlWithTemplate(sq_connection const*, 
kudu::WebCallbackRegistry::WebRequest const&, impala::Webserver::UrlHandler 
const&, std::__cxx11::basic_stringstream, 
std::allocator >*, impala::ContentType*) 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.cc:816:3
 (impalad+0x256e882)
#3 impala::Webserver::BeginRequestCallback(sq_connection*, 
sq_request_info*) 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.cc:714:5
 (impalad+0x256cfbb)
#4 impala::Webserver::BeginRequestCallbackStatic(sq_connection*) 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.cc:556:20
 (impalad+0x256ba98)
#5 handle_request  (impalad+0x2582d59)

  Previous write of size 2 at 0x7b2c0006e3b0 by main thread:
#0 
impala::Webserver::UrlHandler::UrlHandler(impala::Webserver::UrlHandler&&) 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.h:141:9
 (impalad+0x2570dbc)
#1 std::pair, 
std::allocator > const, 
impala::Webserver::UrlHandler>::pair, std::allocator >, impala::Webserver::UrlHandler, 
true>(std::pair, 
std::allocator >, impala::Webserver::UrlHandler>&&) 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/stl_pair.h:362:4
 (impalad+0x25738b3)
#2 void 
__gnu_cxx::new_allocator, std::allocator > const, 
impala::Webserver::UrlHandler> > 
>::construct, 
std::allocator > const, impala::Webserver::UrlHandler>, 
std::pair, 
std::allocator >, impala::Webserver::UrlHandler> 
>(std::pair, 
std::allocator > const, impala::Webserver::UrlHandler>*, 
std::pair, 
std::allocator >, impala::Webserver::UrlHandler>&&) 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/ext/new_allocator.h:136:23
 (impalad+0x2573848)
#3 void 
std::allocator_traits, std::allocator > const, 
impala::Webserver::UrlHandler> > > 
>::construct, 
std::allocator > const, impala::Webserver::UrlHandler>, 
std::pair, 
std::allocator >, impala::Webserver::UrlHandler> 
>(std::allocator, std::allocator > const, 
impala::Webserver::UrlHandler> > >&, std::pair, std::allocator > const, 
impala::Webserver::UrlHandler>*, std::pair, std::allocator >, 
impala::Webserver::UrlHandler>&&) 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/alloc_traits.h:475:8
 (impalad+0x25737f1)
#4 void std::_Rb_tree, std::allocator >, 
std::pair, 
std::allocator > const, impala::Webserver::UrlHandler>, 
std::_Select1st, std::allocator > const, 
impala::Webserver::UrlHandler> >, std::less, std::allocator > >, 
std::allocator, std::allocator > const, 
impala::Webserver::UrlHandler> > 
>::_M_construct_node, std::allocator >, impala::Webserver::UrlHandler> 
>(std::_Rb_tree_node, std::allocator > const, 
impala::Webserver::UrlHandler> >*, std::pair, std::allocator >, 
impala::Webserver::UrlHandler>&&) 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/stl_tree.h:626:8
 (impalad+0x257369b)
#5 std::_Rb_tree_node, std::allocator > const, 
impala::Webserver::UrlHandler> >* 
std::_Rb_tree, 
std::allocator >, std::pair, std::allocator > const, 
impala::Webserver::UrlHandler>, 
std::_Select1st, std::allocator > const, 
impala::Webserver::UrlHandler> >, std::less, std::allocator > >, 
std::allocator, std::allocator > const, 
impala::Webserver::UrlHandler> > 
>::_M_create_node, std::allocator >, impala::Webserver::UrlHandler> 
>(std::pair, 
std::allocator >, impala::Webserver::UrlHandler>&&) 

[jira] [Resolved] (IMPALA-9740) TSAN data race in hdfs-bulk-ops

2020-09-10 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-9740.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> TSAN data race in hdfs-bulk-ops
> ---
>
> Key: IMPALA-9740
> URL: https://issues.apache.org/jira/browse/IMPALA-9740
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
>
> hdfs-bulk-ops usage of a local connection cache (HdfsFsCache::HdfsFsMap) has 
> a data race:
> {code:java}
>  WARNING: ThreadSanitizer: data race (pid=23205)
>   Write of size 8 at 0x7b24005642d8 by thread T47:
> #0 
> boost::unordered::detail::table_impl  const, hdfs_internal*> >, std::string, hdfs_internal*, 
> boost::hash, std::equal_to > 
> >::add_node(boost::unordered::detail::node_constructor  const, hdfs_internal*> > > >&, unsigned long) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/unordered/detail/unique.hpp:329:26
>  (impalad+0x1f93832)
> #1 
> std::pair  const, hdfs_internal*> > >, bool> 
> boost::unordered::detail::table_impl  const, hdfs_internal*> >, std::string, hdfs_internal*, 
> boost::hash, std::equal_to > 
> >::emplace_impl >(std::string 
> const&, std::pair&&) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/unordered/detail/unique.hpp:420:41
>  (impalad+0x1f933ed)
> #2 
> std::pair  const, hdfs_internal*> > >, bool> 
> boost::unordered::detail::table_impl  const, hdfs_internal*> >, std::string, hdfs_internal*, 
> boost::hash, std::equal_to > 
> >::emplace 
> >(std::pair&&) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/unordered/detail/unique.hpp:384:20
>  (impalad+0x1f932d1)
> #3 
> std::pair  const, hdfs_internal*> > >, bool> 
> boost::unordered::unordered_map boost::hash, std::equal_to, 
> std::allocator > 
> >::emplace 
> >(std::pair&&) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/unordered/unordered_map.hpp:241:27
>  (impalad+0x1f93238)
> #4 boost::unordered::unordered_map boost::hash, std::equal_to, 
> std::allocator > 
> >::insert(std::pair&&) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/unordered/unordered_map.hpp:390:26
>  (impalad+0x1f92038)
> #5 impala::HdfsFsCache::GetConnection(std::string const&, 
> hdfs_internal**, boost::unordered::unordered_map boost::hash, std::equal_to, 
> std::allocator > >*) 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/runtime/hdfs-fs-cache.cc:115:18
>  (impalad+0x1f916b3)
> #6 impala::HdfsOp::Execute() const 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/hdfs-bulk-ops.cc:84:55
>  (impalad+0x23444d5)
> #7 HdfsThreadPoolHelper(int, impala::HdfsOp const&) 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/hdfs-bulk-ops.cc:137:6
>  (impalad+0x2344ea9)
> #8 boost::detail::function::void_function_invoker2 impala::HdfsOp const&), void, int, impala::HdfsOp 
> const&>::invoke(boost::detail::function::function_buffer&, int, 
> impala::HdfsOp const&) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/function/function_template.hpp:118:11
>  (impalad+0x2345e80)
> #9 boost::function2::operator()(int, 
> impala::HdfsOp const&) const 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/function/function_template.hpp:770:14
>  (impalad+0x1f883be)
> #10 impala::ThreadPool::WorkerThread(int) 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/thread-pool.h:166:9
>  (impalad+0x1f874e5)
> #11 boost::_mfi::mf1, 
> int>::operator()(impala::ThreadPool*, int) const 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/mem_fn_template.hpp:165:29
>  (impalad+0x1f87b7d)
> #12 void 
> boost::_bi::list2*>, 
> boost::_bi::value >::operator() impala::ThreadPool, int>, 
> boost::_bi::list0>(boost::_bi::type, boost::_mfi::mf1 impala::ThreadPool, int>&, boost::_bi::list0&, int) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/bind.hpp:319:9
>  (impalad+0x1f87abc)
> #13 boost::_bi::bind_t impala::ThreadPool, int>, 
> boost::_bi::list2*>, 
> boost::_bi::value > >::operator()() 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/bind.hpp:1222:16
>  (impalad+0x1f87a23)
> #14 
> 

[jira] [Resolved] (IMPALA-9740) TSAN data race in hdfs-bulk-ops

2020-09-10 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-9740.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> TSAN data race in hdfs-bulk-ops
> ---
>
> Key: IMPALA-9740
> URL: https://issues.apache.org/jira/browse/IMPALA-9740
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
>
> hdfs-bulk-ops usage of a local connection cache (HdfsFsCache::HdfsFsMap) has 
> a data race:
> {code:java}
>  WARNING: ThreadSanitizer: data race (pid=23205)
>   Write of size 8 at 0x7b24005642d8 by thread T47:
> #0 
> boost::unordered::detail::table_impl  const, hdfs_internal*> >, std::string, hdfs_internal*, 
> boost::hash, std::equal_to > 
> >::add_node(boost::unordered::detail::node_constructor  const, hdfs_internal*> > > >&, unsigned long) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/unordered/detail/unique.hpp:329:26
>  (impalad+0x1f93832)
> #1 
> std::pair  const, hdfs_internal*> > >, bool> 
> boost::unordered::detail::table_impl  const, hdfs_internal*> >, std::string, hdfs_internal*, 
> boost::hash, std::equal_to > 
> >::emplace_impl >(std::string 
> const&, std::pair&&) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/unordered/detail/unique.hpp:420:41
>  (impalad+0x1f933ed)
> #2 
> std::pair  const, hdfs_internal*> > >, bool> 
> boost::unordered::detail::table_impl  const, hdfs_internal*> >, std::string, hdfs_internal*, 
> boost::hash, std::equal_to > 
> >::emplace 
> >(std::pair&&) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/unordered/detail/unique.hpp:384:20
>  (impalad+0x1f932d1)
> #3 
> std::pair  const, hdfs_internal*> > >, bool> 
> boost::unordered::unordered_map boost::hash, std::equal_to, 
> std::allocator > 
> >::emplace 
> >(std::pair&&) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/unordered/unordered_map.hpp:241:27
>  (impalad+0x1f93238)
> #4 boost::unordered::unordered_map boost::hash, std::equal_to, 
> std::allocator > 
> >::insert(std::pair&&) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/unordered/unordered_map.hpp:390:26
>  (impalad+0x1f92038)
> #5 impala::HdfsFsCache::GetConnection(std::string const&, 
> hdfs_internal**, boost::unordered::unordered_map boost::hash, std::equal_to, 
> std::allocator > >*) 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/runtime/hdfs-fs-cache.cc:115:18
>  (impalad+0x1f916b3)
> #6 impala::HdfsOp::Execute() const 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/hdfs-bulk-ops.cc:84:55
>  (impalad+0x23444d5)
> #7 HdfsThreadPoolHelper(int, impala::HdfsOp const&) 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/hdfs-bulk-ops.cc:137:6
>  (impalad+0x2344ea9)
> #8 boost::detail::function::void_function_invoker2 impala::HdfsOp const&), void, int, impala::HdfsOp 
> const&>::invoke(boost::detail::function::function_buffer&, int, 
> impala::HdfsOp const&) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/function/function_template.hpp:118:11
>  (impalad+0x2345e80)
> #9 boost::function2::operator()(int, 
> impala::HdfsOp const&) const 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/function/function_template.hpp:770:14
>  (impalad+0x1f883be)
> #10 impala::ThreadPool::WorkerThread(int) 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/thread-pool.h:166:9
>  (impalad+0x1f874e5)
> #11 boost::_mfi::mf1, 
> int>::operator()(impala::ThreadPool*, int) const 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/mem_fn_template.hpp:165:29
>  (impalad+0x1f87b7d)
> #12 void 
> boost::_bi::list2*>, 
> boost::_bi::value >::operator() impala::ThreadPool, int>, 
> boost::_bi::list0>(boost::_bi::type, boost::_mfi::mf1 impala::ThreadPool, int>&, boost::_bi::list0&, int) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/bind.hpp:319:9
>  (impalad+0x1f87abc)
> #13 boost::_bi::bind_t impala::ThreadPool, int>, 
> boost::_bi::list2*>, 
> boost::_bi::value > >::operator()() 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/bind.hpp:1222:16
>  (impalad+0x1f87a23)
> #14 
> 

[jira] [Assigned] (IMPALA-9740) TSAN data race in hdfs-bulk-ops

2020-09-10 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar reassigned IMPALA-9740:


Assignee: Sahil Takiar

> TSAN data race in hdfs-bulk-ops
> ---
>
> Key: IMPALA-9740
> URL: https://issues.apache.org/jira/browse/IMPALA-9740
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> hdfs-bulk-ops usage of a local connection cache (HdfsFsCache::HdfsFsMap) has 
> a data race:
> {code:java}
>  WARNING: ThreadSanitizer: data race (pid=23205)
>   Write of size 8 at 0x7b24005642d8 by thread T47:
> #0 
> boost::unordered::detail::table_impl  const, hdfs_internal*> >, std::string, hdfs_internal*, 
> boost::hash, std::equal_to > 
> >::add_node(boost::unordered::detail::node_constructor  const, hdfs_internal*> > > >&, unsigned long) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/unordered/detail/unique.hpp:329:26
>  (impalad+0x1f93832)
> #1 
> std::pair  const, hdfs_internal*> > >, bool> 
> boost::unordered::detail::table_impl  const, hdfs_internal*> >, std::string, hdfs_internal*, 
> boost::hash, std::equal_to > 
> >::emplace_impl >(std::string 
> const&, std::pair&&) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/unordered/detail/unique.hpp:420:41
>  (impalad+0x1f933ed)
> #2 
> std::pair  const, hdfs_internal*> > >, bool> 
> boost::unordered::detail::table_impl  const, hdfs_internal*> >, std::string, hdfs_internal*, 
> boost::hash, std::equal_to > 
> >::emplace 
> >(std::pair&&) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/unordered/detail/unique.hpp:384:20
>  (impalad+0x1f932d1)
> #3 
> std::pair  const, hdfs_internal*> > >, bool> 
> boost::unordered::unordered_map boost::hash, std::equal_to, 
> std::allocator > 
> >::emplace 
> >(std::pair&&) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/unordered/unordered_map.hpp:241:27
>  (impalad+0x1f93238)
> #4 boost::unordered::unordered_map boost::hash, std::equal_to, 
> std::allocator > 
> >::insert(std::pair&&) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/unordered/unordered_map.hpp:390:26
>  (impalad+0x1f92038)
> #5 impala::HdfsFsCache::GetConnection(std::string const&, 
> hdfs_internal**, boost::unordered::unordered_map boost::hash, std::equal_to, 
> std::allocator > >*) 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/runtime/hdfs-fs-cache.cc:115:18
>  (impalad+0x1f916b3)
> #6 impala::HdfsOp::Execute() const 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/hdfs-bulk-ops.cc:84:55
>  (impalad+0x23444d5)
> #7 HdfsThreadPoolHelper(int, impala::HdfsOp const&) 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/hdfs-bulk-ops.cc:137:6
>  (impalad+0x2344ea9)
> #8 boost::detail::function::void_function_invoker2 impala::HdfsOp const&), void, int, impala::HdfsOp 
> const&>::invoke(boost::detail::function::function_buffer&, int, 
> impala::HdfsOp const&) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/function/function_template.hpp:118:11
>  (impalad+0x2345e80)
> #9 boost::function2::operator()(int, 
> impala::HdfsOp const&) const 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/function/function_template.hpp:770:14
>  (impalad+0x1f883be)
> #10 impala::ThreadPool::WorkerThread(int) 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/thread-pool.h:166:9
>  (impalad+0x1f874e5)
> #11 boost::_mfi::mf1, 
> int>::operator()(impala::ThreadPool*, int) const 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/mem_fn_template.hpp:165:29
>  (impalad+0x1f87b7d)
> #12 void 
> boost::_bi::list2*>, 
> boost::_bi::value >::operator() impala::ThreadPool, int>, 
> boost::_bi::list0>(boost::_bi::type, boost::_mfi::mf1 impala::ThreadPool, int>&, boost::_bi::list0&, int) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/bind.hpp:319:9
>  (impalad+0x1f87abc)
> #13 boost::_bi::bind_t impala::ThreadPool, int>, 
> boost::_bi::list2*>, 
> boost::_bi::value > >::operator()() 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/bind.hpp:1222:16
>  (impalad+0x1f87a23)
> #14 
> boost::detail::function::void_function_obj_invoker0 boost::_mfi::mf1, int>, 
> 

[jira] [Created] (IMPALA-10160) kernel_stack_watchdog cannot print user stack

2020-09-09 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10160:
-

 Summary: kernel_stack_watchdog cannot print user stack
 Key: IMPALA-10160
 URL: https://issues.apache.org/jira/browse/IMPALA-10160
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Reporter: Sahil Takiar


I've seen this a few times now, the kernel_stack_watchdog is used in a few 
places in the KRPC code and it prints out the kernel + user stack whenever a 
thread is stuck in some method call for too long. The issue is that the user 
stack does not get printed:

{code}
W0908 17:15:00.365721  6605 kernel_stack_watchdog.cc:198] Thread 6612 stuck at 
outbound_call.cc:273 for 120ms:
Kernel stack:
[] futex_wait_queue_me+0xc6/0x130
[] futex_wait+0x17b/0x280
[] do_futex+0x106/0x5a0
[] SyS_futex+0x80/0x180
[] system_call_fastpath+0x16/0x1b
[] 0x

User stack:

{code}

It says that the signal handler of taking the thread stack is unavailable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10160) kernel_stack_watchdog cannot print user stack

2020-09-09 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10160:
-

 Summary: kernel_stack_watchdog cannot print user stack
 Key: IMPALA-10160
 URL: https://issues.apache.org/jira/browse/IMPALA-10160
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Reporter: Sahil Takiar


I've seen this a few times now, the kernel_stack_watchdog is used in a few 
places in the KRPC code and it prints out the kernel + user stack whenever a 
thread is stuck in some method call for too long. The issue is that the user 
stack does not get printed:

{code}
W0908 17:15:00.365721  6605 kernel_stack_watchdog.cc:198] Thread 6612 stuck at 
outbound_call.cc:273 for 120ms:
Kernel stack:
[] futex_wait_queue_me+0xc6/0x130
[] futex_wait+0x17b/0x280
[] do_futex+0x106/0x5a0
[] SyS_futex+0x80/0x180
[] system_call_fastpath+0x16/0x1b
[] 0x

User stack:

{code}

It says that the signal handler of taking the thread stack is unavailable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IMPALA-9351) AnalyzeDDLTest.TestCreateTableLikeFileOrc failed due to non-existing path

2020-09-08 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17192522#comment-17192522
 ] 

Sahil Takiar commented on IMPALA-9351:
--

Another instance: 
https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/11985/testReport/junit/org.apache.impala.analysis/AnalyzeDDLTest/TestCreateTableLikeFileOrc/

> AnalyzeDDLTest.TestCreateTableLikeFileOrc failed due to non-existing path
> -
>
> Key: IMPALA-9351
> URL: https://issues.apache.org/jira/browse/IMPALA-9351
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Fang-Yu Rao
>Assignee: Norbert Luksa
>Priority: Blocker
>  Labels: broken-build, flaky-test
> Fix For: Impala 3.4.0
>
>
> AnalyzeDDLTest.TestCreateTableLikeFileOrc failed due to a non-existing path. 
> Specifically, we see the following error message.
> {code:java}
> Error Message
> Error during analysis:
> org.apache.impala.common.AnalysisException: Cannot infer schema, path does 
> not exist: 
> hdfs://localhost:20500/test-warehouse/functional_orc_def.db/complextypes_fileformat/00_0
> sql:
> create table if not exists newtbl_DNE like orc 
> '/test-warehouse/functional_orc_def.db/complextypes_fileformat/00_0'
> {code}
> The stack trace is provided in the following.
> {code:java}
> Stacktrace
> java.lang.AssertionError: 
> Error during analysis:
> org.apache.impala.common.AnalysisException: Cannot infer schema, path does 
> not exist: 
> hdfs://localhost:20500/test-warehouse/functional_orc_def.db/complextypes_fileformat/00_0
> sql:
> create table if not exists newtbl_DNE like orc 
> '/test-warehouse/functional_orc_def.db/complextypes_fileformat/00_0'
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.impala.common.FrontendFixture.analyzeStmt(FrontendFixture.java:397)
>   at 
> org.apache.impala.common.FrontendTestBase.AnalyzesOk(FrontendTestBase.java:244)
>   at 
> org.apache.impala.common.FrontendTestBase.AnalyzesOk(FrontendTestBase.java:185)
>   at 
> org.apache.impala.analysis.AnalyzeDDLTest.TestCreateTableLikeFileOrc(AnalyzeDDLTest.java:2045)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:272)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:236)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:386)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:323)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:143)
> {code}
> This test was recently added by [~norbertluksa], and [~boroknagyz] gave a +2, 
> maybe [~boroknagyz] could provide some insight into this? Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (IMPALA-6984) Coordinator should cancel backends when returning EOS

2020-09-08 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar reassigned IMPALA-6984:


Assignee: Wenzhe Zhou

> Coordinator should cancel backends when returning EOS
> -
>
> Key: IMPALA-6984
> URL: https://issues.apache.org/jira/browse/IMPALA-6984
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Affects Versions: Impala 3.0
>Reporter: Daniel Hecht
>Assignee: Wenzhe Zhou
>Priority: Major
>  Labels: query-lifecycle
> Fix For: Impala 4.0
>
>
> Currently, the Coordinator waits for backends rather than proactively 
> cancelling them in the case of hitting EOS. There's a tangled mess that makes 
> it tricky to proactively cancel the backends related to how 
> {{Coordinator::ComputeQuerySummary()}} works – we can't update the summary 
> until the profiles are no longer changing (which also makes sense given that 
> we want the exec summary to be consistent with the final profile).  But we 
> current tie together the FIS status and the profile, and cancellation of 
> backends causes the FIS to return CANCELLED, which then means that the 
> remaining FIS on that backend won't produce a final profile.
> With the rework of the protocol for IMPALA-2990 we should make it possible to 
> sort this out such that a final profile can be requested regardless of how a 
> FIS ends execution.
> This also relates to IMPALA-5783.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-5119) Don't make RPCs from Coordinator::UpdateBackendExecStatus()

2020-09-08 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar reassigned IMPALA-5119:


Assignee: Wenzhe Zhou

> Don't make RPCs from Coordinator::UpdateBackendExecStatus()
> ---
>
> Key: IMPALA-5119
> URL: https://issues.apache.org/jira/browse/IMPALA-5119
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Distributed Exec
>Affects Versions: Impala 2.9.0
>Reporter: Henry Robinson
>Assignee: Wenzhe Zhou
>Priority: Major
>
> If it reports a bad status, {{UpdateFragmentExecStatus()}} will call 
> {{UpdateStatus()}}, which takes {{Coordinator::lock_}} and then calls 
> {{Cancel()}}. That method issues one RPC per fragment instance.
> In KRPC, doing so much work from {{UpdateFragmentExecStatus()}} - which is an 
> RPC handler - is a bad idea, even if the RPCs are issued asynchronously. 
> There's still some serialization cost.
> It's also a bad idea to do all this work while holding {{lock_}}. We should 
> address both of these to ensure scalability of the cancellation path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-9227) Test coverage for query retries when there is a network partition

2020-09-08 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar reassigned IMPALA-9227:


Assignee: Wenzhe Zhou

> Test coverage for query retries when there is a network partition
> -
>
> Key: IMPALA-9227
> URL: https://issues.apache.org/jira/browse/IMPALA-9227
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Wenzhe Zhou
>Priority: Major
>
> The initial version of transparent query retries just adds coverage for 
> retrying a query if an impalad crashes. Now that the Impala has an RPC fault 
> injection framework (IMPALA-8138) based on debug actions, integration tests 
> can introduce network partitions between two impalad processes.
> Node blacklisting should cause the Impala Coordinator to blacklist the nodes 
> with the network partitions (IMPALA-9137), and then transparent query retries 
> should cause the query to be successfully retried.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10154) Data race on coord_backend_id

2020-09-08 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10154:
-

 Summary: Data race on coord_backend_id
 Key: IMPALA-10154
 URL: https://issues.apache.org/jira/browse/IMPALA-10154
 Project: IMPALA
  Issue Type: Bug
Reporter: Sahil Takiar
Assignee: Wenzhe Zhou


TSAN is reporting a data race on 
{{ExecQueryFInstancesRequestPB#coord_backend_id}}
{code:java}
WARNING: ThreadSanitizer: data race (pid=15392)
  Write of size 8 at 0x7b74001104a8 by thread T83 (mutexes: write 
M871582266043729400):
#0 impala::ExecQueryFInstancesRequestPB::mutable_coord_backend_id() 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/generated-sources/gen-cpp/control_service.pb.h:6625:23
 (impalad+0x20c03ed)
#1 impala::QueryState::Init(impala::ExecQueryFInstancesRequestPB const*, 
impala::TExecPlanFragmentInfo const&) 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/runtime/query-state.cc:216:21
 (impalad+0x20b8b29)
#2 impala::QueryExecMgr::StartQuery(impala::ExecQueryFInstancesRequestPB 
const*, impala::TQueryCtx const&, impala::TExecPlanFragmentInfo const&) 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/runtime/query-exec-mgr.cc:80:23
 (impalad+0x20acb59)
#3 
impala::ControlService::ExecQueryFInstances(impala::ExecQueryFInstancesRequestPB
 const*, impala::ExecQueryFInstancesResponsePB*, kudu::rpc::RpcContext*) 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/service/control-service.cc:157:66
 (impalad+0x22a621d)
#4 
impala::ControlServiceIf::ControlServiceIf(scoped_refptr 
const&, scoped_refptr 
const&)::$_1::operator()(google::protobuf::Message const*, 
google::protobuf::Message*, kudu::rpc::RpcContext*) const 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/generated-sources/gen-cpp/control_service.service.cc:70:13
 (impalad+0x23622a4)
#5 std::_Function_handler 
const&, scoped_refptr 
const&)::$_1>::_M_invoke(std::_Any_data const&, google::protobuf::Message 
const*&&, google::protobuf::Message*&&, kudu::rpc::RpcContext*&&) 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/std_function.h:316:2
 (impalad+0x23620ed)
#6 std::function::operator()(google::protobuf::Message const*, 
google::protobuf::Message*, kudu::rpc::RpcContext*) const 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/std_function.h:706:14
 (impalad+0x2a4a453)
#7 kudu::rpc::GeneratedServiceIf::Handle(kudu::rpc::InboundCall*) 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/kudu/rpc/service_if.cc:139:3
 (impalad+0x2a49efe)
#8 impala::ImpalaServicePool::RunThread() 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/rpc/impala-service-pool.cc:272:15
 (impalad+0x2011a12)
#9 boost::_mfi::mf0::operator()(impala::ImpalaServicePool*) const 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/mem_fn_template.hpp:49:29
 (impalad+0x2017a16)
#10 void boost::_bi::list1 
>::operator(), 
boost::_bi::list0>(boost::_bi::type, boost::_mfi::mf0&, boost::_bi::list0&, int) 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/bind.hpp:259:9
 (impalad+0x201796a)
#11 boost::_bi::bind_t, 
boost::_bi::list1 > 
>::operator()() 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/bind.hpp:1222:16
 (impalad+0x20178f3)
#12 
boost::detail::function::void_function_obj_invoker0, 
boost::_bi::list1 > >, 
void>::invoke(boost::detail::function::function_buffer&) 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:159:11
 (impalad+0x20176e9)
#13 boost::function0::operator()() const 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:770:14
 (impalad+0x1f666f1)
#14 impala::Thread::SuperviseThread(std::__cxx11::basic_string, std::allocator > const&, 
std::__cxx11::basic_string, std::allocator > 
const&, boost::function, impala::ThreadDebugInfo const*, 
impala::Promise*) 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/thread.cc:360:3
 (impalad+0x252644b)
#15 void 
boost::_bi::list5, std::allocator > >, 
boost::_bi::value, 
std::allocator > >, boost::_bi::value >, 
boost::_bi::value, 
boost::_bi::value*> 
>::operator(), 
std::allocator > const&, 

[jira] [Commented] (IMPALA-10154) Data race on coord_backend_id

2020-09-08 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17192308#comment-17192308
 ] 

Sahil Takiar commented on IMPALA-10154:
---

This looks related to IMPALA-5746, so assigning to [~wzhou].

> Data race on coord_backend_id
> -
>
> Key: IMPALA-10154
> URL: https://issues.apache.org/jira/browse/IMPALA-10154
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Sahil Takiar
>Assignee: Wenzhe Zhou
>Priority: Major
>
> TSAN is reporting a data race on 
> {{ExecQueryFInstancesRequestPB#coord_backend_id}}
> {code:java}
> WARNING: ThreadSanitizer: data race (pid=15392)
>   Write of size 8 at 0x7b74001104a8 by thread T83 (mutexes: write 
> M871582266043729400):
> #0 impala::ExecQueryFInstancesRequestPB::mutable_coord_backend_id() 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/generated-sources/gen-cpp/control_service.pb.h:6625:23
>  (impalad+0x20c03ed)
> #1 impala::QueryState::Init(impala::ExecQueryFInstancesRequestPB const*, 
> impala::TExecPlanFragmentInfo const&) 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/runtime/query-state.cc:216:21
>  (impalad+0x20b8b29)
> #2 impala::QueryExecMgr::StartQuery(impala::ExecQueryFInstancesRequestPB 
> const*, impala::TQueryCtx const&, impala::TExecPlanFragmentInfo const&) 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/runtime/query-exec-mgr.cc:80:23
>  (impalad+0x20acb59)
> #3 
> impala::ControlService::ExecQueryFInstances(impala::ExecQueryFInstancesRequestPB
>  const*, impala::ExecQueryFInstancesResponsePB*, kudu::rpc::RpcContext*) 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/service/control-service.cc:157:66
>  (impalad+0x22a621d)
> #4 
> impala::ControlServiceIf::ControlServiceIf(scoped_refptr 
> const&, scoped_refptr 
> const&)::$_1::operator()(google::protobuf::Message const*, 
> google::protobuf::Message*, kudu::rpc::RpcContext*) const 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/generated-sources/gen-cpp/control_service.service.cc:70:13
>  (impalad+0x23622a4)
> #5 std::_Function_handler google::protobuf::Message*, kudu::rpc::RpcContext*), 
> impala::ControlServiceIf::ControlServiceIf(scoped_refptr 
> const&, scoped_refptr 
> const&)::$_1>::_M_invoke(std::_Any_data const&, google::protobuf::Message 
> const*&&, google::protobuf::Message*&&, kudu::rpc::RpcContext*&&) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/std_function.h:316:2
>  (impalad+0x23620ed)
> #6 std::function google::protobuf::Message*, 
> kudu::rpc::RpcContext*)>::operator()(google::protobuf::Message const*, 
> google::protobuf::Message*, kudu::rpc::RpcContext*) const 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/std_function.h:706:14
>  (impalad+0x2a4a453)
> #7 kudu::rpc::GeneratedServiceIf::Handle(kudu::rpc::InboundCall*) 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/kudu/rpc/service_if.cc:139:3
>  (impalad+0x2a49efe)
> #8 impala::ImpalaServicePool::RunThread() 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/rpc/impala-service-pool.cc:272:15
>  (impalad+0x2011a12)
> #9 boost::_mfi::mf0 impala::ImpalaServicePool>::operator()(impala::ImpalaServicePool*) const 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/mem_fn_template.hpp:49:29
>  (impalad+0x2017a16)
> #10 void boost::_bi::list1 
> >::operator(), 
> boost::_bi::list0>(boost::_bi::type, boost::_mfi::mf0 impala::ImpalaServicePool>&, boost::_bi::list0&, int) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/bind.hpp:259:9
>  (impalad+0x201796a)
> #11 boost::_bi::bind_t impala::ImpalaServicePool>, 
> boost::_bi::list1 > 
> >::operator()() 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/bind.hpp:1222:16
>  (impalad+0x20178f3)
> #12 
> boost::detail::function::void_function_obj_invoker0 boost::_mfi::mf0, 
> boost::_bi::list1 > >, 
> void>::invoke(boost::detail::function::function_buffer&) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:159:11
>  (impalad+0x20176e9)
> #13 boost::function0::operator()() const 
> 

[jira] [Created] (IMPALA-10154) Data race on coord_backend_id

2020-09-08 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10154:
-

 Summary: Data race on coord_backend_id
 Key: IMPALA-10154
 URL: https://issues.apache.org/jira/browse/IMPALA-10154
 Project: IMPALA
  Issue Type: Bug
Reporter: Sahil Takiar
Assignee: Wenzhe Zhou


TSAN is reporting a data race on 
{{ExecQueryFInstancesRequestPB#coord_backend_id}}
{code:java}
WARNING: ThreadSanitizer: data race (pid=15392)
  Write of size 8 at 0x7b74001104a8 by thread T83 (mutexes: write 
M871582266043729400):
#0 impala::ExecQueryFInstancesRequestPB::mutable_coord_backend_id() 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/generated-sources/gen-cpp/control_service.pb.h:6625:23
 (impalad+0x20c03ed)
#1 impala::QueryState::Init(impala::ExecQueryFInstancesRequestPB const*, 
impala::TExecPlanFragmentInfo const&) 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/runtime/query-state.cc:216:21
 (impalad+0x20b8b29)
#2 impala::QueryExecMgr::StartQuery(impala::ExecQueryFInstancesRequestPB 
const*, impala::TQueryCtx const&, impala::TExecPlanFragmentInfo const&) 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/runtime/query-exec-mgr.cc:80:23
 (impalad+0x20acb59)
#3 
impala::ControlService::ExecQueryFInstances(impala::ExecQueryFInstancesRequestPB
 const*, impala::ExecQueryFInstancesResponsePB*, kudu::rpc::RpcContext*) 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/service/control-service.cc:157:66
 (impalad+0x22a621d)
#4 
impala::ControlServiceIf::ControlServiceIf(scoped_refptr 
const&, scoped_refptr 
const&)::$_1::operator()(google::protobuf::Message const*, 
google::protobuf::Message*, kudu::rpc::RpcContext*) const 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/generated-sources/gen-cpp/control_service.service.cc:70:13
 (impalad+0x23622a4)
#5 std::_Function_handler 
const&, scoped_refptr 
const&)::$_1>::_M_invoke(std::_Any_data const&, google::protobuf::Message 
const*&&, google::protobuf::Message*&&, kudu::rpc::RpcContext*&&) 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/std_function.h:316:2
 (impalad+0x23620ed)
#6 std::function::operator()(google::protobuf::Message const*, 
google::protobuf::Message*, kudu::rpc::RpcContext*) const 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/std_function.h:706:14
 (impalad+0x2a4a453)
#7 kudu::rpc::GeneratedServiceIf::Handle(kudu::rpc::InboundCall*) 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/kudu/rpc/service_if.cc:139:3
 (impalad+0x2a49efe)
#8 impala::ImpalaServicePool::RunThread() 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/rpc/impala-service-pool.cc:272:15
 (impalad+0x2011a12)
#9 boost::_mfi::mf0::operator()(impala::ImpalaServicePool*) const 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/mem_fn_template.hpp:49:29
 (impalad+0x2017a16)
#10 void boost::_bi::list1 
>::operator(), 
boost::_bi::list0>(boost::_bi::type, boost::_mfi::mf0&, boost::_bi::list0&, int) 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/bind.hpp:259:9
 (impalad+0x201796a)
#11 boost::_bi::bind_t, 
boost::_bi::list1 > 
>::operator()() 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/bind.hpp:1222:16
 (impalad+0x20178f3)
#12 
boost::detail::function::void_function_obj_invoker0, 
boost::_bi::list1 > >, 
void>::invoke(boost::detail::function::function_buffer&) 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:159:11
 (impalad+0x20176e9)
#13 boost::function0::operator()() const 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:770:14
 (impalad+0x1f666f1)
#14 impala::Thread::SuperviseThread(std::__cxx11::basic_string, std::allocator > const&, 
std::__cxx11::basic_string, std::allocator > 
const&, boost::function, impala::ThreadDebugInfo const*, 
impala::Promise*) 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/thread.cc:360:3
 (impalad+0x252644b)
#15 void 
boost::_bi::list5, std::allocator > >, 
boost::_bi::value, 
std::allocator > >, boost::_bi::value >, 
boost::_bi::value, 
boost::_bi::value*> 
>::operator(), 
std::allocator > const&, 

[jira] [Commented] (IMPALA-10073) Create shaded dependency for S3A and aws-java-sdk-bundle

2020-09-08 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17192259#comment-17192259
 ] 

Sahil Takiar commented on IMPALA-10073:
---

[https://github.com/apache/impala/blob/master/shaded-deps/s3a-aws-sdk/pom.xml#L58]
 contains the full list

> Create shaded dependency for S3A and aws-java-sdk-bundle
> 
>
> Key: IMPALA-10073
> URL: https://issues.apache.org/jira/browse/IMPALA-10073
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
>
> One of the largest dependencies in Impala Docker containers is the 
> aws-java-sdk-bundle jar. One way to decrease the size of this dependency is 
> to apply a similar technique used for the hive-exec shaded jar: 
> [https://github.com/apache/impala/blob/master/shaded-deps/pom.xml]
> The aws-java-sdk-bundle contains SDKs for all AWS services, even though 
> Impala-S3A only requires a few of the more basic SDKs.
> IMPALA-10028 and HADOOP-17197 both discuss this a bit as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-10142) Add RPC sender tracing

2020-09-03 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190302#comment-17190302
 ] 

Sahil Takiar edited comment on IMPALA-10142 at 9/3/20, 5:03 PM:


Actually this only really be useful if the RPC response includes some trace 
information as well, otherwise it is hard to capture time actually spent on the 
network. Currently, the {{TransmitDataResponsePB}} just includes the 
{{receiver_latency_ns}}. Adding that into the trace would be useful, other 
things such the timestamp when the RPC was received by the receiver, time in 
queue, etc. would be useful as well.

The timestamp of when the RPC was received by the sender would be particularly 
useful in debugging RPCs where the network is slow.


was (Author: stakiar):
Actually this only really be useful if the RPC response includes some trace 
information as well. Currently, the {{TransmitDataResponsePB}} just includes 
the {{receiver_latency_ns}}. Adding that into the trace would be useful, other 
things such the timestamp when the RPC was received by the receiver, time in 
queue, etc. would be useful as well.

> Add RPC sender tracing
> --
>
> Key: IMPALA-10142
> URL: https://issues.apache.org/jira/browse/IMPALA-10142
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Sahil Takiar
>Priority: Major
>
> We currently have RPC tracing on the receiver side, but not on the the sender 
> side. For slow RPCs, the logs print out the total amount of time spent 
> sending the RPC + the network time. Adding tracing will basically make this 
> more granular. It will help determine where exactly in the stack the time was 
> spent when sending RPCs.
> Combined with the trace logs in the receiver, it should be much easier to 
> determine the timeline of a given slow RPC.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10142) Add RPC sender tracing

2020-09-03 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190302#comment-17190302
 ] 

Sahil Takiar commented on IMPALA-10142:
---

Actually this only really be useful if the RPC response includes some trace 
information as well. Currently, the {{TransmitDataResponsePB}} just includes 
the {{receiver_latency_ns}}. Adding that into the trace would be useful, other 
things such the timestamp when the RPC was received by the receiver, time in 
queue, etc. would be useful as well.

> Add RPC sender tracing
> --
>
> Key: IMPALA-10142
> URL: https://issues.apache.org/jira/browse/IMPALA-10142
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Sahil Takiar
>Priority: Major
>
> We currently have RPC tracing on the receiver side, but not on the the sender 
> side. For slow RPCs, the logs print out the total amount of time spent 
> sending the RPC + the network time. Adding tracing will basically make this 
> more granular. It will help determine where exactly in the stack the time was 
> spent when sending RPCs.
> Combined with the trace logs in the receiver, it should be much easier to 
> determine the timeline of a given slow RPC.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10142) Add RPC sender tracing

2020-09-03 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10142:
-

 Summary: Add RPC sender tracing
 Key: IMPALA-10142
 URL: https://issues.apache.org/jira/browse/IMPALA-10142
 Project: IMPALA
  Issue Type: Improvement
Reporter: Sahil Takiar


We currently have RPC tracing on the receiver side, but not on the the sender 
side. For slow RPCs, the logs print out the total amount of time spent sending 
the RPC + the network time. Adding tracing will basically make this more 
granular. It will help determine where exactly in the stack the time was spent 
when sending RPCs.

Combined with the trace logs in the receiver, it should be much easier to 
determine the timeline of a given slow RPC.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10142) Add RPC sender tracing

2020-09-03 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10142:
-

 Summary: Add RPC sender tracing
 Key: IMPALA-10142
 URL: https://issues.apache.org/jira/browse/IMPALA-10142
 Project: IMPALA
  Issue Type: Improvement
Reporter: Sahil Takiar


We currently have RPC tracing on the receiver side, but not on the the sender 
side. For slow RPCs, the logs print out the total amount of time spent sending 
the RPC + the network time. Adding tracing will basically make this more 
granular. It will help determine where exactly in the stack the time was spent 
when sending RPCs.

Combined with the trace logs in the receiver, it should be much easier to 
determine the timeline of a given slow RPC.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10139) Slow RPC logs can be misleading

2020-09-03 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190296#comment-17190296
 ] 

Sahil Takiar commented on IMPALA-10139:
---

One other thought is that with result spooling enabled, the back-pressure 
mechanism won't be such a big issue anymore because results will all get 
spooled, regardless of whether clients fetch results slowly or not.

> Slow RPC logs can be misleading
> ---
>
> Key: IMPALA-10139
> URL: https://issues.apache.org/jira/browse/IMPALA-10139
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Sahil Takiar
>Priority: Major
>
> The slow RPC logs added in IMPALA-9128 are based on the total time taken to 
> successfully complete a RPC. The issue is that there are many reasons why an 
> RPC might take a long time to complete. An RPC is considered complete only 
> when the receiver has processed that RPC. 
> The problem is that due to client-driven back-pressure mechanism, it is 
> entirely possible that the receiver RPC does not process a receiver RPC 
> because {{KrpcDataStreamRecvr::SenderQueue::GetBatch}} just hasn't been 
> called yet (indirectly called by {{ExchangeNode::GetNext}}).
> This can lead to flood of slow RPC logs, even though the RPCs might not 
> actually be slow themselves. What is worse is that the because of the 
> back-pressure mechanism, slowness from the client (e.g. Hue users) will 
> propagate across all nodes involved in the query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10139) Slow RPC logs can be misleading

2020-09-03 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190295#comment-17190295
 ] 

Sahil Takiar commented on IMPALA-10139:
---

I think there is a similar issue with the TRACE logs. Take the example TRACE 
above, the majority of the time the RPC was just in the deferred state - e.g. 
there was not enough resources to process the RPC. Again, this just means that 
the back-pressure mechanism was kicking in, not necessarily that the network 
was slow.

> Slow RPC logs can be misleading
> ---
>
> Key: IMPALA-10139
> URL: https://issues.apache.org/jira/browse/IMPALA-10139
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Sahil Takiar
>Priority: Major
>
> The slow RPC logs added in IMPALA-9128 are based on the total time taken to 
> successfully complete a RPC. The issue is that there are many reasons why an 
> RPC might take a long time to complete. An RPC is considered complete only 
> when the receiver has processed that RPC. 
> The problem is that due to client-driven back-pressure mechanism, it is 
> entirely possible that the receiver RPC does not process a receiver RPC 
> because {{KrpcDataStreamRecvr::SenderQueue::GetBatch}} just hasn't been 
> called yet (indirectly called by {{ExchangeNode::GetNext}}).
> This can lead to flood of slow RPC logs, even though the RPCs might not 
> actually be slow themselves. What is worse is that the because of the 
> back-pressure mechanism, slowness from the client (e.g. Hue users) will 
> propagate across all nodes involved in the query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9870) summary and profile command in impala-shell should show both original and retried info

2020-09-02 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189775#comment-17189775
 ] 

Sahil Takiar commented on IMPALA-9870:
--

WIP Patch: http://gerrit.cloudera.org:8080/16406

> summary and profile command in impala-shell should show both original and 
> retried info
> --
>
> Key: IMPALA-9870
> URL: https://issues.apache.org/jira/browse/IMPALA-9870
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Major
>
> If a query is retried, impala-shell still uses the original query handle 
> containing the original query id. Subsequent "summary" and "profile" commands 
> will return results of the original query. We should consider return both the 
> original and retried information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10141) Include aggregate TCP metrics in per-node profiles

2020-09-02 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189724#comment-17189724
 ] 

Sahil Takiar commented on IMPALA-10141:
---

Adding some additional fields from {{/proc/net/dev}} in the per-node stats from 
system-info.cc might be useful as well. Fields like NET_RX_ERRS, NET_RX_DROP, 
NET_TX_ERRS, NET_TX_DROP might be useful to track transmit / receive errors or 
dropped packets. Although these stats are probably more generic as they are not 
specific to the kRPC TCP connections and are truly at the host level. I'm also 
not sure what exactly they capture compared to the TCP stats. They seem more 
hardware specific, maybe they would capture host NIC issues.

> Include aggregate TCP metrics in per-node profiles
> --
>
> Key: IMPALA-10141
> URL: https://issues.apache.org/jira/browse/IMPALA-10141
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Sahil Takiar
>Priority: Major
>
> The /rpcz endpoint in the debug web ui includes a ton of useful TCP-level 
> metrics per kRPC connection for all inbound / outbound connections. It would 
> be useful to aggregate some of these metrics and put them in the per-node 
> profiles. Since it is not possible to currently split these metrics out per 
> query, they should be added at the per-host level. Furthermore, only metrics 
> that can be sanely aggregated across all connections should be included. For 
> example, tracking the number of Retransmitted TCP Packets across all 
> connections for the duration of the query would be useful. TCP 
> retransmissions should be rare and are typically indicate of network hardware 
> issues or network congestions, having at least some high level idea of the 
> number of TCP retransmissions that occur during a query can drastically help 
> determine if the network is to blame for query slowness.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10139) Slow RPC logs can be misleading

2020-09-02 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189718#comment-17189718
 ] 

Sahil Takiar commented on IMPALA-10139:
---

The "network" time (calculated as {{int64_t network_time_ns = total_time_ns - 
resp_.receiver_latency_ns()}}) might be a more useful threshold value to use.

> Slow RPC logs can be misleading
> ---
>
> Key: IMPALA-10139
> URL: https://issues.apache.org/jira/browse/IMPALA-10139
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Sahil Takiar
>Priority: Major
>
> The slow RPC logs added in IMPALA-9128 are based on the total time taken to 
> successfully complete a RPC. The issue is that there are many reasons why an 
> RPC might take a long time to complete. An RPC is considered complete only 
> when the receiver has processed that RPC. 
> The problem is that due to client-driven back-pressure mechanism, it is 
> entirely possible that the receiver RPC does not process a receiver RPC 
> because {{KrpcDataStreamRecvr::SenderQueue::GetBatch}} just hasn't been 
> called yet (indirectly called by {{ExchangeNode::GetNext}}).
> This can lead to flood of slow RPC logs, even though the RPCs might not 
> actually be slow themselves. What is worse is that the because of the 
> back-pressure mechanism, slowness from the client (e.g. Hue users) will 
> propagate across all nodes involved in the query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10141) Include aggregate TCP metrics in per-node profiles

2020-09-02 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10141:
-

 Summary: Include aggregate TCP metrics in per-node profiles
 Key: IMPALA-10141
 URL: https://issues.apache.org/jira/browse/IMPALA-10141
 Project: IMPALA
  Issue Type: Improvement
Reporter: Sahil Takiar


The /rpcz endpoint in the debug web ui includes a ton of useful TCP-level 
metrics per kRPC connection for all inbound / outbound connections. It would be 
useful to aggregate some of these metrics and put them in the per-node 
profiles. Since it is not possible to currently split these metrics out per 
query, they should be added at the per-host level. Furthermore, only metrics 
that can be sanely aggregated across all connections should be included. For 
example, tracking the number of Retransmitted TCP Packets across all 
connections for the duration of the query would be useful. TCP retransmissions 
should be rare and are typically indicate of network hardware issues or network 
congestions, having at least some high level idea of the number of TCP 
retransmissions that occur during a query can drastically help determine if the 
network is to blame for query slowness.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10141) Include aggregate TCP metrics in per-node profiles

2020-09-02 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10141:
-

 Summary: Include aggregate TCP metrics in per-node profiles
 Key: IMPALA-10141
 URL: https://issues.apache.org/jira/browse/IMPALA-10141
 Project: IMPALA
  Issue Type: Improvement
Reporter: Sahil Takiar


The /rpcz endpoint in the debug web ui includes a ton of useful TCP-level 
metrics per kRPC connection for all inbound / outbound connections. It would be 
useful to aggregate some of these metrics and put them in the per-node 
profiles. Since it is not possible to currently split these metrics out per 
query, they should be added at the per-host level. Furthermore, only metrics 
that can be sanely aggregated across all connections should be included. For 
example, tracking the number of Retransmitted TCP Packets across all 
connections for the duration of the query would be useful. TCP retransmissions 
should be rare and are typically indicate of network hardware issues or network 
congestions, having at least some high level idea of the number of TCP 
retransmissions that occur during a query can drastically help determine if the 
network is to blame for query slowness.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   4   5   6   7   8   9   10   >