[jira] [Commented] (IMPALA-10120) Beeline hangs when connecting to coordinators

2020-10-23 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17219768#comment-17219768
 ] 

Sahil Takiar commented on IMPALA-10120:
---

Not particularly familiar with that config, but this is what the Hive code has 
to say about it:
{code:java}
HIVE_SERVER2_THRIFT_RESULTSET_DEFAULT_FETCH_SIZE("hive.server2.thrift.resultset.default.fetch.size",
 1000,
"The number of rows sent in one Fetch RPC call by the server to the 
client, if not\n" +
"specified by the client."), {code}

> Beeline hangs when connecting to coordinators
> -
>
> Key: IMPALA-10120
> URL: https://issues.apache.org/jira/browse/IMPALA-10120
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Quanlong Huang
>Priority: Critical
>
> Beeline is always hanging when connecting to a coordinator:
> {code:java}
> $ beeline -u "jdbc:hive2://localhost:21050/default;auth=noSasl"
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/home/quanlong/workspace/Impala/toolchain/cdp_components-4493826/apache-hive-3.1.3000.7.2.1.0-287-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/home/quanlong/workspace/Impala/toolchain/cdp_components-4493826/hadoop-3.1.1.7.2.1.0-287/share/hadoop/common/lib/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> ERROR StatusLogger No log4j2 configuration file found. Using default 
> configuration: logging only errors to the console. Set system property 
> 'log4j2.debug' to show Log4j2 internal initialization logging.
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/home/quanlong/workspace/Impala/toolchain/cdp_components-4493826/apache-hive-3.1.3000.7.2.1.0-287-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/home/quanlong/workspace/Impala/toolchain/cdp_components-4493826/hadoop-3.1.1.7.2.1.0-287/share/hadoop/common/lib/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> Connecting to jdbc:hive2://localhost:21050/default;auth=noSasl
> Connected to: Impala (version 4.0.0-SNAPSHOT)
> Driver: Hive JDBC (version 3.1.3000.7.2.1.0-287)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> {code}
> Looking into the impalad log file, there is a wrong option set:
> {code:java}
> I0901 15:41:14.577576 25325 TAcceptQueueServer.cpp:340] New connection to 
> server hiveserver2-frontend from client 
> I0901 15:41:14.577911 25597 impala-hs2-server.cc:300] Opening session: 
> 204a3f33cc8e28ea:d6a915ab96b26aa7 request username: 
> I0901 15:41:14.577970 25597 status.cc:129] Invalid query option: 
> set:hiveconf:hive.server2.thrift.resultset.default.fetch.size
> @  0x1cdba3d  impala::Status::Status()
> @  0x24c673f  impala::SetQueryOption()
> @  0x250c1d1  impala::ImpalaServer::OpenSession()
> @  0x2b0dc45  
> apache::hive::service::cli::thrift::TCLIServiceProcessor::process_OpenSession()
> @  0x2b0d993  
> apache::hive::service::cli::thrift::TCLIServiceProcessor::dispatchCall()
> @  0x2acd15a  
> impala::ImpalaHiveServer2ServiceProcessor::dispatchCall()
> @  0x1c8a483  apache::thrift::TDispatchProcessor::process()
> @  0x218ab4a  
> apache::thrift::server::TAcceptQueueServer::Task::run()
> @  0x218004a  impala::ThriftThread::RunRunnable()
> @  0x2181686  boost::_mfi::mf2<>::operator()()
> @  0x218151a  boost::_bi::list3<>::operator()<>()
> @  0x2181260  boost::_bi::bind_t<>::operator()()
> @  0x2181172  
> boost::detail::function::void_function_obj_invoker0<>::invoke()
> @  0x20fba57  boost::function0<>::operator()()
> @  0x26cb779  impala::Thread::SuperviseThread()
> @  0x26d3716  boost::_bi::list5<>::operator()<>()
> @  0x26d363a  boost::_bi::bind_t<>::operator()()
> @  0x26d35fb  boost::detail::thread_data<>::run()
> @  0x3eb7ae1  thread_proxy
> @ 0x7fc9443456b9  start_thread
> @ 0x7fc940e334dc  clone
> I0901 15:41:14.739985 25597 impala-hs2-server.cc:405] Opened session: 
> 204a3f33cc8e28ea:d6a915ab96b26aa7 effective username: 
> I0901 15:41:14.781677 25597 impala-hs2-server.cc:426] GetInfo(): 
> request=TGetInfoReq {
>   01: sessionHandle (struct) = TSessionHandle {
> 01: sessionId 

[jira] [Resolved] (IMPALA-9954) RpcRecvrTime can be negative

2020-10-19 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-9954.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> RpcRecvrTime can be negative
> 
>
> Key: IMPALA-9954
> URL: https://issues.apache.org/jira/browse/IMPALA-9954
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Riza Suminto
>Priority: Major
> Fix For: Impala 4.0
>
> Attachments: profile_034e7209bd98c96c_9a448dfc.txt
>
>
> Saw this on a recent version of master. Attached the full runtime profile.
> {code:java}
> KrpcDataStreamSender (dst_id=2):(Total: 9.863ms, non-child: 3.185ms, 
> % non-child: 32.30%)
>   ExecOption: Unpartitioned Sender Codegen Disabled: not needed
>- BytesSent (500.000ms): 0, 0
>- NetworkThroughput: (Avg: 4.34 MB/sec ; Min: 4.34 MB/sec ; Max: 
> 4.34 MB/sec ; Number of samples: 1)
>- RpcNetworkTime: (Avg: 3.562ms ; Min: 679.676us ; Max: 6.445ms ; 
> Number of samples: 2)
>- RpcRecvrTime: (Avg: -151281.000ns ; Min: -231485.000ns ; Max: 
> -71077.000ns ; Number of samples: 2)
>- EosSent: 1 (1)
>- PeakMemoryUsage: 416.00 B (416)
>- RowsSent: 100 (100)
>- RpcFailure: 0 (0)
>- RpcRetry: 0 (0)
>- SerializeBatchTime: 2.880ms
>- TotalBytesSent: 28.67 KB (29355)
>- UncompressedRowBatchSize: 69.29 KB (70950) {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-9954) RpcRecvrTime can be negative

2020-10-19 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar reassigned IMPALA-9954:


Assignee: Riza Suminto

> RpcRecvrTime can be negative
> 
>
> Key: IMPALA-9954
> URL: https://issues.apache.org/jira/browse/IMPALA-9954
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Riza Suminto
>Priority: Major
> Attachments: profile_034e7209bd98c96c_9a448dfc.txt
>
>
> Saw this on a recent version of master. Attached the full runtime profile.
> {code:java}
> KrpcDataStreamSender (dst_id=2):(Total: 9.863ms, non-child: 3.185ms, 
> % non-child: 32.30%)
>   ExecOption: Unpartitioned Sender Codegen Disabled: not needed
>- BytesSent (500.000ms): 0, 0
>- NetworkThroughput: (Avg: 4.34 MB/sec ; Min: 4.34 MB/sec ; Max: 
> 4.34 MB/sec ; Number of samples: 1)
>- RpcNetworkTime: (Avg: 3.562ms ; Min: 679.676us ; Max: 6.445ms ; 
> Number of samples: 2)
>- RpcRecvrTime: (Avg: -151281.000ns ; Min: -231485.000ns ; Max: 
> -71077.000ns ; Number of samples: 2)
>- EosSent: 1 (1)
>- PeakMemoryUsage: 416.00 B (416)
>- RowsSent: 100 (100)
>- RpcFailure: 0 (0)
>- RpcRetry: 0 (0)
>- SerializeBatchTime: 2.880ms
>- TotalBytesSent: 28.67 KB (29355)
>- UncompressedRowBatchSize: 69.29 KB (70950) {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9954) RpcRecvrTime can be negative

2020-10-19 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217063#comment-17217063
 ] 

Sahil Takiar commented on IMPALA-9954:
--

[~rizaon] so if understand correctly, the remaining work to be done here is to 
add proper locking of {{rpc_start_time_ns_}} in 
{{be/src/runtime/krpc-data-stream-sender.cc}}

> RpcRecvrTime can be negative
> 
>
> Key: IMPALA-9954
> URL: https://issues.apache.org/jira/browse/IMPALA-9954
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Priority: Major
> Attachments: profile_034e7209bd98c96c_9a448dfc.txt
>
>
> Saw this on a recent version of master. Attached the full runtime profile.
> {code:java}
> KrpcDataStreamSender (dst_id=2):(Total: 9.863ms, non-child: 3.185ms, 
> % non-child: 32.30%)
>   ExecOption: Unpartitioned Sender Codegen Disabled: not needed
>- BytesSent (500.000ms): 0, 0
>- NetworkThroughput: (Avg: 4.34 MB/sec ; Min: 4.34 MB/sec ; Max: 
> 4.34 MB/sec ; Number of samples: 1)
>- RpcNetworkTime: (Avg: 3.562ms ; Min: 679.676us ; Max: 6.445ms ; 
> Number of samples: 2)
>- RpcRecvrTime: (Avg: -151281.000ns ; Min: -231485.000ns ; Max: 
> -71077.000ns ; Number of samples: 2)
>- EosSent: 1 (1)
>- PeakMemoryUsage: 416.00 B (416)
>- RowsSent: 100 (100)
>- RpcFailure: 0 (0)
>- RpcRetry: 0 (0)
>- SerializeBatchTime: 2.880ms
>- TotalBytesSent: 28.67 KB (29355)
>- UncompressedRowBatchSize: 69.29 KB (70950) {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10220) Min value of RpcNetworkTime can be negative

2020-10-19 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217061#comment-17217061
 ] 

Sahil Takiar commented on IMPALA-10220:
---

[~rizaon] can this be closed?

> Min value of RpcNetworkTime can be negative
> ---
>
> Key: IMPALA-10220
> URL: https://issues.apache.org/jira/browse/IMPALA-10220
> Project: IMPALA
>  Issue Type: Bug
>  Components: Distributed Exec
>Affects Versions: Impala 3.4.0
>Reporter: Riza Suminto
>Assignee: Riza Suminto
>Priority: Major
>
> There is a bug in function 
> KrpcDataStreamSender::Channel::EndDataStreamCompleteCb(), particularly in 
> this line:
> [https://github.com/apache/impala/blob/d453d52/be/src/runtime/krpc-data-stream-sender.cc#L635]
> network_time_ns should be computed using eos_rsp_.receiver_latency_ns() 
> instead of resp_.receiver_latency_ns().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-9370) Re-factor ImpalaServer, ClientRequestState, Coordinator protocol

2020-10-14 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar reassigned IMPALA-9370:


Assignee: (was: Sahil Takiar)

> Re-factor ImpalaServer, ClientRequestState, Coordinator protocol
> 
>
> Key: IMPALA-9370
> URL: https://issues.apache.org/jira/browse/IMPALA-9370
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Priority: Major
>
> All of these classes need to be updated to support transparent query retries, 
> and each one could due with some re-factoring so that query retries don't 
> make this code even more complex. For now, I'm going to list out some ideas / 
> suggestions:
>  * Rename ImpalaServer to ImpalaService, I think ImpalaServer is a bit of a 
> misnomer because Impala isn't implementing its own server (it uses Thrift for 
> that) instead it is providing a "service" to end users - this name is 
> consistent with Thrift "service"s as well
>  * Split up ClientRequestState - I'm not sure I fully understand what 
> ClientRequestState is suppose to encapsulate - perhaps originally it captured 
> the state of the actual client request as well as some helper code, but it 
> seems to have evolved over time; it doesn't really look like a purely 
> "stateful" object any more (e.g. it manages admission control submission)
> One possible end state could be:
> ImpalaService <–> QueryDriver (has a ClientRequestState that is not exposed 
> externally) <–> QueryInstance <–> Coordinator
> The QueryDriver is responsible for E2E execution of a query, including all 
> stages such as parsing / planning of a query, submission to admission 
> control, and backend execution. A QueryInstance is a single instance of a 
> query, this is necessary for query retry support since a single query can be 
> run multiple times. The Coordinator remains mostly the same - it is purely 
> responsible for *backend* coordination / execution of a query.
> This provides an opportunity to move a lot of the execution specific logic 
> out of ImpalaServer and into QueryDriver. Currently, ImpalaServer is 
> responsible for submitting the query to the fe/ and then passing the result 
> to the ClientRequestState which submits it for admission control (and 
> eventually the Coordinator for execution).
> QueryDriver encapsulates the E2E execution of a query (starting from a query 
> string, and then returning the results of a query) (inspired by Hive's 
> IDriver interface - 
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/IDriver.java]).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9856) Enable result spooling by default

2020-10-14 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214243#comment-17214243
 ] 

Sahil Takiar commented on IMPALA-9856:
--

I tried to do this, but hit a DCHECK while running exhaustive tests:
{code:java}
Log file created at: 2020/10/13 15:33:06
Running on machine: 
impala-ec2-centos74-m5-4xlarge-ondemand-012a.vpc.cloudera.com
Log line format: [IWEF]mmdd hh:mm:ss.uu threadid file:line] msg
F1013 15:33:06.568224 22912 query-state.cc:877] 
914777cab6a164b8:dce62b1d] Check failed: is_cancelled_.Load() == 1 (0 
vs. 1) {code}
Mindump Stack:
{code}
Operating system: Linux
  0.0.0 Linux 3.10.0-693.5.2.el7.x86_64 #1 SMP Fri Oct 20 
20:32:50 UTC 2017 x86_64
CPU: amd64
 family 6 model 85 stepping 7
 1 CPU

GPU: UNKNOWN

Crash reason:  SIGABRT
Crash address: 0x7d12913
Process uptime: not available

Thread 410 (crashed)
 0  libc-2.17.so + 0x351f7
rax = 0x   rdx = 0x0006
rcx = 0x   rbx = 0x0004
rsi = 0x5980   rdi = 0x2913
rbp = 0x7f53dff7acc0   rsp = 0x7f53dff7a948
 r8 = 0xr9 = 0x7f53dff7a7c0
r10 = 0x0008   r11 = 0x0202
r12 = 0x076bb400   r13 = 0x0086
r14 = 0x076bb404   r15 = 0x076b3a20
rip = 0x7f54b87021f7
Found by: given as instruction pointer in context
 1  impalad!google::LogMessage::Flush() + 0x1eb
rbp = 0x7f53dff7ae10   rsp = 0x7f53dff7acd0
rip = 0x051fec5b
Found by: previous frame's frame pointer
 2  impalad!google::LogMessageFatal::~LogMessageFatal() + 0x9
rbx = 0x0001   rbp = 0x7f53dff7ae60
rsp = 0x7f53dff7ad70   r12 = 0x0d2ad680
r13 = 0x7f5458a88690   r14 = 0x2074e8a0
r15 = 0x0034   rip = 0x05202859
Found by: call frame info
 3  impalad!impala::QueryState::MonitorFInstances() [query-state.cc : 877 + 0xc]
rbx = 0x0001   rbp = 0x7f53dff7ae60
rsp = 0x7f53dff7ad80   r12 = 0x0d2ad680
r13 = 0x7f5458a88690   r14 = 0x2074e8a0
r15 = 0x0034   rip = 0x0227b5a0
Found by: call frame info
 4  impalad!impala::QueryExecMgr::ExecuteQueryHelper(impala::QueryState*) 
[query-exec-mgr.cc : 162 + 0xf]
rbx = 0x13e76000   rbp = 0x7f53dff7b6b0
rsp = 0x7f53dff7ae70   r12 = 0x0d2ad680
r13 = 0x7f5458a88690   r14 = 0x2074e8a0
r15 = 0x0034   rip = 0x0226ad41
Found by: call frame info
 5  impalad!boost::_mfi::mf1::operator()(impala::QueryExecMgr*, impala::QueryState*) 
const [mem_fn_template.hpp : 165 + 0xc]
rbx = 0x13e76000   rbp = 0x7f53dff7b6e0
rsp = 0x7f53dff7b6c0   r12 = 0x0d2ad680
r13 = 0x7f5458a88690   r14 = 0x2074e8a0
r15 = 0x0034   rip = 0x02273655
Found by: call frame info
 6  impalad!void boost::_bi::list2, 
boost::_bi::value >::operator(), 
boost::_bi::list0>(boost::_bi::type, boost::_mfi::mf1&, boost::_bi::list0&, int) [bind.hpp 
: 319 + 0x52]
rbx = 0x13e76000   rbp = 0x7f53dff7b720
rsp = 0x7f53dff7b6f0   r12 = 0x0d2ad680
r13 = 0x7f5458a88690   r14 = 0x2074e8a0
r15 = 0x0034   rip = 0x02272f1e
Found by: call frame info
 7  impalad!boost::_bi::bind_t, 
boost::_bi::list2, 
boost::_bi::value > >::operator()() [bind.hpp : 1222 + 
0x22]
rbx = 0x5980   rbp = 0x7f53dff7b770
rsp = 0x7f53dff7b730   r12 = 0x086e72c0
r13 = 0x7f5458a88690   r14 = 0x2074e8a0
r15 = 0x0034   rip = 0x02272525
Found by: call frame info
 8  
impalad!boost::detail::function::void_function_obj_invoker0, 
boost::_bi::list2, 
boost::_bi::value > >, 
void>::invoke(boost::detail::function::function_buffer&) [function_template.hpp 
: 159 + 0xc]
rbx = 0x5980   rbp = 0x7f53dff7b7a0
rsp = 0x7f53dff7b780   r12 = 0x086e72c0
r13 = 0x7f5458a88690   r14 = 0x2074e8a0
r15 = 0x0034   rip = 0x0227193f
Found by: call frame info
 9  impalad!boost::function0::operator()() const [function_template.hpp : 
770 + 0x1d]
rbx = 0x5980   rbp = 0x7f53dff7b7e0
rsp = 0x7f53dff7b7b0   r12 = 0x086e72c0
r13 = 0x7f5458a88690   r14 = 0x2074e8a0
r15 = 0x0034   rip = 0x02137600
Found by: call frame info
10  impalad!impala::Thread::SuperviseThread(std::__cxx11::basic_string, std::allocator > const&, 
std::__cxx11::basic_string, std::allocator > 
const&, boost::function, impala::ThreadDebugInfo const*, 
impala::Promise*) [thread.cc : 360 + 0xf]
rbx = 0x5980   rbp = 

[jira] [Assigned] (IMPALA-10238) Add fault tolerance docs

2020-10-14 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar reassigned IMPALA-10238:
-

Assignee: (was: Sahil Takiar)

> Add fault tolerance docs
> 
>
> Key: IMPALA-10238
> URL: https://issues.apache.org/jira/browse/IMPALA-10238
> Project: IMPALA
>  Issue Type: Task
>  Components: Docs
>Reporter: Sahil Takiar
>Priority: Major
>
> Impala docs currently don't have much information about any of our fault 
> tolerance features. We should add a dedicated section with several sub-topics 
> to address this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10242) impala-shell client retry for failed Fetch RPC calls.

2020-10-14 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214232#comment-17214232
 ] 

Sahil Takiar commented on IMPALA-10242:
---

I'm actually not sure if this will work. Result spooling wasn't really designed 
with this use case in mind, although result spooling does make this much easier 
to implement. Result spooling is basically backed by a 
{{buffered-tuple-stream.h}} which is the same object used for operator 
spill-to-disk, I'm not sure if result spooling currently uses a 
{{buffered-tuple-stream.h}} in a way that would support this (I think it should 
by just pinning the whole stream in memory?), but Tim would know.

There are a few other considerations as well. I don't think we really support 
this from a client perspective. To make fetch operations idempotent, Impala 
would probably need to support a Fetch Orientation (e.g. TFetchOrientation) 
beyond FETCH_NEXT and FETCH_FIRST. Support for something like FETCH_ABSOLUTE 
might be necessary.

The issue is that fetch operations are just done using a simple iterator 
interface (e.g. FETCH_NEXT). I don't think the impala-shell even tracks at 
which point in the result set it has fetched rows up to. It just calls fetch 
next in a loop until it returns hasMoreRows = false.

> impala-shell client retry for failed Fetch RPC calls.
> -
>
> Key: IMPALA-10242
> URL: https://issues.apache.org/jira/browse/IMPALA-10242
> Project: IMPALA
>  Issue Type: New Feature
>Reporter: Abhishek Rawat
>Priority: Major
>
> impala-shell client can retry failed idempotent rpcs. This work was done as 
> part of IMPALA-9466.
> Since Impala also supports result spooling, the impala-shell client could 
> also retry failed fetch rpc calls in some scenarios when result spooling is 
> enabled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10241) Impala Doc: RPC troubleshooting guide

2020-10-14 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10241:
-

 Summary: Impala Doc: RPC troubleshooting guide
 Key: IMPALA-10241
 URL: https://issues.apache.org/jira/browse/IMPALA-10241
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Sahil Takiar


There have been several diagnostic improvements to how RPCs can be debugged. We 
should document them a bit along with the associated options for configuring 
them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10240) Impala Doc: Add docs for cluster membership statestore heartbeats

2020-10-14 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214064#comment-17214064
 ] 

Sahil Takiar commented on IMPALA-10240:
---

Would be nice to document what exactly happens when a node is remove from the 
cluster membership - e.g. all queries running on that node are either cancelled 
or retried (they are retried if transparent query retries are enabled). This 
should cover the scenario where a coordinator fails as well (e.g. all queries 
die, and the executors eventually timeout all fragments and cancel them).

> Impala Doc: Add docs for cluster membership statestore heartbeats
> -
>
> Key: IMPALA-10240
> URL: https://issues.apache.org/jira/browse/IMPALA-10240
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Priority: Major
>
> I don't see many docs explaining how the current cluster membership logic 
> works (e.g. via the statestored heartbeats). Would be nice to include a high 
> level explanation along with how to configure the heartbeat threshold.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10239) Impala Doc: Add docs for node blacklisting

2020-10-14 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated IMPALA-10239:
--
Summary: Impala Doc: Add docs for node blacklisting  (was: Docs: Add docs 
for node blacklisting)

> Impala Doc: Add docs for node blacklisting
> --
>
> Key: IMPALA-10239
> URL: https://issues.apache.org/jira/browse/IMPALA-10239
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> We should add some docs for node blacklisting explaining what is it, how it 
> works at a high level, what errors it captures, how to debug it, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10240) Impala Doc: Add docs for cluster membership statestore heartbeats

2020-10-14 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10240:
-

 Summary: Impala Doc: Add docs for cluster membership statestore 
heartbeats
 Key: IMPALA-10240
 URL: https://issues.apache.org/jira/browse/IMPALA-10240
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Sahil Takiar


I don't see many docs explaining how the current cluster membership logic works 
(e.g. via the statestored heartbeats). Would be nice to include a high level 
explanation along with how to configure the heartbeat threshold.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10239) Docs: Add docs for node blacklisting

2020-10-14 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10239:
-

 Summary: Docs: Add docs for node blacklisting
 Key: IMPALA-10239
 URL: https://issues.apache.org/jira/browse/IMPALA-10239
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Sahil Takiar
Assignee: Sahil Takiar


We should add some docs for node blacklisting explaining what is it, how it 
works at a high level, what errors it captures, how to debug it, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10238) Add fault tolerance docs

2020-10-14 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10238:
-

 Summary: Add fault tolerance docs
 Key: IMPALA-10238
 URL: https://issues.apache.org/jira/browse/IMPALA-10238
 Project: IMPALA
  Issue Type: Task
  Components: Docs
Reporter: Sahil Takiar
Assignee: Sahil Takiar


Impala docs currently don't have much information about any of our fault 
tolerance features. We should add a dedicated section with several sub-topics 
to address this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10235) Averaged timer profile counters can be negative for trivial queries

2020-10-13 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated IMPALA-10235:
--
Description: 
Steps to reproduce on master:
{code:java}
stakiar @ stakiar-desktop -bash ~/Impala 2020-10-13 11:13:02 master
 [74] → ./bin/impala-shell.sh -q "select sleep(100) from functional.alltypes 
limit 25" -p > profile-output.txt
...
Query: select sleep(100) from functional.alltypes limit 25
Query submitted at: 2020-10-13 11:13:07 (Coordinator: 
http://stakiar-desktop:25000)
Query progress can be monitored at: 
http://stakiar-desktop:25000/query_plan?query_id=694f94671571d4d1:cdec9db9
Fetched 25 row(s) in 2.64s
{code}
Attached the contents of {{profile-output.txt}}

Relevant portion of the profile:
{code:java}
Averaged Fragment F00:(Total: 2s603ms, non-child: 272.519us, % non-child: 
0.01%)
...
   - CompletionTime: -1665218428.000ns
...
   - TotalThreadsTotalWallClockTime: -1686005515.000ns
 - TotalThreadsSysTime: 0.000ns
 - TotalThreadsUserTime: 2.151ms
...
   - TotalTime: -1691524485.000ns
{code}
For whatever reason, this only affects the averaged fragment profile. For this 
query, there was only one coordinator fragment and thus only one fragment 
instance. The coordinator fragment instance showed normal timer values:
{code:java}
Coordinator Fragment F00:
...
 - CompletionTime: 2s629ms
...
 - TotalThreadsTotalWallClockTime: 2s608ms
   - TotalThreadsSysTime: 0.000ns
   - TotalThreadsUserTime: 2.151ms
...
 - TotalTime: 2s603ms
{code}

  was:
Steps to reproduce on master:
{code}
stakiar @ stakiar-desktop -bash ~/Impala 2020-10-13 11:13:02 master
 [74] → ./bin/impala-shell.sh -q "select sleep(100) from functional.alltypes 
limit 25" -p > profile-output.txt
...
Query: select sleep(100) from functional.alltypes limit 25
Query submitted at: 2020-10-13 11:13:07 (Coordinator: 
http://stakiar-desktop:25000)
Query progress can be monitored at: 
http://stakiar-desktop:25000/query_plan?query_id=694f94671571d4d1:cdec9db9
Fetched 25 row(s) in 2.64s
{code}

Attached the contents of {{profile-output.txt}}

Relevant portion of the profile:

{code}
Averaged Fragment F00:(Total: 2s603ms, non-child: 272.519us, % non-child: 
0.01%)
...
   - CompletionTime: -1665218428.000ns
...
   - TotalThreadsTotalWallClockTime: -1686005515.000ns
 - TotalThreadsSysTime: 0.000ns
 - TotalThreadsUserTime: 2.151ms
...
   - TotalTime: -1691524485.000ns
{code}

For whatever reason, this only affects the averaged fragment profile. For this 
query, there was only one coordinator fragment and thus only one fragment 
instance. It showed normal values:

{code}
Coordinator Fragment F00:
...
 - CompletionTime: 2s629ms
...
 - TotalThreadsTotalWallClockTime: 2s608ms
   - TotalThreadsSysTime: 0.000ns
   - TotalThreadsUserTime: 2.151ms
...
 - TotalTime: 2s603ms
{code}


> Averaged timer profile counters can be negative for trivial queries
> ---
>
> Key: IMPALA-10235
> URL: https://issues.apache.org/jira/browse/IMPALA-10235
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Sahil Takiar
>Priority: Major
>  Labels: newbie, ramp-up
> Attachments: profile-output.txt
>
>
> Steps to reproduce on master:
> {code:java}
> stakiar @ stakiar-desktop -bash ~/Impala 2020-10-13 11:13:02 master
>  [74] → ./bin/impala-shell.sh -q "select sleep(100) from functional.alltypes 
> limit 25" -p > profile-output.txt
> ...
> Query: select sleep(100) from functional.alltypes limit 25
> Query submitted at: 2020-10-13 11:13:07 (Coordinator: 
> http://stakiar-desktop:25000)
> Query progress can be monitored at: 
> http://stakiar-desktop:25000/query_plan?query_id=694f94671571d4d1:cdec9db9
> Fetched 25 row(s) in 2.64s
> {code}
> Attached the contents of {{profile-output.txt}}
> Relevant portion of the profile:
> {code:java}
> Averaged Fragment F00:(Total: 2s603ms, non-child: 272.519us, % non-child: 
> 0.01%)
> ...
>- CompletionTime: -1665218428.000ns
> ...
>- TotalThreadsTotalWallClockTime: -1686005515.000ns
>  - TotalThreadsSysTime: 0.000ns
>  - TotalThreadsUserTime: 2.151ms
> ...
>- TotalTime: -1691524485.000ns
> {code}
> For whatever reason, this only affects the averaged fragment profile. For 
> this query, there was only one coordinator fragment and thus only one 
> fragment instance. The coordinator fragment instance showed normal timer 
> values:
> {code:java}
> Coordinator Fragment F00:
> ...
>  - CompletionTime: 2s629ms
> ...
>  - TotalThreadsTotalWallClockTime: 2s608ms
>- TotalThreadsSysTime: 0.000ns
>- TotalThreadsUserTime: 2.151ms
> ...
>  - 

[jira] [Created] (IMPALA-10235) Averaged timer profile counters can be negative for trivial queries

2020-10-13 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10235:
-

 Summary: Averaged timer profile counters can be negative for 
trivial queries
 Key: IMPALA-10235
 URL: https://issues.apache.org/jira/browse/IMPALA-10235
 Project: IMPALA
  Issue Type: Bug
Reporter: Sahil Takiar
 Attachments: profile-output.txt

Steps to reproduce on master:
{code}
stakiar @ stakiar-desktop -bash ~/Impala 2020-10-13 11:13:02 master
 [74] → ./bin/impala-shell.sh -q "select sleep(100) from functional.alltypes 
limit 25" -p > profile-output.txt
...
Query: select sleep(100) from functional.alltypes limit 25
Query submitted at: 2020-10-13 11:13:07 (Coordinator: 
http://stakiar-desktop:25000)
Query progress can be monitored at: 
http://stakiar-desktop:25000/query_plan?query_id=694f94671571d4d1:cdec9db9
Fetched 25 row(s) in 2.64s
{code}

Attached the contents of {{profile-output.txt}}

Relevant portion of the profile:

{code}
Averaged Fragment F00:(Total: 2s603ms, non-child: 272.519us, % non-child: 
0.01%)
...
   - CompletionTime: -1665218428.000ns
...
   - TotalThreadsTotalWallClockTime: -1686005515.000ns
 - TotalThreadsSysTime: 0.000ns
 - TotalThreadsUserTime: 2.151ms
...
   - TotalTime: -1691524485.000ns
{code}

For whatever reason, this only affects the averaged fragment profile. For this 
query, there was only one coordinator fragment and thus only one fragment 
instance. It showed normal values:

{code}
Coordinator Fragment F00:
...
 - CompletionTime: 2s629ms
...
 - TotalThreadsTotalWallClockTime: 2s608ms
   - TotalThreadsSysTime: 0.000ns
   - TotalThreadsUserTime: 2.151ms
...
 - TotalTime: 2s603ms
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8925) Consider replacing ClientRequestState ResultCache with result spooling

2020-10-12 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-8925.
--
Resolution: Later

This would be nice to have, but not seeing a strong reason to do this at the 
moment. So closing as "Later".

> Consider replacing ClientRequestState ResultCache with result spooling
> --
>
> Key: IMPALA-8925
> URL: https://issues.apache.org/jira/browse/IMPALA-8925
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend, Clients
>Reporter: Sahil Takiar
>Priority: Minor
>
> The {{ClientRequestState}} maintains an internal results cache (which is 
> really just a {{QueryResultSet}}) in order to provide support for the 
> {{TFetchOrientation.FETCH_FIRST}} fetch orientation (used by Hue - see 
> [https://github.com/apache/impala/commit/6b769d011d2016a73483f63b311e108d17d9a083]).
> The cache itself has some limitations:
>  * It caches all results in a {{QueryResultSet}} with limited admission 
> control integration
>  * It has a max size, if the size is exceeded the cache is emptied
>  * It cannot spill to disk
> Result spooling could potentially replace the query result cache and provide 
> a few benefits; it should be able to fit more rows since it can spill to 
> disk. The memory is better tracked as well since it integrates with both 
> admitted and reserved memory. Hue currently sets the max result set fetch 
> size to 
> [https://github.com/cloudera/hue/blob/master/apps/impala/src/impala/impala_flags.py#L61],
>  would be good to check how well that value works for Hue users so we can 
> decide if replacing the current result cache with result spooling makes sense.
> This would require some changes to result spooling as well, currently it 
> discards rows whenever it reads them from the underlying 
> {{BufferedTupleStream}}. It would need the ability to reset the read cursor, 
> which would require some changes to the {{PlanRootSink}} interface as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10055) DCHECK was hit while executing e2e test TestQueries::test_subquery

2020-10-12 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212499#comment-17212499
 ] 

Sahil Takiar commented on IMPALA-10055:
---

Saw this again recently, any plans for a fix?

> DCHECK was hit while executing e2e test TestQueries::test_subquery
> --
>
> Key: IMPALA-10055
> URL: https://issues.apache.org/jira/browse/IMPALA-10055
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.0
>Reporter: Attila Jeges
>Assignee: Zoltán Borók-Nagy
>Priority: Blocker
>  Labels: broken-build, crash, flaky
> Fix For: Impala 4.0
>
>
> A DCHECK was hit while executing e2e test. Time frame suggests that it 
> possibly happened while executing TestQueries::test_subquery:
> {code}
> query_test/test_queries.py:149: in test_subquery
> self.run_test_case('QueryTest/subquery', vector)
> common/impala_test_suite.py:662: in run_test_case
> result = exec_fn(query, user=test_section.get('USER', '').strip() or None)
> common/impala_test_suite.py:600: in __exec_in_impala
> result = self.__execute_query(target_impalad_client, query, user=user)
> common/impala_test_suite.py:909: in __execute_query
> return impalad_client.execute(query, user=user)
> common/impala_connection.py:334: in execute
> r = self.__fetch_results(handle, profile_format=profile_format)
> common/impala_connection.py:436: in __fetch_results
> result_tuples = cursor.fetchall()
> /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/infra/python/env-gcc7.5.0/lib/python2.7/site-packages/impala/hiveserver2.py:532:
>  in fetchall
> self._wait_to_finish()
> /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/infra/python/env-gcc7.5.0/lib/python2.7/site-packages/impala/hiveserver2.py:405:
>  in _wait_to_finish
> resp = self._last_operation._rpc('GetOperationStatus', req)
> /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/infra/python/env-gcc7.5.0/lib/python2.7/site-packages/impala/hiveserver2.py:992:
>  in _rpc
> response = self._execute(func_name, request)
> /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/infra/python/env-gcc7.5.0/lib/python2.7/site-packages/impala/hiveserver2.py:1023:
>  in _execute
> .format(self.retries))
> E   HiveServer2Error: Failed after retrying 3 times
> {code}
> impalad log:
> {code}
> Log file created at: 2020/08/05 17:34:30
> Running on machine: 
> impala-ec2-centos74-m5-4xlarge-ondemand-18a5.vpc.cloudera.com
> Log line format: [IWEF]mmdd hh:mm:ss.uu threadid file:line] msg
> F0805 17:34:30.003247 10887 orc-column-readers.cc:423] 
> c34e87376f496a53:7ba6a2e40002] Check failed: (scanner_->row_batches_nee
> d_validation_ && scanner_->scan_node_->IsZeroSlotTableScan()) || 
> scanner_->acid_original_file
> {code}
> Stack trace:
> {code}
> CORE: ./fe/core.1596674070.14179.impalad
> BINARY: ./be/build/latest/service/impalad
> Core was generated by 
> `/data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/build/lat'.
> Program terminated with signal SIGABRT, Aborted.
> #0  0x7efd6ec6e1f7 in raise () from /lib64/libc.so.6
> To enable execution of this file add
>   add-auto-load-safe-path 
> /data0/jenkins/workspace/impala-cdpd-master-core-ubsan/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib64/libstdc++.so.6.0.24-gdb.py
> line to your configuration file "/var/lib/jenkins/.gdbinit".
> To completely disable this security protection add
>   set auto-load safe-path /
> line to your configuration file "/var/lib/jenkins/.gdbinit".
> For more information about this security protection see the
> "Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
>   info "(gdb)Auto-loading safe path"
> #0  0x7efd6ec6e1f7 in raise () from /lib64/libc.so.6
> #1  0x7efd6ec6f8e8 in abort () from /lib64/libc.so.6
> #2  0x086b8ea4 in google::DumpStackTraceAndExit() ()
> #3  0x086ae25d in google::LogMessage::Fail() ()
> #4  0x086afb4d in google::LogMessage::SendToLog() ()
> #5  0x086adbbb in google::LogMessage::Flush() ()
> #6  0x086b17b9 in google::LogMessageFatal::~LogMessageFatal() ()
> #7  0x0388e10a in impala::OrcStructReader::TopLevelReadValueBatch 
> (this=0x61162630, scratch_batch=0x824831e0, pool=0x82483258) at 
> /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/exec/orc-column-readers.cc:421
> #8  0x03810c92 in impala::HdfsOrcScanner::TransferTuples 
> (this=0x27143c00, dst_batch=0x2e5ca820) at 
> /data/jenkins/workspace/impala-cdpd-master-core-ubsan/repos/Impala/be/src/exec/hdfs-orc-scanner.cc:808
> #9  0x03814e2a in impala::HdfsOrcScanner::AssembleRows 
> 

[jira] [Assigned] (IMPALA-9485) Enable file handle cache for EC files

2020-10-09 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar reassigned IMPALA-9485:


Assignee: Sahil Takiar

> Enable file handle cache for EC files
> -
>
> Key: IMPALA-9485
> URL: https://issues.apache.org/jira/browse/IMPALA-9485
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> Now that HDFS-14308 has been fixed, we can re-enable the file handle cache 
> for EC files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-9485) Enable file handle cache for EC files

2020-10-09 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-9485.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Enable file handle cache for EC files
> -
>
> Key: IMPALA-9485
> URL: https://issues.apache.org/jira/browse/IMPALA-9485
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
>
> Now that HDFS-14308 has been fixed, we can re-enable the file handle cache 
> for EC files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10028) Additional optimizations of Impala docker container sizes

2020-10-08 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17210512#comment-17210512
 ] 

Sahil Takiar commented on IMPALA-10028:
---

For reference, here is what the new Docker image sizes look like:
{code}
impalad_coordinator   latest  a54e3f5b73b22 days ago
  770MB
impalad_coord_execlatest  6eedba64cb422 days ago
  770MB
impalad_executor  latest  65998abf9cac2 days ago
  685MB
{code}

> Additional optimizations of Impala docker container sizes
> -
>
> Key: IMPALA-10028
> URL: https://issues.apache.org/jira/browse/IMPALA-10028
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
>
> There are some more optimizations we can make to get the images to be even 
> smaller. It looks like we may have regressed with regards to image size as 
> well. IMPALA-8425 reports the images at ~700 MB. I just checked on a release 
> build and they are currently 1.01 GB.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-10028) Additional optimizations of Impala docker container sizes

2020-10-08 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-10028.
---
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Additional optimizations of Impala docker container sizes
> -
>
> Key: IMPALA-10028
> URL: https://issues.apache.org/jira/browse/IMPALA-10028
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
>
> There are some more optimizations we can make to get the images to be even 
> smaller. It looks like we may have regressed with regards to image size as 
> well. IMPALA-8425 reports the images at ~700 MB. I just checked on a release 
> build and they are currently 1.01 GB.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10028) Additional optimizations of Impala docker container sizes

2020-10-08 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17210510#comment-17210510
 ] 

Sahil Takiar commented on IMPALA-10028:
---

Not planning to tackle IMPALA-10068 anytime soon, so moving it out and closing 
this JIRA as all other subtasks have been completed.

> Additional optimizations of Impala docker container sizes
> -
>
> Key: IMPALA-10028
> URL: https://issues.apache.org/jira/browse/IMPALA-10028
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> There are some more optimizations we can make to get the images to be even 
> smaller. It looks like we may have regressed with regards to image size as 
> well. IMPALA-8425 reports the images at ~700 MB. I just checked on a release 
> build and they are currently 1.01 GB.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10068) Split out jars for catalog Docker images

2020-10-08 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated IMPALA-10068:
--
Parent: (was: IMPALA-10028)
Issue Type: Task  (was: Sub-task)

> Split out jars for catalog Docker images
> 
>
> Key: IMPALA-10068
> URL: https://issues.apache.org/jira/browse/IMPALA-10068
> Project: IMPALA
>  Issue Type: Task
>Reporter: Sahil Takiar
>Priority: Major
>
> One way to decrease the size of the catalogd images is to only include jar 
> files necessary to run the catalogd. Currently, all Impala coordiantor / 
> executor jars are included in the catalogd images, which is not necessary.
> This can be fixed by splitting the fe/ Java code into fe/ and catalogd/ 
> folders (and perhaps a  java-common/ folder). This is probably a nice 
> improvement to make regardless because the fe and catalogd code should really 
> be in separate Maven modules. By separating all catalogd code into a separate 
> Maven module it should be easy to modify the Docker built scripts to only 
> copy in the catalogd jars for the catalogd Impala image.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Closed] (IMPALA-10016) Split jars for Impala executor and coordinator Docker images

2020-10-08 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar closed IMPALA-10016.
-
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Split jars for Impala executor and coordinator Docker images
> 
>
> Key: IMPALA-10016
> URL: https://issues.apache.org/jira/browse/IMPALA-10016
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
>
> Impala executors and coordinator currently have a common base images. The 
> base image defines a set of jar files needed by either the coordinator or the 
> executor. In order to reduce the image size, we should split out the jars 
> into two categories: those necessary for the coordinator and those necessary 
> for the executor. This should help reduce overall image size.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10217) test_runtime_filters.TestMinMaxFilters.test_decimal_min_max_filters is flaky

2020-10-05 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated IMPALA-10217:
--
Attachment: profile.txt

> test_runtime_filters.TestMinMaxFilters.test_decimal_min_max_filters is flaky
> 
>
> Key: IMPALA-10217
> URL: https://issues.apache.org/jira/browse/IMPALA-10217
> Project: IMPALA
>  Issue Type: Test
>Reporter: Sahil Takiar
>Priority: Major
> Attachments: profile.txt
>
>
> Seen this a few times in exhaustive builds:
> {code}
> query_test.test_runtime_filters.TestMinMaxFilters.test_decimal_min_max_filters[protocol:
>  beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> kudu/none] (from pytest)
> query_test/test_runtime_filters.py:231: in test_decimal_min_max_filters
> test_file_vars={'$RUNTIME_FILTER_WAIT_TIME_MS': str(WAIT_TIME_MS)})
> common/impala_test_suite.py:718: in run_test_case
> update_section=pytest.config.option.update_results)
> common/test_result_verifier.py:627: in verify_runtime_profile
> % (function, field, expected_value, actual_value, actual))
> E   AssertionError: Aggregation of SUM over ProbeRows did not match expected 
> results.
> E   EXPECTED VALUE:
> E   102
> E   
> E   ACTUAL VALUE:
> E   38
> E   
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10217) test_runtime_filters.TestMinMaxFilters.test_decimal_min_max_filters is flaky

2020-10-05 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208176#comment-17208176
 ] 

Sahil Takiar commented on IMPALA-10217:
---

Attached the full runtime profile dumped by the test failure.

> test_runtime_filters.TestMinMaxFilters.test_decimal_min_max_filters is flaky
> 
>
> Key: IMPALA-10217
> URL: https://issues.apache.org/jira/browse/IMPALA-10217
> Project: IMPALA
>  Issue Type: Test
>Reporter: Sahil Takiar
>Priority: Major
> Attachments: profile.txt
>
>
> Seen this a few times in exhaustive builds:
> {code}
> query_test.test_runtime_filters.TestMinMaxFilters.test_decimal_min_max_filters[protocol:
>  beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> kudu/none] (from pytest)
> query_test/test_runtime_filters.py:231: in test_decimal_min_max_filters
> test_file_vars={'$RUNTIME_FILTER_WAIT_TIME_MS': str(WAIT_TIME_MS)})
> common/impala_test_suite.py:718: in run_test_case
> update_section=pytest.config.option.update_results)
> common/test_result_verifier.py:627: in verify_runtime_profile
> % (function, field, expected_value, actual_value, actual))
> E   AssertionError: Aggregation of SUM over ProbeRows did not match expected 
> results.
> E   EXPECTED VALUE:
> E   102
> E   
> E   ACTUAL VALUE:
> E   38
> E   
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10217) test_runtime_filters.TestMinMaxFilters.test_decimal_min_max_filters is flaky

2020-10-05 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208175#comment-17208175
 ] 

Sahil Takiar commented on IMPALA-10217:
---

Might be a recurrence of IMPALA-8064

> test_runtime_filters.TestMinMaxFilters.test_decimal_min_max_filters is flaky
> 
>
> Key: IMPALA-10217
> URL: https://issues.apache.org/jira/browse/IMPALA-10217
> Project: IMPALA
>  Issue Type: Test
>Reporter: Sahil Takiar
>Priority: Major
>
> Seen this a few times in exhaustive builds:
> {code}
> query_test.test_runtime_filters.TestMinMaxFilters.test_decimal_min_max_filters[protocol:
>  beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> kudu/none] (from pytest)
> query_test/test_runtime_filters.py:231: in test_decimal_min_max_filters
> test_file_vars={'$RUNTIME_FILTER_WAIT_TIME_MS': str(WAIT_TIME_MS)})
> common/impala_test_suite.py:718: in run_test_case
> update_section=pytest.config.option.update_results)
> common/test_result_verifier.py:627: in verify_runtime_profile
> % (function, field, expected_value, actual_value, actual))
> E   AssertionError: Aggregation of SUM over ProbeRows did not match expected 
> results.
> E   EXPECTED VALUE:
> E   102
> E   
> E   ACTUAL VALUE:
> E   38
> E   
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10217) test_runtime_filters.TestMinMaxFilters.test_decimal_min_max_filters is flaky

2020-10-05 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10217:
-

 Summary: 
test_runtime_filters.TestMinMaxFilters.test_decimal_min_max_filters is flaky
 Key: IMPALA-10217
 URL: https://issues.apache.org/jira/browse/IMPALA-10217
 Project: IMPALA
  Issue Type: Test
Reporter: Sahil Takiar


Seen this a few times in exhaustive builds:
{code}
query_test.test_runtime_filters.TestMinMaxFilters.test_decimal_min_max_filters[protocol:
 beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 'abort_on_error': 
1, 'exec_single_node_rows_threshold': 0} | table_format: kudu/none] (from 
pytest)

query_test/test_runtime_filters.py:231: in test_decimal_min_max_filters
test_file_vars={'$RUNTIME_FILTER_WAIT_TIME_MS': str(WAIT_TIME_MS)})
common/impala_test_suite.py:718: in run_test_case
update_section=pytest.config.option.update_results)
common/test_result_verifier.py:627: in verify_runtime_profile
% (function, field, expected_value, actual_value, actual))
E   AssertionError: Aggregation of SUM over ProbeRows did not match expected 
results.
E   EXPECTED VALUE:
E   102
E   
E   ACTUAL VALUE:
E   38
E   
{code}





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10216) BufferPoolTest.WriteErrorBlacklistCompression is flaky on UBSAN builds

2020-10-05 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10216:
-

 Summary: BufferPoolTest.WriteErrorBlacklistCompression is flaky on 
UBSAN builds
 Key: IMPALA-10216
 URL: https://issues.apache.org/jira/browse/IMPALA-10216
 Project: IMPALA
  Issue Type: Test
Reporter: Sahil Takiar


Only seen this once so far:

{code}
BufferPoolTest.WriteErrorBlacklistCompression

Error Message
Value of: FindPageInDir(pages[NO_ERROR_QUERY], error_dir) != NULL
  Actual: false
Expected: true

Stacktrace

Impala/be/src/runtime/bufferpool/buffer-pool-test.cc:1764
Value of: FindPageInDir(pages[NO_ERROR_QUERY], error_dir) != NULL
  Actual: false
Expected: true
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9355) TestExchangeMemUsage.test_exchange_mem_usage_scaling doesn't hit the memory limit

2020-10-05 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208155#comment-17208155
 ] 

Sahil Takiar commented on IMPALA-9355:
--

Saw this again in an UBSAN build.

> TestExchangeMemUsage.test_exchange_mem_usage_scaling doesn't hit the memory 
> limit
> -
>
> Key: IMPALA-9355
> URL: https://issues.apache.org/jira/browse/IMPALA-9355
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Fang-Yu Rao
>Assignee: Joe McDonnell
>Priority: Critical
>  Labels: broken-build, flaky
>
> The EE test {{test_exchange_mem_usage_scaling}} failed because the query at 
> [https://github.com/apache/impala/blame/master/testdata/workloads/functional-query/queries/QueryTest/exchange-mem-scaling.test#L7-L15]
>  does not hit the specified memory limit (170m) at 
> [https://github.com/apache/impala/blame/master/testdata/workloads/functional-query/queries/QueryTest/exchange-mem-scaling.test#L7].
>  We may need to further reduce the specified limit. In what follows the error 
> message is also given. Recall that the same issue occurred at 
> https://issues.apache.org/jira/browse/IMPALA-7873 but was resolved.
> {code:java}
> FAIL 
> query_test/test_mem_usage_scaling.py::TestExchangeMemUsage::()::test_exchange_mem_usage_scaling[protocol:
>  beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> parquet/none]
> === FAILURES 
> ===
>  TestExchangeMemUsage.test_exchange_mem_usage_scaling[protocol: beeswax | 
> exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> parquet/none] 
> [gw3] linux2 -- Python 2.7.12 
> /home/ubuntu/Impala/bin/../infra/python/env/bin/python
> query_test/test_mem_usage_scaling.py:386: in test_exchange_mem_usage_scaling
> self.run_test_case('QueryTest/exchange-mem-scaling', vector)
> common/impala_test_suite.py:674: in run_test_case
> expected_str, query)
> E   AssertionError: Expected exception: Memory limit exceeded
> E   
> E   when running:
> E   
> E   set mem_limit=170m;
> E   set num_scanner_threads=1;
> E   select *
> E   from tpch_parquet.lineitem l1
> E join tpch_parquet.lineitem l2 on l1.l_orderkey = l2.l_orderkey and
> E l1.l_partkey = l2.l_partkey and l1.l_suppkey = l2.l_suppkey
> E and l1.l_linenumber = l2.l_linenumber
> E   order by l1.l_orderkey desc, l1.l_partkey, l1.l_suppkey, l1.l_linenumber
> E   limit 5
> {code}
> [~tarmstr...@cloudera.com] and [~joemcdonnell] reviewed the patch at 
> [https://gerrit.cloudera.org/c/11965/]. Assign this JIRA to [~joemcdonnell] 
> for now. Please re-assign the JIRA to others as appropriate. Thanks!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10214) Ozone support for file handle cache

2020-10-05 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10214:
-

 Summary: Ozone support for file handle cache
 Key: IMPALA-10214
 URL: https://issues.apache.org/jira/browse/IMPALA-10214
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Sahil Takiar


This is dependent on the Ozone input streams supporting the {{CanUnbuffer}} 
interface first (last I checked, the input streams don't implement the 
interface).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-10202) Enable file handle cache for ABFS files

2020-10-02 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-10202.
---
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Enable file handle cache for ABFS files
> ---
>
> Key: IMPALA-10202
> URL: https://issues.apache.org/jira/browse/IMPALA-10202
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
>
> We should enable the file handle cache for ABFS, we have already seen it 
> benefit jobs that read data from S3A.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-9606) ABFS reads should use hdfsPreadFully

2020-10-01 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-9606.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> ABFS reads should use hdfsPreadFully
> 
>
> Key: IMPALA-9606
> URL: https://issues.apache.org/jira/browse/IMPALA-9606
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
>
> In IMPALA-8525, hdfs preads were enabled by default when reading data from 
> S3. IMPALA-8525 deferred enabling preads for ABFS because they didn't 
> significantly improve performance. After some more investigation into the 
> ABFS input streams, I think it is safe to use {{hdfsPreadFully}} for ABFS 
> reads.
> The ABFS client uses a different model for fetching data compared to S3A. 
> Details are beyond the scope of this JIRA, but it is related to a feature in 
> ABFS called "read-aheads". ABFS has logic to pre-fetch data it *thinks* will 
> be required by the client. By default, it pre-fetches # cores * 4 MB of data. 
> If the requested data exists in the client cache, it is read from the cache.
> However, there is no real drawback to using {{hdfsPreadFully}} for ABFS 
> reads. It's definitely safer, because while the current implementation of 
> ABFS always returns the amount of requested data, only the {{hdfsPreadFully}} 
> API makes that guarantee.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-3335) Allow single-node optimization with joins.

2020-10-01 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-3335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar reassigned IMPALA-3335:


Assignee: Sahil Takiar

> Allow single-node optimization with joins.
> --
>
> Key: IMPALA-3335
> URL: https://issues.apache.org/jira/browse/IMPALA-3335
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 2.5.0
>Reporter: Alexander Behm
>Assignee: Sahil Takiar
>Priority: Minor
>  Labels: ramp-up
>
> Now that IMPALA-561 has been fixed, we can remove the workaround that 
> disables the our single-node optimization for any plan with joins. See 
> MaxRowsProcessedVisitor.java:
> {code}
> } else if (caller instanceof HashJoinNode || caller instanceof 
> NestedLoopJoinNode) {
>   // Revisit when multiple scan nodes can be executed in a single fragment, 
> IMPALA-561
>   abort_ = true;
>   return;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-3335) Allow single-node optimization with joins.

2020-10-01 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-3335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-3335.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Allow single-node optimization with joins.
> --
>
> Key: IMPALA-3335
> URL: https://issues.apache.org/jira/browse/IMPALA-3335
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 2.5.0
>Reporter: Alexander Behm
>Assignee: Sahil Takiar
>Priority: Minor
>  Labels: ramp-up
> Fix For: Impala 4.0
>
>
> Now that IMPALA-561 has been fixed, we can remove the workaround that 
> disables the our single-node optimization for any plan with joins. See 
> MaxRowsProcessedVisitor.java:
> {code}
> } else if (caller instanceof HashJoinNode || caller instanceof 
> NestedLoopJoinNode) {
>   // Revisit when multiple scan nodes can be executed in a single fragment, 
> IMPALA-561
>   abort_ = true;
>   return;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10202) Enable file handle cache for ABFS files

2020-10-01 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10202:
-

 Summary: Enable file handle cache for ABFS files
 Key: IMPALA-10202
 URL: https://issues.apache.org/jira/browse/IMPALA-10202
 Project: IMPALA
  Issue Type: Improvement
Reporter: Sahil Takiar
Assignee: Sahil Takiar


We should enable the file handle cache for ABFS, we have already seen it 
benefit jobs that read data from S3A.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8577) Crash during OpenSSLSocket.read

2020-09-28 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-8577.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

This was fixed a while ago. Impala has been using wildfly for communication 
with S3 for a while now and everything seems stable.

> Crash during OpenSSLSocket.read
> ---
>
> Key: IMPALA-8577
> URL: https://issues.apache.org/jira/browse/IMPALA-8577
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.3.0
>Reporter: David Rorke
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
> Attachments: 5ca78771-ad78-4a29-31f88aa6-9bfac38c.dmp, 
> hs_err_pid6313.log, 
> impalad.drorke-impala-r5d2xl2-30w-17.vpc.cloudera.com.impala.log.ERROR.20190521-103105.6313,
>  
> impalad.drorke-impala-r5d2xl2-30w-17.vpc.cloudera.com.impala.log.INFO.20190521-103105.6313
>
>
> Impalad crashed while running a TPC-DS 10 TB run against S3.   Excerpt from 
> the stack trace (hs_err log file attached with more complete stack):
> {noformat}
> Stack: [0x7f3d095bc000,0x7f3d09dbc000],  sp=0x7f3d09db9050,  free 
> space=8180k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native 
> code)
> C  [impalad+0x2528a33]  
> tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*,
>  unsigned long, int)+0x133
> C  [impalad+0x2528e0f]  tcmalloc::ThreadCache::Scavenge()+0x3f
> C  [impalad+0x266468a]  operator delete(void*)+0x32a
> C  [libcrypto.so.10+0x6e70d]  CRYPTO_free+0x1d
> J 5709  org.wildfly.openssl.SSLImpl.freeBIO0(J)V (0 bytes) @ 
> 0x7f3d4dadf9f9 [0x7f3d4dadf940+0xb9]
> J 5708 C1 org.wildfly.openssl.SSLImpl.freeBIO(J)V (5 bytes) @ 
> 0x7f3d4dfd0dfc [0x7f3d4dfd0d80+0x7c]
> J 5158 C1 org.wildfly.openssl.OpenSSLEngine.shutdown()V (78 bytes) @ 
> 0x7f3d4de4fe2c [0x7f3d4de4f720+0x70c]
> J 5758 C1 org.wildfly.openssl.OpenSSLEngine.closeInbound()V (51 bytes) @ 
> 0x7f3d4de419cc [0x7f3d4de417c0+0x20c]
> J 2994 C2 
> org.wildfly.openssl.OpenSSLEngine.unwrap(Ljava/nio/ByteBuffer;[Ljava/nio/ByteBuffer;II)Ljavax/net/ssl/SSLEngineResult;
>  (892 bytes) @ 0x7f3d4db8da34 [0x7f3d4db8c900+0x1134]
> J 3161 C2 org.wildfly.openssl.OpenSSLSocket.read([BII)I (810 bytes) @ 
> 0x7f3d4dd64cb0 [0x7f3d4dd646c0+0x5f0]
> J 5090 C2 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.fillBuffer()I
>  (97 bytes) @ 0x7f3d4ddd9ee0 [0x7f3d4ddd9e40+0xa0]
> J 5846 C1 
> com.amazonaws.thirdparty.apache.http.impl.BHttpConnectionBase.fillInputBuffer(I)I
>  (48 bytes) @ 0x7f3d4d7acb24 [0x7f3d4d7ac7a0+0x384]
> J 5845 C1 
> com.amazonaws.thirdparty.apache.http.impl.BHttpConnectionBase.isStale()Z (31 
> bytes) @ 0x7f3d4d7ad49c [0x7f3d4d7ad220+0x27c]
> {noformat}
> The crash may not be easy to reproduce.  I've run this test multiple times 
> and only crashed once.   I have a core file if needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10191) Test impalad_coordinator and impalad_executor in Dockerized tests

2020-09-24 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10191:
-

 Summary: Test impalad_coordinator and impalad_executor in 
Dockerized tests
 Key: IMPALA-10191
 URL: https://issues.apache.org/jira/browse/IMPALA-10191
 Project: IMPALA
  Issue Type: Improvement
Reporter: Sahil Takiar


Currently only the impalad_coord_exec images are tested in the Dockerized 
tests, it would be nice to get test coverage for the other images as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10190) Remove impalad_coord_exec Dockerfile

2020-09-24 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10190:
-

 Summary: Remove impalad_coord_exec Dockerfile
 Key: IMPALA-10190
 URL: https://issues.apache.org/jira/browse/IMPALA-10190
 Project: IMPALA
  Issue Type: Improvement
Reporter: Sahil Takiar


The impalad_coord_exec Dockerfile is a bit redundant because it basically 
contains all the same dependencies as the impalad_coordinator Dockerfile. The 
only different between the two files is that the startup flags for 
impalad_coordinator contain {{is_executor=false}}. We should find a way to 
remove the {{impalad_coord_exec}} altogether.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-10170) Data race on Webserver::UrlHandler::is_on_nav_bar_

2020-09-24 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-10170.
---
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Data race on Webserver::UrlHandler::is_on_nav_bar_
> --
>
> Key: IMPALA-10170
> URL: https://issues.apache.org/jira/browse/IMPALA-10170
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
>
> {code}
> WARNING: ThreadSanitizer: data race (pid=31102)
>   Read of size 1 at 0x7b2c0006e3b0 by thread T42:
> #0 impala::Webserver::UrlHandler::is_on_nav_bar() const 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.h:152:41
>  (impalad+0x256ff39)
> #1 
> impala::Webserver::GetCommonJson(rapidjson::GenericDocument,
>  rapidjson::MemoryPoolAllocator, 
> rapidjson::CrtAllocator>*, sq_connection const*, 
> kudu::WebCallbackRegistry::WebRequest const&) 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.cc:527:24
>  (impalad+0x256be13)
> #2 impala::Webserver::RenderUrlWithTemplate(sq_connection const*, 
> kudu::WebCallbackRegistry::WebRequest const&, impala::Webserver::UrlHandler 
> const&, std::__cxx11::basic_stringstream, 
> std::allocator >*, impala::ContentType*) 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.cc:816:3
>  (impalad+0x256e882)
> #3 impala::Webserver::BeginRequestCallback(sq_connection*, 
> sq_request_info*) 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.cc:714:5
>  (impalad+0x256cfbb)
> #4 impala::Webserver::BeginRequestCallbackStatic(sq_connection*) 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.cc:556:20
>  (impalad+0x256ba98)
> #5 handle_request  (impalad+0x2582d59)
>   Previous write of size 2 at 0x7b2c0006e3b0 by main thread:
> #0 
> impala::Webserver::UrlHandler::UrlHandler(impala::Webserver::UrlHandler&&) 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.h:141:9
>  (impalad+0x2570dbc)
> #1 std::pair, 
> std::allocator > const, 
> impala::Webserver::UrlHandler>::pair std::char_traits, std::allocator >, 
> impala::Webserver::UrlHandler, 
> true>(std::pair, 
> std::allocator >, impala::Webserver::UrlHandler>&&) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/stl_pair.h:362:4
>  (impalad+0x25738b3)
> #2 void 
> __gnu_cxx::new_allocator  std::char_traits, std::allocator > const, 
> impala::Webserver::UrlHandler> > 
> >::construct std::char_traits, std::allocator > const, 
> impala::Webserver::UrlHandler>, std::pair std::char_traits, std::allocator >, 
> impala::Webserver::UrlHandler> >(std::pair std::char_traits, std::allocator > const, 
> impala::Webserver::UrlHandler>*, std::pair std::char_traits, std::allocator >, 
> impala::Webserver::UrlHandler>&&) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/ext/new_allocator.h:136:23
>  (impalad+0x2573848)
> #3 void 
> std::allocator_traits  std::char_traits, std::allocator > const, 
> impala::Webserver::UrlHandler> > > 
> >::construct std::char_traits, std::allocator > const, 
> impala::Webserver::UrlHandler>, std::pair std::char_traits, std::allocator >, 
> impala::Webserver::UrlHandler> 
> >(std::allocator  std::char_traits, std::allocator > const, 
> impala::Webserver::UrlHandler> > >&, 
> std::pair, 
> std::allocator > const, impala::Webserver::UrlHandler>*, 
> std::pair, 
> std::allocator >, impala::Webserver::UrlHandler>&&) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/alloc_traits.h:475:8
>  (impalad+0x25737f1)
> #4 void std::_Rb_tree std::char_traits, std::allocator >, 
> std::pair, 
> std::allocator > const, impala::Webserver::UrlHandler>, 
> std::_Select1st std::char_traits, std::allocator > const, 
> impala::Webserver::UrlHandler> >, std::less std::char_traits, std::allocator > >, 
> std::allocator std::char_traits, std::allocator > const, 
> impala::Webserver::UrlHandler> > 
> >::_M_construct_node std::char_traits, std::allocator >, 
> impala::Webserver::UrlHandler> 
> >(std::_Rb_tree_node std::char_traits, std::allocator > const, 
> impala::Webserver::UrlHandler> >*, std::pair std::char_traits, std::allocator >, 
> impala::Webserver::UrlHandler>&&) 
> 

[jira] [Commented] (IMPALA-10183) Hit promise DCHECK while looping result spooling tests

2020-09-23 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17201044#comment-17201044
 ] 

Sahil Takiar commented on IMPALA-10183:
---

Thanks for reporting and fixing this!

> Hit promise DCHECK while looping result spooling tests
> --
>
> Key: IMPALA-10183
> URL: https://issues.apache.org/jira/browse/IMPALA-10183
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Quanlong Huang
>Priority: Major
> Attachments: impalad.ERROR.gz, impalad.FATAL.gz, impalad.INFO.gz
>
>
> {noformat}
> while impala-py.test tests/query_test/test_result_spooling.py -n4 ; do date; 
> done
> {noformat}
> {noformat}
> F0921 10:14:35.355281  5842 promise.h:61] Check failed: mode == 
> PromiseMode::MULTIPLE_PRODUCER [ mode = 0 , PromiseM
> ode::MULTIPLE_PRODUCER = 1 ]Called Set(..) twice on the same Promise in 
> SINGLE_PRODUCER mode
> *** Check failure stack trace: ***
> @  0x52087fc  google::LogMessage::Fail()
> @  0x520a0ec  google::LogMessage::SendToLog()
> @  0x520815a  google::LogMessage::Flush()
> @  0x520bd58  google::LogMessageFatal::~LogMessageFatal()
> @  0x223cc50  impala::Promise<>::Set()
> @  0x293f21d  impala::BufferedPlanRootSink::Cancel()
> @  0x2317856  impala::FragmentInstanceState::Cancel()
> @  0x2284c62  impala::QueryState::Cancel()
> @  0x2464728  impala::ControlService::CancelQueryFInstances()
> @  0x253df37  
> _ZZN6impala16ControlServiceIfC4ERK13scoped_refptrIN4kudu12MetricEntityEERKS1_INS2_3rpc13Re
> sultTrackerEEENKUlPKN6google8protobuf7MessageEPSE_PNS7_10RpcContextEE4_clESG_SH_SJ_
> @  0x253fb65  
> _ZNSt17_Function_handlerIFvPKN6google8protobuf7MessageEPS2_PN4kudu3rpc10RpcContextEEZN6imp
> ala16ControlServiceIfC4ERK13scoped_refptrINS6_12MetricEntityEERKSD_INS7_13ResultTrackerEEEUlS4_S5_S9_E4_E9_M_invokeE
> RKSt9_Any_dataOS4_OS5_OS9_
> @  0x2c9612f  std::function<>::operator()()
> @  0x2c95ade  kudu::rpc::GeneratedServiceIf::Handle()
> @  0x21d8c55  impala::ImpalaServicePool::RunThread()
> @  0x21de836  boost::_mfi::mf0<>::operator()()
> @  0x21de468  boost::_bi::list1<>::operator()<>()
> @  0x21de02e  boost::_bi::bind_t<>::operator()()
> @  0x21ddaa5  
> boost::detail::function::void_function_obj_invoker0<>::invoke()
> @  0x2140b55  boost::function0<>::operator()()
> @  0x271e1a9  impala::Thread::SuperviseThread()
> @  0x2726146  boost::_bi::list5<>::operator()<>()
> @  0x272606a  boost::_bi::bind_t<>::operator()()
> @  0x272602b  boost::detail::thread_data<>::run()
> @  0x3f0f621  thread_proxy
> @ 0x7f4db3f356da  start_thread
> @ 0x7f4db092ca3e  clone
> Wrote minidump to 
> /home/tarmstrong/Impala/impala/logs/cluster/minidumps/impalad/3204ffe5-6905-4842-d702c395-21c4eca5
> .dmp
> (END)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-9046) Profile counter that indicates if a process or JVM pause occurred

2020-09-22 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-9046.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Profile counter that indicates if a process or JVM pause occurred
> -
>
> Key: IMPALA-9046
> URL: https://issues.apache.org/jira/browse/IMPALA-9046
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 3.3.0
>Reporter: Tim Armstrong
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
>
> We currently log a message if a process or JVM pause is detected but there's 
> no indication in the query profile if it got affected. I suggest that we 
> should:
> * Add metrics that indicate the number and duration of detected pauses
> * Add counters to the backend profile for the deltas in those metrics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-9046) Profile counter that indicates if a process or JVM pause occurred

2020-09-22 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar reassigned IMPALA-9046:


Assignee: Sahil Takiar  (was: Tamas Mate)

> Profile counter that indicates if a process or JVM pause occurred
> -
>
> Key: IMPALA-9046
> URL: https://issues.apache.org/jira/browse/IMPALA-9046
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 3.3.0
>Reporter: Tim Armstrong
>Assignee: Sahil Takiar
>Priority: Major
>
> We currently log a message if a process or JVM pause is detected but there's 
> no indication in the query profile if it got affected. I suggest that we 
> should:
> * Add metrics that indicate the number and duration of detected pauses
> * Add counters to the backend profile for the deltas in those metrics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9870) summary and profile command in impala-shell should show both original and retried info

2020-09-18 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17198604#comment-17198604
 ] 

Sahil Takiar commented on IMPALA-9870:
--

The 'profile' part of this was done in IMPALA-9229, we still need support for 
the 'summary' command.

> summary and profile command in impala-shell should show both original and 
> retried info
> --
>
> Key: IMPALA-9870
> URL: https://issues.apache.org/jira/browse/IMPALA-9870
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Major
>
> If a query is retried, impala-shell still uses the original query handle 
> containing the original query id. Subsequent "summary" and "profile" commands 
> will return results of the original query. We should consider return both the 
> original and retried information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-9229) Link failed and retried runtime profiles

2020-09-18 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-9229.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

Marking as resolved. The Web UI improvements are tracked in a separate JIRA.

> Link failed and retried runtime profiles
> 
>
> Key: IMPALA-9229
> URL: https://issues.apache.org/jira/browse/IMPALA-9229
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Critical
> Fix For: Impala 4.0
>
>
> There should be a way for clients to link the runtime profiles from failed 
> queries to all retry attempts (whether successful or not), and vice versa.
> There are a few ways to do this:
>  * The simplest way would be to include the query id of the retried query in 
> the runtime profile of the failed query, and vice versa; users could then 
> manually create a chain of runtime profiles in order to fetch all failed / 
> successful attempts
>  * Extend TGetRuntimeProfileReq to include an option to fetch all runtime 
> profiles for the given query id + all retry attempts (or add a new Thrift 
> call TGetRetryQueryIds(TQueryId) which returns a list of retried ids for a 
> given query id)
>  * The Impala debug UI should include a simple way to view all the runtime 
> profiles of a query (the failed attempts + all retry attempts) side by side 
> (perhaps the query_profile?query_id profile should include tabs to easily 
> switch between the runtime profiles of each attempt)
> These are not mutually exclusive, and it might be good to stage these changes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10180) Add average size of fetch requests in runtime profile

2020-09-18 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10180:
-

 Summary: Add average size of fetch requests in runtime profile
 Key: IMPALA-10180
 URL: https://issues.apache.org/jira/browse/IMPALA-10180
 Project: IMPALA
  Issue Type: Improvement
  Components: Clients
Reporter: Sahil Takiar


When queries with a high {{ClientFetchWaitTimer}} it would be useful to know 
the average number of rows requested by the client per fetch request. This can 
help determine if setting a higher fetch size would help improve fetch 
performance where the network RTT between the client and Impala is high.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9923) Data loading of TPC-DS ORC fails with "Fail to get checksum"

2020-09-17 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17197766#comment-17197766
 ] 

Sahil Takiar commented on IMPALA-9923:
--

+1 on fixing this soon. Hit this twice in a row on the dryrun job.

> Data loading of TPC-DS ORC fails with "Fail to get checksum"
> 
>
> Key: IMPALA-9923
> URL: https://issues.apache.org/jira/browse/IMPALA-9923
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Reporter: Tim Armstrong
>Assignee: Zoltán Borók-Nagy
>Priority: Critical
>  Labels: broken-build, flaky
> Attachments: load-tpcds-core-hive-generated-orc-def-block.sql, 
> load-tpcds-core-hive-generated-orc-def-block.sql.log
>
>
> {noformat}
> INFO  : Loading data to table tpcds_orc_def.store_sales partition 
> (ss_sold_date_sk=null) from 
> hdfs://localhost:20500/test-warehouse/managed/tpcds.store_sales_orc_def
> INFO  : 
> ERROR : FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.MoveTask. java.io.IOException: Fail to get 
> checksum, since file 
> /test-warehouse/managed/tpcds.store_sales_orc_def/ss_sold_date_sk=2451646/base_003/_orc_acid_version
>  is under construction.
> INFO  : Completed executing 
> command(queryId=ubuntu_20200707055650_a1958916-1e85-4db5-b1bc-cc63d80b3537); 
> Time taken: 14.512 seconds
> INFO  : OK
> Error: Error while compiling statement: FAILED: Execution Error, return code 
> 1 from org.apache.hadoop.hive.ql.exec.MoveTask. java.io.IOException: Fail to 
> get checksum, since file 
> /test-warehouse/managed/tpcds.store_sales_orc_def/ss_sold_date_sk=2451646/base_003/_orc_acid_version
>  is under construction. (state=08S01,code=1)
> java.sql.SQLException: Error while compiling statement: FAILED: Execution 
> Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. 
> java.io.IOException: Fail to get checksum, since file 
> /test-warehouse/managed/tpcds.store_sales_orc_def/ss_sold_date_sk=2451646/base_003/_orc_acid_version
>  is under construction.
>   at 
> org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:401)
>   at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:266)
>   at org.apache.hive.beeline.Commands.executeInternal(Commands.java:1007)
>   at org.apache.hive.beeline.Commands.execute(Commands.java:1217)
>   at org.apache.hive.beeline.Commands.sql(Commands.java:1146)
>   at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1497)
>   at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:1355)
>   at org.apache.hive.beeline.BeeLine.executeFile(BeeLine.java:1329)
>   at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:1127)
>   at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:1082)
>   at 
> org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:546)
>   at org.apache.hive.beeline.BeeLine.main(BeeLine.java:528)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:318)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:232)
> Closing: 0: jdbc:hive2://localhost:11050/default;auth=none
> {noformat}
> https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/11223/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9923) Data loading of TPC-DS ORC fails with "Fail to get checksum"

2020-09-16 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17197281#comment-17197281
 ] 

Sahil Takiar commented on IMPALA-9923:
--

Hit this again: 
https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/12060/console

> Data loading of TPC-DS ORC fails with "Fail to get checksum"
> 
>
> Key: IMPALA-9923
> URL: https://issues.apache.org/jira/browse/IMPALA-9923
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Reporter: Tim Armstrong
>Assignee: Zoltán Borók-Nagy
>Priority: Critical
>  Labels: broken-build, flaky
> Attachments: load-tpcds-core-hive-generated-orc-def-block.sql, 
> load-tpcds-core-hive-generated-orc-def-block.sql.log
>
>
> {noformat}
> INFO  : Loading data to table tpcds_orc_def.store_sales partition 
> (ss_sold_date_sk=null) from 
> hdfs://localhost:20500/test-warehouse/managed/tpcds.store_sales_orc_def
> INFO  : 
> ERROR : FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.MoveTask. java.io.IOException: Fail to get 
> checksum, since file 
> /test-warehouse/managed/tpcds.store_sales_orc_def/ss_sold_date_sk=2451646/base_003/_orc_acid_version
>  is under construction.
> INFO  : Completed executing 
> command(queryId=ubuntu_20200707055650_a1958916-1e85-4db5-b1bc-cc63d80b3537); 
> Time taken: 14.512 seconds
> INFO  : OK
> Error: Error while compiling statement: FAILED: Execution Error, return code 
> 1 from org.apache.hadoop.hive.ql.exec.MoveTask. java.io.IOException: Fail to 
> get checksum, since file 
> /test-warehouse/managed/tpcds.store_sales_orc_def/ss_sold_date_sk=2451646/base_003/_orc_acid_version
>  is under construction. (state=08S01,code=1)
> java.sql.SQLException: Error while compiling statement: FAILED: Execution 
> Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. 
> java.io.IOException: Fail to get checksum, since file 
> /test-warehouse/managed/tpcds.store_sales_orc_def/ss_sold_date_sk=2451646/base_003/_orc_acid_version
>  is under construction.
>   at 
> org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:401)
>   at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:266)
>   at org.apache.hive.beeline.Commands.executeInternal(Commands.java:1007)
>   at org.apache.hive.beeline.Commands.execute(Commands.java:1217)
>   at org.apache.hive.beeline.Commands.sql(Commands.java:1146)
>   at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1497)
>   at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:1355)
>   at org.apache.hive.beeline.BeeLine.executeFile(BeeLine.java:1329)
>   at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:1127)
>   at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:1082)
>   at 
> org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:546)
>   at org.apache.hive.beeline.BeeLine.main(BeeLine.java:528)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:318)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:232)
> Closing: 0: jdbc:hive2://localhost:11050/default;auth=none
> {noformat}
> https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/11223/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10170) Data race on Webserver::UrlHandler::is_on_nav_bar_

2020-09-16 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10170:
-

 Summary: Data race on Webserver::UrlHandler::is_on_nav_bar_
 Key: IMPALA-10170
 URL: https://issues.apache.org/jira/browse/IMPALA-10170
 Project: IMPALA
  Issue Type: Sub-task
  Components: Backend
Reporter: Sahil Takiar
Assignee: Sahil Takiar


{code}
WARNING: ThreadSanitizer: data race (pid=31102)
  Read of size 1 at 0x7b2c0006e3b0 by thread T42:
#0 impala::Webserver::UrlHandler::is_on_nav_bar() const 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.h:152:41
 (impalad+0x256ff39)
#1 
impala::Webserver::GetCommonJson(rapidjson::GenericDocument,
 rapidjson::MemoryPoolAllocator, 
rapidjson::CrtAllocator>*, sq_connection const*, 
kudu::WebCallbackRegistry::WebRequest const&) 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.cc:527:24
 (impalad+0x256be13)
#2 impala::Webserver::RenderUrlWithTemplate(sq_connection const*, 
kudu::WebCallbackRegistry::WebRequest const&, impala::Webserver::UrlHandler 
const&, std::__cxx11::basic_stringstream, 
std::allocator >*, impala::ContentType*) 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.cc:816:3
 (impalad+0x256e882)
#3 impala::Webserver::BeginRequestCallback(sq_connection*, 
sq_request_info*) 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.cc:714:5
 (impalad+0x256cfbb)
#4 impala::Webserver::BeginRequestCallbackStatic(sq_connection*) 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.cc:556:20
 (impalad+0x256ba98)
#5 handle_request  (impalad+0x2582d59)

  Previous write of size 2 at 0x7b2c0006e3b0 by main thread:
#0 
impala::Webserver::UrlHandler::UrlHandler(impala::Webserver::UrlHandler&&) 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/webserver.h:141:9
 (impalad+0x2570dbc)
#1 std::pair, 
std::allocator > const, 
impala::Webserver::UrlHandler>::pair, std::allocator >, impala::Webserver::UrlHandler, 
true>(std::pair, 
std::allocator >, impala::Webserver::UrlHandler>&&) 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/stl_pair.h:362:4
 (impalad+0x25738b3)
#2 void 
__gnu_cxx::new_allocator, std::allocator > const, 
impala::Webserver::UrlHandler> > 
>::construct, 
std::allocator > const, impala::Webserver::UrlHandler>, 
std::pair, 
std::allocator >, impala::Webserver::UrlHandler> 
>(std::pair, 
std::allocator > const, impala::Webserver::UrlHandler>*, 
std::pair, 
std::allocator >, impala::Webserver::UrlHandler>&&) 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/ext/new_allocator.h:136:23
 (impalad+0x2573848)
#3 void 
std::allocator_traits, std::allocator > const, 
impala::Webserver::UrlHandler> > > 
>::construct, 
std::allocator > const, impala::Webserver::UrlHandler>, 
std::pair, 
std::allocator >, impala::Webserver::UrlHandler> 
>(std::allocator, std::allocator > const, 
impala::Webserver::UrlHandler> > >&, std::pair, std::allocator > const, 
impala::Webserver::UrlHandler>*, std::pair, std::allocator >, 
impala::Webserver::UrlHandler>&&) 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/alloc_traits.h:475:8
 (impalad+0x25737f1)
#4 void std::_Rb_tree, std::allocator >, 
std::pair, 
std::allocator > const, impala::Webserver::UrlHandler>, 
std::_Select1st, std::allocator > const, 
impala::Webserver::UrlHandler> >, std::less, std::allocator > >, 
std::allocator, std::allocator > const, 
impala::Webserver::UrlHandler> > 
>::_M_construct_node, std::allocator >, impala::Webserver::UrlHandler> 
>(std::_Rb_tree_node, std::allocator > const, 
impala::Webserver::UrlHandler> >*, std::pair, std::allocator >, 
impala::Webserver::UrlHandler>&&) 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/stl_tree.h:626:8
 (impalad+0x257369b)
#5 std::_Rb_tree_node, std::allocator > const, 
impala::Webserver::UrlHandler> >* 
std::_Rb_tree, 
std::allocator >, std::pair, std::allocator > const, 
impala::Webserver::UrlHandler>, 
std::_Select1st, std::allocator > const, 
impala::Webserver::UrlHandler> >, std::less, std::allocator > >, 
std::allocator, std::allocator > const, 
impala::Webserver::UrlHandler> > 
>::_M_create_node, std::allocator >, impala::Webserver::UrlHandler> 
>(std::pair, 
std::allocator >, impala::Webserver::UrlHandler>&&) 

[jira] [Resolved] (IMPALA-9740) TSAN data race in hdfs-bulk-ops

2020-09-10 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-9740.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> TSAN data race in hdfs-bulk-ops
> ---
>
> Key: IMPALA-9740
> URL: https://issues.apache.org/jira/browse/IMPALA-9740
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
>
> hdfs-bulk-ops usage of a local connection cache (HdfsFsCache::HdfsFsMap) has 
> a data race:
> {code:java}
>  WARNING: ThreadSanitizer: data race (pid=23205)
>   Write of size 8 at 0x7b24005642d8 by thread T47:
> #0 
> boost::unordered::detail::table_impl  const, hdfs_internal*> >, std::string, hdfs_internal*, 
> boost::hash, std::equal_to > 
> >::add_node(boost::unordered::detail::node_constructor  const, hdfs_internal*> > > >&, unsigned long) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/unordered/detail/unique.hpp:329:26
>  (impalad+0x1f93832)
> #1 
> std::pair  const, hdfs_internal*> > >, bool> 
> boost::unordered::detail::table_impl  const, hdfs_internal*> >, std::string, hdfs_internal*, 
> boost::hash, std::equal_to > 
> >::emplace_impl >(std::string 
> const&, std::pair&&) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/unordered/detail/unique.hpp:420:41
>  (impalad+0x1f933ed)
> #2 
> std::pair  const, hdfs_internal*> > >, bool> 
> boost::unordered::detail::table_impl  const, hdfs_internal*> >, std::string, hdfs_internal*, 
> boost::hash, std::equal_to > 
> >::emplace 
> >(std::pair&&) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/unordered/detail/unique.hpp:384:20
>  (impalad+0x1f932d1)
> #3 
> std::pair  const, hdfs_internal*> > >, bool> 
> boost::unordered::unordered_map boost::hash, std::equal_to, 
> std::allocator > 
> >::emplace 
> >(std::pair&&) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/unordered/unordered_map.hpp:241:27
>  (impalad+0x1f93238)
> #4 boost::unordered::unordered_map boost::hash, std::equal_to, 
> std::allocator > 
> >::insert(std::pair&&) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/unordered/unordered_map.hpp:390:26
>  (impalad+0x1f92038)
> #5 impala::HdfsFsCache::GetConnection(std::string const&, 
> hdfs_internal**, boost::unordered::unordered_map boost::hash, std::equal_to, 
> std::allocator > >*) 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/runtime/hdfs-fs-cache.cc:115:18
>  (impalad+0x1f916b3)
> #6 impala::HdfsOp::Execute() const 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/hdfs-bulk-ops.cc:84:55
>  (impalad+0x23444d5)
> #7 HdfsThreadPoolHelper(int, impala::HdfsOp const&) 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/hdfs-bulk-ops.cc:137:6
>  (impalad+0x2344ea9)
> #8 boost::detail::function::void_function_invoker2 impala::HdfsOp const&), void, int, impala::HdfsOp 
> const&>::invoke(boost::detail::function::function_buffer&, int, 
> impala::HdfsOp const&) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/function/function_template.hpp:118:11
>  (impalad+0x2345e80)
> #9 boost::function2::operator()(int, 
> impala::HdfsOp const&) const 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/function/function_template.hpp:770:14
>  (impalad+0x1f883be)
> #10 impala::ThreadPool::WorkerThread(int) 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/thread-pool.h:166:9
>  (impalad+0x1f874e5)
> #11 boost::_mfi::mf1, 
> int>::operator()(impala::ThreadPool*, int) const 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/mem_fn_template.hpp:165:29
>  (impalad+0x1f87b7d)
> #12 void 
> boost::_bi::list2*>, 
> boost::_bi::value >::operator() impala::ThreadPool, int>, 
> boost::_bi::list0>(boost::_bi::type, boost::_mfi::mf1 impala::ThreadPool, int>&, boost::_bi::list0&, int) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/bind.hpp:319:9
>  (impalad+0x1f87abc)
> #13 boost::_bi::bind_t impala::ThreadPool, int>, 
> boost::_bi::list2*>, 
> boost::_bi::value > >::operator()() 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/bind.hpp:1222:16
>  (impalad+0x1f87a23)
> #14 
> 

[jira] [Assigned] (IMPALA-9740) TSAN data race in hdfs-bulk-ops

2020-09-10 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar reassigned IMPALA-9740:


Assignee: Sahil Takiar

> TSAN data race in hdfs-bulk-ops
> ---
>
> Key: IMPALA-9740
> URL: https://issues.apache.org/jira/browse/IMPALA-9740
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> hdfs-bulk-ops usage of a local connection cache (HdfsFsCache::HdfsFsMap) has 
> a data race:
> {code:java}
>  WARNING: ThreadSanitizer: data race (pid=23205)
>   Write of size 8 at 0x7b24005642d8 by thread T47:
> #0 
> boost::unordered::detail::table_impl  const, hdfs_internal*> >, std::string, hdfs_internal*, 
> boost::hash, std::equal_to > 
> >::add_node(boost::unordered::detail::node_constructor  const, hdfs_internal*> > > >&, unsigned long) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/unordered/detail/unique.hpp:329:26
>  (impalad+0x1f93832)
> #1 
> std::pair  const, hdfs_internal*> > >, bool> 
> boost::unordered::detail::table_impl  const, hdfs_internal*> >, std::string, hdfs_internal*, 
> boost::hash, std::equal_to > 
> >::emplace_impl >(std::string 
> const&, std::pair&&) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/unordered/detail/unique.hpp:420:41
>  (impalad+0x1f933ed)
> #2 
> std::pair  const, hdfs_internal*> > >, bool> 
> boost::unordered::detail::table_impl  const, hdfs_internal*> >, std::string, hdfs_internal*, 
> boost::hash, std::equal_to > 
> >::emplace 
> >(std::pair&&) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/unordered/detail/unique.hpp:384:20
>  (impalad+0x1f932d1)
> #3 
> std::pair  const, hdfs_internal*> > >, bool> 
> boost::unordered::unordered_map boost::hash, std::equal_to, 
> std::allocator > 
> >::emplace 
> >(std::pair&&) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/unordered/unordered_map.hpp:241:27
>  (impalad+0x1f93238)
> #4 boost::unordered::unordered_map boost::hash, std::equal_to, 
> std::allocator > 
> >::insert(std::pair&&) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/unordered/unordered_map.hpp:390:26
>  (impalad+0x1f92038)
> #5 impala::HdfsFsCache::GetConnection(std::string const&, 
> hdfs_internal**, boost::unordered::unordered_map boost::hash, std::equal_to, 
> std::allocator > >*) 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/runtime/hdfs-fs-cache.cc:115:18
>  (impalad+0x1f916b3)
> #6 impala::HdfsOp::Execute() const 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/hdfs-bulk-ops.cc:84:55
>  (impalad+0x23444d5)
> #7 HdfsThreadPoolHelper(int, impala::HdfsOp const&) 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/hdfs-bulk-ops.cc:137:6
>  (impalad+0x2344ea9)
> #8 boost::detail::function::void_function_invoker2 impala::HdfsOp const&), void, int, impala::HdfsOp 
> const&>::invoke(boost::detail::function::function_buffer&, int, 
> impala::HdfsOp const&) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/function/function_template.hpp:118:11
>  (impalad+0x2345e80)
> #9 boost::function2::operator()(int, 
> impala::HdfsOp const&) const 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/function/function_template.hpp:770:14
>  (impalad+0x1f883be)
> #10 impala::ThreadPool::WorkerThread(int) 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/thread-pool.h:166:9
>  (impalad+0x1f874e5)
> #11 boost::_mfi::mf1, 
> int>::operator()(impala::ThreadPool*, int) const 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/mem_fn_template.hpp:165:29
>  (impalad+0x1f87b7d)
> #12 void 
> boost::_bi::list2*>, 
> boost::_bi::value >::operator() impala::ThreadPool, int>, 
> boost::_bi::list0>(boost::_bi::type, boost::_mfi::mf1 impala::ThreadPool, int>&, boost::_bi::list0&, int) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/bind.hpp:319:9
>  (impalad+0x1f87abc)
> #13 boost::_bi::bind_t impala::ThreadPool, int>, 
> boost::_bi::list2*>, 
> boost::_bi::value > >::operator()() 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/bind.hpp:1222:16
>  (impalad+0x1f87a23)
> #14 
> boost::detail::function::void_function_obj_invoker0 boost::_mfi::mf1, int>, 
> 

[jira] [Created] (IMPALA-10160) kernel_stack_watchdog cannot print user stack

2020-09-09 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10160:
-

 Summary: kernel_stack_watchdog cannot print user stack
 Key: IMPALA-10160
 URL: https://issues.apache.org/jira/browse/IMPALA-10160
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Reporter: Sahil Takiar


I've seen this a few times now, the kernel_stack_watchdog is used in a few 
places in the KRPC code and it prints out the kernel + user stack whenever a 
thread is stuck in some method call for too long. The issue is that the user 
stack does not get printed:

{code}
W0908 17:15:00.365721  6605 kernel_stack_watchdog.cc:198] Thread 6612 stuck at 
outbound_call.cc:273 for 120ms:
Kernel stack:
[] futex_wait_queue_me+0xc6/0x130
[] futex_wait+0x17b/0x280
[] do_futex+0x106/0x5a0
[] SyS_futex+0x80/0x180
[] system_call_fastpath+0x16/0x1b
[] 0x

User stack:

{code}

It says that the signal handler of taking the thread stack is unavailable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9351) AnalyzeDDLTest.TestCreateTableLikeFileOrc failed due to non-existing path

2020-09-08 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17192522#comment-17192522
 ] 

Sahil Takiar commented on IMPALA-9351:
--

Another instance: 
https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/11985/testReport/junit/org.apache.impala.analysis/AnalyzeDDLTest/TestCreateTableLikeFileOrc/

> AnalyzeDDLTest.TestCreateTableLikeFileOrc failed due to non-existing path
> -
>
> Key: IMPALA-9351
> URL: https://issues.apache.org/jira/browse/IMPALA-9351
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Fang-Yu Rao
>Assignee: Norbert Luksa
>Priority: Blocker
>  Labels: broken-build, flaky-test
> Fix For: Impala 3.4.0
>
>
> AnalyzeDDLTest.TestCreateTableLikeFileOrc failed due to a non-existing path. 
> Specifically, we see the following error message.
> {code:java}
> Error Message
> Error during analysis:
> org.apache.impala.common.AnalysisException: Cannot infer schema, path does 
> not exist: 
> hdfs://localhost:20500/test-warehouse/functional_orc_def.db/complextypes_fileformat/00_0
> sql:
> create table if not exists newtbl_DNE like orc 
> '/test-warehouse/functional_orc_def.db/complextypes_fileformat/00_0'
> {code}
> The stack trace is provided in the following.
> {code:java}
> Stacktrace
> java.lang.AssertionError: 
> Error during analysis:
> org.apache.impala.common.AnalysisException: Cannot infer schema, path does 
> not exist: 
> hdfs://localhost:20500/test-warehouse/functional_orc_def.db/complextypes_fileformat/00_0
> sql:
> create table if not exists newtbl_DNE like orc 
> '/test-warehouse/functional_orc_def.db/complextypes_fileformat/00_0'
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.impala.common.FrontendFixture.analyzeStmt(FrontendFixture.java:397)
>   at 
> org.apache.impala.common.FrontendTestBase.AnalyzesOk(FrontendTestBase.java:244)
>   at 
> org.apache.impala.common.FrontendTestBase.AnalyzesOk(FrontendTestBase.java:185)
>   at 
> org.apache.impala.analysis.AnalyzeDDLTest.TestCreateTableLikeFileOrc(AnalyzeDDLTest.java:2045)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:272)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:236)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:386)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:323)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:143)
> {code}
> This test was recently added by [~norbertluksa], and [~boroknagyz] gave a +2, 
> maybe [~boroknagyz] could provide some insight into this? Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (IMPALA-6984) Coordinator should cancel backends when returning EOS

2020-09-08 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar reassigned IMPALA-6984:


Assignee: Wenzhe Zhou

> Coordinator should cancel backends when returning EOS
> -
>
> Key: IMPALA-6984
> URL: https://issues.apache.org/jira/browse/IMPALA-6984
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Affects Versions: Impala 3.0
>Reporter: Daniel Hecht
>Assignee: Wenzhe Zhou
>Priority: Major
>  Labels: query-lifecycle
> Fix For: Impala 4.0
>
>
> Currently, the Coordinator waits for backends rather than proactively 
> cancelling them in the case of hitting EOS. There's a tangled mess that makes 
> it tricky to proactively cancel the backends related to how 
> {{Coordinator::ComputeQuerySummary()}} works – we can't update the summary 
> until the profiles are no longer changing (which also makes sense given that 
> we want the exec summary to be consistent with the final profile).  But we 
> current tie together the FIS status and the profile, and cancellation of 
> backends causes the FIS to return CANCELLED, which then means that the 
> remaining FIS on that backend won't produce a final profile.
> With the rework of the protocol for IMPALA-2990 we should make it possible to 
> sort this out such that a final profile can be requested regardless of how a 
> FIS ends execution.
> This also relates to IMPALA-5783.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-5119) Don't make RPCs from Coordinator::UpdateBackendExecStatus()

2020-09-08 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar reassigned IMPALA-5119:


Assignee: Wenzhe Zhou

> Don't make RPCs from Coordinator::UpdateBackendExecStatus()
> ---
>
> Key: IMPALA-5119
> URL: https://issues.apache.org/jira/browse/IMPALA-5119
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Distributed Exec
>Affects Versions: Impala 2.9.0
>Reporter: Henry Robinson
>Assignee: Wenzhe Zhou
>Priority: Major
>
> If it reports a bad status, {{UpdateFragmentExecStatus()}} will call 
> {{UpdateStatus()}}, which takes {{Coordinator::lock_}} and then calls 
> {{Cancel()}}. That method issues one RPC per fragment instance.
> In KRPC, doing so much work from {{UpdateFragmentExecStatus()}} - which is an 
> RPC handler - is a bad idea, even if the RPCs are issued asynchronously. 
> There's still some serialization cost.
> It's also a bad idea to do all this work while holding {{lock_}}. We should 
> address both of these to ensure scalability of the cancellation path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-9227) Test coverage for query retries when there is a network partition

2020-09-08 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar reassigned IMPALA-9227:


Assignee: Wenzhe Zhou

> Test coverage for query retries when there is a network partition
> -
>
> Key: IMPALA-9227
> URL: https://issues.apache.org/jira/browse/IMPALA-9227
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Wenzhe Zhou
>Priority: Major
>
> The initial version of transparent query retries just adds coverage for 
> retrying a query if an impalad crashes. Now that the Impala has an RPC fault 
> injection framework (IMPALA-8138) based on debug actions, integration tests 
> can introduce network partitions between two impalad processes.
> Node blacklisting should cause the Impala Coordinator to blacklist the nodes 
> with the network partitions (IMPALA-9137), and then transparent query retries 
> should cause the query to be successfully retried.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10154) Data race on coord_backend_id

2020-09-08 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10154:
-

 Summary: Data race on coord_backend_id
 Key: IMPALA-10154
 URL: https://issues.apache.org/jira/browse/IMPALA-10154
 Project: IMPALA
  Issue Type: Bug
Reporter: Sahil Takiar
Assignee: Wenzhe Zhou


TSAN is reporting a data race on 
{{ExecQueryFInstancesRequestPB#coord_backend_id}}
{code:java}
WARNING: ThreadSanitizer: data race (pid=15392)
  Write of size 8 at 0x7b74001104a8 by thread T83 (mutexes: write 
M871582266043729400):
#0 impala::ExecQueryFInstancesRequestPB::mutable_coord_backend_id() 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/generated-sources/gen-cpp/control_service.pb.h:6625:23
 (impalad+0x20c03ed)
#1 impala::QueryState::Init(impala::ExecQueryFInstancesRequestPB const*, 
impala::TExecPlanFragmentInfo const&) 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/runtime/query-state.cc:216:21
 (impalad+0x20b8b29)
#2 impala::QueryExecMgr::StartQuery(impala::ExecQueryFInstancesRequestPB 
const*, impala::TQueryCtx const&, impala::TExecPlanFragmentInfo const&) 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/runtime/query-exec-mgr.cc:80:23
 (impalad+0x20acb59)
#3 
impala::ControlService::ExecQueryFInstances(impala::ExecQueryFInstancesRequestPB
 const*, impala::ExecQueryFInstancesResponsePB*, kudu::rpc::RpcContext*) 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/service/control-service.cc:157:66
 (impalad+0x22a621d)
#4 
impala::ControlServiceIf::ControlServiceIf(scoped_refptr 
const&, scoped_refptr 
const&)::$_1::operator()(google::protobuf::Message const*, 
google::protobuf::Message*, kudu::rpc::RpcContext*) const 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/generated-sources/gen-cpp/control_service.service.cc:70:13
 (impalad+0x23622a4)
#5 std::_Function_handler 
const&, scoped_refptr 
const&)::$_1>::_M_invoke(std::_Any_data const&, google::protobuf::Message 
const*&&, google::protobuf::Message*&&, kudu::rpc::RpcContext*&&) 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/std_function.h:316:2
 (impalad+0x23620ed)
#6 std::function::operator()(google::protobuf::Message const*, 
google::protobuf::Message*, kudu::rpc::RpcContext*) const 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/std_function.h:706:14
 (impalad+0x2a4a453)
#7 kudu::rpc::GeneratedServiceIf::Handle(kudu::rpc::InboundCall*) 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/kudu/rpc/service_if.cc:139:3
 (impalad+0x2a49efe)
#8 impala::ImpalaServicePool::RunThread() 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/rpc/impala-service-pool.cc:272:15
 (impalad+0x2011a12)
#9 boost::_mfi::mf0::operator()(impala::ImpalaServicePool*) const 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/mem_fn_template.hpp:49:29
 (impalad+0x2017a16)
#10 void boost::_bi::list1 
>::operator(), 
boost::_bi::list0>(boost::_bi::type, boost::_mfi::mf0&, boost::_bi::list0&, int) 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/bind.hpp:259:9
 (impalad+0x201796a)
#11 boost::_bi::bind_t, 
boost::_bi::list1 > 
>::operator()() 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/bind.hpp:1222:16
 (impalad+0x20178f3)
#12 
boost::detail::function::void_function_obj_invoker0, 
boost::_bi::list1 > >, 
void>::invoke(boost::detail::function::function_buffer&) 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:159:11
 (impalad+0x20176e9)
#13 boost::function0::operator()() const 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:770:14
 (impalad+0x1f666f1)
#14 impala::Thread::SuperviseThread(std::__cxx11::basic_string, std::allocator > const&, 
std::__cxx11::basic_string, std::allocator > 
const&, boost::function, impala::ThreadDebugInfo const*, 
impala::Promise*) 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/util/thread.cc:360:3
 (impalad+0x252644b)
#15 void 
boost::_bi::list5, std::allocator > >, 
boost::_bi::value, 
std::allocator > >, boost::_bi::value >, 
boost::_bi::value, 
boost::_bi::value*> 
>::operator(), 
std::allocator > const&, 

[jira] [Commented] (IMPALA-10154) Data race on coord_backend_id

2020-09-08 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17192308#comment-17192308
 ] 

Sahil Takiar commented on IMPALA-10154:
---

This looks related to IMPALA-5746, so assigning to [~wzhou].

> Data race on coord_backend_id
> -
>
> Key: IMPALA-10154
> URL: https://issues.apache.org/jira/browse/IMPALA-10154
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Sahil Takiar
>Assignee: Wenzhe Zhou
>Priority: Major
>
> TSAN is reporting a data race on 
> {{ExecQueryFInstancesRequestPB#coord_backend_id}}
> {code:java}
> WARNING: ThreadSanitizer: data race (pid=15392)
>   Write of size 8 at 0x7b74001104a8 by thread T83 (mutexes: write 
> M871582266043729400):
> #0 impala::ExecQueryFInstancesRequestPB::mutable_coord_backend_id() 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/generated-sources/gen-cpp/control_service.pb.h:6625:23
>  (impalad+0x20c03ed)
> #1 impala::QueryState::Init(impala::ExecQueryFInstancesRequestPB const*, 
> impala::TExecPlanFragmentInfo const&) 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/runtime/query-state.cc:216:21
>  (impalad+0x20b8b29)
> #2 impala::QueryExecMgr::StartQuery(impala::ExecQueryFInstancesRequestPB 
> const*, impala::TQueryCtx const&, impala::TExecPlanFragmentInfo const&) 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/runtime/query-exec-mgr.cc:80:23
>  (impalad+0x20acb59)
> #3 
> impala::ControlService::ExecQueryFInstances(impala::ExecQueryFInstancesRequestPB
>  const*, impala::ExecQueryFInstancesResponsePB*, kudu::rpc::RpcContext*) 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/service/control-service.cc:157:66
>  (impalad+0x22a621d)
> #4 
> impala::ControlServiceIf::ControlServiceIf(scoped_refptr 
> const&, scoped_refptr 
> const&)::$_1::operator()(google::protobuf::Message const*, 
> google::protobuf::Message*, kudu::rpc::RpcContext*) const 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/generated-sources/gen-cpp/control_service.service.cc:70:13
>  (impalad+0x23622a4)
> #5 std::_Function_handler google::protobuf::Message*, kudu::rpc::RpcContext*), 
> impala::ControlServiceIf::ControlServiceIf(scoped_refptr 
> const&, scoped_refptr 
> const&)::$_1>::_M_invoke(std::_Any_data const&, google::protobuf::Message 
> const*&&, google::protobuf::Message*&&, kudu::rpc::RpcContext*&&) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/std_function.h:316:2
>  (impalad+0x23620ed)
> #6 std::function google::protobuf::Message*, 
> kudu::rpc::RpcContext*)>::operator()(google::protobuf::Message const*, 
> google::protobuf::Message*, kudu::rpc::RpcContext*) const 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/std_function.h:706:14
>  (impalad+0x2a4a453)
> #7 kudu::rpc::GeneratedServiceIf::Handle(kudu::rpc::InboundCall*) 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/kudu/rpc/service_if.cc:139:3
>  (impalad+0x2a49efe)
> #8 impala::ImpalaServicePool::RunThread() 
> /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/rpc/impala-service-pool.cc:272:15
>  (impalad+0x2011a12)
> #9 boost::_mfi::mf0 impala::ImpalaServicePool>::operator()(impala::ImpalaServicePool*) const 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/mem_fn_template.hpp:49:29
>  (impalad+0x2017a16)
> #10 void boost::_bi::list1 
> >::operator(), 
> boost::_bi::list0>(boost::_bi::type, boost::_mfi::mf0 impala::ImpalaServicePool>&, boost::_bi::list0&, int) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/bind.hpp:259:9
>  (impalad+0x201796a)
> #11 boost::_bi::bind_t impala::ImpalaServicePool>, 
> boost::_bi::list1 > 
> >::operator()() 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/bind.hpp:1222:16
>  (impalad+0x20178f3)
> #12 
> boost::detail::function::void_function_obj_invoker0 boost::_mfi::mf0, 
> boost::_bi::list1 > >, 
> void>::invoke(boost::detail::function::function_buffer&) 
> /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:159:11
>  (impalad+0x20176e9)
> #13 boost::function0::operator()() const 
> 

[jira] [Commented] (IMPALA-10073) Create shaded dependency for S3A and aws-java-sdk-bundle

2020-09-08 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17192259#comment-17192259
 ] 

Sahil Takiar commented on IMPALA-10073:
---

[https://github.com/apache/impala/blob/master/shaded-deps/s3a-aws-sdk/pom.xml#L58]
 contains the full list

> Create shaded dependency for S3A and aws-java-sdk-bundle
> 
>
> Key: IMPALA-10073
> URL: https://issues.apache.org/jira/browse/IMPALA-10073
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
>
> One of the largest dependencies in Impala Docker containers is the 
> aws-java-sdk-bundle jar. One way to decrease the size of this dependency is 
> to apply a similar technique used for the hive-exec shaded jar: 
> [https://github.com/apache/impala/blob/master/shaded-deps/pom.xml]
> The aws-java-sdk-bundle contains SDKs for all AWS services, even though 
> Impala-S3A only requires a few of the more basic SDKs.
> IMPALA-10028 and HADOOP-17197 both discuss this a bit as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-10142) Add RPC sender tracing

2020-09-03 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190302#comment-17190302
 ] 

Sahil Takiar edited comment on IMPALA-10142 at 9/3/20, 5:03 PM:


Actually this only really be useful if the RPC response includes some trace 
information as well, otherwise it is hard to capture time actually spent on the 
network. Currently, the {{TransmitDataResponsePB}} just includes the 
{{receiver_latency_ns}}. Adding that into the trace would be useful, other 
things such the timestamp when the RPC was received by the receiver, time in 
queue, etc. would be useful as well.

The timestamp of when the RPC was received by the sender would be particularly 
useful in debugging RPCs where the network is slow.


was (Author: stakiar):
Actually this only really be useful if the RPC response includes some trace 
information as well. Currently, the {{TransmitDataResponsePB}} just includes 
the {{receiver_latency_ns}}. Adding that into the trace would be useful, other 
things such the timestamp when the RPC was received by the receiver, time in 
queue, etc. would be useful as well.

> Add RPC sender tracing
> --
>
> Key: IMPALA-10142
> URL: https://issues.apache.org/jira/browse/IMPALA-10142
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Sahil Takiar
>Priority: Major
>
> We currently have RPC tracing on the receiver side, but not on the the sender 
> side. For slow RPCs, the logs print out the total amount of time spent 
> sending the RPC + the network time. Adding tracing will basically make this 
> more granular. It will help determine where exactly in the stack the time was 
> spent when sending RPCs.
> Combined with the trace logs in the receiver, it should be much easier to 
> determine the timeline of a given slow RPC.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10142) Add RPC sender tracing

2020-09-03 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190302#comment-17190302
 ] 

Sahil Takiar commented on IMPALA-10142:
---

Actually this only really be useful if the RPC response includes some trace 
information as well. Currently, the {{TransmitDataResponsePB}} just includes 
the {{receiver_latency_ns}}. Adding that into the trace would be useful, other 
things such the timestamp when the RPC was received by the receiver, time in 
queue, etc. would be useful as well.

> Add RPC sender tracing
> --
>
> Key: IMPALA-10142
> URL: https://issues.apache.org/jira/browse/IMPALA-10142
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Sahil Takiar
>Priority: Major
>
> We currently have RPC tracing on the receiver side, but not on the the sender 
> side. For slow RPCs, the logs print out the total amount of time spent 
> sending the RPC + the network time. Adding tracing will basically make this 
> more granular. It will help determine where exactly in the stack the time was 
> spent when sending RPCs.
> Combined with the trace logs in the receiver, it should be much easier to 
> determine the timeline of a given slow RPC.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10142) Add RPC sender tracing

2020-09-03 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10142:
-

 Summary: Add RPC sender tracing
 Key: IMPALA-10142
 URL: https://issues.apache.org/jira/browse/IMPALA-10142
 Project: IMPALA
  Issue Type: Improvement
Reporter: Sahil Takiar


We currently have RPC tracing on the receiver side, but not on the the sender 
side. For slow RPCs, the logs print out the total amount of time spent sending 
the RPC + the network time. Adding tracing will basically make this more 
granular. It will help determine where exactly in the stack the time was spent 
when sending RPCs.

Combined with the trace logs in the receiver, it should be much easier to 
determine the timeline of a given slow RPC.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10139) Slow RPC logs can be misleading

2020-09-03 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190296#comment-17190296
 ] 

Sahil Takiar commented on IMPALA-10139:
---

One other thought is that with result spooling enabled, the back-pressure 
mechanism won't be such a big issue anymore because results will all get 
spooled, regardless of whether clients fetch results slowly or not.

> Slow RPC logs can be misleading
> ---
>
> Key: IMPALA-10139
> URL: https://issues.apache.org/jira/browse/IMPALA-10139
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Sahil Takiar
>Priority: Major
>
> The slow RPC logs added in IMPALA-9128 are based on the total time taken to 
> successfully complete a RPC. The issue is that there are many reasons why an 
> RPC might take a long time to complete. An RPC is considered complete only 
> when the receiver has processed that RPC. 
> The problem is that due to client-driven back-pressure mechanism, it is 
> entirely possible that the receiver RPC does not process a receiver RPC 
> because {{KrpcDataStreamRecvr::SenderQueue::GetBatch}} just hasn't been 
> called yet (indirectly called by {{ExchangeNode::GetNext}}).
> This can lead to flood of slow RPC logs, even though the RPCs might not 
> actually be slow themselves. What is worse is that the because of the 
> back-pressure mechanism, slowness from the client (e.g. Hue users) will 
> propagate across all nodes involved in the query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10139) Slow RPC logs can be misleading

2020-09-03 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190295#comment-17190295
 ] 

Sahil Takiar commented on IMPALA-10139:
---

I think there is a similar issue with the TRACE logs. Take the example TRACE 
above, the majority of the time the RPC was just in the deferred state - e.g. 
there was not enough resources to process the RPC. Again, this just means that 
the back-pressure mechanism was kicking in, not necessarily that the network 
was slow.

> Slow RPC logs can be misleading
> ---
>
> Key: IMPALA-10139
> URL: https://issues.apache.org/jira/browse/IMPALA-10139
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Sahil Takiar
>Priority: Major
>
> The slow RPC logs added in IMPALA-9128 are based on the total time taken to 
> successfully complete a RPC. The issue is that there are many reasons why an 
> RPC might take a long time to complete. An RPC is considered complete only 
> when the receiver has processed that RPC. 
> The problem is that due to client-driven back-pressure mechanism, it is 
> entirely possible that the receiver RPC does not process a receiver RPC 
> because {{KrpcDataStreamRecvr::SenderQueue::GetBatch}} just hasn't been 
> called yet (indirectly called by {{ExchangeNode::GetNext}}).
> This can lead to flood of slow RPC logs, even though the RPCs might not 
> actually be slow themselves. What is worse is that the because of the 
> back-pressure mechanism, slowness from the client (e.g. Hue users) will 
> propagate across all nodes involved in the query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9870) summary and profile command in impala-shell should show both original and retried info

2020-09-02 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189775#comment-17189775
 ] 

Sahil Takiar commented on IMPALA-9870:
--

WIP Patch: http://gerrit.cloudera.org:8080/16406

> summary and profile command in impala-shell should show both original and 
> retried info
> --
>
> Key: IMPALA-9870
> URL: https://issues.apache.org/jira/browse/IMPALA-9870
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Major
>
> If a query is retried, impala-shell still uses the original query handle 
> containing the original query id. Subsequent "summary" and "profile" commands 
> will return results of the original query. We should consider return both the 
> original and retried information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10141) Include aggregate TCP metrics in per-node profiles

2020-09-02 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189724#comment-17189724
 ] 

Sahil Takiar commented on IMPALA-10141:
---

Adding some additional fields from {{/proc/net/dev}} in the per-node stats from 
system-info.cc might be useful as well. Fields like NET_RX_ERRS, NET_RX_DROP, 
NET_TX_ERRS, NET_TX_DROP might be useful to track transmit / receive errors or 
dropped packets. Although these stats are probably more generic as they are not 
specific to the kRPC TCP connections and are truly at the host level. I'm also 
not sure what exactly they capture compared to the TCP stats. They seem more 
hardware specific, maybe they would capture host NIC issues.

> Include aggregate TCP metrics in per-node profiles
> --
>
> Key: IMPALA-10141
> URL: https://issues.apache.org/jira/browse/IMPALA-10141
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Sahil Takiar
>Priority: Major
>
> The /rpcz endpoint in the debug web ui includes a ton of useful TCP-level 
> metrics per kRPC connection for all inbound / outbound connections. It would 
> be useful to aggregate some of these metrics and put them in the per-node 
> profiles. Since it is not possible to currently split these metrics out per 
> query, they should be added at the per-host level. Furthermore, only metrics 
> that can be sanely aggregated across all connections should be included. For 
> example, tracking the number of Retransmitted TCP Packets across all 
> connections for the duration of the query would be useful. TCP 
> retransmissions should be rare and are typically indicate of network hardware 
> issues or network congestions, having at least some high level idea of the 
> number of TCP retransmissions that occur during a query can drastically help 
> determine if the network is to blame for query slowness.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10139) Slow RPC logs can be misleading

2020-09-02 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189718#comment-17189718
 ] 

Sahil Takiar commented on IMPALA-10139:
---

The "network" time (calculated as {{int64_t network_time_ns = total_time_ns - 
resp_.receiver_latency_ns()}}) might be a more useful threshold value to use.

> Slow RPC logs can be misleading
> ---
>
> Key: IMPALA-10139
> URL: https://issues.apache.org/jira/browse/IMPALA-10139
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Sahil Takiar
>Priority: Major
>
> The slow RPC logs added in IMPALA-9128 are based on the total time taken to 
> successfully complete a RPC. The issue is that there are many reasons why an 
> RPC might take a long time to complete. An RPC is considered complete only 
> when the receiver has processed that RPC. 
> The problem is that due to client-driven back-pressure mechanism, it is 
> entirely possible that the receiver RPC does not process a receiver RPC 
> because {{KrpcDataStreamRecvr::SenderQueue::GetBatch}} just hasn't been 
> called yet (indirectly called by {{ExchangeNode::GetNext}}).
> This can lead to flood of slow RPC logs, even though the RPCs might not 
> actually be slow themselves. What is worse is that the because of the 
> back-pressure mechanism, slowness from the client (e.g. Hue users) will 
> propagate across all nodes involved in the query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10141) Include aggregate TCP metrics in per-node profiles

2020-09-02 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10141:
-

 Summary: Include aggregate TCP metrics in per-node profiles
 Key: IMPALA-10141
 URL: https://issues.apache.org/jira/browse/IMPALA-10141
 Project: IMPALA
  Issue Type: Improvement
Reporter: Sahil Takiar


The /rpcz endpoint in the debug web ui includes a ton of useful TCP-level 
metrics per kRPC connection for all inbound / outbound connections. It would be 
useful to aggregate some of these metrics and put them in the per-node 
profiles. Since it is not possible to currently split these metrics out per 
query, they should be added at the per-host level. Furthermore, only metrics 
that can be sanely aggregated across all connections should be included. For 
example, tracking the number of Retransmitted TCP Packets across all 
connections for the duration of the query would be useful. TCP retransmissions 
should be rare and are typically indicate of network hardware issues or network 
congestions, having at least some high level idea of the number of TCP 
retransmissions that occur during a query can drastically help determine if the 
network is to blame for query slowness.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10139) Slow RPC logs can be misleading

2020-09-02 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189698#comment-17189698
 ] 

Sahil Takiar commented on IMPALA-10139:
---

This is pretty easy to reproduce on master. I just ran the query "select * from 
functional.alltypes as a2, functional.alltypes as a1" and didn't fetch any 
results. A bunch of RPCs get sent, but are not processed because queues are 
probably full. Then the logs contain entries like:
{code:java}
I0902 13:25:34.797029 17168 rpcz_store.cc:269] Call 
impala.DataStreamService.TransmitData from 127.0.0.1:33354 (request call id 
6737) took 218496ms. Request Metrics: {}
I0902 13:25:34.797061 17168 rpcz_store.cc:273] Trace:
0902 13:21:56.300996 (+ 0us) impala-service-pool.cc:170] Inserting onto 
call queue
0902 13:21:56.301037 (+41us) impala-service-pool.cc:269] Handling call
0902 13:21:56.301048 (+11us) krpc-data-stream-recvr.cc:325] Enqueuing 
deferred RPC
0902 13:25:34.757315 (+218456267us) krpc-data-stream-recvr.cc:504] Processing 
deferred RPC
0902 13:25:34.757317 (+ 2us) krpc-data-stream-recvr.cc:524] Batch queue is 
full
0902 13:25:34.757319 (+ 2us) krpc-data-stream-recvr.cc:504] Processing 
deferred RPC
0902 13:25:34.757320 (+ 1us) krpc-data-stream-recvr.cc:524] Batch queue is 
full
0902 13:25:34.796800 (+ 39480us) krpc-data-stream-recvr.cc:504] Processing 
deferred RPC
0902 13:25:34.796803 (+ 3us) krpc-data-stream-recvr.cc:397] Deserializing 
batch
0902 13:25:34.797011 (+   208us) krpc-data-stream-recvr.cc:424] Enqueuing 
deserialized batch
0902 13:25:34.797021 (+10us) inbound_call.cc:162] Queueing success response
Metrics: {}
I0902 13:25:34.797154 17105 krpc-data-stream-sender.cc:394] Slow TransmitData 
RPC to 127.0.0.1:27000 
(fragment_instance_id=d447645333af3b77:671fbefe): took 3m38s. Receiver 
time: 3m38s Network time: 239.735us
I0902 13:25:34.797215  3684 krpc-data-stream-sender.cc:428] 
d447645333af3b77:671fbefe0005] Long delay waiting for RPC to 
127.0.0.1:27000 (fragment_instance_id=d447645333af3b77:671fbefe): took 
3m38s {code}

> Slow RPC logs can be misleading
> ---
>
> Key: IMPALA-10139
> URL: https://issues.apache.org/jira/browse/IMPALA-10139
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Sahil Takiar
>Priority: Major
>
> The slow RPC logs added in IMPALA-9128 are based on the total time taken to 
> successfully complete a RPC. The issue is that there are many reasons why an 
> RPC might take a long time to complete. An RPC is considered complete only 
> when the receiver has processed that RPC. 
> The problem is that due to client-driven back-pressure mechanism, it is 
> entirely possible that the receiver RPC does not process a receiver RPC 
> because {{KrpcDataStreamRecvr::SenderQueue::GetBatch}} just hasn't been 
> called yet (indirectly called by {{ExchangeNode::GetNext}}).
> This can lead to flood of slow RPC logs, even though the RPCs might not 
> actually be slow themselves. What is worse is that the because of the 
> back-pressure mechanism, slowness from the client (e.g. Hue users) will 
> propagate across all nodes involved in the query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10139) Slow RPC logs can be misleading

2020-09-02 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189696#comment-17189696
 ] 

Sahil Takiar commented on IMPALA-10139:
---

Linking IMPALA-3380 - which has some details why we don't add timeouts for 
TransmitData RPCs.

> Slow RPC logs can be misleading
> ---
>
> Key: IMPALA-10139
> URL: https://issues.apache.org/jira/browse/IMPALA-10139
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Sahil Takiar
>Priority: Major
>
> The slow RPC logs added in IMPALA-9128 are based on the total time taken to 
> successfully complete a RPC. The issue is that there are many reasons why an 
> RPC might take a long time to complete. An RPC is considered complete only 
> when the receiver has processed that RPC. 
> The problem is that due to client-driven back-pressure mechanism, it is 
> entirely possible that the receiver RPC does not process a receiver RPC 
> because {{KrpcDataStreamRecvr::SenderQueue::GetBatch}} just hasn't been 
> called yet (indirectly called by {{ExchangeNode::GetNext}}).
> This can lead to flood of slow RPC logs, even though the RPCs might not 
> actually be slow themselves. What is worse is that the because of the 
> back-pressure mechanism, slowness from the client (e.g. Hue users) will 
> propagate across all nodes involved in the query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10139) Slow RPC logs can be misleading

2020-09-02 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10139:
-

 Summary: Slow RPC logs can be misleading
 Key: IMPALA-10139
 URL: https://issues.apache.org/jira/browse/IMPALA-10139
 Project: IMPALA
  Issue Type: Improvement
Reporter: Sahil Takiar


The slow RPC logs added in IMPALA-9128 are based on the total time taken to 
successfully complete a RPC. The issue is that there are many reasons why an 
RPC might take a long time to complete. An RPC is considered complete only when 
the receiver has processed that RPC. 

The problem is that due to client-driven back-pressure mechanism, it is 
entirely possible that the receiver RPC does not process a receiver RPC because 
{{KrpcDataStreamRecvr::SenderQueue::GetBatch}} just hasn't been called yet 
(indirectly called by {{ExchangeNode::GetNext}}).

This can lead to flood of slow RPC logs, even though the RPCs might not 
actually be slow themselves. What is worse is that the because of the 
back-pressure mechanism, slowness from the client (e.g. Hue users) will 
propagate across all nodes involved in the query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10138) Add fragment instance id to RPC trace output

2020-09-02 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10138:
-

 Summary: Add fragment instance id to RPC trace output
 Key: IMPALA-10138
 URL: https://issues.apache.org/jira/browse/IMPALA-10138
 Project: IMPALA
  Issue Type: Improvement
Reporter: Sahil Takiar


The RPC traces added in IMPALA-9128 are hard to correlate to specific queries 
because the output does not include the fragment instance id. I'm not sure if 
this is actually possible in the current kRPC code, but it would be nice if the 
tracing output included the fragment instance id.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-9954) RpcRecvrTime can be negative

2020-09-02 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated IMPALA-9954:
-
Parent: IMPALA-10137
Issue Type: Sub-task  (was: Bug)

> RpcRecvrTime can be negative
> 
>
> Key: IMPALA-9954
> URL: https://issues.apache.org/jira/browse/IMPALA-9954
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Priority: Major
> Attachments: profile_034e7209bd98c96c_9a448dfc.txt
>
>
> Saw this on a recent version of master. Attached the full runtime profile.
> {code:java}
> KrpcDataStreamSender (dst_id=2):(Total: 9.863ms, non-child: 3.185ms, 
> % non-child: 32.30%)
>   ExecOption: Unpartitioned Sender Codegen Disabled: not needed
>- BytesSent (500.000ms): 0, 0
>- NetworkThroughput: (Avg: 4.34 MB/sec ; Min: 4.34 MB/sec ; Max: 
> 4.34 MB/sec ; Number of samples: 1)
>- RpcNetworkTime: (Avg: 3.562ms ; Min: 679.676us ; Max: 6.445ms ; 
> Number of samples: 2)
>- RpcRecvrTime: (Avg: -151281.000ns ; Min: -231485.000ns ; Max: 
> -71077.000ns ; Number of samples: 2)
>- EosSent: 1 (1)
>- PeakMemoryUsage: 416.00 B (416)
>- RowsSent: 100 (100)
>- RpcFailure: 0 (0)
>- RpcRetry: 0 (0)
>- SerializeBatchTime: 2.880ms
>- TotalBytesSent: 28.67 KB (29355)
>- UncompressedRowBatchSize: 69.29 KB (70950) {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-5473) Make diagnosing network issues easier

2020-09-02 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar reassigned IMPALA-5473:


Assignee: (was: Michael Ho)

> Make diagnosing network issues easier
> -
>
> Key: IMPALA-5473
> URL: https://issues.apache.org/jira/browse/IMPALA-5473
> Project: IMPALA
>  Issue Type: Task
>Affects Versions: Impala 2.10.0
>Reporter: Henry Robinson
>Priority: Major
>
> With our current metrics in the profile, it's hard to debug queries that get 
> slow throughput from their exchanges. 
> The following cases have different causes, but similar symptoms (e.g. a high 
> {{InactiveTimer}} in the xchg profile):
> 1. Downstream sender does not produce rows quickly (perhaps because *its* 
> child instances do not produce rows quickly).
> 2. Downstream sender can not _send_ rows quickly, perhaps because of network 
> congestion.
> 3. Downstream sender does not start producing rows until some time after the 
> upstream has started (captured by {{FirstBatchArrivalWaitTime}}).
> 4. Downstream sender does not close stream until some time after all rows are 
> sent.
> We should try to improve these metrics so that all the information about who 
> is slow, and why, is available clearly in the runtime profile. Distinguishing 
> cases 1 and 2 is particularly important.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10049) Include RPC call_id in slow RPC logs

2020-09-02 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated IMPALA-10049:
--
Parent: IMPALA-10137
Issue Type: Sub-task  (was: Improvement)

> Include RPC call_id in slow RPC logs
> 
>
> Key: IMPALA-10049
> URL: https://issues.apache.org/jira/browse/IMPALA-10049
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Priority: Major
>
> The current code for logging slow RPCs on the sender side looks something 
> like this:
> {code:java}
> template 
> void KrpcDataStreamSender::Channel::LogSlowRpc(
>   ¦ const char* rpc_name, int64_t total_time_ns, const ResponsePBType& resp) {
>   int64_t network_time_ns = total_time_ns - resp_.receiver_latency_ns();
>   LOG(INFO) << "Slow " << rpc_name << " RPC to " << address_
>   ¦ ¦ ¦ ¦ ¦ << " (fragment_instance_id=" << PrintId(fragment_instance_id_) << 
> "): "
>   ¦ ¦ ¦ ¦ ¦ << "took " << PrettyPrinter::Print(total_time_ns, TUnit::TIME_NS) 
> << ". "
>   ¦ ¦ ¦ ¦ ¦ << "Receiver time: "
>   ¦ ¦ ¦ ¦ ¦ << PrettyPrinter::Print(resp_.receiver_latency_ns(), 
> TUnit::TIME_NS)
>   ¦ ¦ ¦ ¦ ¦ << " Network time: " << PrettyPrinter::Print(network_time_ns, 
> TUnit::TIME_NS);
> }
> void KrpcDataStreamSender::Channel::LogSlowFailedRpc(
>   ¦ const char* rpc_name, int64_t total_time_ns, const kudu::Status& err) {
>   LOG(INFO) << "Slow " << rpc_name << " RPC to " << address_
>   ¦ ¦ ¦ ¦ ¦ << " (fragment_instance_id=" << PrintId(fragment_instance_id_) << 
> "): "
>   ¦ ¦ ¦ ¦ ¦ << "took " << PrettyPrinter::Print(total_time_ns, TUnit::TIME_NS) 
> << ". "
>   ¦ ¦ ¦ ¦ ¦ << "Error: " << err.ToString();
> } {code}
> It would be nice to include the call_id in the logs as well so that RPCs can 
> more easily be traced. The RPC call_id is dumped in RPC traces on the 
> receiver side, as well as in the /rpcz output on the debug ui.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-6705) TotalNetworkSendTime in query profile is misleading

2020-09-02 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-6705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189580#comment-17189580
 ] 

Sahil Takiar commented on IMPALA-6705:
--

I got bitten by this recently, and it was very confusing. Linking to 
IMPALA-10137. +1 on fixing this.

> TotalNetworkSendTime in query profile is misleading
> ---
>
> Key: IMPALA-6705
> URL: https://issues.apache.org/jira/browse/IMPALA-6705
> Project: IMPALA
>  Issue Type: Bug
>  Components: Distributed Exec
>Affects Versions: Impala 2.5.0, Impala 2.4.0, Impala 2.6.0, Impala 2.7.0, 
> Impala 2.8.0, Impala 2.9.0, Impala 2.10.0, Impala 2.11.0, Impala 2.12.0
>Reporter: Michael Ho
>Priority: Major
>  Labels: observability
>
> {{TotalNetworkSendTime}} is actually measuring the time which a fragment 
> instance execution thread spent waiting for the completion of previous RPC. 
> This is a combination of:
>  - network time of sending the RPC payload to the destination
>  - processing and queuing time in the destination
>  - network time of sending the RPC response to the originating node
> The name of this metric itself is misleading because it gives the impression 
> that it's the time spent sending the RPC payload to the destination so a 
> query profile with a high {{TotalNetworkSendTime}} may easily mislead a user 
> into concluding that there is something wrong with the network. In reality, 
> the receiving end could be overloaded and it's taking a huge amount of time 
> to respond to an RPC.
> For this metric to be useful, we need to have a breakdown of those 3 
> components above.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10137) Network Debugging / Supportability Improvements

2020-09-02 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated IMPALA-10137:
--
Labels: observability  (was: )

> Network Debugging / Supportability Improvements
> ---
>
> Key: IMPALA-10137
> URL: https://issues.apache.org/jira/browse/IMPALA-10137
> Project: IMPALA
>  Issue Type: Epic
>Reporter: Sahil Takiar
>Priority: Major
>  Labels: observability
>
> There are various improvements Impala should make to improve debugging of 
> network issues (e.g. slow RPCs, TCP retransmissions, etc.).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-10049) Include RPC call_id in slow RPC logs

2020-09-02 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar reassigned IMPALA-10049:
-

Assignee: Sahil Takiar

> Include RPC call_id in slow RPC logs
> 
>
> Key: IMPALA-10049
> URL: https://issues.apache.org/jira/browse/IMPALA-10049
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> The current code for logging slow RPCs on the sender side looks something 
> like this:
> {code:java}
> template 
> void KrpcDataStreamSender::Channel::LogSlowRpc(
>   ¦ const char* rpc_name, int64_t total_time_ns, const ResponsePBType& resp) {
>   int64_t network_time_ns = total_time_ns - resp_.receiver_latency_ns();
>   LOG(INFO) << "Slow " << rpc_name << " RPC to " << address_
>   ¦ ¦ ¦ ¦ ¦ << " (fragment_instance_id=" << PrintId(fragment_instance_id_) << 
> "): "
>   ¦ ¦ ¦ ¦ ¦ << "took " << PrettyPrinter::Print(total_time_ns, TUnit::TIME_NS) 
> << ". "
>   ¦ ¦ ¦ ¦ ¦ << "Receiver time: "
>   ¦ ¦ ¦ ¦ ¦ << PrettyPrinter::Print(resp_.receiver_latency_ns(), 
> TUnit::TIME_NS)
>   ¦ ¦ ¦ ¦ ¦ << " Network time: " << PrettyPrinter::Print(network_time_ns, 
> TUnit::TIME_NS);
> }
> void KrpcDataStreamSender::Channel::LogSlowFailedRpc(
>   ¦ const char* rpc_name, int64_t total_time_ns, const kudu::Status& err) {
>   LOG(INFO) << "Slow " << rpc_name << " RPC to " << address_
>   ¦ ¦ ¦ ¦ ¦ << " (fragment_instance_id=" << PrintId(fragment_instance_id_) << 
> "): "
>   ¦ ¦ ¦ ¦ ¦ << "took " << PrettyPrinter::Print(total_time_ns, TUnit::TIME_NS) 
> << ". "
>   ¦ ¦ ¦ ¦ ¦ << "Error: " << err.ToString();
> } {code}
> It would be nice to include the call_id in the logs as well so that RPCs can 
> more easily be traced. The RPC call_id is dumped in RPC traces on the 
> receiver side, as well as in the /rpcz output on the debug ui.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-10049) Include RPC call_id in slow RPC logs

2020-09-02 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar reassigned IMPALA-10049:
-

Assignee: (was: Sahil Takiar)

> Include RPC call_id in slow RPC logs
> 
>
> Key: IMPALA-10049
> URL: https://issues.apache.org/jira/browse/IMPALA-10049
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Sahil Takiar
>Priority: Major
>
> The current code for logging slow RPCs on the sender side looks something 
> like this:
> {code:java}
> template 
> void KrpcDataStreamSender::Channel::LogSlowRpc(
>   ¦ const char* rpc_name, int64_t total_time_ns, const ResponsePBType& resp) {
>   int64_t network_time_ns = total_time_ns - resp_.receiver_latency_ns();
>   LOG(INFO) << "Slow " << rpc_name << " RPC to " << address_
>   ¦ ¦ ¦ ¦ ¦ << " (fragment_instance_id=" << PrintId(fragment_instance_id_) << 
> "): "
>   ¦ ¦ ¦ ¦ ¦ << "took " << PrettyPrinter::Print(total_time_ns, TUnit::TIME_NS) 
> << ". "
>   ¦ ¦ ¦ ¦ ¦ << "Receiver time: "
>   ¦ ¦ ¦ ¦ ¦ << PrettyPrinter::Print(resp_.receiver_latency_ns(), 
> TUnit::TIME_NS)
>   ¦ ¦ ¦ ¦ ¦ << " Network time: " << PrettyPrinter::Print(network_time_ns, 
> TUnit::TIME_NS);
> }
> void KrpcDataStreamSender::Channel::LogSlowFailedRpc(
>   ¦ const char* rpc_name, int64_t total_time_ns, const kudu::Status& err) {
>   LOG(INFO) << "Slow " << rpc_name << " RPC to " << address_
>   ¦ ¦ ¦ ¦ ¦ << " (fragment_instance_id=" << PrintId(fragment_instance_id_) << 
> "): "
>   ¦ ¦ ¦ ¦ ¦ << "took " << PrettyPrinter::Print(total_time_ns, TUnit::TIME_NS) 
> << ". "
>   ¦ ¦ ¦ ¦ ¦ << "Error: " << err.ToString();
> } {code}
> It would be nice to include the call_id in the logs as well so that RPCs can 
> more easily be traced. The RPC call_id is dumped in RPC traces on the 
> receiver side, as well as in the /rpcz output on the debug ui.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10137) Network Debugging / Supportability Improvements

2020-09-02 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10137:
-

 Summary: Network Debugging / Supportability Improvements
 Key: IMPALA-10137
 URL: https://issues.apache.org/jira/browse/IMPALA-10137
 Project: IMPALA
  Issue Type: Epic
Reporter: Sahil Takiar


There are various improvements Impala should make to improve debugging of 
network issues (e.g. slow RPCs, TCP retransmissions, etc.).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10121) bin/jenkins/finalize.sh should generate JUnitXML for TSAN failures

2020-09-01 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188848#comment-17188848
 ] 

Sahil Takiar commented on IMPALA-10121:
---

+1 thanks for reporting this.

> bin/jenkins/finalize.sh should generate JUnitXML for TSAN failures
> --
>
> Key: IMPALA-10121
> URL: https://issues.apache.org/jira/browse/IMPALA-10121
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 4.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
>
> There is code in bin/jenkins/finalize.sh to generate JUnitXML for 
> AddressSanitizer errors that crash Impala. We should add equivalent logic for 
> ThreadSanitizer messages. It looks like the message would go to the ERROR log 
> and look like:
> {noformat}
> ==
> WARNING: ThreadSanitizer: data race (pid=6436)
>   Read of size 1 at 0x7b480017aaa8 by thread T320 (mutexes: write 
> M861448892003377216, write M862574791910219632, write M623321199144890016, 
> write M1054540811927503496):
> ... stacks, etc ...
> =={noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10125) webserver.test_web_pages.TestWebPage.test_catalog failed

2020-09-01 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188847#comment-17188847
 ] 

Sahil Takiar commented on IMPALA-10125:
---

Might be related to IMPALA-9292.

> webserver.test_web_pages.TestWebPage.test_catalog failed
> 
>
> Key: IMPALA-10125
> URL: https://issues.apache.org/jira/browse/IMPALA-10125
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Yongzhi Chen
>Priority: Major
>
> In master-core-data-load, webserver.test_web_pages.TestWebPage.test_catalog 
> failed with
> {noformat}
> Stacktrace
> webserver/test_web_pages.py:303: in test_catalog
> self.__test_table_metrics(unique_database, "foo_part", "alter-duration")
> webserver/test_web_pages.py:352: in __test_table_metrics
> "?name=%s.%s" % (db_name, tbl_name), metric, 
> ports_to_test=self.CATALOG_TEST_PORT)
> webserver/test_web_pages.py:170: in get_and_check_status
> assert string_to_search in response.text, "URL: {0} 
> Str:'{1}'\nResp:{2}".format(
> E   AssertionError: URL: 
> http://localhost:25020/table_metrics?name=test_catalog_caf8ffd1.foo_part 
> Str:'alter-duration'
> E Resp:
> E 
> E 
> E 
> E 
> E   Apache Impala
> E 
> E 
> E 
> E  href="/www/datatables-1.10.18.min.css"/>
> E  src="/www/datatables-1.10.18.min.js">
> E  rel='stylesheet' media='screen'>
> E 
> E 
> E   @media (min-width: 1300px) {
> E #nav-options {
> E width: 1280px;
> E }
> E   }
> E 
> E   body {
> E font-size: 14px;
> E   }
> E 
> E   pre {
> E padding: 10px;
> E font-size: 12px;
> E border: 1px solid #ccc;
> E   }
> E 
> E   /* Avoid unsightly padding around code element */
> E   pre.code {
> E padding: 0;
> E   }
> E 
> E   
> E   
> E 
> E   
> E 
> E   catalogd
> E 
> E  role="navigation">
> E   
> E 
> E  href="/">/
> E 
> E  href="/catalog">/catalog
> E 
> E  href="/jmx">/jmx
> E 
> E  href="/log_level">/log_level
> E 
> E  href="/logs">/logs
> E 
> E  href="/memz">/memz
> E 
> E  href="/metrics">/metrics
> E 
> E  href="/operations">/operations
> E 
> E  href="/profile_docs">/profile_docs
> E 
> E  href="/rpcz">/rpcz
> E 
> E  href="/threadz">/threadz
> E 
> E  href="/varz">/varz
> E 
> E   
> E 
> E   
> E 
> E 
> E 
> E 
> E // For Apache Knox compatibility, all urls that are accessed by 
> javascript should have
> E // their path wrapped with this.
> E function make_url(path) {
> E   var root_link = document.getElementById('root-link');
> E   var s  = root_link.href.split("?");
> E   url = s[0] + path;
> E   if (s.length > 1) {
> E if (path.includes("?")) {
> E   url += "&"
> E } else {
> E   url += "?";
> E }
> E url += s[1];
> E   }
> E   return url;
> E }
> E 
> E 
> E 
> E Metrics for table test_catalog_caf8ffd1.foo_partare not available 
> because the table is currently modified by another operation.
> E 
> E 
> E 
> E 
> E 
> E 
> E   assert 'alter-duration' in 

[jira] [Resolved] (IMPALA-10126) asf-master-core-s3 test_aggregation.TestWideAggregationQueries.test_many_grouping_columns failed

2020-09-01 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-10126.
---
Resolution: Duplicate

Duplicate of IMPALA-9058

> asf-master-core-s3 
> test_aggregation.TestWideAggregationQueries.test_many_grouping_columns failed
> 
>
> Key: IMPALA-10126
> URL: https://issues.apache.org/jira/browse/IMPALA-10126
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Yongzhi Chen
>Priority: Major
>
> query_test.test_aggregation.TestWideAggregationQueries.test_many_grouping_columns[protocol:
>  beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> parquet/none] (from pytest)
> {noformat}
> Error Message
> query_test/test_aggregation.py:453: in test_many_grouping_columns result 
> = self.execute_query(query, exec_option, table_format=table_format) 
> common/impala_test_suite.py:811: in wrapper return function(*args, 
> **kwargs) common/impala_test_suite.py:843: in execute_query return 
> self.__execute_query(self.client, query, query_options) 
> common/impala_test_suite.py:909: in __execute_query return 
> impalad_client.execute(query, user=user) common/impala_connection.py:205: in 
> execute return self.__beeswax_client.execute(sql_stmt, user=user) 
> beeswax/impala_beeswax.py:187: in execute handle = 
> self.__execute_query(query_string.strip(), user=user) 
> beeswax/impala_beeswax.py:365: in __execute_query 
> self.wait_for_finished(handle) beeswax/impala_beeswax.py:386: in 
> wait_for_finished raise ImpalaBeeswaxException("Query aborted:" + 
> error_log, None) E   ImpalaBeeswaxException: ImpalaBeeswaxException: E
> Query aborted:Disk I/O error on 
> impala-ec2-centos74-m5-4xlarge-ondemand-1129.vpc.cloudera.com:22001: Failed 
> to open HDFS file 
> s3a://impala-test-uswest2-1/test-warehouse/widetable_1000_cols_parquet/1f4ec08992b6e3f9-6fd9a17d_1482052561_data.0.parq
>  E   Error(2): No such file or directory E   Root cause: 
> ResourceNotFoundException: Requested resource not found (Service: 
> AmazonDynamoDBv2; Status Code: 400; Error Code: ResourceNotFoundException; 
> Request ID: 1HMMG39MJ9GP2JEENAUFVFDVA3VV4KQNSO5AEMVJF66Q9ASUAAJG)
> Stacktrace
> query_test/test_aggregation.py:453: in test_many_grouping_columns
> result = self.execute_query(query, exec_option, table_format=table_format)
> common/impala_test_suite.py:811: in wrapper
> return function(*args, **kwargs)
> common/impala_test_suite.py:843: in execute_query
> return self.__execute_query(self.client, query, query_options)
> common/impala_test_suite.py:909: in __execute_query
> return impalad_client.execute(query, user=user)
> common/impala_connection.py:205: in execute
> return self.__beeswax_client.execute(sql_stmt, user=user)
> beeswax/impala_beeswax.py:187: in execute
> handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:365: in __execute_query
> self.wait_for_finished(handle)
> beeswax/impala_beeswax.py:386: in wait_for_finished
> raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EQuery aborted:Disk I/O error on 
> impala-ec2-centos74-m5-4xlarge-ondemand-1129.vpc.cloudera.com:22001: Failed 
> to open HDFS file 
> s3a://impala-test-uswest2-1/test-warehouse/widetable_1000_cols_parquet/1f4ec08992b6e3f9-6fd9a17d_1482052561_data.0.parq
> E   Error(2): No such file or directory
> E   Root cause: ResourceNotFoundException: Requested resource not found 
> (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: 
> ResourceNotFoundException; Request ID: 
> 1HMMG39MJ9GP2JEENAUFVFDVA3VV4KQNSO5AEMVJF66Q9ASUAAJG)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-10128) AnalyzeDDLTest.TestCreateTableLikeFileOrc failed

2020-09-01 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-10128.
---
Resolution: Duplicate

Looks like a duplicate of IMPALA-9351

> AnalyzeDDLTest.TestCreateTableLikeFileOrc failed
> 
>
> Key: IMPALA-10128
> URL: https://issues.apache.org/jira/browse/IMPALA-10128
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Yongzhi Chen
>Priority: Major
>
> Parallel-all-tests:
> In ubuntu-16.04-from-scratch, 
> org.apache.impala.analysis.AnalyzeDDLTest.TestCreateTableLikeFileOrc
> failed with
> Error during analysis:
> org.apache.impala.common.AnalysisException: Cannot infer schema, path does 
> not exist: 
> hdfs://localhost:20500/test-warehouse/managed/complextypestbl_orc_def/base_001/bucket_0_0
> sql:
> create table if not exists newtbl_DNE like orc 
> '/test-warehouse/managed/complextypestbl_orc_def/base_001/bucket_0_0'
> Stacktrace
> java.lang.AssertionError: 
> Error during analysis:
> org.apache.impala.common.AnalysisException: Cannot infer schema, path does 
> not exist: 
> hdfs://localhost:20500/test-warehouse/managed/complextypestbl_orc_def/base_001/bucket_0_0
> sql:
> create table if not exists newtbl_DNE like orc 
> '/test-warehouse/managed/complextypestbl_orc_def/base_001/bucket_0_0'
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.impala.common.FrontendFixture.analyzeStmt(FrontendFixture.java:397)
>   at 
> org.apache.impala.common.FrontendTestBase.AnalyzesOk(FrontendTestBase.java:246)
>   at 
> org.apache.impala.common.FrontendTestBase.AnalyzesOk(FrontendTestBase.java:186)
>   at 
> org.apache.impala.analysis.AnalyzeDDLTest.TestCreateTableLikeFileOrc(AnalyzeDDLTest.java:2027)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:272)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:236)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:386)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:323)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:143)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Closed] (IMPALA-10123) asf-master-core-tsan load data error

2020-09-01 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar closed IMPALA-10123.
-
Resolution: Duplicate

I think this is a duplicate of IMPALA-10129. The underlying error was in the 
impalad.ERROR logs for data load.

> asf-master-core-tsan load data error
> 
>
> Key: IMPALA-10123
> URL: https://issues.apache.org/jira/browse/IMPALA-10123
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Yongzhi Chen
>Priority: Major
>
> The load data failed in asf-master-core-tsan two builds in a row:
> 19:32:54 16:32:54 Error executing impala SQL: 
> /data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/logs/data_loading/sql/functional/invalidate-functional-query-exhaustive-impala-generated.sql
>  See: 
> /data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/logs/data_loading/sql/functional/invalidate-functional-query-exhaustive-impala-generated.sql.log
> In the log, it shows:
> Encounter errors before parsing any queries.
> Traceback (most recent call last):
>   File 
> "/data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/bin/load-data.py",
>  line 202, in exec_impala_query_from_file
> impala_client.connect()
>   File 
> "/data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/tests/beeswax/impala_beeswax.py",
>  line 162, in connect
> raise ImpalaBeeswaxException(self.__build_error_message(e), e)
> ImpalaBeeswaxException: ImpalaBeeswaxException:
>  INNER EXCEPTION: 
>  MESSAGE: Could not connect to localhost:21000



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10129) Data race in MemTracker::GetTopNQueriesAndUpdatePoolStats

2020-09-01 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10129:
-

 Summary: Data race in MemTracker::GetTopNQueriesAndUpdatePoolStats
 Key: IMPALA-10129
 URL: https://issues.apache.org/jira/browse/IMPALA-10129
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Reporter: Sahil Takiar
Assignee: Qifan Chen


TSAN is reporting a data race in 
{{MemTracker::GetTopNQueriesAndUpdatePoolStats}}

{code}
WARNING: ThreadSanitizer: data race (pid=6436)
  Read of size 1 at 0x7b480017aaa8 by thread T320 (mutexes: write 
M861448892003377216, write M862574791910219632, write M623321199144890016, 
write M1054540811927503496):
#0 
impala::MemTracker::GetTopNQueriesAndUpdatePoolStats(std::priority_queue >, 
std::greater >&, int, impala::TPoolStats&) 
/data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/be/src/runtime/mem-tracker.cc:453:19
 (impalad+0x20b13b1)
#1 impala::MemTracker::UpdatePoolStatsForQueries(int, impala::TPoolStats&) 
/data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/be/src/runtime/mem-tracker.cc:432:3
 (impalad+0x20b123d)
#2 impala::AdmissionController::PoolStats::UpdateMemTrackerStats() 
/data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/be/src/scheduling/admission-controller.cc:1642:14
 (impalad+0x21c9d10)
#3 
impala::AdmissionController::AddPoolUpdates(std::vector >*) 
/data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/be/src/scheduling/admission-controller.cc:1662:18
 (impalad+0x21c7053)
#4 
impala::AdmissionController::UpdatePoolStats(std::map, std::allocator >, impala::TTopicDelta, 
std::less, 
std::allocator > >, 
std::allocator, std::allocator > const, impala::TTopicDelta> > > 
const&, std::vector 
>*) 
/data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/be/src/scheduling/admission-controller.cc:1355:5
 (impalad+0x21c6d7d)
#5 
impala::AdmissionController::Init()::$_4::operator()(std::map, std::allocator >, impala::TTopicDelta, 
std::less, 
std::allocator > >, 
std::allocator, std::allocator > const, impala::TTopicDelta> > > 
const&, std::vector 
>*) const 
/data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/be/src/scheduling/admission-controller.cc:643:45
 (impalad+0x21ce0e1)
#6 
boost::detail::function::void_function_obj_invoker2, 
std::allocator >, impala::TTopicDelta, 
std::less, 
std::allocator > >, 
std::allocator, std::allocator > const, impala::TTopicDelta> > > 
const&, std::vector 
>*>::invoke(boost::detail::function::function_buffer&, 
std::map, 
std::allocator >, impala::TTopicDelta, 
std::less, 
std::allocator > >, 
std::allocator, std::allocator > const, impala::TTopicDelta> > > 
const&, std::vector 
>*) 
/data/jenkins/workspace/impala-asf-master-core-tsan/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:159:11
 (impalad+0x21cdf2c)
#7 boost::function2, std::allocator >, impala::TTopicDelta, 
std::less, 
std::allocator > >, 
std::allocator, std::allocator > const, impala::TTopicDelta> > > 
const&, std::vector 
>*>::operator()(std::map, std::allocator >, impala::TTopicDelta, 
std::less, 
std::allocator > >, 
std::allocator, std::allocator > const, impala::TTopicDelta> > > 
const&, std::vector 
>*) const 
/data/jenkins/workspace/impala-asf-master-core-tsan/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:770:14
 (impalad+0x23fa960)
#8 
impala::StatestoreSubscriber::UpdateState(std::map, std::allocator >, impala::TTopicDelta, 
std::less, 
std::allocator > >, 
std::allocator, std::allocator > const, impala::TTopicDelta> > > 
const&, impala::TUniqueId const&, std::vector >*, bool*) 
/data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/be/src/statestore/statestore-subscriber.cc:471:7
 (impalad+0x23f7899)
#9 
impala::StatestoreSubscriberThriftIf::UpdateState(impala::TUpdateStateResponse&,
 impala::TUpdateStateRequest const&) 
/data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/be/src/statestore/statestore-subscriber.cc:110:18
 (impalad+0x23fabbf)
#10 impala::StatestoreSubscriberProcessor::process_UpdateState(int, 
apache::thrift::protocol::TProtocol*, apache::thrift::protocol::TProtocol*, 
void*) 
/data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/be/generated-sources/gen-cpp/StatestoreSubscriber.cpp:543:13
 (impalad+0x29adba4)
#11 
impala::StatestoreSubscriberProcessor::dispatchCall(apache::thrift::protocol::TProtocol*,
 apache::thrift::protocol::TProtocol*, std::__cxx11::basic_string, std::allocator > const&, int, void*) 
/data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/be/generated-sources/gen-cpp/StatestoreSubscriber.cpp:516:3
 (impalad+0x29ad982)
#12 
apache::thrift::TDispatchProcessor::process(boost::shared_ptr,
 boost::shared_ptr, void*) 

[jira] [Commented] (IMPALA-9870) summary and profile command in impala-shell should show both original and retried info

2020-09-01 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188831#comment-17188831
 ] 

Sahil Takiar commented on IMPALA-9870:
--

I like the idea. I've been working on something similar in IMPALA-9229, 
although it does not include the updates to the SQL Syntax for the summary / 
profile command. I think the extensions to the summary / profile command make 
sense, and would be great to have. The implementation I've been working on is 
server side, it basically extends the current Thrift GetSummary / GetProfile 
service requests. I should have something ready for review by today or 
tomorrow. I'll post it for review, and then we can decide how to proceed 
together.

> summary and profile command in impala-shell should show both original and 
> retried info
> --
>
> Key: IMPALA-9870
> URL: https://issues.apache.org/jira/browse/IMPALA-9870
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Major
>
> If a query is retried, impala-shell still uses the original query handle 
> containing the original query id. Subsequent "summary" and "profile" commands 
> will return results of the original query. We should consider return both the 
> original and retried information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-10030) Remove unneeded jars from fe/pom.xml

2020-08-31 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-10030.
---
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Remove unneeded jars from fe/pom.xml
> 
>
> Key: IMPALA-10030
> URL: https://issues.apache.org/jira/browse/IMPALA-10030
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
>
> There are several jars dependencies that are (1) not needed, (2) can easily 
> be removed, (3) can be converted to test dependencies, or (4) pull in 
> unnecessary transitive dependencies.
> Removing all these jar dependencies can help decrease the size of Impala 
> Docker images.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10117) Skip calls to FsPermissionCache for blob stores

2020-08-31 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17187895#comment-17187895
 ] 

Sahil Takiar commented on IMPALA-10117:
---

FWIW I hacked the code a bit and ran core tests against S3, and confirmed these 
methods are called on S3-backed tables.

> Skip calls to FsPermissionCache for blob stores
> ---
>
> Key: IMPALA-10117
> URL: https://issues.apache.org/jira/browse/IMPALA-10117
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Sahil Takiar
>Priority: Major
>
> The {{FsPermissionCache}} is described as:
> {code:java}
> /**
>  * Simple non-thread-safe cache for resolved file permissions. This allows
>  * pre-caching permissions by listing the status of all files within a 
> directory,
>  * and then using that cache to avoid round trips to the FileSystem for later
>  * queries of those paths.
>  */ {code}
> I confirmed, and {{FsPermissionCache#precacheChildrenOf}} is actually called 
> for data stored on S3. The issue is that {{FsPermissionCache#getPermissions}} 
> is called inside {{HdfsTable#getAvailableAccessLevel}}, which is skipped for 
> S3. So all the cached metadata is not used. The problem is that 
> {{precacheChildrenOf}} calls {{getFileStatus}} for all files, which results 
> in a bunch of unnecessary metadata operations to S3 + a bunch of cached 
> metadata that is never used.
> {{precacheChildrenOf}} is actually only invoked in the specific scenario 
> described below:
> {code}
> // Only preload permissions if the number of partitions to be added is
> // large (3x) relative to the number of existing partitions. This covers
> // two common cases:
> //
> // 1) initial load of a table (no existing partition metadata)
> // 2) ALTER TABLE RECOVER PARTITIONS after creating a table pointing to
> // an already-existing partition directory tree
> //
> // Without this heuristic, we would end up using a "listStatus" call to
> // potentially fetch a bunch of irrelevant information about existing
> // partitions when we only want to know about a small number of 
> newly-added
> // partitions.
> {code}
> Regardless, skipping the call to {{precacheChildrenOf}} for blob stores 
> should (1) improve table loading time for S3 backed tables, and (2) decrease 
> catalogd memory requirements when loading a bunch of tables stored on S3.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10117) Skip calls to FsPermissionCache for blob stores

2020-08-31 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10117:
-

 Summary: Skip calls to FsPermissionCache for blob stores
 Key: IMPALA-10117
 URL: https://issues.apache.org/jira/browse/IMPALA-10117
 Project: IMPALA
  Issue Type: Improvement
Reporter: Sahil Takiar


The {{FsPermissionCache}} is described as:
{code:java}
/**
 * Simple non-thread-safe cache for resolved file permissions. This allows
 * pre-caching permissions by listing the status of all files within a 
directory,
 * and then using that cache to avoid round trips to the FileSystem for later
 * queries of those paths.
 */ {code}

I confirmed, and {{FsPermissionCache#precacheChildrenOf}} is actually called 
for data stored on S3. The issue is that {{FsPermissionCache#getPermissions}} 
is called inside {{HdfsTable#getAvailableAccessLevel}}, which is skipped for 
S3. So all the cached metadata is not used. The problem is that 
{{precacheChildrenOf}} calls {{getFileStatus}} for all files, which results in 
a bunch of unnecessary metadata operations to S3 + a bunch of cached metadata 
that is never used.

{{precacheChildrenOf}} is actually only invoked in the specific scenario 
described below:
{code}
// Only preload permissions if the number of partitions to be added is
// large (3x) relative to the number of existing partitions. This covers
// two common cases:
//
// 1) initial load of a table (no existing partition metadata)
// 2) ALTER TABLE RECOVER PARTITIONS after creating a table pointing to
// an already-existing partition directory tree
//
// Without this heuristic, we would end up using a "listStatus" call to
// potentially fetch a bunch of irrelevant information about existing
// partitions when we only want to know about a small number of newly-added
// partitions.
{code}

Regardless, skipping the call to {{precacheChildrenOf}} for blob stores should 
(1) improve table loading time for S3 backed tables, and (2) decrease catalogd 
memory requirements when loading a bunch of tables stored on S3.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-2019) Proper UTF-8 support in string functions

2020-08-31 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated IMPALA-2019:
-
Priority: Major  (was: Minor)

> Proper UTF-8 support in string functions
> 
>
> Key: IMPALA-2019
> URL: https://issues.apache.org/jira/browse/IMPALA-2019
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend
>Affects Versions: Impala 2.1, Impala 2.2
>Reporter: Andrés Cordero
>Priority: Major
>  Labels: sql-language
>
> As documented here: 
> https://impala.apache.org/docs/build/html/topics/impala_string.html
> Impala does not properly handle non-ASCII UTF-8 characters, and will return 
> results in string functions such as length that are inconsistent with Hive.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-10073) Create shaded dependency for S3A and aws-java-sdk-bundle

2020-08-31 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-10073.
---
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Create shaded dependency for S3A and aws-java-sdk-bundle
> 
>
> Key: IMPALA-10073
> URL: https://issues.apache.org/jira/browse/IMPALA-10073
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
>
> One of the largest dependencies in Impala Docker containers is the 
> aws-java-sdk-bundle jar. One way to decrease the size of this dependency is 
> to apply a similar technique used for the hive-exec shaded jar: 
> [https://github.com/apache/impala/blob/master/shaded-deps/pom.xml]
> The aws-java-sdk-bundle contains SDKs for all AWS services, even though 
> Impala-S3A only requires a few of the more basic SDKs.
> IMPALA-10028 and HADOOP-17197 both discuss this a bit as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9332) Investigate and use the new batch listing API from HDFS-13616

2020-08-27 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17186189#comment-17186189
 ] 

Sahil Takiar commented on IMPALA-9332:
--

FYI there is a TODO to use this in HdfsTable#preloadPermissionsCache:
{code:java}
// TODO(todd): when HDFS-13616 (batch listing of multiple directories)
// is implemented, we could likely implement this with a single round
// trip. {code}

> Investigate and use the new batch listing API from HDFS-13616
> -
>
> Key: IMPALA-9332
> URL: https://issues.apache.org/jira/browse/IMPALA-9332
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Vihang Karajgaonkar
>Assignee: Quanlong Huang
>Priority: Critical
>
> HDFS-13616 provides a new batch listing API which can potentially speed up 
> the file listing on HDFS tables when reloading the table file metadata. We 
> should investigate if this API is helpful for Impala and use it if there are 
> any performance benefits.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9225) Retryable queries should spool all results before returning any to the client

2020-08-27 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17186089#comment-17186089
 ] 

Sahil Takiar commented on IMPALA-9225:
--

Nice work on this!

> Retryable queries should spool all results before returning any to the client
> -
>
> Key: IMPALA-9225
> URL: https://issues.apache.org/jira/browse/IMPALA-9225
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Assignee: Quanlong Huang
>Priority: Critical
> Fix For: Impala 4.0
>
>
> If query retries are enabled, a query should not return any results to the 
> client until all results are spooled. The issue is that once a query starts 
> returning results, retrying the query becomes increasingly complex and is not 
> supported in the initial version of IMPALA-9124. Retrying a query while 
> returning results could cause incorrect results, especially for 
> non-deterministic queries (e.g. when the results are not ordered).
> Since a query can fail anytime while results are being produced, transparent 
> retries are most effective if they can be done during any point of query 
> execution.
> The one edge case is what happens if all query results cannot be contained in 
> the allocated result spooling memory (including unpinned memory). In this 
> case, retries for the query should be transparently disabled.
> We should consider making this configurable, in case it leads to performance 
> degradation. Although, I'm inclined to turn the flag on by default (e.g. 
> always spool all returns before returning them), otherwise (depending on the 
> query) query retries won't always be helpful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8547) get_json_object fails to get value for numeric key

2020-08-25 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-8547.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> get_json_object fails to get value for numeric key
> --
>
> Key: IMPALA-8547
> URL: https://issues.apache.org/jira/browse/IMPALA-8547
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.1.0
>Reporter: Eugene Zimichev
>Assignee: Eugene Zimichev
>Priority: Minor
>  Labels: built-in-function
> Fix For: Impala 4.0
>
>
> {code:java}
> select get_json_object('{"1": 5}', '$.1');
> {code}
> returns error:
>  
> {code:java}
> "Expected key at position 2"
> {code}
>  
> I guess it's caused by using function FindEndOfIdentifier that expects first 
> symbol of key to be a letter.
> Hive version of get_json_object works fine in this case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-5564) Return a profile for queries during planning

2020-08-25 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-5564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184620#comment-17184620
 ] 

Sahil Takiar commented on IMPALA-5564:
--

A WIP patch for this was posted here: https://gerrit.cloudera.org/#/c/8434/

> Return a profile for queries during planning
> 
>
> Key: IMPALA-5564
> URL: https://issues.apache.org/jira/browse/IMPALA-5564
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Affects Versions: Impala 2.10.0
>Reporter: Lars Volker
>Priority: Major
>  Labels: supportability
>
> During planning we currently don't return a profile from the debug webpages. 
> It would be nice to do so, to allow various monitoring tools to retrieve 
> information about queries during their planning phase.
> This could be a minimal version of the profiles with information about the 
> current state of planning, e.g. that the FE is currently waiting for metadata 
> to be loaded.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10073) Create shaded dependency for S3A and aws-java-sdk-bundle

2020-08-14 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated IMPALA-10073:
--
Summary: Create shaded dependency for S3A and aws-java-sdk-bundle  (was: 
Created shaded dependency for S3A and aws-java-sdk-bundle)

> Create shaded dependency for S3A and aws-java-sdk-bundle
> 
>
> Key: IMPALA-10073
> URL: https://issues.apache.org/jira/browse/IMPALA-10073
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> One of the largest dependencies in Impala Docker containers is the 
> aws-java-sdk-bundle jar. One way to decrease the size of this dependency is 
> to apply a similar technique used for the hive-exec shaded jar: 
> [https://github.com/apache/impala/blob/master/shaded-deps/pom.xml]
> The aws-java-sdk-bundle contains SDKs for all AWS services, even though 
> Impala-S3A only requires a few of the more basic SDKs.
> IMPALA-10028 and HADOOP-17197 both discuss this a bit as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10085) Table level stats are not honored when partition has corrupt stats

2020-08-13 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10085:
-

 Summary: Table level stats are not honored when partition has 
corrupt stats
 Key: IMPALA-10085
 URL: https://issues.apache.org/jira/browse/IMPALA-10085
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Sahil Takiar


This is more of an edge case of IMPALA-9744, but when any partition in a table 
has corrupt stats, the table-level stats will not be honored. On the other 
hand, if a table just has missing stats, the table-level stats will be honored.

Given the a partitioned table with the following partitions and their row 
counts:

{code:java}
[localhost:21000] default> show partitions part_test;
Query: show partitions part_test
+-+++--+--+---++---+---+
| partcol | #Rows  | #Files | Size | Bytes Cached | Cache Replication | 
Format | Incremental stats | Location   
   |
+-+++--+--+---++---+---+
| 1   | -1 | 1  | 10B  | NOT CACHED   | NOT CACHED| 
TEXT   | false | 
hdfs://localhost:20500/test-warehouse/part_test/partcol=1 |
| 2   | -438290| 1  | 6B   | NOT CACHED   | NOT CACHED| 
TEXT   | false | 
hdfs://localhost:20500/test-warehouse/part_test/partcol=2 |
| 3   | 3  | 1  | 6B   | NOT CACHED   | NOT CACHED| 
TEXT   | false | 
hdfs://localhost:20500/test-warehouse/part_test/partcol=3 |
| Total   | 100100 | 3  | 22B  | 0B   |   | 
   |   |
   |
+-+++--+--+---++---+---+
 {code}

The query {{explain select * from part_test order by col limit 10}} will cause 
{{HdfsScanNode#getStatsNumRows}} to return 5.

Given the following set of partitions with different row counts than above:

{code}
+-+++--+--+---++---+---+
| partcol | #Rows  | #Files | Size | Bytes Cached | Cache Replication | 
Format | Incremental stats | Location   
   |
+-+++--+--+---++---+---+
| 1   | -1 | 1  | 10B  | NOT CACHED   | NOT CACHED| 
TEXT   | false | 
hdfs://localhost:20500/test-warehouse/part_test/partcol=1 |
| 2   | -1 | 1  | 6B   | NOT CACHED   | NOT CACHED| 
TEXT   | false | 
hdfs://localhost:20500/test-warehouse/part_test/partcol=2 |
| 3   | 3  | 1  | 6B   | NOT CACHED   | NOT CACHED| 
TEXT   | false | 
hdfs://localhost:20500/test-warehouse/part_test/partcol=3 |
| Total   | 100100 | 3  | 22B  | 0B   |   | 
   |   |
   |
+-+++--+--+---++---+---+
{code}

The same method returns 100100.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



  1   2   3   4   5   6   7   8   9   >