[ 
https://issues.apache.org/jira/browse/IMPALA-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16882527#comment-16882527
 ] 

ASF subversion and git services commented on IMPALA-8732:
---------------------------------------------------------

Commit b068c2c4e21ab302ca372f870eccfb2111752e34 in impala's branch 
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=b068c2c ]

IMPALA-8732: Use a serialized descriptor table in TQueryCtx

In IMPALA-8732, there is contention in tcmalloc when sending the
ExecQueryFInstances messages for a query referencing a large
number of partitions. This is because each thread in the
ExecEnv::exec_rpc_thread_pool_ is making a copy of the TQueryCtx,
which contains the TDescriptorTable and a large map of THdfsPartition
objects. Every thread in the exec_rpc_thread_pool_ is doing this
simultaneously.

The threads are copying this structure, but the TQueryCtx and its
corresponding TDescriptorTable is the same across all messages for
this query. Copying a large map of THdfsPartition objects is
wasteful, especially considering that the coordinator does not
need to access any of the information in TDescriptorTable before
sending it out to executors.

In future, the entire TQueryCtx can be serialized once and
embedded in its own sidecar. This change is limited to
TDescriptorTable to allow easier backports to older versions,
as this codepath has been converted from Thrift to KRPC and
a large amount of code has changed.

This changes TQueryCtx to contain a TDescriptorTableSerialized,
which is a binary blob containing the serialized form of
TDescriptorTable. This is serialized in the frontend and
passed directly through to executors. The old unserialized
TDescriptorTable form is maintained to enable frontend
planner tests (which use incomplete structures lacking some
required fields and cannot be serialized).

Testing:
 - Core and exhaustive tests pass

Change-Id: I458aa62dd4d1e4e4a7b1869a604623a69f3b2d9a
Reviewed-on: http://gerrit.cloudera.org:8080/13772
Reviewed-by: Joe McDonnell <joemcdonn...@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>


> tcmalloc contention in Coordinator::BackendState::Exec()
> --------------------------------------------------------
>
>                 Key: IMPALA-8732
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8732
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 2.12.0, Impala 3.3.0
>            Reporter: Joe McDonnell
>            Assignee: Joe McDonnell
>            Priority: Critical
>
> A cluster with several dozen nodes running Impala 2.12 observed slow query 
> startup:
> {noformat}
>     Query Timeline
>       Query submitted: 63.19us (63194)
>       Planning finished: 815ms (815243445)
>       Submit for admission: 1.21s (1206292509)
>       Completed admission: 1.21s (1206462581)
>       Ready to start on 83 backends: 1.40s (1395408016) <------
>       All 83 execution backends (84 fragment instances) started: 3.3m 
> (196968242656) <-------
>       Rows available: 3.3m (197012769471)
>       Cancelled: 3.3m (197012953528)
>       Released admission control resources: 3.3m (197029124587)
>       Unregister query: 3.3m (197241887354){noformat}
> This corresponds to the time spent in Coordinator::StartBackendExec() where 
> the coordinator submits the ExecQueryFInstances messages to the 
> ExecEnv::exec_rpc_thread_pool_ and waits for the thread pool to send the 
> messages via Coordinator::BackendState:;Exec(). This time did not show up in 
> the backend startup latency:
> {noformat}
> Backend startup latencies: Count: 83, min / max: 1ms / 619ms, 25th %-ile: 
> 2ms, 50th %-ile: 2ms, 75th %-ile: 3ms, 90th %-ile: 4ms, 95th %-ile: 8ms, 
> 99.9th %-ile: 619ms{noformat}
> This means that the time is spent on the coordinator side, and it is not 
> related to RPCs.
> Examining the logs from the affected Impala coordinator shows a backlog on 
> the ExecEnv::exec_rpc_thread_pool_. Specifically, I processed the logs 
> looking for "starting execution on X backends for query_id=" messages and 
> their corresponding "started execution on X backends for query_id="  messages 
> and tracked the number of outstanding starting queries. There was no burst of 
> query activity; the number of outstanding starting queries accumulated over 
> the course of a minute, reaching 30 at times.
> I wrote a script to track the number of outstanding starting queries in the 
> real time and dump the Impalad's pstack if the problem showed up (defined as 
> number of starting queries > 10).
> This revealed contention on tcmalloc in Coordinator::BackendState::Exec() 
> when copying the TQueryCtx (specifically when copying a map of 
> THdfsPartitions and their corresponding TExpr/TExprNode). The pstacks showed 
> stacks like these:
> {noformat}
> #0  0x000000000211519a in base::internal::SpinLockDelay(int volatile*, int, 
> int) ()
> #1  0x0000000002160b69 in SpinLock::SlowLock() ()
> #2  0x00000000021a4568 in tcmalloc::CentralFreeList::Populate() ()
> #3  0x00000000021a4668 in 
> tcmalloc::CentralFreeList::FetchFromOneSpansSafe(int, void**, void**) ()
> #4  0x00000000021a46f4 in tcmalloc::CentralFreeList::RemoveRange(void**, 
> void**, int) ()
> #5  0x00000000021b1e53 in 
> tcmalloc::ThreadCache::FetchFromCentralCache(unsigned long, unsigned long) ()
> #6  0x0000000002258978 in tc_newarray ()
> #7  0x00000000009644e7 in impala::TExpr* 
> std::__uninitialized_copy<false>::__uninit_copy<__gnu_cxx::__normal_iterator<impala::TExpr
>  const*, std::vector<impala::TExpr, std::allocator<impala::TExpr> > >, 
> impala::TExpr*>(__gnu_cxx::__normal_iterator<impala::TExpr const*, 
> std::vector<impala::TExpr, std::allocator<impala::TExpr> > >, 
> __gnu_cxx::__normal_iterator<impala::TExpr const*, std::vector<impala::TExpr, 
> std::allocator<impala::TExpr> > >, impala::TExpr*) ()
> #8  0x000000000096509a in 
> _ZNSt8_Rb_treeIlSt4pairIKlN6impala14THdfsPartitionEESt10_Select1stIS4_ESt4lessIlESaIS4_EE14_M_create_nodeIJRKS4_EEEPSt13_Rb_tree_nodeIS4_EDpOT_
>  ()
> #9  0x00000000009654e3 in std::_Rb_tree<long, std::pair<long const, 
> impala::THdfsPartition>, std::_Select1st<std::pair<long const, 
> impala::THdfsPartition> >, std::less<long>, std::allocator<std::pair<long 
> const, impala::THdfsPartition> > >::_M_copy(std::_Rb_tree_node<std::pair<long 
> const, impala::THdfsPartition> > const*, std::_Rb_tree_node<std::pair<long 
> const, impala::THdfsPartition> >*) ()
> #10 0x0000000000965513 in std::_Rb_tree<long, std::pair<long const, 
> impala::THdfsPartition>, std::_Select1st<std::pair<long const, 
> impala::THdfsPartition> >, std::less<long>, std::allocator<std::pair<long 
> const, impala::THdfsPartition> > >::_M_copy(std::_Rb_tree_node<std::pair<long 
> const, impala::THdfsPartition> > const*, std::_Rb_tree_node<std::pair<long 
> const, impala::THdfsPartition> >*) ()
> #11 0x0000000000965513 in std::_Rb_tree<long, std::pair<long const, 
> impala::THdfsPartition>, std::_Select1st<std::pair<long const, 
> impala::THdfsPartition> >, std::less<long>, std::allocator<std::pair<long 
> const, impala::THdfsPartition> > >::_M_copy(std::_Rb_tree_node<std::pair<long 
> const, impala::THdfsPartition> > const*, std::_Rb_tree_node<std::pair<long 
> const, impala::THdfsPartition> >*) ()
> #12 0x000000000096556b in std::_Rb_tree<long, std::pair<long const, 
> impala::THdfsPartition>, std::_Select1st<std::pair<long const, 
> impala::THdfsPartition> >, std::less<long>, std::allocator<std::pair<long 
> const, impala::THdfsPartition> > >::_M_copy(std::_Rb_tree_node<std::pair<long 
> const, impala::THdfsPartition> > const*, std::_Rb_tree_node<std::pair<long 
> const, impala::THdfsPartition> >*) ()
> #13 0x000000000096556b in std::_Rb_tree<long, std::pair<long const, 
> impala::THdfsPartition>, std::_Select1st<std::pair<long const, 
> impala::THdfsPartition> >, std::less<long>, std::allocator<std::pair<long 
> const, impala::THdfsPartition> > >::_M_copy(std::_Rb_tree_node<std::pair<long 
> const, impala::THdfsPartition> > const*, std::_Rb_tree_node<std::pair<long 
> const, impala::THdfsPartition> >*) ()
> #14 0x000000000096556b in std::_Rb_tree<long, std::pair<long const, 
> impala::THdfsPartition>, std::_Select1st<std::pair<long const, 
> impala::THdfsPartition> >, std::less<long>, std::allocator<std::pair<long 
> const, impala::THdfsPartition> > >::_M_copy(std::_Rb_tree_node<std::pair<long 
> const, impala::THdfsPartition> > const*, std::_Rb_tree_node<std::pair<long 
> const, impala::THdfsPartition> >*) ()
> #15 0x000000000096556b in std::_Rb_tree<long, std::pair<long const, 
> impala::THdfsPartition>, std::_Select1st<std::pair<long const, 
> impala::THdfsPartition> >, std::less<long>, std::allocator<std::pair<long 
> const, impala::THdfsPartition> > >::_M_copy(std::_Rb_tree_node<std::pair<long 
> const, impala::THdfsPartition> > const*, std::_Rb_tree_node<std::pair<long 
> const, impala::THdfsPartition> >*) ()
> #16 0x0000000000a2f59b in impala::THdfsTable::THdfsTable(impala::THdfsTable 
> const&) ()
> #17 0x0000000000af53c0 in 
> impala::TTableDescriptor::TTableDescriptor(impala::TTableDescriptor const&) ()
> #18 0x0000000000c7ccd0 in std::vector<impala::TTableDescriptor, 
> std::allocator<impala::TTableDescriptor> 
> >::operator=(std::vector<impala::TTableDescriptor, 
> std::allocator<impala::TTableDescriptor> > const&) ()
> #19 0x000000000126eb71 in 
> impala::Coordinator::BackendState::Exec(impala::TQueryCtx const&, 
> impala::DebugOptions const&, boost::unordered::unordered_map<int, 
> impala::Coordinator::FilterState, boost::hash<int>, std::equal_to<int>, 
> std::allocator<std::pair<int const, impala::Coordinator::FilterState> > > 
> const&, impala::CountingBarrier*) (){noformat}
> and
> {noformat}
> #0  0x000000000211519a in base::internal::SpinLockDelay(int volatile*, int, 
> int) ()
> #1  0x0000000002160b69 in SpinLock::SlowLock() ()
> #2  0x00000000021a3fd8 in tcmalloc::CentralFreeList::ReleaseToSpans(void*) ()
> #3  0x00000000021a401b in 
> tcmalloc::CentralFreeList::ReleaseListToSpans(void*) ()
> #4  0x00000000021a428c in tcmalloc::CentralFreeList::InsertRange(void*, 
> void*, int) ()
> #5  0x00000000021b1ff4 in 
> tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*,
>  unsigned long, int) ()
> #6  0x00000000021b20bc in 
> tcmalloc::ThreadCache::ListTooLong(tcmalloc::ThreadCache::FreeList*, unsigned 
> long) ()
> #7  0x0000000002256c10 in tc_free ()
> #8  0x00000000009520b1 in std::vector<impala::TExpr, 
> std::allocator<impala::TExpr> >::~vector() ()
> #9  0x0000000000953823 in impala::THdfsPartition::~THdfsPartition() ()
> #10 0x0000000000954709 in std::_Rb_tree<long, std::pair<long const, 
> impala::THdfsPartition>, std::_Select1st<std::pair<long const, 
> impala::THdfsPartition> >, std::less<long>, std::allocator<std::pair<long 
> const, impala::THdfsPartition> > 
> >::_M_erase(std::_Rb_tree_node<std::pair<long const, impala::THdfsPartition> 
> >*) ()
> #11 0x00000000009546fc in std::_Rb_tree<long, std::pair<long const, 
> impala::THdfsPartition>, std::_Select1st<std::pair<long const, 
> impala::THdfsPartition> >, std::less<long>, std::allocator<std::pair<long 
> const, impala::THdfsPartition> > 
> >::_M_erase(std::_Rb_tree_node<std::pair<long const, impala::THdfsPartition> 
> >*) ()
> #12 0x00000000009546fc in std::_Rb_tree<long, std::pair<long const, 
> impala::THdfsPartition>, std::_Select1st<std::pair<long const, 
> impala::THdfsPartition> >, std::less<long>, std::allocator<std::pair<long 
> const, impala::THdfsPartition> > 
> >::_M_erase(std::_Rb_tree_node<std::pair<long const, impala::THdfsPartition> 
> >*) ()
> #13 0x00000000009546fc in std::_Rb_tree<long, std::pair<long const, 
> impala::THdfsPartition>, std::_Select1st<std::pair<long const, 
> impala::THdfsPartition> >, std::less<long>, std::allocator<std::pair<long 
> const, impala::THdfsPartition> > 
> >::_M_erase(std::_Rb_tree_node<std::pair<long const, impala::THdfsPartition> 
> >*) ()
> #14 0x00000000009546fc in std::_Rb_tree<long, std::pair<long const, 
> impala::THdfsPartition>, std::_Select1st<std::pair<long const, 
> impala::THdfsPartition> >, std::less<long>, std::allocator<std::pair<long 
> const, impala::THdfsPartition> > 
> >::_M_erase(std::_Rb_tree_node<std::pair<long const, impala::THdfsPartition> 
> >*) ()
> #15 0x00000000009546fc in std::_Rb_tree<long, std::pair<long const, 
> impala::THdfsPartition>, std::_Select1st<std::pair<long const, 
> impala::THdfsPartition> >, std::less<long>, std::allocator<std::pair<long 
> const, impala::THdfsPartition> > 
> >::_M_erase(std::_Rb_tree_node<std::pair<long const, impala::THdfsPartition> 
> >*) ()
> #16 0x00000000009546fc in std::_Rb_tree<long, std::pair<long const, 
> impala::THdfsPartition>, std::_Select1st<std::pair<long const, 
> impala::THdfsPartition> >, std::less<long>, std::allocator<std::pair<long 
> const, impala::THdfsPartition> > 
> >::_M_erase(std::_Rb_tree_node<std::pair<long const, impala::THdfsPartition> 
> >*) ()
> #17 0x00000000009546fc in std::_Rb_tree<long, std::pair<long const, 
> impala::THdfsPartition>, std::_Select1st<std::pair<long const, 
> impala::THdfsPartition> >, std::less<long>, std::allocator<std::pair<long 
> const, impala::THdfsPartition> > 
> >::_M_erase(std::_Rb_tree_node<std::pair<long const, impala::THdfsPartition> 
> >*) ()
> #18 0x00000000009546fc in std::_Rb_tree<long, std::pair<long const, 
> impala::THdfsPartition>, std::_Select1st<std::pair<long const, 
> impala::THdfsPartition> >, std::less<long>, std::allocator<std::pair<long 
> const, impala::THdfsPartition> > 
> >::_M_erase(std::_Rb_tree_node<std::pair<long const, impala::THdfsPartition> 
> >*) ()
> #19 0x00000000009546fc in std::_Rb_tree<long, std::pair<long const, 
> impala::THdfsPartition>, std::_Select1st<std::pair<long const, 
> impala::THdfsPartition> >, std::less<long>, std::allocator<std::pair<long 
> const, impala::THdfsPartition> > 
> >::_M_erase(std::_Rb_tree_node<std::pair<long const, impala::THdfsPartition> 
> >*) ()
> #20 0x00000000009546fc in std::_Rb_tree<long, std::pair<long const, 
> impala::THdfsPartition>, std::_Select1st<std::pair<long const, 
> impala::THdfsPartition> >, std::less<long>, std::allocator<std::pair<long 
> const, impala::THdfsPartition> > 
> >::_M_erase(std::_Rb_tree_node<std::pair<long const, impala::THdfsPartition> 
> >*) ()
> #21 0x00000000009546fc in std::_Rb_tree<long, std::pair<long const, 
> impala::THdfsPartition>, std::_Select1st<std::pair<long const, 
> impala::THdfsPartition> >, std::less<long>, std::allocator<std::pair<long 
> const, impala::THdfsPartition> > 
> >::_M_erase(std::_Rb_tree_node<std::pair<long const, impala::THdfsPartition> 
> >*) ()
> #22 0x00000000009547ff in impala::THdfsTable::~THdfsTable() ()
> #23 0x0000000000af4544 in impala::TTableDescriptor::~TTableDescriptor() ()
> #24 0x0000000000bc575f in impala::TDescriptorTable::~TDescriptorTable() ()
> #25 0x0000000000bc59b3 in impala::TQueryCtx::~TQueryCtx() ()
> #26 0x0000000000bc6158 in 
> impala::TExecQueryFInstancesParams::~TExecQueryFInstancesParams() ()
> #27 0x000000000126f68f in 
> impala::Coordinator::BackendState::Exec(impala::TQueryCtx const&, 
> impala::DebugOptions const&, boost::unordered::unordered_map<int, 
> impala::Coordinator::FilterState, boost::hash<int>, std::equal_to<int>, 
> std::allocator<std::pair<int const, impala::Coordinator::FilterState> > > 
> const&, impala::CountingBarrier*) (){noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to