[jira] [Commented] (IMPALA-12233) Partitioned hash join with a limit can hang when using mt_dop>0

Jira Mon, 10 Jul 2023 05:31:04 -0700


    [ 
https://issues.apache.org/jira/browse/IMPALA-12233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17741594#comment-17741594
 ]


Gergely Fürnstáhl commented on IMPALA-12233:
--------------------------------------------

[~joemcdonnell] could you check this out?

[https://gerrit.cloudera.org/#/c/20179/]

I started with modifying the expected number of threads, but it's a bit more 
complicated. If all the "active" threads are waiting, on the "last" unregister 
call, we would need to wake up one thread and modify the control flow to behave 
as the current last Wait() call (execute the function, then wake up the rest to 
return). Cancelling with OK seemed cleaner, it says "there is nothing wrong, 
move on with the threads, this barrier has no more use".

> Partitioned hash join with a limit can hang when using mt_dop>0
> ---------------------------------------------------------------
>
>                 Key: IMPALA-12233
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12233
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 4.3.0
>            Reporter: Joe McDonnell
>            Assignee: Gergely Fürnstáhl
>            Priority: Blocker
>
> After encountering a hung query on an Impala cluster, we were able to 
> reproduce it in the Impala developer environment with these steps:
> {noformat}
> use tpcds;
> set mt_dop=2;
> select ss_cdemo_sk from store_sales where ss_sold_date_sk = (select 
> max(ss_sold_date_sk) from store_sales) group by ss_cdemo_sk limit 1;{noformat}
> The problem reproduces with limit values up to 183, then at limit 184 and 
> higher it doesn't reproduce.
> Taking stack traces show a thread waiting for a cyclic barrier:
> {noformat}
>  0  libpthread.so.0!__pthread_cond_wait + 0x216
>  1  
> impalad!impala::CyclicBarrier::Wait<impala::PhjBuilder::DoneProbingHashPartitions(const
>  int64_t*, impala::BufferPool::ClientHandle*, impala::RuntimeProfile*, 
> std::deque<std::unique_ptr<impala::PhjBuilderPartition> >*, 
> impala::RowBatch*)::<lambda()> > [condition-variable.h : 49 + 0xc]
>  2  impalad!impala::PhjBuilder::DoneProbingHashPartitions(long const*, 
> impala::BufferPool::ClientHandle*, impala::RuntimeProfile*, 
> std::deque<std::unique_ptr<impala::PhjBuilderPartition, 
> std::default_delete<impala::PhjBuilderPartition> >, 
> std::allocator<std::unique_ptr<impala::PhjBuilderPartition, 
> std::default_delete<impala::PhjBuilderPartition> > > >*, impala::RowBatch*) 
> [partitioned-hash-join-builder.cc : 766 + 0x25]
>  3  
> impalad!impala::PartitionedHashJoinNode::DoneProbing(impala::RuntimeState*, 
> impala::RowBatch*) [partitioned-hash-join-node.cc : 1189 + 0x28]
>  4  impalad!impala::PartitionedHashJoinNode::GetNext(impala::RuntimeState*, 
> impala::RowBatch*, bool*) [partitioned-hash-join-node.cc : 599 + 0x15]
>  5  
> impalad!impala::StreamingAggregationNode::GetRowsStreaming(impala::RuntimeState*,
>  impala::RowBatch*) [streaming-aggregation-node.cc : 115 + 0x14]
>  6  impalad!impala::StreamingAggregationNode::GetNext(impala::RuntimeState*, 
> impala::RowBatch*, bool*) [streaming-aggregation-node.cc : 77 + 0x15]
>  7  impalad!impala::FragmentInstanceState::ExecInternal() 
> [fragment-instance-state.cc : 446 + 0x15]
>  8  impalad!impala::FragmentInstanceState::Exec() [fragment-instance-state.cc 
> : 104 + 0xf]
>  9  impalad!impala::QueryState::ExecFInstance(impala::FragmentInstanceState*) 
> [query-state.cc : 956 + 0xf]{noformat}
> Adding some debug logging around locations that go through that cyclic 
> barrier, we see one Impalad where it is expecting two threads and only one 
> arrives:
> {noformat}
> I0621 18:28:19.926551 210363 partitioned-hash-join-builder.cc:766] 
> 2a4787b28425372d:ac6bd96200000004] DoneProbingHashPartitions: 
> num_probe_threads_=2
> I0621 18:28:19.927855 210362 streaming-aggregation-node.cc:136] 
> 2a4787b28425372d:ac6bd96200000003] the number of rows (93) returned from the 
> streaming aggregation node has exceeded the limit of 1
> I0621 18:28:19.928887 210362 query-state.cc:958] 
> 2a4787b28425372d:ac6bd96200000003] Instance completed. 
> instance_id=2a4787b28425372d:ac6bd96200000003 #in-flight=4 status=OK{noformat}
> Other instances that don't have a stuck thread see both threads arrive:
> {noformat}
> I0621 18:28:19.926223 210358 partitioned-hash-join-builder.cc:766] 
> 2a4787b28425372d:ac6bd96200000005] DoneProbingHashPartitions: 
> num_probe_threads_=2
> I0621 18:28:19.926326 210359 partitioned-hash-join-builder.cc:766] 
> 2a4787b28425372d:ac6bd96200000006] DoneProbingHashPartitions: 
> num_probe_threads_=2{noformat}
> So, there must be a codepath that skips going through the cyclic barrier.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-12233) Partitioned hash join with a limit can hang when using mt_dop>0

Reply via email to