keith-turner commented on PR #5375:
URL: https://github.com/apache/accumulo/pull/5375#issuecomment-2695650687

   > Right, but the send call in the loadFiles case won't return until 
TabletClientHandler.loadFiles is completed on the server side.
   
   That does not seem to be the behavior I am seeing based on logging from 
running the new test against 551dde0f835105e71708c790d35993b2cf1f01d1.  Below 
are some of the following logs where by the time the manager has sent 999 one 
way messages not a single tablet has completed bulk load processing in a 
tserver.  
   
   ```
   $ grep sent Manager_1222094219.out | head
   2025-03-03T21:17:41,722 99 [bulkVer2.LoadFiles] DEBUG: 
FATE[31d804b6af250c68] sent 999 messages to 2 tablet servers in 80 ms
   2025-03-03T21:17:42,309 97 [bulkVer2.LoadFiles] DEBUG: 
FATE[31d804b6af250c68] sent 989 messages to 2 tablet servers in 21 ms
   2025-03-03T21:17:42,826 98 [bulkVer2.LoadFiles] DEBUG: 
FATE[31d804b6af250c68] sent 974 messages to 2 tablet servers in 14 ms
   2025-03-03T21:17:43,341 96 [bulkVer2.LoadFiles] DEBUG: 
FATE[31d804b6af250c68] sent 954 messages to 2 tablet servers in 8 ms
   2025-03-03T21:17:43,801 97 [bulkVer2.LoadFiles] DEBUG: 
FATE[31d804b6af250c68] sent 930 messages to 2 tablet servers in 6 ms
   2025-03-03T21:17:44,272 98 [bulkVer2.LoadFiles] DEBUG: 
FATE[31d804b6af250c68] sent 895 messages to 2 tablet servers in 5 ms
   2025-03-03T21:17:44,710 98 [bulkVer2.LoadFiles] DEBUG: 
FATE[31d804b6af250c68] sent 860 messages to 2 tablet servers in 4 ms
   2025-03-03T21:17:45,112 99 [bulkVer2.LoadFiles] DEBUG: 
FATE[31d804b6af250c68] sent 820 messages to 2 tablet servers in 4 ms
   2025-03-03T21:17:45,502 99 [bulkVer2.LoadFiles] DEBUG: 
FATE[31d804b6af250c68] sent 775 messages to 2 tablet servers in 3 ms
   2025-03-03T21:17:45,853 98 [bulkVer2.LoadFiles] DEBUG: 
FATE[31d804b6af250c68] sent 715 messages to 2 tablet servers in 2 ms
   $ grep -e Starting -e Finished TabletServer_1604342290.out | head
   2025-03-03T21:17:41,732 94 [tserver.TabletClientHandler] DEBUG: Starting 
bulk import  for 2;0138;0137 
   2025-03-03T21:17:41,732 88 [tserver.TabletClientHandler] DEBUG: Starting 
bulk import  for 2;0584;0583 
   2025-03-03T21:17:41,740 57 [tserver.TabletClientHandler] DEBUG: Starting 
bulk import  for 2;0660;0659 
   2025-03-03T21:17:41,741 93 [tserver.TabletClientHandler] DEBUG: Starting 
bulk import  for 2;0177;0176 
   2025-03-03T21:17:41,744 64 [tserver.TabletClientHandler] DEBUG: Starting 
bulk import  for 2;0024;0023 
   2025-03-03T21:17:41,745 96 [tserver.TabletClientHandler] DEBUG: Starting 
bulk import  for 2;0202;0201 
   2025-03-03T21:17:41,746 62 [tserver.TabletClientHandler] DEBUG: Starting 
bulk import  for 2;0253;0252 
   2025-03-03T21:17:41,747 58 [tserver.TabletClientHandler] DEBUG: Starting 
bulk import  for 2;0292;0291 
   2025-03-03T21:17:41,836 88 [tserver.TabletClientHandler] DEBUG: Finished 
bulk import  for 2;0584;0583 
   2025-03-03T21:17:41,836 94 [tserver.TabletClientHandler] DEBUG: Finished 
bulk import  for 2;0138;0137 
   $ grep -e Starting -e Finished TabletServer_1604342290.out | head
   2025-03-03T21:17:41,732 94 [tserver.TabletClientHandler] DEBUG: Starting 
bulk import  for 2;0138;0137 
   2025-03-03T21:17:41,732 88 [tserver.TabletClientHandler] DEBUG: Starting 
bulk import  for 2;0584;0583 
   2025-03-03T21:17:41,740 57 [tserver.TabletClientHandler] DEBUG: Starting 
bulk import  for 2;0660;0659 
   2025-03-03T21:17:41,741 93 [tserver.TabletClientHandler] DEBUG: Starting 
bulk import  for 2;0177;0176 
   2025-03-03T21:17:41,744 64 [tserver.TabletClientHandler] DEBUG: Starting 
bulk import  for 2;0024;0023 
   2025-03-03T21:17:41,745 96 [tserver.TabletClientHandler] DEBUG: Starting 
bulk import  for 2;0202;0201 
   2025-03-03T21:17:41,746 62 [tserver.TabletClientHandler] DEBUG: Starting 
bulk import  for 2;0253;0252 
   2025-03-03T21:17:41,747 58 [tserver.TabletClientHandler] DEBUG: Starting 
bulk import  for 2;0292;0291 
   2025-03-03T21:17:41,836 88 [tserver.TabletClientHandler] DEBUG: Finished 
bulk import  for 2;0584;0583 
   2025-03-03T21:17:41,836 94 [tserver.TabletClientHandler] DEBUG: Finished 
bulk import  for 2;0138;0137 
   ```
   
   Notice how the manager code keeps queueing work up for the tablet servers in 
the messages above by continually sending these one way messages.  Eventually a 
bunch of these run after the bulk import is done.
   
   ```
   $ grep "no longer active" TabletServer_1* | head -3
   TabletServer_1097069005.out:2025-03-03T21:17:52,169 86 
[zookeeper.TransactionWatcher] DEBUG: Transaction 3591625885496970344 of type 
bulkTx is no longer active.
   TabletServer_1097069005.out:2025-03-03T21:17:52,169 58 
[zookeeper.TransactionWatcher] DEBUG: Transaction 3591625885496970344 of type 
bulkTx is no longer active.
   TabletServer_1097069005.out:2025-03-03T21:17:52,170 75 
[zookeeper.TransactionWatcher] DEBUG: Transaction 3591625885496970344 of type 
bulkTx is no longer active.
   $ grep "no longer active" TabletServer_1* | wc
     13988  181844 2252068
   ```
   
   >  I'm curious if the following could be done in parallel.
   
   I considered that when I started looking into this but did not want to 
create yet another thread pool that needs to be configured and monitored.  
Figured could use the existing RPC thread pool.  That may be a way to solve 
this, would probably be best to have a thread pool per tserver for this as 
opposed to per request.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to