ankitsultana opened a new issue, #10185: URL: https://github.com/apache/pinot/issues/10185
We recently started using Partial Upsert tables for a use-case and started seeing this issue. We have a cluster with a few partial upsert tables with replication=1. If we restart a server in the cluster, all the tables (even offline/vanilla-realtime tables) go into Bad state. In the server logs we see logs like the following: ``` 2023/01/26 17:45:32.602 INFO [TableStateUtils] [HelixTaskExecutor-message_handle_thread_29] Find unloaded segment: my_great_table, table: my_great_table, expected: ONLINE, actual: OFFLINE ``` On taking a thread-dump I see as many threads as there are partial upsert tables in the cluster, all stuck in this loop (corresponding [PR](https://github.com/apache/pinot/pull/8923/files)): https://github.com/apache/pinot/blob/master/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/tablestate/TableStateUtils.java#L130 Sample thread-dump: ``` "HelixTaskExecutor-message_handle_thread_35" #116 daemon prio=5 os_prio=0 cpu=74100.46ms elapsed=3248.88s tid=0x00007eb8202b8000 nid=0xe9 waiting on condition [0x00007eb78b8f8000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep([email protected]/Native Method) at org.apache.pinot.segment.local.utils.tablestate.TableStateUtils.waitForAllSegmentsLoaded(TableStateUtils.java:133) at org.apache.pinot.core.data.manager.realtime.RealtimeTableDataManager.addSegment(RealtimeTableDataManager.java:416) - locked <0x00007ebbe43613b8> (a java.util.concurrent.atomic.AtomicBoolean) at org.apache.pinot.server.starter.helix.HelixInstanceDataManager.addRealtimeSegment(HelixInstanceDataManager.java:189) at org.apache.pinot.server.starter.helix.SegmentOnlineOfflineStateModelFactory$SegmentOnlineOfflineStateModel.onBecomeOnlineFromOffline(SegmentOnlineOfflineStateModelFactory.java:168) at org.apache.pinot.server.starter.helix.SegmentOnlineOfflineStateModelFactory$SegmentOnlineOfflineStateModel.onBecomeConsumingFromOffline(SegmentOnlineOfflineStateModelFactory.java:83) at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0([email protected]/Native Method) at jdk.internal.reflect.NativeMethodAccessorImpl.invoke([email protected]/NativeMethodAccessorImpl.java:62) at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke([email protected]/DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke([email protected]/Method.java:566) at org.apache.helix.messaging.handling.HelixStateTransitionHandler.invoke(HelixStateTransitionHandler.java:350) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
