tibrewalpratik17 opened a new pull request, #13036:
URL: https://github.com/apache/pinot/pull/13036

   We have intermittently seen issues in our clusters while creating 
streamMessageDecoder. Stack trace:
   
   ```
   java.lang.RuntimeException: Caught exception while creating 
StreamMessageDecoder from stream config: 
   StreamConfig{_type='kafka', _topicName='<redacted>', 
_consumerTypes=[LOWLEVEL], _consumerFactoryClassName='redacted>', 
   _offsetCriteria='OffsetCriteria{_offsetType=LARGEST, 
_offsetString='largest'}', _connectionTimeoutMillis=30000, 
_fetchTimeoutMillis=5000, _idleTimeoutMillis=180000, 
_flushThresholdRows=80000000, 
   _flushThresholdTimeMillis=86400000, _flushSegmentDesiredSizeBytes=209715200, 
_flushAutotuneInitialRows=100000, _decoderClass='redacted', 
_decoderProperties={}, _groupId='null', _topicConsumptionRateLimit=-1.0, 
   _tableNameWithType='redacted'}
   at 
org.apache.pinot.spi.stream.StreamDecoderProvider.create(StreamDecoderProvider.java:48)
   at 
org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.(LLRealtimeSegmentDataManager.java:1424)
   at 
org.apache.pinot.core.data.manager.realtime.RealtimeTableDataManager.addSegment(RealtimeTableDataManager.java:446)
   at 
org.apache.pinot.server.starter.helix.HelixInstanceDataManager.addRealtimeSegment(HelixInstanceDataManager.java:228)
   at 
org.apache.pinot.server.starter.helix.SegmentOnlineOfflineStateModelFactory$SegmentOnlineOfflineStateModel.onBecomeOnlineFromOffline(SegmentOnlineOfflineStateModelFactory.java:168)
   at 
org.apache.pinot.server.starter.helix.SegmentOnlineOfflineStateModelFactory$SegmentOnlineOfflineStateModel.onBecomeConsumingFromOffline(SegmentOnlineOfflineStateModelFactory.java:83)
   at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)
   at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.base/java.lang.reflect.Method.invoke(Method.java:566)
   at 
org.apache.helix.messaging.handling.HelixStateTransitionHandler.invoke(HelixStateTransitionHandler.java:350)
   at 
org.apache.helix.messaging.handling.HelixStateTransitionHandler.handleMessage(HelixStateTransitionHandler.java:278)
   at org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:97)
   at org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:49)
   at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
   at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
   at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
   at java.base/java.lang.Thread.run(Thread.java:829)
   ```
   
   This stops consumption in one of the replicas and once the other replica 
starts committing, this stopped replica always ends up in ERROR state. The only 
way to fix this is to reset this replica's segment.
   The behaviour of not consuming in one replica is also dangerous as if the 
other replica's hosts restarts / goes down due to any reason, it can cause data 
loss scenarios.
   
   Having a retry policy during StreamMessageDecoder.create() may help reduce 
the chances of such scenarios. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to