Apache Pinot Daily Email Digest (2021-08-30)

Pinot Slack Email Digest Mon, 30 Aug 2021 19:00:33 -0700

#general

@sandeep908: @sandeep908 has joined the channel

#random

@sandeep908: @sandeep908 has joined the channel

#troubleshooting

@nadeemsadim: Hi @mayanks @xiangfu0 @jackie.jxt @ssubrama @g.kishore @dlavoie @npawar @kulbir.nijjer.. we are getting following error and broker missing error is coming on pinot sql query editor Ui .. also broker pods are in crashloopbackoff state and are not ready .. please find attached logs of broker pods.. cc : @mohamedkashifuddin @shaileshjha061 @mohamed.sultan @arunkumarc2010
@nadeemsadim: Excerpt from log file : 2021/08/30 06:16:44.488 ERROR [StatusUpdateUtil] [HelixTaskExecutor-message_handle_thread] Exception while logging status update org.apache.helix.HelixException: HelixManager (ZkClient) is not connected. Call HelixManager#connect() at org.apache.helix.manager.zk.ZKHelixManager.checkConnected(ZKHelixManager.java:363) ~[pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-2302bd2c01655d803e96e825143f03c675ed32ff] at org.apache.helix.manager.zk.ZKHelixManager.getHelixDataAccessor(ZKHelixManager.java:593) ~[pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-2302bd2c01655d803e96e825143f03c675ed32ff] at org.apache.helix.util.StatusUpdateUtil.logMessageStatusUpdateRecord(StatusUpdateUtil.java:348) [pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-2302bd2c01655d803e96e825143f03c675ed32ff] at org.apache.helix.util.StatusUpdateUtil.logError(StatusUpdateUtil.java:400) [pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-2302bd2c01655d803e96e825143f03c675ed32ff] at org.apache.helix.messaging.handling.HelixStateTransitionHandler.handleMessage(HelixStateTransitionHandler.java:359) [pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-2302bd2c01655d803e96e825143f03c675ed32ff] at org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:97) [pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-2302bd2c01655d803e96e825143f03c675ed32ff] at org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:49) [pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-2302bd2c01655d803e96e825143f03c675ed32ff] at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?
@xiangfu0: Can you check if zookeeper is up?
@nadeemsadim: yes .. zookeeper is in running
@nadeemsadim:
@nadeemsadim: @mohamedkashifuddin can you update your findings here
@nadeemsadim: issue got resolved after helm reinstall but not sure if it will again replicate or not
@nadeemsadim: as per kashif .. 2hrs before when he did helm reinstall .. issue was still there .. again after doing helm reinstall 30 mins before.. it got resolved
@nadeemsadim: cc: @hussain @mohamed.sultan @shaileshjha061 @arunkumarc2010
@nadeemsadim: is it because of release updated by pull policy always may bring some instability on pinot deployment .. can we use a stable build or image or store in it google cloud repository and use and not use pull policy always until sure that the release is stable cc: @mayanks @xiangfu0 @g.kishore @jackie.jxt @ssubrama
@xiangfu0: Can you check zookeeper container disk size? If the disk is full?
@xiangfu0: Right now we don’t see the root cause of brokers went down
@nadeemsadim: sure.. will check again .. right now everything is working fine and zookeeper disk usage is around 20% for all 3 zookeeper pods ..
@nadeemsadim: still I am not sure what is taking up 20 Gb space in each zookeeper pod .. is it pinot indexing details or something else .. what is recommeneded disk size for zookeeper
@nadeemsadim: FYI before the auth feature was enabled .. we disabled it to resolve this issue once .. but again same issue replicated even when auth feature was disbled .. helm uninstall and reinstall seems to resolve broker pods crashing issue
@mayanks: Seems like the ZK snapshots might not be cleaned up?
@nadeemsadim: how to clean it
@xiangfu0: hmm, I think zk disk is fine
@xiangfu0: But Pinot cannot connect to zk
@xiangfu0: How do you enable the auth for zookeeper?
@nadeemsadim: auth here means auth while connecting to pinot controller UI .. means access to pinot controller UI with different levels of access .. I guess this is a latest feature in pinot controller UI ..
@nadeemsadim: right now from last 10 hours .. no issue in pinot and its working fine .. will definitely update if issue regression happens
@nadeemsadim: helm uninstall and reinstall seems to resolve broker pods crashing issue
@nadeemsadim: I think restarting broker pods alone would resolve this issue .. I think its the regression of same issue we faced one month before @xiangfu0
@xiangfu0: do you have the logs of broker when it’s crashing?
@nadeemsadim:
@nadeemsadim: @xiangfu0
@nadeemsadim: what is this sessionId does not match about cc: @mayanks @xiangfu0 @g.kishore @jackie.jxt @ssubrama
@xiangfu0: it comes from the process restart
@xiangfu0: the previous session is still in zookeeper, which will be cleaned after zookeeper session timeout
@nadeemsadim: how to resolve this issue ..
@jackie.jxt: This is normal during broker restart/reconnect, and you may ignore them
@nadeemsadim: i think restarting broker alone could have resolve the broker crashloopbackoff issue .. the mistake we were making was restarting everything including controller , server .. I guess we got the trick now ..
@nadeemsadim: or the other solution would be increasing the health check timeout .. not sure how to increase it
@nadeemsadim: we faced the same issue around a month ago .. I think same issue faced again 1 month later today but forgot the resolution .. now I can recollect
@xiangfu0: I think the issue is still why broker will fail, if you can provide the logs of failed broker, it can help us find the issue
@jeking.dev: I am completely new to setting up software like this. I'm trying to follow the > Running locally If I download the release and un-tar it, I don't need to do the maven install from git right? After I've done that and try to run the pinot-admin.sh or the quick-start-batch.sh using 'git bash here' and I get Could not find or load main class org.apache.pinot.tools.Quickstart Could not find or load main class org.apache.pinot.tools.admin.PinotAdministrator
@ken: 1. Correct, you don’t have to do a Maven install if you download & untar the distribution. 2. You could be able to `cd <installation dir>` and then run `bin/quick-start-bash.sh` to get a cluster running for batch.
@jeking.dev: I'm on windows: when I `./quick-start-batch`in powershell I can see gitbash opens briefly then closes and there is no message in the terminal. When I git-bash-here and try that terminal in the pinot installation directory it doesn't recognize the `bin/quick-start-bash.sh` and it says `Could not find or load main class org.apache.pinot.tools.Quickstart`
@jeking.dev: for some reason when I download the 0.1.0 release instead of the 0.8.0 release the commands will execute in bash like I'd expect
@ken: Sorry, no experience on Windows with powershell
@jeking.dev: Thanks. I think the issue is that all the class files are in the .jar file in /lib but for some reason its not recognizing that. back to googling. thanks
@sandeep908: @sandeep908 has joined the channel

#getting-started

@matt: set the channel topic: New to Pinot? Start here:
--------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]

Apache Pinot Daily Email Digest (2021-08-30)

#general

#random

#troubleshooting

#getting-started

Reply via email to