Apache Pinot Daily Email Digest (2021-02-09)

Pinot Slack Email Digest Tue, 09 Feb 2021 18:00:54 -0800

#general

@andrew.hattemer: @andrew.hattemer has joined the channel
@asanka.perera: @asanka.perera has joined the channel
@kullappankspc004: @kullappankspc004 has joined the channel
@gulshan.yadav: Hi. Is there any plan to have a Grafana plugin for Pinot.
@g.kishore: . There was some effort here.. not sure what’s the status
@devashish: @devashish has joined the channel
@zsolt: @zsolt has joined the channel
@rogergrangeiamartins: @rogergrangeiamartins has joined the channel
@ltlamontagne: @ltlamontagne has joined the channel
@mike.davis: Does the S3PinotFS support server-side encryption via KMS or does it require implementing a custom PinotFS?
@fx19880617: KMS integration is not there yet, but should be easy to add to it if SDK is there
@fx19880617: Current S3 implementation just uses
@fx19880617: more ref:
@mike.davis: thanks. i think the SDK supports it we'd just need to pass through the KMS key to the appropriate requests.
@fx19880617: Is is something described here?
@fx19880617: this seems to be client side
@fx19880617: ```The following examples use the AmazonS3EncryptionClientV2Builder class to create an Amazon S3 client with client-side encryption enabled. Once configured, any objects you upload to Amazon S3 using this client will be encrypted. Any objects you get from Amazon S3 using this client are automatically decrypted.```
@mike.davis:
@mike.davis: there's a java example
@mike.davis: ```PutObjectRequest putRequest = new PutObjectRequest(bucketName, keyName, file).withSSEAwsKeyManagementParams(new SSEAwsKeyManagementParams());```
@fx19880617: ic, so it’s different api
@mike.davis: ```PutObjectRequest putRequest = new PutObjectRequest(bucketName, keyName, file).withSSEAwsKeyManagementParams(new SSEAwsKeyManagementParams(keyID));```
@mike.davis: uses the same PutObjectRequest
@fx19880617: right, we need to add an if-check here to use this `.withSSEAwsKeyManagementParams(new SSEAwsKeyManagementParams(keyID));`
@fx19880617: if given kms key
@mike.davis: :thumbsup:
@fx19880617: we will add that support soon :slightly_smiling_face:
@mike.davis: Would you like me to file a GH issue?
@fx19880617: yes please! so we can also link the PR to it
@fx19880617: Thanks!
@mike.davis:
@fx19880617: Thanks!
@fx19880617:
@mike.davis: :raised_hands:
@riddle4045: @riddle4045 has joined the channel

#random

@andrew.hattemer: @andrew.hattemer has joined the channel
@asanka.perera: @asanka.perera has joined the channel
@kullappankspc004: @kullappankspc004 has joined the channel
@devashish: @devashish has joined the channel
@zsolt: @zsolt has joined the channel
@rogergrangeiamartins: @rogergrangeiamartins has joined the channel
@ltlamontagne: @ltlamontagne has joined the channel
@riddle4045: @riddle4045 has joined the channel

#troubleshooting

@andrew.hattemer: @andrew.hattemer has joined the channel
@asanka.perera: @asanka.perera has joined the channel
@kullappankspc004: @kullappankspc004 has joined the channel
@varun.srivastava: Created UPSERT table with composit primary key having 8 columns. Many times - few of the column value can be null. But Together all 8 columns ensure the uniqueness. Do you see concern with this ? Will upsert work properly ?
@fx19880617: @jackie.jxt
@jackie.jxt: @varun.srivastava `null` value will be replaced to the default value, so it should work
@jackie.jxt: Do you want to use UPSERT for deduplication purpose? FYI it might not be a good idea to use UPSERT with mostly unique primary keys for a large table
@devashish: @devashish has joined the channel
@devashish: Hello Team, I was trying to work with pinot realtime table on Kafka and S3 as deep storage. My pinot-server has this error in the logs : java.lang.ClassNotFoundException: org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory. This setup is done kubernetes via helm. Earlier I was getting a different error around S3PinotFS, to fix that I updated the server config to use pinot-s3 plugin. After that it seems like Kafka plugin is not available to the server. How do I solve this?
@fx19880617: did you set `plugins.dir` and `plugins.include` when you start pinot-server? if so, you need to set `plugins.include` value to `pinot-s3,pinot-kafka-2.0`
@devashish: Thanks, it worked. I didnt have to add the kafka plugin before when I was not using s3 as the deepstorage.
@fx19880617: right, by default pinot loads all the plugins if not specified.
@fx19880617: also if you are using parquet file, you also need to include `pinot-parquet`
@zsolt: @zsolt has joined the channel
@rogergrangeiamartins: @rogergrangeiamartins has joined the channel
@ltlamontagne: @ltlamontagne has joined the channel
@pradeepgv42: Hi, I am having trouble getting the reload segments to work (I am running latest master code) Seeing below logs on pinot-server. Looks like `SegmentMessageHandlerFactory` is not getting registered for some reason. When I restart the server I don’t see Logs from this function beyond this point. () ```Subscribing changes listener to path: /PinotCluster/INSTANCES/Server_10.0.101.11_8069/MESSAGES, type: CALLBACK, listener: org.apache.helix.messaging.handling.HelixTaskExecutor@4b9419ff Subscribing child change listener to path:/PinotCluster/INSTANCES/Server_10.0.101.11_8069/MESSAGES Subscribing to path:/PinotCluster/INSTANCES/Server_10.0.101.11_8069/MESSAGES took:0 21 START:INVOKE /PinotCluster/INSTANCES/Server_10.0.101.11_8069/MESSAGES listener:org.apache.helix.messaging.handling.HelixTaskExecutor@4b9419ff type: CALLBACK Resubscribe change listener to path: /PinotCluster/INSTANCES/Server_10.0.101.11_8069/MESSAGES, for listener: org.apache.helix.messaging.handling.HelixTaskExecutor@4b9419ff, watchChild: false Subscribing changes listener to path: /PinotCluster/INSTANCES/Server_10.0.101.11_8069/MESSAGES, type: CALLBACK, listener: org.apache.helix.messaging.handling.HelixTaskExecutor@4b9419ff Subscribing child change listener to path:/PinotCluster/INSTANCES/Server_10.0.101.11_8069/MESSAGES Subscribing to path:/PinotCluster/INSTANCES/Server_10.0.101.11_8069/MESSAGES took:1 The latency of message dbb39d17-cece-4f3c-bd88-29fb61a93863 is 922470 ms Fail to find message handler factory for type: USER_DEFINE_MSG msgId: dbb39d17-cece-4f3c-bd88-29fb61a93863 The latency of message 4c8dffc3-68cf-417e-8cd2-a572ce4cdcb4 is 42 ms Fail to find message handler factory for type: USER_DEFINE_MSG msgId: 4c8dffc3-68cf-417e-8cd2-a572ce4cdcb4 21 END:INVOKE /PinotCluster/INSTANCES/Server_10.0.101.11_8069/MESSAGES listener:org.apache.helix.messaging.handling.HelixTaskExecutor@4b9419ff type: CALLBACK Took: 7ms ```
@fx19880617: do you have more logs show any error logs
@fx19880617: are you running on docker or baremetal
@fx19880617: for reloading segments, do you mean you did call the controller reload endpoint or just restart server
@pradeepgv42: I didn't find any more error logs, next set of logs are segment state transition logs. I was trying controller endpoint then I tried restarting to see debug it
@pradeepgv42: It's running on docker
@fx19880617: after you restart the pinot server, is ip address same?
@fx19880617: can you check the idealstates for the table and see if segment assigned instances with same ip
@pradeepgv42: Oh ok let me check that and get back
@fx19880617: you can set this config in to pinot server config to use hostname not ip to register pinot: ```pinot.set.instance.id.to.hostname=true```
@pradeepgv42: thanks, let me try t hat
@pradeepgv42: So I currently have only one pinot server and all the segments are always attached to that. But I see how this is better ```pinot.set.instance.id.to.hostname=true```
@fx19880617: got it, there should be a pinotServer.log file in your docker container
@fx19880617: if possible you can enter into the container and check that file
@pradeepgv42: hmm, it’s empty We created our own docker image but it uses pinot-admin.sh
@fx19880617: can you check if this file is there : `/opt/pinot/conf/pinot-server-log4j2.xml`
@pradeepgv42: ah no, let me fix that and try
@fx19880617: if so you can append `-Dlog4j2.configurationFile=/opt/pinot/conf/pinot-server-log4j2.xml` to the existing `JAVA_OPTS` (I think you must already configured it ) when you start up pinot server
@pradeepgv42: ```-Dlog4j2.configurationFile=conf/pinot-admin-log4j2.xml``` yup it’s setup
@pradeepgv42: i see it as an option with which the process is running too
@fx19880617: ic
@fx19880617: so `conf/pinot-admin-log4j2.xml` should tell where the log files are located
@pradeepgv42: yeah for some reason i don’t see it appending to the pinotServer.log but i see console outputs as part of `sudo docker logs <container_id>`
@fx19880617: ic, that comes from pinot-admin-log4j2.xml
@fx19880617: you can set console log to INFO
@fx19880617: so it will print more info
@pradeepgv42: it’s already set to INFO, i do see the info logs
@pradeepgv42: are you looking for any specific log entry?
@fx19880617: not really
@fx19880617: cause you say segments are not reloading
@fx19880617: so the first thing is to see if there is any error that prevent the reloading
@fx19880617: if restart doesn’t work, then we need to check if idealstates matches the instance id
@pradeepgv42: so segments are in healthy state, I am trying to reload some indicies
@pradeepgv42: and trying to do that using reload. Assignment is correct, all segments are healthy and I am able to query them
@fx19880617: ic
@pradeepgv42: Initialized with retryCount: 3, retryWaitMs: 100, retryDelayScaleFactor: 5 NoOpPinotCrypter not configured, adding as default Register state model factory for state model SegmentOnlineOfflineStateModel using factory name DEFAULT with org.apache.pinot.server.starter.helix.SegmentOnlineOfflineStateModelFactory@6 bb2e932 Adding preconnect callback: org.apache.pinot.server.starter.helix.HelixServerStarter$$Lambda$142/653079016@1c09a18 Connecting Helix manager ClusterManager.connect() Starting ZkClient event thread. Client environment:zookeeper.version=3.4.11-37e277162d567b55a07d1755f0b31c32e93c01a0, built on 11/01/2017 18:06 GMT Client environment:host.name=<NA> Client environment:java.version=1.8.0_275 Client environment:java.vendor=Private Build Client environment:java.home=/usr/lib/jvm/java-8-openjdk-amd64/jre Client environment:java.class.path=/opt/pinot/apache-pinot-incubating-0.7.0-SNAPSHOT-bin/lib/pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:jmx_prometheus_javaagent-0.12.0.jar Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/lib Client environment:java.io.tmpdir=/tmp Client environment:java.compiler=<NA> Client environment:os.name=Linux Client environment:os.arch=amd64 Client environment:os.version=5.4.0-1030-aws Client environment:user.name=root Client environment:user.home=/root Client environment:user.dir=/opt/pinot/apache-pinot-incubating-0.7.0-SNAPSHOT-bin
@fx19880617: then there should be log entry with that segment id
@pradeepgv42: let me try that on a single segment
@fx19880617: if there is any indices, it should also tell it’s creating indices for that segment
@fx19880617: yes
@fx19880617: you can do /reload with force option
@pradeepgv42: nope I see this on the server side when I trigger a reload on a segment. How do I a force reload, this is triggered from controller endpoint `/segments/{tablename}/{segment}/reload` ``` Subscribing changes listener to path: /PinotCluster/INSTANCES/Server_10.0.101.11_8069/MESSAGES, type: CALLBACK, listener: org.apache.helix.messaging.handling.HelixTaskExecutor@4b9419ff Subscribing child change listener to path:/PinotCluster/INSTANCES/Server_10.0.101.11_8069/MESSAGES Subscribing to path:/PinotCluster/INSTANCES/Server_10.0.101.11_8069/MESSAGES took:7 21 START:INVOKE /PinotCluster/INSTANCES/Server_10.0.101.11_8069/MESSAGES listener:org.apache.helix.messaging.handling.HelixTaskExecutor@4b9419ff type: CALLBACK Resubscribe change listener to path: /PinotCluster/INSTANCES/Server_10.0.101.11_8069/MESSAGES, for listener: org.apache.helix.messaging.handling.HelixTaskExecutor@4b9419ff, watchChild: false Subscribing changes listener to path: /PinotCluster/INSTANCES/Server_10.0.101.11_8069/MESSAGES, type: CALLBACK, listener: org.apache.helix.messaging.handling.HelixTaskExecutor@4b9419ff Subscribing child change listener to path:/PinotCluster/INSTANCES/Server_10.0.101.11_8069/MESSAGES Consumed 0 events from (rate:0.0/s), currentOffset=4037, numRowsConsumedSoFar=0, numRowsIndexedSoFar=0 Subscribing to path:/PinotCluster/INSTANCES/Server_10.0.101.11_8069/MESSAGES took:3 The latency of message 2334f0d8-e251-4f75-85f0-c4db72623b31 is 136141 ms Fail to find message handler factory for type: USER_DEFINE_MSG msgId: 2334f0d8-e251-4f75-85f0-c4db72623b31 The latency of message f41b6769-5f43-4982-bd40-5c8e43dbd9ac is 60 ms Fail to find message handler factory for type: USER_DEFINE_MSG msgId: f41b6769-5f43-4982-bd40-5c8e43dbd9ac 21 END:INVOKE /PinotCluster/INSTANCES/Server_10.0.101.11_8069/MESSAGES listener:org.apache.helix.messaging.handling.HelixTaskExecutor@4b9419ff type: CALLBACK Took: 9ms```
@pradeepgv42: only thing i can think of is that SegmentMessageHandlerFactory not getting registered for some reason which is weird since I don’t see any exception/errors
@fx19880617: I don’t think that’s the issue
@fx19880617: if it’s not registered then segment shouldn’t be loaded
@fx19880617: so there is no segment got reloaded in server side ?
@fx19880617: I think consuming segment cannot be reloaded
@fx19880617: you can reload an online segment that is already persisted
@pradeepgv42: yeah this is an online segment
@fx19880617: ok
@fx19880617: then there is no logs on server side for this segment?
@pradeepgv42: I see that segments offline to online is managed by SegmentOnlineOfflineStateModelFactory rite? where as SegmentMessageHandlerFactory seems to be handling reloading/refreshing messages
@pradeepgv42: no I don’t see any logs on the server side for that segment when I did a reload
@pradeepgv42: ok let me get back to debugging this a bit later, but let me know if you have any other ideas/suggestions. Thanks a lot for the help
@fx19880617: this SegmentMessageHandlerFactory is on pinot server side only
@fx19880617: ``` // Register message handler factory SegmentMessageHandlerFactory messageHandlerFactory = new SegmentMessageHandlerFactory(fetcherAndLoader, instanceDataManager, serverMetrics); _helixManager.getMessagingService() .registerMessageHandlerFactory(Message.MessageType.USER_DEFINE_MSG.toString(), messageHandlerFactory);```
@fx19880617: this is registered always
@pradeepgv42: yeah that makes sense which is why I am a bit confused on what’s going on, anyways let me get back to this when I get some time in the evening
@fx19880617: sure, also check if restarting server makes segment reloading happen with new indexing
@pradeepgv42: yeah restarting server is creating appropriates indicies
@fx19880617: ok
@pradeepgv42: Wondering if anyone know what’s happening?
@riddle4045: @riddle4045 has joined the channel
@elon.azoulay: We had an issue where confluent schema registry had downtime, and realtime ingestion failed with "Read from kafka failed" but did not recover until we restarted the servers. Is this a known issue? Or is there something else we could have done? The realtime table was on the default tenant and I issued a rebalance, that did not help (old data was there but no consuming segments).
@fx19880617: i feel we should have a cache for schema?
@fx19880617: for confluent schema client
@elon.azoulay: here's the stack trace we saw:
@elon.azoulay: ```2021/02/09 20:45:06.889 ERROR [ServerSegmentCompletionProtocolHandler] [oas_integration_operation_event__3__331__20210209T2044Z] Could not send request ller-1.pinot-us-central1-controller-headless.pinot.svc.cluster.local:9000/segmentStoppedConsuming?name=oas_integration_operation_event__3__331__20210209T2044Z&offset=9259054&instance= Server_pinot-us-central1-server-0.pinot-us-central1-server-headless.pinot.svc.cluster.local_8098&reason=org.apache.kafka.common.errors.SerializationException&streamPartitionMsgOffset= 9259054 java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) ~[?:?] at java.net.SocketInputStream.socketRead(SocketInputStream.java:115) ~[?:?] at java.net.SocketInputStream.read(SocketInputStream.java:168) ~[?:?] at java.net.SocketInputStream.read(SocketInputStream.java:140) ~[?:?] at shaded.org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-7115670e5c2b152812a09bd0 1```
@fx19880617: this seems to be pinot controller issue not schema registry?
@elon.azoulay: Oh, sorry, wrong log, let me see if I can still get it
@elon.azoulay: I lost the log message, but have you ever seen an error reading from kafka (ex. kafka is down) - and then realtime ingestion stops until the servers are restarted?
@fx19880617: ic
@fx19880617:
@fx19880617: do we configured this
@elon.azoulay: It's default to 1000
@fx19880617: ok
@elon.azoulay: I think this was more that the kafka consumer failed due to schema registry being down (for ~1 hour) and then ingestion stopped, did not see any errors in externalview or idealstate, also did not see consuming segments until I restarted the servers
@fx19880617: @npawar ^^
@fx19880617: I saw that this client is inside the avrodeserializer
@elon.azoulay: Yep
@fx19880617: so if there is no msg consumed from kafka then this logic shouldn’t be triggered
@npawar: i think the behavior is by design. The consumer was likely seeing exceptions when trying to consume from kafka with the schema registry down. ```while (!_shouldStop && !endCriteriaReached()) { // Consume for the next readTime ms, or we get to final offset, whichever happens earlier, // Update _currentOffset upon return from this method MessageBatch messageBatch; try { messageBatch = _partitionLevelConsumer .fetchMessages(_currentOffset, null, _partitionLevelStreamConfig.getFetchTimeoutMillis()); consecutiveErrorCount = 0; } catch (TimeoutException e) { handleTransientStreamErrors(e); continue; } catch (TransientConsumerException e) { handleTransientStreamErrors(e); continue; } catch (PermanentConsumerException e) { segmentLogger.warn("Permanent exception from stream when fetching messages, stopping consumption", e); throw e; } catch (Exception e) { // Unknown exception from stream. Treat as a transient exception. // One such exception seen so far is java.net.SocketTimeoutException handleTransientStreamErrors(e); continue; } ``` if permanent exception or more than 5 transient exceptions, we stop consuming by marking the consuming segment offline
@npawar: and an operator would need to either reset the partition, or restart the server
@elon.azoulay: Thanks:) This helps
@elon.azoulay: Is there a metric we can use to detect when there's a delay between kafka topic and pinot realtime table?
@npawar: that metric is much requested but much missing :see_no_evil: i recall it has something to do with Kafka not exposing the “latest” offset in an easy way.
@npawar: but i remember in LinkedIn we used to use ```ServerGauge.LLC_PARTITION_CONSUMING```
@npawar: this flag turns to 0, whenever the partition is not consuming
@npawar: we used to have alerts that fired if this metric stayed 0 for > 15 minutes on any partition
@elon.azoulay: Thanks! I'll check that one out.
@elon.azoulay:
@elon.azoulay: Apologies for all the trouble today: we noticed that some tables are in a "bad" (cluster manager ui) state. Looks like it's due to an attempt by servers to download non-existent segments from deepstore. Could it be that the segment was empty and not copied to deepstore?
@elon.azoulay: Should I just delete the segments to restore the idealState to good? Or could this be an issue w SegmentDeletionManager?
@elon.azoulay: Also getting messages like this: ```2021/02/10 01:51:21.689 WARN [SegmentStatusChecker] [pool-8-thread-3] Table XXX has 5 segments with no online replicas 2021/02/10 01:51:21.689 WARN [SegmentStatusChecker] [pool-8-thread-3] Table XXX has 0 replicas, below replication threshold :3 2021/02/10 01:51:21.796 WARN [SegmentStatusChecker] [pool-8-thread-3] Table XXX has 1 replicas, below replication threshold :3 2021/02/10 01:51:21.815 WARN [SegmentStatusChecker] [pool-8-thread-3] Table XXX has 2 replicas, below replication threshold :3 2021/02/10 01:51:21.877 WARN [SegmentStatusChecker] [pool-8-thread-3] Table XXX has 1 replicas, below replication threshold :3 (```