Apache Pinot Daily Email Digest (2022-02-24)

Pinot Slack Email Digest Thu, 24 Feb 2022 18:00:47 -0800

#general

@h20210119: @h20210119 has joined the channel
@alex: @alex has joined the channel
@ashish.athresh: @ashish.athresh has joined the channel
@abhishek.tanwade: @abhishek.tanwade has joined the channel
@sunhee.bigdata: @sunhee.bigdata has joined the channel

#random

#troubleshooting

@h20210119: @h20210119 has joined the channel
@alex: @alex has joined the channel
@vibhor.jaiswal: Hi All , We have been tring to do some Kafka Integration for topics secured as SASL_PLAINTEXT . While doing this , we have been getting the below exceptions . Just to double check I have craeted a Java client and got that working and consuming messages . However Pinot is not able to consume messages with pretty much same settings . Can someone suggest whats wrong here ? `2022/02/23 16:50:56.586 ERROR [PinotTableIdealStateBuilder] [grizzly-http-server-0] Could not get PartitionGroupMetadata for topic: gsp.dataacquisition.risk.public.v2.<Redacted> of table: <Redacted>_REALTIME` `org.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadata` `2022/02/23 16:50:56.591 ERROR [PinotTableRestletResource] [grizzly-http-server-0] org.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadata` `java.lang.RuntimeException: org.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadata` `at org.apache.pinot.controller.helix.core.PinotTableIdealStateBuilder.getPartitionGroupMetadataList(PinotTableIdealStateBuilder.java:172) ~[pinot-all-0.10.0-SNAPSHOT-jar-with-dependencies.jar:0.10.0-SNAPSHOT-428e7d75f91b9d4b4a2288f131d02d643bb2df5d]` `at org.apache.pinot.controller.helix.core.realtime.PinotLLCRealtimeSegmentManager.getNewPartitionGroupMetadataList(PinotLLCRealtimeSegmentManager.java:764)` Below is the table config for reference - ```{ "tableName": "<Redacted>", "tableType": "REALTIME", "segmentsConfig": { "schemaName": "<Redacted>", "timeColumnName": "PublishDateTimeUTC", "allowNullTimeValue": false, "replication": "1", "replicasPerPartition": "2", "completionConfig":{ "completionMode":"DOWNLOAD" } }, "tenants": { "broker": "DefaultTenant", "server": "DefaultTenant", "tagOverrideConfig": {} }, "tableIndexConfig": { "invertedIndexColumns": [], "noDictionaryColumns": ["some columns "], "rangeIndexColumns": [], "rangeIndexVersion": 1, "autoGeneratedInvertedIndex": false, "createInvertedIndexDuringSegmentGeneration": false, "sortedColumn": [], "bloomFilterColumns": [], "loadMode": "MMAP", "streamConfigs": { "streamType": "kafka", "stream.kafka.topic.name": "gsp.dataacquisition.risk.public.v2.<Redacted>", "stream.kafka.broker.list": "comma separated list of servers", "stream.kafka.consumer.type": "lowlevel", "stream.kafka.consumer.prop.auto.offset.reset": "largest", "stream.kafka.schema.registry.url": , "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory", "stream.kafka.sasl.mechanism": "SCRAM-SHA-256" , "stream.kafka.security.protocol": "SASL_PLAINTEXT" , "stream.kafka.sasl.jaas.config":"org.apache.kafka.common.security.scram.ScramLoginModule required username=\"some user\" password=\"somepwd\"", "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder", "realtime.segment.flush.threshold.rows": "0", "realtime.segment.flush.threshold.size":"0", "realtime.segment.flush.threshold.time": "24h", "realtime.segment.flush.autotune.initialRows": "3000000", "realtime.segment.flush.threshold.segment.size": "500M" }, "onHeapDictionaryColumns": [], "varLengthDictionaryColumns": [], "enableDefaultStarTree": false, "enableDynamicStarTreeCreation": false, "aggregateMetrics": false, "nullHandlingEnabled": false }, "metadata": {}, "quota": {}, "routing": {"instanceSelectorType": "strictReplicaGroup"}, "query": {}, "ingestionConfig": {}, "isDimTable": false, "upsertConfig": { "mode": "FULL", "comparisonColumn": "PublishDateTimeUTC" }, "primaryKeyColumns": [ "BusinessDate","UID","UIDType","LegId" ] }```
@mayanks: I am guessing it is unable to connect to Kafka cc: @slack1 @npawar
@vibhor.jaiswal: @mayanks @slack1 @npawar This issue was basically integrating with the SSL . We fixed it by removing the stream.kafka prefix from stream.kafka.sasl.mechanism ,stream.kafka.security.protocol and stream.kafka.sasl.jaas.config . Please feel free to public this to the documentation because we can not find the documentation anywhere about how to integrate SSL secured Kafka with Pinot . Will be great value add for end users.
@vibhor.jaiswal: Another this we did here was added a semicolon in the end of jaas.config like -"sasl.jaas.config":"org.apache.kafka.common.security.scram.ScramLoginModule required username=\"some user\" password=\"somepwd\";",
@mayanks: Could paste a sample config here @vibhor.jaiswal? @mark.needham we can then put it in our docs.
@slack1: Thank you, @vibhor.jaiswal. We’ll add this to our docs
@ashish.athresh: @ashish.athresh has joined the channel
@abhishek.tanwade: @abhishek.tanwade has joined the channel
@elon.azoulay: qq, do you recommend setting `controller.enable.batch.message.mode` to true? I see a github from pinot 0.2.0 that switched it to false by default due to high controller gc. Do you think it's safe to enable now? Pinot has evolved a lot since then. :slightly_smiling_face:
@elon.azoulay: context: we have a lot of tables with tiny segments, and we are going through and fixing them, but in the meantime we notice a huge amount of zk messages, mostly from broker resource and the tables w tiny segments.
@elon.azoulay: If anything we can try and let you know how it goes, unless you recommend not to even try it.
@elon.azoulay: Sorry to bug you guys: @mayanks @jackie.jxt @xiangfu0 do you think it's safe to enable batch mode on the controllers?
@elon.azoulay: Whenever you have time lmk
@jackie.jxt: This flag is actually passed to Helix, and I don't think we have upgraded Helix since `0.2.0`. I'm not sure if the issue described in the PR applies to you, so I would suggest enabling it in a staging cluster and monitor how it works
@elon.azoulay: nice, thanks! Will let you know how it goes.
@elon.azoulay: Hopefully the findings will be helpful
@sunhee.bigdata: @sunhee.bigdata has joined the channel
@sunhee.bigdata: Hi, everyone :slightly_smiling_face: I am trying batch ingestion job. There are 5 server instances(3 is running , 2 is down) in our pinot cluster. When segments assigned, some segments assigned to down server instance. Even when all segment are assigned to down server instances, I cant use the table. Is is normal? or Any other solutions? Thank you :slightly_smiling_face:
@mayanks: What is your replication? If replication = 1, then server down = data unavailability.
@mayanks: Do you know why the servers are down?
--------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org For additional commands, e-mail: dev-h...@pinot.apache.org

Apache Pinot Daily Email Digest (2022-02-24)

#general

#random

#troubleshooting

Reply via email to