Apache Pinot Daily Email Digest (2020-12-11)

Pinot Slack Email Digest Fri, 11 Dec 2020 18:00:32 -0800

#general

@lnc.adoni: @lnc.adoni has joined the channel
@gergely.lendvai93: @gergely.lendvai93 has joined the channel
@karinwolok1: :wave: Hello everyone! In the next couple of months we're going to create some kind of "Apache Pinot updates" email for our community. If you're interested in getting Apache Pinot product release announcements, news, community events, etc. in your email inbox, you can sign up here. :slightly_smiling_face:

#random

@lnc.adoni: @lnc.adoni has joined the channel
@gergely.lendvai93: @gergely.lendvai93 has joined the channel

#troubleshooting

@tanmay.movva: Hello, how frequently are the (jmx)metrics emitted by pinot? And is this configurable by the user?
@fx19880617: do you mean jmx itself or the http server exposed by jmxagent?
@fx19880617: I think it’s all instant when you issue the request
@lnc.adoni: @lnc.adoni has joined the channel
@gergely.lendvai93: @gergely.lendvai93 has joined the channel
@ken: When running a data ingestion job where the table spec includes a star tree index, I see output lines like: Generated 1623374 star-tree records from 3291903 segment records Finished creating aggregated documents, got -1824996 aggregated records. Wondering why it’s reporting a negative number of aggregated records…
@mayanks: Seems like a bug.
@g.kishore: its a logging bug ``` int numRecordsUnderStarNode = _numDocs - numStarTreeRecords; ("Finished constructing star-tree, got {} tree nodes and {} records under star-node", _numNodes, numRecordsUnderStarNode); createAggregatedDocs(_rootNode); int numAggregatedRecords = _numDocs - numSegmentRecords - numRecordsUnderStarNode;```
@g.kishore: @jackie.jxt ^^
@jackie.jxt:
@jackie.jxt: @ken Thanks for reporting the issue
@ken: @jackie.jxt - wow, that was fast :slightly_smiling_face:

#pinot-perf-tuning

@elon.azoulay: We noticed that offline tables with a lot of segments require a lot of DirectR buffer references - would this indicate that we need to scale up the number of servers? What % of the heap should DirectR buffer references consume before it is recommended to scale up?
@steotia: I don't think we have ever made such a specific consideration before adding more capacity. Typically it's the latency and QPS that guide the number of servers (number of replica groups and servers per group) to keep an optimal cpu usage per server. Yes adding more servers will potentially reduce the heap overhead per server. But, % overhead for direct buffers seems very specific thing to optimize. Typically for Java, the way to tune is divide between heap and direct(native) memory. As an example, one of our very high throughput use case has the following config for both offline and realtime ```<value>-Xms32g</value> <value>-Xmx32g</value> <value>-XX:MaxDirectMemorySize=21g</value>``` Another case, where the ratio between direct to heap is higher for offline ```<value>-Xms14g</value> <value>-Xmx14g</value> <value>-XX:MaxDirectMemorySize=37g</value>``` for realtime, the ratio is low ```<value>-Xms30g</value> <value>-Xmx30g</value> <value>-XX:MaxDirectMemorySize=23g</value>```
@steotia: So my suggestion would be to start with a ratio of direct to heap memory, a set of servers and tune both to arrive at an optimal combination that meets qps and latency sla
@ken: Hi @steotia - if a server has lots of RAM (e.g. 256gb) then it seems like there’s some max size for the JVM beyond which it doesn’t benefit for being bigger, but increasing direct memory would help. What’s in JVM space that grows with the size of the dataset?

#getting-started

@amitchopra: Hi, I am try to setup Pinot wherein the segments are written to deep store (S3 in this case) from the server instead of the controller. Firstly, i have a pinot cluster running and writing segments to S3. I followed combination of steps in: 1. 2. 3. 4.
@amitchopra: Then i changed config as mentioned in . And now segments are not being written to S3. I do see segments being created, as they show up on query browser. But the segments show up as status BAD. Can someone help to point what is wrong with the configuration: Configs: controller.conf ------------------------------ controller.helix.cluster.name=pinot-quickstart controller.port=9000 controller.enable.split.commit=true controller.allow.hlc.tables=false controller.data.dir=/tmp/pinot-tmp-data/ controller.local.temp.dir=/tmp/pinot-tmp-data/ pinot.controller.storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS pinot.controller.storage.factory.s3.region=us-west-2 pinot.controller.segment.fetcher.protocols=file,http,s3 pinot.controller.segment.fetcher.s3.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher controller.zk.str=pinot-zookeeper:2181 .hostname=true server.conf ------------------ pinot.server.netty.port=8098 pinot.server.instance.enable.split.commit=true pinot.server.adminapi.port=8097 pinot.server.instance.dataDir=/tmp/pinot-tmp/server/index pinot.server.instance.segment.store.uri= pinot.server.instance.segmentTarDir=/tmp/pinot-tmp/server/segmentTars pinot.server.storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS pinot.server.storage.factory.s3.region=us-west-2 pinot.server.segment.fetcher.protocols=file,http,s3 pinot.server.segment.fetcher.s3.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher .hostname=true pinot.server.instance.realtime.alloc.offheap=true table conf ---------------------- { “REALTIME”: { “tableName”: “demo1_REALTIME”, “tableType”: “REALTIME”, “segmentsConfig”: { “timeType”: “MILLISECONDS”, “schemaName”: “demo1", “timeColumnName”: “mergedTimeMillis”, “retentionTimeUnit”: “DAYS”, “retentionTimeValue”: “60", “replication”: “1", “replicasPerPartition”: “1", “completionConfig”: { “completionMode”: “DOWNLOAD” }, “peerSegmentDownloadScheme”: “http” }, “tenants”: { “broker”: “DefaultTenant”, “server”: “DefaultTenant” }, “tableIndexConfig”: { “streamConfigs”: { “streamType”: “kafka”, “stream.kafka.consumer.type”: “lowlevel”, “stream.kafka.topic.name”: “demo1", “stream.kafka.decoder.class.name”: “org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder”, “stream.kafka.consumer.factory.class.name”: “org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory”, “stream.kafka.zk.broker.url”: “", “stream.kafka.broker.list”: “", “realtime.segment.flush.threshold.time”: “10m”, “realtime.segment.flush.threshold.size”: “10000", “stream.kafka.consumer.prop.auto.offset.reset”: “smallest” }, “enableDefaultStarTree”: false, “enableDynamicStarTreeCreation”: false, “loadMode”: “MMAP”, “autoGeneratedInvertedIndex”: false, “createInvertedIndexDuringSegmentGeneration”: false, “aggregateMetrics”: false, “nullHandlingEnabled”: false }, “metadata”: { “customConfigs”: {} } } }
@fx19880617: I think this controller.data.dir=/tmp/pinot-tmp-data/ should be on s3?
@fx19880617: oic, this is for split commit
@fx19880617: have you seen any logs on pinot server for not able to write to s3?
@amitchopra: BTW - if i change controller.data.dir to s3 path, things start to work. Segments are getting created in S3. But how do i know then if it is controller or server creating and uploading the segments to S3?
@amitchopra: @fx19880617 - basically trying to understand if pinot.server.instance.segment.store.uri is set with S3 path for server config, does controller also need the S3 path set using controller.data.dir? And if required to be set to controller too, why does it need that?
@fx19880617: no need
@fx19880617: in your case it's separation
@fx19880617: I think your config is fine
@fx19880617: can you check server log and see if there is any exception about saving segment to s3
@amitchopra: i see. so next steps is for me to remove the s3 path from controller conf. And then check the logs on server logs
@fx19880617: yes, controller is not on data path
@amitchopra: @tingchen @fx19880617 - i see the following in server logs (with _controller.data.dir_ pointing to local temp dir) 2020/12/11 20:13:46.613 WARN [LLRealtimeSegmentDataManager_demo1__3__126__20201211T1757Z] [demo1__1__125__20201211T1757Z] CommitEnd failed with response {“isSplitCommitType”:false,“streamPartitionMsgOffset”:null,“buildTimeSec”:-1,“status”:“FAILED”,“offset”:-1} 2020/12/11 20:13:47.851 WARN [LLRealtimeSegmentDataManager_demo1__3__126__20201211T1757Z] [demo1__4__126__20201211T1757Z] CommitEnd failed with response {“isSplitCommitType”:false,“streamPartitionMsgOffset”:null,“buildTimeSec”:-1,“status”:“FAILED”,“offset”:-1} 2020/12/11 20:13:48.052 WARN [LLRealtimeSegmentDataManager_demo1__3__126__20201211T1757Z] [demo1__3__126__20201211T1757Z] CommitEnd failed with response {“isSplitCommitType”:false,“streamPartitionMsgOffset”:null,“buildTimeSec”:-1,“status”:“FAILED”,“offset”:-1} 2020/12/11 20:13:49.012 WARN [LLRealtimeSegmentDataManager_demo1__3__126__20201211T1757Z] [demo1__0__125__20201211T1757Z] CommitEnd failed with response {“isSplitCommitType”:false,“streamPartitionMsgOffset”:null,“buildTimeSec”:-1,“status”:“FAILED”,“offset”:-1} 2020/12/11 20:13:49.331 WARN [LLRealtimeSegmentDataManager_demo1__3__126__20201211T1757Z] [demo1__2__126__20201211T1757Z] CommitEnd failed with response {“isSplitCommitType”:false,“streamPartitionMsgOffset”:null,“buildTimeSec”:-1,“status”:“FAILED”,“offset”:-1} 2020/12/11 20:13:49.695 WARN [LLRealtimeSegmentDataManager_demo1__3__126__20201211T1757Z] [demo1__1__125__20201211T1757Z] CommitEnd failed with response {“isSplitCommitType”:false,“streamPartitionMsgOffset”:null,“buildTimeSec”:-1,“status”:“FAILED”,“offset”:-1} 2020/12/11 20:13:50.931 WARN [LLRealtimeSegmentDataManager_demo1__3__126__20201211T1757Z] [demo1__4__126__20201211T1757Z] CommitEnd failed with response {“isSplitCommitType”:false,“streamPartitionMsgOffset”:null,“buildTimeSec”:-1,“status”:“FAILED”,“offset”:-1} 2020/12/11 20:13:51.115 WARN [LLRealtimeSegmentDataManager_demo1__3__126__20201211T1757Z] [demo1__3__126__20201211T1757Z] CommitEnd failed with response {“isSplitCommitType”:false,“streamPartitionMsgOffset”:null,“buildTimeSec”:-1,“status”:“FAILED”,“offset”:-1} 2020/12/11 20:13:52.082 WARN [LLRealtimeSegmentDataManager_demo1__3__126__20201211T1757Z] [demo1__0__125__20201211T1757Z] CommitEnd failed with response {“isSplitCommitType”:false,“streamPartitionMsgOffset”:null,“buildTimeSec”:-1,“status”:“FAILED”,“offset”:-1} 2020/12/11 20:13:52.389 WARN [LLRealtimeSegmentDataManager_demo1__3__126__20201211T1757Z] [demo1__2__126__20201211T1757Z] CommitEnd failed with response {“isSplitCommitType”:false,“streamPartitionMsgOffset”:null,“buildTimeSec”:-1,“status”:“FAILED”,“offset”:-1} 2020/12/11 20:13:52.752 WARN [LLRealtimeSegmentDataManager_demo1__3__126__20201211T1757Z] [demo1__1__125__20201211T1757Z] CommitEnd failed with response {“isSplitCommitType”:false,“streamPartitionMsgOffset”:null,“buildTimeSec”:-1,“status”:“FAILED”,“offset”:-1} 2020/12/11 20:13:54.014 WARN [LLRealtimeSegmentDataManager_demo1__3__126__20201211T1757Z] [demo1__4__126__20201211T1757Z] CommitEnd failed with response {“isSplitCommitType”:false,“streamPartitionMsgOffset”:null,“buildTimeSec”:-1,“status”:“FAILED”,“offset”:-1} 2020/12/11 20:13:54.206 WARN [LLRealtimeSegmentDataManager_demo1__3__126__20201211T1757Z] [demo1__3__126__20201211T1757Z] CommitEnd failed with response {“isSplitCommitType”:false,“streamPartitionMsgOffset”:null,“buildTimeSec”:-1,“status”:“FAILED”,“offset”:-1}
@tingchen: can you check the controller log to find why CommitEnd failed?
@g.kishore: @tingchen @fx19880617 can you help with this?
@tingchen: @tingchen has joined the channel
@g.kishore: I believe Uber uses this model
@myeole: @myeole has joined the channel
@tingchen: @amitchopra one config you posted is not pointing to the segment deep store.
@tingchen: ```controller.data.dir=/tmp/pinot-tmp-data/```
@tingchen: it should not point to a local fs but your S3 directory
@amitchopra: @tingchen - as per @fx19880617, he said that given controller is not in data path with split commit. Hence i have not provided the S3 path here. If i add S3 path there, segments are getting created in S3. Though in that case, how to know if server or controller is writing them to S3?
@tingchen: One direct signal you can check:
@tingchen: 1. In the Pinot server log, can you check the line "_Successfully upload segment_" ?
@tingchen: this is in the class _PinotFSSegmentUploader_ used by the Pinot server.
@tingchen: In our set up, _controller.data.dir_ points to the deep store and also is consistent with the server upload destination.
@amitchopra: @tingchen ok, let me change to point to s3 dir and then see if i see following in logs
--------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]