Apache Pinot Daily Email Digest (2021-08-25)

Pinot Slack Email Digest Wed, 25 Aug 2021 19:00:52 -0700

#general

@shubhambhattar: @shubhambhattar has joined the channel
@albertobeiz: @albertobeiz has joined the channel
@mrpringle: Can the pinot-admin MergeSegments command be used on real time segments which are online. I have a lot of 10mb segments and am tweaking the segment flush threshold time. Else guess I can try hybrid table to take the older ones into offline tables.
@mayanks: We are working on merge/rollup using minion cc @jackie.jxt
@jackie.jxt: You may use the realtime-to-offline task documented here to make it a hybrid table:
@jackie.jxt: Once the merge/rollup task is available, you can enable it on the offline table to further merge the segments
@mrpringle: Think I read that upsert won't work for realtime segments converted to offline? Is this the case
@mrpringle: For the new roll up feature would be neat if we could roll up the last value for a given primary key every x hours I.e. like upsert /latest aggregation operator in druid. This way I can reduce the resolution of my data over time.
@yesoreyeram: @yesoreyeram has joined the channel
@dxasonntag: @dxasonntag has joined the channel
@dxasonntag: This question is sort of off topic, but I was wondering if anyone here has details on the tools that the Pinot project uses to generate daily digests of this slack workspace?
@snlee: We use a slack bot which schedules a cron job everyday
@thiago.pereira.net: @thiago.pereira.net has joined the channel
@danny: @danny has joined the channel
@rhodges: @rhodges has joined the channel
@neilteng233: Hey there, I have a question about the range index. Are Pinot allowed to have multiple range index?
@ken: Yes, you can have multiple columns (fields) with range indexes
@neilteng233: Then I am wondering, if the column is not sorted physically, how could we build the range index? Is this index more like a nry-tree, hash table or SSTable structure? How can we store the actual document id as the value of a specific range?
@neilteng233: Is it a list of all the doc id belong to that range?
@g.kishore: List of all doc id that belong to a range
@ken: See
@ken: Hi @xiangfu0 - Jackie suggested I ask you whether there’s a way to query the build version (and/or git hash), to confirm what’s actually running on a cluster & handling requests.
@mayanks: There's a api in swagger?
@xiangfu0: I think there is a /version api
@ken: Yes, thanks - i was trying individual component version calls, but it’s a top-level /version that returns back build version info for all components, which is fin

#random

@shubhambhattar: @shubhambhattar has joined the channel
@albertobeiz: @albertobeiz has joined the channel
@yesoreyeram: @yesoreyeram has joined the channel
@dxasonntag: @dxasonntag has joined the channel
@thiago.pereira.net: @thiago.pereira.net has joined the channel
@danny: @danny has joined the channel
@rhodges: @rhodges has joined the channel

#troubleshooting

@shubhambhattar: @shubhambhattar has joined the channel
@shubhambhattar: Hi, we’ve been trying to setup Apache Pinot ingestion from Azure EventHub (Azure EventHub supports Kafka protocol so that Kafka clients can read from Azure EventHub). All the events within Azure EventHub are Protobuf events. Any ideas on how to make protobuf deserialization work here? The table creation is successful but data is not being ingested.
@shubhambhattar: This is the configuration, if it helps: ```"streamConfigs": { "streamType": "kafka", "stream.kafka.consumer.type": "highLevel", "stream.kafka.topic.name": "<topic-name>", "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.inputformat.protobuf.ProtoBufRecordReader", "stream.kafka.group.id": "$Default", "stream.kafka.client.id": "pinot-test", "stream.kafka.security.protocol": "SASL_SSL", "stream.kafka.sasl.mechanism": "PLAIN", "stream.kafka.sasl.jaas.config": "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"$ConnectionString\" password=${password}", "stream.kafka.zk.broker.url": "<eventhub-namespace>.", "realtime.segment.flush.threshold.rows": "0", "realtime.segment.flush.threshold.time": "24h", "realtime.segment.flush.segment.size": "100M", "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory", "stream.kafka.broker.list": "<eventhub-namespace>.", "realtime.segment.flush.threshold.size": "500", "stream.kafka.consumer.prop.auto.offset.reset": "earliest" }```
@g.kishore: @kharekartik is protobuf supported in real-time?
@kharekartik: No, Only Avro and JSON are supported in realtime. I can take this up.
@shubhambhattar: Okay. Thanks for confirming.
@albertobeiz: @albertobeiz has joined the channel
@yesoreyeram: @yesoreyeram has joined the channel
@dxasonntag: @dxasonntag has joined the channel
@thiago.pereira.net: @thiago.pereira.net has joined the channel
@danny: @danny has joined the channel
@rhodges: @rhodges has joined the channel

#getting-started

@tiger: What does the process look like for changing the table config for an existing table with segments in deepstore?
@tiger: For example: if I wanted to modify the indices for an existing table. Do the segments in deepstore automatically get updated?
@g.kishore: you can simply change the table config and invoke the reloadAll segments api on the controller. Note that this will add index the segments on the servers.. not the deepstore
@tiger: I see, so does that mean that every time a new server loads a segment from the deepstore in the future, it will have to recompute the indices? Also to confirm, the index and aggregation information is all stored within a segment right?
@npawar: Yes, every new server will have to recompute indices. Every existing server will have to recompute indices it doesn't already have, during reload
@tiger: Got it, thanks! In this case, is it recommended to recompute the segments in deepstore?
@g.kishore: Not really.. we did this design on purpose
--------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]

Apache Pinot Daily Email Digest (2021-08-25)

#general

#random

#troubleshooting

#getting-started

Reply via email to