Apache Pinot Daily Email Digest (2021-06-21)

Pinot Slack Email Digest Mon, 21 Jun 2021 19:00:51 -0700

#general

@santhosh: We have 4 billion events coming in everyday and average size is 1kb. We want to use apache pinot with s3 as deep storage. 1. Which is a good storage option? local disk or s3? 2. We use presto and need support for about 1000 concurrent queries which query about 10 collections in each query. Can this support be expected over pinot with presto?
@santhosh: 3. If using s3, how will the realtime analytics use case happen? Is there a hot shard maintained on the local disk?
@mayanks: Deep storage is only for backup. Servers maintain a local copy of the data on local attached disk
@santhosh: If we use pinotfs as s3 and give the data dir as s3 location. Is that a deep storage or is it replacing the storage engine local to s3
@santhosh: Pardon. Now I have understood it is just a deep store and not the replacement for local storage engine.
@mayanks: Yes
@b.barisercan: @b.barisercan has joined the channel
@gmr.sql: @gmr.sql has joined the channel
@neilteng233: Hey I am wondering, will "Sorted forward index with run-length encoding" help group by on that column? I am not sure about how the execution plan and optimizer work in Pinot? (mysql will take advantage of physical sorted col in group by in some cases)
@mayanks: Indexing is primarily used for applying predicates (filters), not for group-by atm.
@neilteng233: got you. Thank you for confirmation.
@dougdeu: @dougdeu has joined the channel
@jkim0110610: @jkim0110610 has joined the channel

#random

@b.barisercan: @b.barisercan has joined the channel
@gmr.sql: @gmr.sql has joined the channel
@dougdeu: @dougdeu has joined the channel
@jkim0110610: @jkim0110610 has joined the channel

#troubleshooting

@b.barisercan: @b.barisercan has joined the channel
@zsolt: We are having trouble with RealtimeToOfflineSegmentsTask, doing rollups fills up the heap and causes bad GC stalls disconnecting ZK and rendering the minion useless until a restart. Since the task is time based it's not possible to size the minion heap to account for traffic spikes without overshooting towards too small segments When is the off-heap rollup support planned to be implemented? (mentioned in )
@npawar: @jackie.jxt is working on that. How close are you to finishing the offheap roll-up Jackie?
@npawar: Meanwhile, you can make the bucket size to something smaller
@jackie.jxt: PR almost ready, will try to merge it by this week
@jmeyer: Hello :slightly_smiling_face: Can someone remind me the steps to handle `"message": "MergeResponseError: Data schema mismatch between merged block:` after a REALTIME table backward-compatible schema change ? (new column) I've tried `Reload All Segments` Is that because I haven't set _`pinot.server.instance.reload.consumingSegment` ?_
@jmeyer: _Full message:_ ```[ { "errorCode": 500, "message": "MergeResponseError: Data schema mismatch between merged block: [communityId(STRING),documentId(STRING),eventTimeString(STRING),eventType(STRING),hoverDuration(INT),ibcustomer(STRING),origin(STRING),projectId(STRING),selectionId(STRING),timeString(STRING),userId(STRING)] and block to merge: [communityId(STRING),documentId(STRING),eventTimeString(STRING),eventType(STRING),ibcustomer(STRING),origin(STRING),projectId(STRING),selectionId(STRING),timeString(STRING),userId(STRING)], drop block to merge" } ]```
@xiangfu0: can you try to restart realtime servers?
@jmeyer: Oh, I'd have to see with the SREs Is that the only way ?
@jmeyer: Because that would mean changing a schema would lead to downtime, right ?
@xiangfu0: hmm, typically it should be if it’s backwards compatible changes
@xiangfu0: Shouldn’t
@xiangfu0: Need to check what’s going on there
@xiangfu0: Can you create an issue and describe how you find this issue?
@jmeyer: Yeah sure will do, thanks @xiangfu0
@jmeyer: -> :slightly_smiling_face:
@jackie.jxt: Yes, in order to reload consuming segment as well, you need to set a flag
@jackie.jxt: We should consider making it on by default
@jmeyer: @jackie.jxt Is there any benefit to keeping it off ?
@jackie.jxt: I don’t think so. I think the flag is added when testing the feature
@jackie.jxt: Let me double check
@jmeyer: Okay It's this flag : _`pinot.server.instance.reload.consumingSegment`_ , right ? :slightly_smiling_face:
@jackie.jxt: Yes, that is correct
@jmeyer: Great news, I'll try that then :tada: Thanks Jackie
@jackie.jxt: @xiangfu0 PR to set the default behavior:
@jmeyer: Fantastic :thankyou:
@gmr.sql: @gmr.sql has joined the channel
@dougdeu: @dougdeu has joined the channel
@jkim0110610: @jkim0110610 has joined the channel

#pinot-dev

@b.barisercan: @b.barisercan has joined the channel
--------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org For additional commands, e-mail: dev-h...@pinot.apache.org

Apache Pinot Daily Email Digest (2021-06-21)

#general

#random

#troubleshooting

#pinot-dev

Reply via email to