#general
@tangyonga: • Hi Team, I have seen that in 0.4.0, pinot has implemented the initial version of theta-sketch based distinct count aggregation function, utilizing the
@mayanks: Pinot already supports HLL and TDigest based percentiles. If there's a specific case where you would find DataSketch based implementations more useful, we can definitely explore that. If so, would recommend filing an issue for that.
@mayanks: For HLL we use `com.clearspring.analytics.stream.cardinality.HyperLogLog`
@mayanks: And for TDigest, we use `com.tdunning.math.stats.TDigest`
@tangyonga: Thanks for quick reply!
@mayanks: :+1:
@tangyonga: @mayanks we maybe need to pay attention to KLL sketch vs t-digest(pinot impmentation) and seeing the following comparison by datasketches,
@mayanks: Thanks for sharing @tangyonga. We can definitely explore adding these if needed.
@tangyonga: appendix(
@tangyonga: Also noting that DataSketches includes a latest
@mayanks: If you could open an issue and add all this there, it would help us track this request @tangyonga
@tangyonga: I will try to open an issue to discuss sketches family @mayanks
@mayanks: Thanks @tangyonga.
@sosyalmedya.oguzhan: Hello, do pinot supports upsert for offline tables? or do it only supports that for realtime tables? for example; when late data arrived after the real-time segment is flushed, can pinot update it?
@mayanks: @sosyalmedya.oguzhan At the moment the support is for real-time only. However, Pinot segments don’t need to be time partitioned, so late arriving data is not an issue cc: @yupeng
@yupeng: Yes, upset is for realtime only, for offline table you can do the compaction in the segment creation job.
@yupeng: but the offline upsert support is on the roadmap for the upcoming months
@john: @john has joined the channel
@egala: @egala has joined the channel
@myeole: Hello, Do we have any pinot DB benchmarks we can refer to ?
@g.kishore: We have some we did at LinkedIn and recently Confluera published some numbers..
@myeole: Thanks
@g.kishore: we always suggest doing the benchmark for your use case and with your data
@g.kishore: you can see the indexing techniques in Pinot and some performance numbers on 5 years of GitHub Data
@myeole: Sure Thanks
@zjinwei: Hi, is it possible to monitor Pinot DB metrics with Wavefront instead of Prometheus and Grafana? Are there any docs I can refer to? Thanks
@g.kishore: All metrics are emitted via JMX. I am not familiar with wavefront... Does wavefront have a JMX exporter?
@zjinwei: Hi Kishore, thanks for replying. I found something in Wavefront about JMX Integration.
#random
@john: @john has joined the channel
@egala: @egala has joined the channel
#troubleshooting
@john: @john has joined the channel
@egala: @egala has joined the channel
--------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
