#general


@tangyonga: • Hi Team, I have seen that in 0.4.0, pinot has implemented the initial version of theta-sketch based distinct count aggregation function, utilizing the  library. Compared to Druid the latest release which has also included DataSketches extension(, , ,), pinot has any plan to implement other sketchs other than Theta sketch). Thanks. 
  @mayanks: Pinot already supports HLL and TDigest based percentiles. If there's a specific case where you would find DataSketch based implementations more useful, we can definitely explore that. If so, would recommend filing an issue for that.
  @mayanks: For HLL we use `com.clearspring.analytics.stream.cardinality.HyperLogLog`
  @mayanks: And for TDigest, we use `com.tdunning.math.stats.TDigest`
  @tangyonga: Thanks for quick reply!
  @mayanks: :+1:
  @tangyonga: @mayanks we maybe need to pay attention to KLL sketch vs t-digest(pinot impmentation) and seeing the following comparison by datasketches,
  @mayanks: Thanks for sharing @tangyonga. We can definitely explore adding these if needed.
  @tangyonga: appendix(): HLL @mayanks
  @tangyonga: Also noting that DataSketches includes a latest : Estimating Stream Cardinalities more efficiently than the famous HLL sketch, which is from
  @mayanks: If you could open an issue and add all this there, it would help us track this request @tangyonga
  @tangyonga: I will try to open an issue to discuss sketches family @mayanks
  @mayanks: Thanks @tangyonga.
@sosyalmedya.oguzhan: Hello, do pinot supports upsert for offline tables? or do it only supports that for realtime tables? for example; when late data arrived after the real-time segment is flushed, can pinot update it?
  @mayanks: @sosyalmedya.oguzhan At the moment the support is for real-time only. However, Pinot segments don’t need to be time partitioned, so late arriving data is not an issue cc: @yupeng
  @yupeng: Yes, upset is for realtime only, for offline table you can do the compaction in the segment creation job.
  @yupeng: but the offline upsert support is on the roadmap for the upcoming months
@john: @john has joined the channel
@egala: @egala has joined the channel
@myeole: Hello, Do we have any pinot DB benchmarks we can refer to ?
  @g.kishore: We have some we did at LinkedIn and recently Confluera published some numbers..
  @myeole: Thanks
  @g.kishore: we always suggest doing the benchmark for your use case and with your data
  @g.kishore: you can see the indexing techniques in Pinot and some performance numbers on 5 years of GitHub Data
  @myeole: Sure Thanks
@zjinwei: Hi, is it possible to monitor Pinot DB metrics with Wavefront instead of Prometheus and Grafana? Are there any docs I can refer to? Thanks
  @g.kishore: All metrics are emitted via JMX. I am not familiar with wavefront... Does wavefront have a JMX exporter?
  @zjinwei: Hi Kishore, thanks for replying. I found something in Wavefront about JMX Integration.. I thinks it might work. Just curious are there any way we can implement to achieve native integration using Dropwizard?

#random


@john: @john has joined the channel
@egala: @egala has joined the channel

#troubleshooting


@john: @john has joined the channel
@egala: @egala has joined the channel
--------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]

Reply via email to