Apache Pinot Daily Email Digest (2021-01-08)

Pinot Slack Email Digest Fri, 08 Jan 2021 18:00:28 -0800

#general

@pandey.mayuresh367: @pandey.mayuresh367 has joined the channel
@karinwolok1: If you've ever wanted to get into the conference speaking circuit, now is your chance (if you're using Kafka :smirk: ) :tada:*The Kafka Summit Europe 2021 CFP is open*! :tada: The deadline is in 9 days. Go ahead and submit a talk. You have nothing to lose -- only to gain the possibility of being a thought leader in the innovations around real time analytics. :wine_glass: :heart: If you need some feedback on your submission, I am happy to help. Also, the Kafka Summit people are available for real-time feedback on submissions in one of their slack channels. More details in link below.
@karinwolok1: We also have a bunch of data-centric meetups groups globally that are looking for speakers. If you're interested in presenting in one of our future meetups (Pinot or others), please send me a DM :dancer:

#random

@pandey.mayuresh367: @pandey.mayuresh367 has joined the channel

#feat-text-search

@gamparohit: @gamparohit has joined the channel

#feat-presto-connector

@gamparohit: @gamparohit has joined the channel

#pql-2-calcite

@gamparohit: @gamparohit has joined the channel

#troubleshooting

@yash.agarwal: Is there a way we can do mode calculation in Pinot ?
@g.kishore: whats the use case? what do you want the result to be if there are multiple modes
@yash.agarwal: For our business purpose, we want the smallest value.
@g.kishore: dont think we have a udf for that right now
@g.kishore: two work arounds
@g.kishore: • select count(*) as count, x from T order by count asc top 10
@yash.agarwal: It would be hard for us to split the query into multiple queries, would it be possible to create a udf for the same ?
@g.kishore: yes,
@yash.agarwal: If for a table we have set `nullHandlingEnabled` as true, and we do distinct count on a column that has nulls, does it filter out the null values and only show count of non null distinct values ?
@g.kishore: you need to add column != NULL as of now
@yash.agarwal: That would be difficult as we would have other aggregations in the same query which are not filtered.
@yash.agarwal: I would assume we would be able to add a udf to handle the same as well ?
@yash.agarwal: or should we change the way distinct count / all other aggregations work when null handling is enabled.
@g.kishore: yes, but checking for null in the udf will make hurt performance. you can use defaultNullValue and filter it out on the client side
@g.kishore: the problem is its not clear what should be the default behavior
@yash.agarwal: we could potentially filter it out, but when it comes to aggregations like distinct count, we dont have a way to be certain if the aggregation counted the null/default value or not and might skew our metrics.
@pandey.mayuresh367: @pandey.mayuresh367 has joined the channel
@pabraham.usa: @pabraham.usa has joined the channel

#pinot-s3

@pabraham.usa: @pabraham.usa has joined the channel
@pabraham.usa: Hello, Is Pinot deep storage S3 is in query path. So that I could store data in S3 and query as normal?

#pinot-perf-tuning

@elon.azoulay: pr to use java11: it's been working for us. We also use `-XX:SoftRefLRUPolicyMSPerMB=0` to fix the issue where soft references are not sufficiently cleaned up in gc.
--------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]

Apache Pinot Daily Email Digest (2021-01-08)

#general

#random

#feat-text-search

#feat-presto-connector

#pql-2-calcite

#troubleshooting

#pinot-s3

#pinot-perf-tuning

Reply via email to