Apache Pinot Daily Email Digest (2021-07-05)

Pinot Slack Email Digest Mon, 05 Jul 2021 19:00:59 -0700

#general

@humengyuk18: How do I efficiently compute a distinct count for column a that is depend on the value of column b, like this: ```select FLOOR(10000*count(distinct case when action = '' then user_id else '' end) / count(distinct case when action = '' then user_id else '' end))/100 from "capi_trace" where company_id = 'aaaaa' and __time >= '1622554026000' and __time <= '1625146026000' and shop_id in ('xxxxx')``` Is it possible without using case when inside distinct count?
@g.kishore: Group by action?
@humengyuk18: If group by action, how should I write the divide section in select clause?
@g.kishore: You will have to perform that on the client side
@g.kishore: Or do something similar to theta sketch implementation.
@humengyuk18: I see, thanks.
@knowledgeisstrengthfo: Hi Everyone, I want to know about compression ratio in Apache Pinot. For Example, If I have a 10GB of JSON file containing records, having 100 columns, to save it in Pinot server, how much memory is required (considering there will be only 1 replica) ? Also in-memory segments gets flushed to segment store once threshold reached. So how much storage should be provisioned for deep store in controller ?
@mayanks: Compression depends on a variety of factors (eg cardinality of columns, data type, type of indexing used, etc). But even so, when compared to text based input (eg CSV/JSON), the compressions should be quite a lot. If you want an accurate number, just take one sample JSON, and create a pinot segment out of it (using pinot-admin)
@e-ramirez: Hi all, What is the timeline for Pinot 0.8? Do you have the roadmap and timelines written somewhere?
@mayanks: We are working on it, so expect in next few weeks
@radhika.23796: @radhika.23796 has joined the channel
@carlos: @carlos has joined the channel
@carlos: :wave:
@madhu.sling: @madhu.sling has joined the channel

#random

@radhika.23796: @radhika.23796 has joined the channel
@carlos: @carlos has joined the channel
@madhu.sling: @madhu.sling has joined the channel

#troubleshooting

@radhika.23796: @radhika.23796 has joined the channel
@radhika.23796: I tried these table and schema format I am not able see any datas in the pinot table but i can see the data in kafka-topic
@jackie.jxt: Hi, Radhika. Can you please check the controller and server log and see if there is any error logged?
@jackie.jxt: Some minor improvement based on the table config and schema, but should not be the root cause for not consuming the data: • Recommend using the same name for table and schema • Date format: ``` "dateTimeFieldSpecs": [ { "name": "date", "dataType": "STRING", "format": "1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd", "granularity": "1:DAYS" } ]```
@radhika.23796: Please help me out
@nadeemsadim: @mayanks @xiangfu0 @jackie.jxt@g.kishore @dlavoie @ken @npawar.. we are actually trying upsert above here as mentioned by @radhika.23796...but the table count is coming up as zero .. we have followed all the steps required as mentioned here .. we are using Apache samza API(check attached code snippet) for partition by as mentioned in the above doc for:- *Partition the input stream by the primary key* An important requirement for the Pinot upsert table is to partition the input stream by the primary key. For Kafka messages, this means the producer shall set the key in the `send` API. If the original stream is not partitioned, then a streaming processing job (e.g. Flink) is needed to shuffle and repartition the input stream into a partitioned one for Pinot's ingestion.
@nadeemsadim: we can see this streaming application publishing data on the output topic on top of which this pinot upsert table is created and before writing the data . using samza API .. we are shuffling the data to push data with same key on same partition .. still the data is not ingested by pinot and table count comes as zero . the table schema and table creation script are shared above by @radhika.23796
@carlos: @carlos has joined the channel
@jackie.jxt: @radhika.23796 @nadeemsadim Please check the controller and server log and see if there is any error logged? Even if the partitioning is wrong, you should still be able to see data consumed, so that should not be the cause
@madhu.sling: @madhu.sling has joined the channel
--------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org For additional commands, e-mail: dev-h...@pinot.apache.org

Apache Pinot Daily Email Digest (2021-07-05)

#general

#random

#troubleshooting

Reply via email to