#general
@gaythu.rajan: @gaythu.rajan has joined the channel
@rahul: @rahul has joined the channel
@sameerasalameh95: @sameerasalameh95 has joined the channel
@sosyalmedya.oguzhan: Does pinot stores min max values for dimensions in segment metadata? Or does it just store min max values for date time fields? And can we create inverted or any other indices on date time column?
@g.kishore: Its store min max for all columns.
@sosyalmedya.oguzhan: So, can we create indices on time column?
@mayanks: You can create Inv index for any column including time. However, if time column is sorted naturally, you don’t need to.
@mayanks: What is your time unit
@sosyalmedya.oguzhan: time column is not sorted. time unit is day in epoch timestamp format
@mayanks: If day, then you will have one inv index value point to all days worth of records, which might be same as metadata based pruning.
@sosyalmedya.oguzhan: perfect actually, i can store this column as dimension as "2021-01-03 00:00:00" format for fast filtering grouping operation. but i don't know if there is any difference to define it as dimension or time column
@mayanks: Mind sharing your query patterns?
@mayanks: General rule of thumb is to partition on primary key (if there is one). If not then sort on dimension that appears on most queries. and then add secondary in index as needed for columns that have higher selectivity first
@sosyalmedya.oguzhan: i have two time column; one is eventDate and the other is represents orderDate. All of my queries based on sellerId and orderDate. ```select abc, xyz, sum(x).. from table group by abc, xyz where sellerId = 123 and orderDate > X and orderDate < y``` the other query example can be like that; ```select orderDate, abc, xyz, sum(x).. from table group by orderDate, abc, xyz where sellerId = 123 and orderDate > X and orderDate < y```
@sosyalmedya.oguzhan: I think to partition data by sellerid(for broker side pruning), create inverted or sorted index on sellerId(i'm not sure which one is the better now), and create inverted index on other required columns
@mayanks: Sort and partition on seller and inv index on other
@mayanks: Sorted is always better than inv as it gives better data locality
@sosyalmedya.oguzhan: ohh, thank you much!
@sosyalmedya.oguzhan: I've tried an example. I created realtime table, and set `orderDate` as time field. Totally 4 segments created. When i send below query; `select * from orders` query stats: ```numSegmentsQueried: 4 numSegmentsProcessed: 4 numSegmentsMatched: 4``` And, when i send below query; `select * from orders where orderDate > 1619458466000` query stats: ```numSegmentsQueried: 4 numSegmentsProcessed: 1 numSegmentsMatched: 1``` Seems pinot prune segments based on min/max orderDate of segment metadata, right? But i can't see min/max orderDate in realtime segment metadata. How can i see it?
@mayanks: It should be there in metadata.properties
@sosyalmedya.oguzhan: is `/segments/{tableName}/{segmentName}/metadata` looking to metadata.properties?
@mayanks: Yes
@mayanks: For segments that are committed to disk
@mayanks: For ones that are still open it might be looking at segment zk metadata.
@mayanks: The fact that num segments matched is 1 you can be assured they pruning worked
@sosyalmedya.oguzhan: yes segment pruning based on min/max values of a time column seems works. But i couldn't sure, because when i try to get metadata of completed segment; ```{ "segment.realtime.endOffset": "3", "segment.time.unit": "MILLISECONDS", "segment.start.time": "1616250195000", "segment.flush.threshold.size": "50000", "segment.realtime.startOffset": "0", "segment.end.time": "1616250220000", "segment.total.docs": "3", "segment.table.name": "orders_REALTIME", "segment.realtime.numReplicas": "1", "segment.creation.time": "1616261054496", "segment.realtime.download.url": "
@sosyalmedya.oguzhan: Also, I set `realtime.segment.flush.threshold.time`to 2M, totally 4 segments created. The creation time of the last segment is 20:30 but segment.realtime.status of it is still `"IN_PROGRESS"`
@mayanks: Do you see any functional issues or you are just trying to verify?
@sosyalmedya.oguzhan: When i see response stats of query, it seems works. I'm just trying to verify
@sosyalmedya.oguzhan: ```numSegmentsQueried: 5 numSegmentsProcessed: 2 numSegmentsMatched: 1``` what is the difference between numSegmentsProcessed and numSegmentsMatched? I've looked document, but i did not understand it. Only one segment matched, but pinot processed 2 segments?
@mayanks: I think `numSegmentsProcessed` is the number of segments that were processed/looked-at. And `numSegmentsMatched` can be <= `numSegmentsProcessed` , because even if a segment is processed there may be 0 records matched in that segment.
@mayanks: Does that make sense?
@sosyalmedya.oguzhan: yes make sense, thank you again!
#random
@gaythu.rajan: @gaythu.rajan has joined the channel
@rahul: @rahul has joined the channel
@sameerasalameh95: @sameerasalameh95 has joined the channel
#troubleshooting
@gaythu.rajan: @gaythu.rajan has joined the channel
@rahul: @rahul has joined the channel
@sameerasalameh95: @sameerasalameh95 has joined the channel
@sameerasalameh95: Hey everyone :wave: I'm new here, still doing the installations for Pinot on my windows based machine, I'm getting this error for the very final step, do you guys have any idea what could be wrong? Thanks in advance! After building Pinot successfully, I tried to execute: bin/quick-start-batch.sh, and I get the following:
@mayanks: Seems like an issue in the batch script for Windows? I am unsure if folks here have tried this on Windows.
@mayanks: I'll take a look shortly.
@sameerasalameh95: @mayanks thanks a lot! I'll be able to try this on Linux next week, but I'm trying to make use of this week and I'm stuck with a windows machine :sweat_smile:
@fx19880617: which java version? if on windows, I feel maybe you can try docker installation?
--------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
