Apache Pinot Daily Email Digest (2022-04-22)

Pinot Slack Email Digest Fri, 22 Apr 2022 19:00:37 -0700

#general

@nizar.hejazi: Hi team, since upsert table maintains an in-memory map from the primary key to the record location, is it still recommended to define an inverted index on the table’s primary key? The primary key is unique per record, so is it recommended to define a bloom filter on the primary key instead of an inverted index to save space or is O(n) space per index, where n is the number of records, considered normal?
@richard892: don't create an inverted index on it
@richard892: bloom filter helps find the segment
@richard892: you can also find the server quickly if you add this configuration so the broker knows how to route queries
@richard892: partitioning helps find the server, bloom filter helps find the segment, but you will still need to scan within the segment to find the record, to speed up intrasegment query time, consider sorting on the primary key
@nizar.hejazi: Example table: `employee`. Primary key: `employeeId`. Partitioning key: `companyId`. I have 8 partitions for `employee` table. This means records of around 1/8 of all companies goes into a single partition. As you mentioned, partitioning helps finds the server. We have also an inverted index on our partitioning key (`companyId`), since each partition contains the records of hundreds of companies but our queries are always restricted to a single company. I’ll add bloom filters on `employeeId` column to help prune segments that don’t contains records related to a specific employeeId or a set of employee Ids. I think there is no value in adding `companyId` to the set of bloom filter columns since it is highly unlikely that a segment does not contains at least a single record from each company. Note: as per docs, bloom filters can be applied only to dictionary-encoded columns. *We want to avoid scanning all the content of a segment, can you elaborate on why sorted index is better than inverted index?* Please note that each record has a unique primary key and our primary key is a Mongo 12-bytes .
@nizar.hejazi: Plan to specify multiple replica set per table. instanceSelectorType will be set to “strictReplicaGroup” since we support upsert.
@richard892: with an inverted index you end up with a one element bitmap per employee, with a sorted index you just binary search on the employee
@richard892: however, given that a 2000 employee company is quite large, and assuming each employee is scoped to a single company, I wouldn't index on employee at all, but sort on company
@richard892: you will always provide the company id for the sake of partition selection anyway, and a sorted index access on company id will select 100s or small 1000s of records to scan for the employeeId
@richard892: also that scan will be contiguous within the scope of a company, so it's the best case for scanning
@richard892: If I were you, I would delay adding bloom filter on employee Id until you've sorted on companyId and seen how far that gets you
@g.kishore: So you are looking up on a secondary key (employeeId).. so having bloomfilter on employee I’d is a good idea.
@richard892: it could be worth it, but we've also had reports of a lot of time spent building bloom filters as well as a lot of time in segment pruning, and the tradeoff is against a binary search then a small contiguous scan
@richard892: I wouldn't like to guess where the crossover point is so I'd try without first and see what the marginal benefit is
@nizar.hejazi: thanks for your input .. I will start w/ sorted index on company id and consider later a bloom filter on employee ID (to prune segments more effectively if needed)
@mann.mehta: @mann.mehta has joined the channel
@erik.bergsten: Querying question: When we query our table (with 6million rows) and aggregate (sum/avg) metrics over dimensions we get very good performance, usually ~250ms. If we use and group by month and year we suddenly see a jump to about 1sec or even more. Our solution is to add year and month as dimensions with an ingestion transform, is there a better way?
@richard892: group by relies on there being a dictionary (a mapping from each distinct value to an integer dictionary code) to be fast, because it makes computing the coordinates fast, but when you use a transform function like `datetimeconvert` there is no longer a dictionary to rely on, so one needs to be materialised on the fly, which slows the evaluation down
@richard892: we recently developed a feature specifically for `datetimeconvert` which pre-materialises the function at various granularites (hour, day, week etc.) precisely so group by queries can be accelerated
@richard892: the timestamp index is available on master and in the latest docker builds if you want to try it
@erik.bergsten: any documentation/links about the timestamp index?
@richard892: here's the PR
@erik.bergsten: I'm running the latest image now, the index is not listed in the gui, does that mean it isnt available or is the gui just lagging?
@richard892: just isn't available in the gui
@richard892: you can configure it manually
@erik.bergsten: I don't understand the configuration instructions: ```Pinot will pre-generate one column per time granularity with forward index and range index. The naming convention is $${ts_column_name}$${ts_granularity}, e.g. Timestamp column ts with granularities DAY, MONTH will have two extra columns generated: $ts$DAY and $ts$MONTH.``` I simply put a sorted-column: ["$ts$MONTH"] in my indexconfig ?
@richard892: @xiangfu0 can you help with configuration please?
@richard892: this is quite a raw feature so the documentation isn't quite there yet
@erik.bergsten: nice! we got it working by setting ```"fieldConfigList": [ { "name": "ts", "encodingType": "DICTIONARY", "indexType": "TIMESTAMP", "indexTypes": [ "TIMESTAMP" ], "timestampConfig": { "granularities": [ "DAY", "WEEK", "MONTH", "YEAR" ] } } ],``` In the table config. Getting really good performance on time queries now!
@erik.bergsten: Thx for the quick help.
@richard892: nice
@richard892: time filters should be faster too
@jadami: > Getting really good performance on time queries now! back to the original 250ms or even better?
@xiangfu0: Thanks @richard892 ! @jadami the perf should be on par with extract to a separated column
@gqian3: Hi team, we have a datetimefield column storing epoc milliseconds from local time zone, is there a transformation in Pinot supporting convert it into UTC timestamp with Pinot query?
@erik.bergsten: this function: can convert epochmillis to datetime strings
@gqian3: I tried to add ‘1:MILLISECONDS:EPOCH tz(US/Pacific)’ as the input format. But got query error.
@erik.bergsten: I dont think epochmilliseconds are timezoned
@gqian3: So we have to store epoch milliseconds in UTC in order to convert to other TZ, not the other way around, right?
@mayanks: `DateTimeConvert` can do all the time zone conversions

#random

@mann.mehta: @mann.mehta has joined the channel

#troubleshooting

@saumya2700: Is it normal for segments go OFFLINE if server is not that much busy , facing this issue almost in dev , Qa env. and data is getting lost due to this , only 2 or 3 segments are ONLINE and sometimes data is not coming to pinot. Checking logs for errors if any.
@npawar: sounds like your servers are unhealthy, and might be restarting. Likely a resource issue. You might want to check for OOM like issues
@mann.mehta: @mann.mehta has joined the channel
@luisfernandez: hey friends, we are using loki to do logging ingestion of pinot logs however with the default setup that we have from the helm chart it seems like the searching capabilities for logging are not as great, do you guys have a configuration for log4j that better satisfies log lookups in tools like loki or google logs search?
@mayanks: Not that I am aware of. If you do come up with one, happy to see if we can update the existing one.
@diogo.baeder: Hey folks, does anybody know how to forcefully push down Pinot queries to the broker when using Trino? I really need a solution to be able to do multi-table queries and have been trying to use Trino for that, but I'm absolutely disappointed at how hard it's been to make this work - if I just do normal queries in Trino it tries to query *ALL* of the segments, completely bypassing any partitioning (including the time column), which is insane.
@mayanks: @elon.azoulay ^^
@mayanks: Also, could you elaborate a bit on what you are trying to query?
@xiangfu0: what’s your join query look like?
@diogo.baeder: Let me paste the query here: ```select distinct weight from weights inner join brands_metrics on weights.pmxid = brands_metrics.pmxid inner join filters on weights.pmxid = filters.pmxid where brand_id = 1000226 and filters.name = 'gender' and contains(filters."values", 2) and weights.date > 20130501 and weights.date < 20130515 and brands_metrics.date > 20130501 and brands_metrics.date < 20130515 and filters.date > 20130501 and filters.date < 20130515``` and here's some explanation about what's going on: • I only want data between 2013-05-01 and 2013-05-15 • I want to join data from `weights`, `brands_metrics` and `filters` where they match on the `caseid` column (which is common to all the 3 tables) • I expected Trino to do the querying in the Pinot Broker, but it seems like it first gets all of the segments existing in those tables and does the queries inside each of them, so for example doing each separate query takes only a few ms in Pinot itself, but running the full query in Trino takes 6s which is unacceptable
@diogo.baeder: Looking at the Trino logs, I see that it tries to read all of the Pinot segments, no matter the time boundaries for it
@mayanks: Which part would you like to push down to Pinot?
@xiangfu0: hmm, I think in both trino/presto the filters will be pushed down
@diogo.baeder: I'd like to push down everything except the joins, so that I can make use of Pinot's partitioning of data (per time and other configured columns I have)
@xiangfu0: although it queries all the segments, but the push down query won’t return if there is no match
@xiangfu0: true, pinot partitioning metadata hasn’t been utilized so far for presto/trino
@xiangfu0: did you tried using trino dynamic table:
@diogo.baeder: @xiangfu0 I tried, yes, but Trino is not happy about my subquery statements and is unable to parse them :disappointed:
@diogo.baeder: Even if I run a simple query like: ```select weight, caseid as weights_caseid from weights where "date" > 20130501 and "date" < 20130515 limit 100``` if I run it in Pinot, it runs in 5ms, and in Trino it's taking more than 1s. And then looking at the logs, I see this for example: ```trino-server-xavier | 2022-04-22T21:44:38.522Z INFO 20220422_214437_00113_jhvv6.1.0.0-105-102 io.trino.plugin.pinot.PinotSegmentPageSource Query 'SELECT "caseid", "weight" FROM weights_OFFLINE WHERE (("date" > '20130501' AND "date" < '20130515')) LIMIT 100' on host 'Optional[Server_172.30.0.5_8098]' for segment splits: [weights_OFFLINE_20131224_20131224_201]``` meaning Trino is trying to query the `weights_OFFLINE_20131224_20131224_201` segment, which is totally out of the period I want to get data from. Pinot matches 5 segments but Trino tries to run it over all the 209 segments I have.
@xiangfu0: yes, that’s true, trino/presto is not that smart to prune at the segment level
@xiangfu0: The pruning logic needs to be added
@xiangfu0: how about query like ```SELECT * FROM pinot.default."select weight, caseid as weights_caseid from weights where \"date\" > 20130501 and \"date\" < 20130515 limit 100"```
@diogo.baeder: Got it. Thanks a lot, @xiangfu0, I think this actually helps on my decision. I'll just drop Trino and try to work with the data myself doing the joins.
@diogo.baeder: Let me try that
@xiangfu0: yeah, if the underlying queries can be executed from broker, then you can leverage trino for it
@xiangfu0: presto will query all segments
@xiangfu0: so if trino is not an option, you can opt out presto as well
@diogo.baeder: That query doesn't work: ```trino.exceptions.TrinoUserError: TrinoUserError(type=USER_ERROR, name=SYNTAX_ERROR, message="line 1:94: mismatched input '" > 20130501 and "'. Expecting: '(', ',', 'CROSS', 'EXCEPT', 'FETCH', 'FULL', 'GROUP', 'HAVING', 'INNER', 'INTERSECT', 'JOIN', 'LEFT', 'LIMIT', 'MATCH_RECOGNIZE', 'NATURAL', 'OFFSET', 'ORDER', 'RIGHT', 'TABLESAMPLE', 'UNION', 'WHERE', 'WINDOW', <EOF>", query_id=20220422_215155_00117_jhvv6)``` so yeah, I guess Trino or Presto are a no-go for me. :confused:
@xiangfu0: hmm
@xiangfu0: is this syntax error ?
@xiangfu0: for reserved keyword?
@diogo.baeder: It is, from the Trino perspective - I think because of the quotes around `date`
@xiangfu0: what about ```SELECT * FROM pinot.default."select weight, caseid as weights_caseid from weights where date > 20130501 and date < 20130515 limit 100"```
@xiangfu0: yeah, those quotes are really annoying
@xiangfu0: also if date is a keyword in pinot, then generated pinot query has to double quotes it to mention it’s an identifier
@diogo.baeder: ```trino.exceptions.TrinoQueryError: TrinoQueryError(type=INTERNAL_ERROR, name=GENERIC_INTERNAL_ERROR, message="Caught exception while parsing query: select weight, caseid as weights_caseid from weights where date > 20130501 and date < 20130515 limit 100", query_id=20220422_215444_00133_jhvv6)``` probably because Pinot itself doesn't accept just `date` (since it's reserved)
@diogo.baeder: Yeah, you read my mind :smile:
@xiangfu0: right, I guess it’s unfair to ask to create a new column :stuck_out_tongue:
@diogo.baeder: Hmmm... to avoid this quoting issue, you mean?
@xiangfu0: I feel we should ban create column names on reserved keyword soon for new tables lol
@xiangfu0: yeah
@xiangfu0: I was thinking your query might be runnable with ```select distinct weight from (SELECT * FROM pinot.default."SELECT weight, pmxid FROM weights WHERE date > 20130501 and date < 20130515 and brand_id = 1000226") weights inner join (SELECT * FROM pinot.default."SELECT weight, pmxid FROM weights WHERE date > 20130501 and date < 20130515 and brand_id = 1000226") brands_metrics on weights.pmxid = brands_metrics.pmxid inner join (SELECT * FROM pinot.default."SELECT weight, pmxid FROM weights WHERE date > 20130501 and date < 20130515 and name = 'gender' and contains(filters."values", 2) ") filters on weights.pmxid = filters.pmxid```
@xiangfu0: but I don’t know the quote parts
@diogo.baeder: Yeah, that could be a good idea I guess :smile: No worries, let me actually try that. I was really rooting for Trino to work fine for me, so let's see if pushing down works but with non-reserved keywords.
@xiangfu0: kind of pushing down 3 queries to pinot and just use trino for joins on brokerResponses
@diogo.baeder: Yeah, that's the last thing I tried by myself, but maybe it was failing because of the quotes. I'll try to prefix all columns with the table name, this should also make joins easier to read, and avoid any kind of quoting issue.
@xiangfu0: sure, or you can change filter to some other non-reserved keywords
@xiangfu0: if we can narrow down the problem to quotes, then it could be much simpler fixes
@diogo.baeder: Yeah, just made a quick fix actually - to `date_` and `values_`. I'm too lazy :smile:
@diogo.baeder: Alright, dynamic tables do work now, albeit not performing as I expected. I get queries running ~10x slower than straight queries to Pinot, ~80ms in Trino compared to ~7ms in Pinot, which is not good, but at least it's not absurdly slow anymore. A full query with joins now takes ~270ms, which is far better than 6s, but still not performing as I would expect it to. I'll do some more experiments and check how fast I can get by just fetching those results into a Python script and joining the data myself.
@diogo.baeder: @elon.azoulay in case you're interested ^
@diogo.baeder: For reference, this is my full query now: ```select distinct weight from pinot.default."select weight, caseid as weights_caseid from weights where date_ > 20150501 and date_ < 20150515" as weights inner join pinot.default."select brand_id, caseid as brands_metrics_caseid from brands_metrics where brand_id = 1000226 and date_ > 20150501 and date_ < 20150515 limit 100" as brands_metrics on weights_caseid = brands_metrics_caseid inner join pinot.default."select name, caseid as filters_caseid from filters where name = 'gender' and values_ = 2 and date_ > 20150501 and date_ < 20150515" as filters on weights_caseid = filters_caseid```
@diogo.baeder: I'm almost dropping Trino and trying PrestoDB instead. If that doesn't work either, I'll have to brute-force this thing and rely on implementing the joins myself.
@g.kishore: Thanks @diogo.baeder. keep us posted. @elon.azoulay would be good to chat with Diogo to get his feedback on trino-pinot connector. I know we chatted about not having the need of dynamic tables.
@diogo.baeder: Thanks! :slightly_smiling_face:
@elon.azoulay: Yes, we do have a workaround but it may be very hacky - I can share more details if the dynamic table doesn't work.
@diogo.baeder: Got it. Thanks @elon.azoulay, I'm doing some changes to see if push-down through dynamic tables works fine.
@diogo.baeder: Dynamic tables do work now for me after I recreated my data avoiding the usage of quotes for column names.

#getting-started

@mann.mehta: @mann.mehta has joined the channel
--------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org For additional commands, e-mail: dev-h...@pinot.apache.org

Apache Pinot Daily Email Digest (2022-04-22)

#general

#random

#troubleshooting

#getting-started

Reply via email to