#general


@anumukhe: @anumukhe has joined the channel
@anumukhe: Hi, I am from Cisco. we have recently decided to evaluate Apache Pinot for our cloud based analytic project. However while evaluation, I got stuck for one of our non functional requirements which is backup-restore. Can you please suggest how we can take periodic backup of Pinot to S3 for disaster recovery purpose?
  @fx19880617: You can configure pinot controller to use s3 as data directory deep store. So if Pinot cluster is gone, you can still recreate the cluster, and re-push all the segments from s3. Ref:
  @anumukhe: Thank you so much. will take a look at it
@anumukhe: Thanks
@joshhighley: Do lowlevel realtime tables support ingestionConfig-transformConfig ?
  @g.kishore: Yes
  @joshhighley: My POST request to create my table has a transformConfig property in ingestionConfig but it doesn't appear in the UI for the new table when I look. I also tried adding the config in the UI and it's not updated
  @joshhighley: 'filterConfig' is saved but transformConfig isn't
  @g.kishore: can you show the screen shot
  @g.kishore: @npawar ^^
  @npawar: Can you share the table config?
  @joshhighley: the config in my REST body:
  @joshhighley: ```"ingestionConfig": { "filterConfig": { "filterFunction": "Groovy({custom_saluation != null})" }, "transformConfig": [{ "columnName":"custom_salutation", "transformFunction":"Groovy({'This is Groovy'})" }] }```
  @joshhighley: when I view the table config in the UI:
  @joshhighley: ``` "ingestionConfig": { "filterConfig": { "filterFunction": "Groovy({custom_saluation != null})" } },```
  @joshhighley: lowlevel realtime table configured for upsert, if that matters
  @joshhighley: version 0.7.0-SNAPSHOT-e62addb3b381e89d3afe24847a1bff06e7682246 running from the docker images
  @npawar: it should be transformConfigs
  @joshhighley: ok, that worked. The documentation needs fixed:
  @npawar: will fix it, thanks for pointing that out!
@afilipchik: hey, quick question to validate an assumption - is star tree index created for realtime tables?
@chinmay.cerebro: Not for mutable segments
@chinmay.cerebro: everything else yes
@ustela101: @ustela101 has joined the channel
@g.kishore: @afilipchik to clarify, real-time tables do create star-tree index when the segments get flushed
@afilipchik: got it, ty!
@afilipchik: and what about Upserts?
  @chinmay.cerebro: @jackie.jxt ^^ startree support in case of upserts ? Should we track this somewhere ?
  @afilipchik: yep, does a table with upserts generate star tree?
  @chinmay.cerebro: you can - but the question is do the results make sense :slightly_smiling_face:
  @chinmay.cerebro: as in - de-dup would likely not happen
  @afilipchik: ahh, so - it will create but will be incorrect?
  @jackie.jxt: Star-tree should not be applied to upsert use cases as the records can be invalidated
  @jackie.jxt: There is no way to also invalidate the records that is already pre-aggregated in star-tree
  @jackie.jxt: @chinmay.cerebro Maybe we should add this to the validation
  @afilipchik: we should. I was chatting with Elon about it, maybe there is another way to build/rebuild startee for upserts but will require some thinking :slightly_smiling_face:
  @jackie.jxt: I don't see an easy way to make star-tree mutable. Whenever a record gets updated, we need to remove it from the star-tree, which will be very hard
  @jackie.jxt: Also, upsert won't work properly with very large table as the primary keys are maintained on heap memory, so not sure how much value star-tree can provide even if we support it
  @chinmay.cerebro: I’ll add the validation check for now
  @g.kishore: @jackie.jxt you can do use star-tree with upsert if you negate the previous value + new value for the metrics (create a new column for the aggregate metric)
  @jackie.jxt: @g.kishore Conceptually yes, but that will be very hard to implement and has the following challenges: • Some aggregated metrics cannot be negated such as HLL, MIN, MAX • Won't work on aggregated metrics with variable length (no way to update in-place) • Thread-safety issue We don't have star-tree for mutable segment for similar reasons
  @yupeng: fyi, there is a validation for startree with upsert
@neilteng233: @neilteng233 has joined the channel

#random


@anumukhe: @anumukhe has joined the channel
@ustela101: @ustela101 has joined the channel
@neilteng233: @neilteng233 has joined the channel

#troubleshooting


@anumukhe: @anumukhe has joined the channel
@ustela101: @ustela101 has joined the channel
@slackbot: This message was deleted.
  @fx19880617: what’s the select * query results?
  @fx19880617: there should be one doc per kafka event
  @fx19880617: if there is only one doc, then it means pinot only consumed 1 record from Kafka
  @fx19880617: also is there any exceptions from pinot server log?
  @joshhighley: select * has no query result but query response stats has totalDocs = 1, so it seems the data is there, just not being returned
  @joshhighley: and, yes, I only sent 1 record to Kafka for the table
  @joshhighley: no errors in the server or broker logs
  @fx19880617: Then I somehow feel the groovy function might not be correct. @npawar is there a way we can ingest using consumption/index time as the time value?
  @npawar: you could use `now()` to set current time?
  @fx19880617: true, can you try this function `now()`:
  @joshhighley: those transforms are on the Select. I'm running the transform on the ingestion
  @joshhighley: my actual transform is more complex, but I've boiled it down to the test case. putting just a long in the Groovy script doesn't work, but putting System.currentTimeMillis() does work. When it doesn't work, the record still shows up in the query stats but not in the results
  @joshhighley: if I use a different date field, defined the same way but not acting as the table time column, then its value is populated by the transform as I expect
  @joshhighley: I've found that when I destroy the table, Pinot still seems to remember the most recent table date-time. If I create the table again, same name and schema, then even though the table is starts empty, Pinot queries won't return records with a date-time prior to those when the table was destroyed.
  @joshhighley: if I create the table with a different table name but same transformation, then the records will be returned by the query
  @joshhighley: I've been able to simplify the issue somewhat, I'm going to start a new message thread
  @fx19880617: Sure, I think that transform function can also be used in ingestion transform function
  @fx19880617: for table destroy, did you also delete the schema ?
  @fx19880617: then it seems indicate that Pinot caches some transform functions per table name
  @joshhighley: I found that the transform doesn't matter. I can specify the column time in the data and have the same issue
  @joshhighley: when I re-create the table, if I insert the same primary key with an earlier column time, the record won't be returned by a query
  @joshhighley: it's still using the most recent column time for the record that existed in the deleted table
  @fx19880617: Is this for upsert?
  @joshhighley: yes
  @fx19880617: i think upsert uses timestamp to figure out the latest record
  @fx19880617: so only newest version of the record is counted
  @joshhighley: right, but it's using the timestamp from records in the deleted table when I re-create the table with the same name
  @fx19880617: I see, how long did you wait for the table to be deleted then recreated it? We observed some issue when recreate table in a very short time
  @fx19880617: if not, then it’s very likely that some intermediate status is not cleaned up
  @joshhighley: several seconds between deleting and re-creating. There's only a few rows in the tables (testing)
@neilteng233: @neilteng233 has joined the channel
@joshhighley: I have a realtime table configured for upsert, so a primary key in the schema. If I delete the table, then re-create it with the SAME name, then inserted records in the NEW table will not be returned by a query if they have an earlier timestamp than what the same records had in the deleted table (same record by primary key). The records in the new table are reflected in the query stats (and only the new records) but they aren't returned by the query if they have an earlier timestamp. Is there more I need to delete besides the table? New segments are being created when I create the new table
  @fx19880617: @yupeng @jackie.jxt is it possible the primary key store is not cleaned during deletion?
  @jackie.jxt: There is a recent fix regarding this issue: . This fix is not included in 0.6.0, so for now you have to restart the server in order to make the new table work as expected
--------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]

Reply via email to