Apache Pinot Daily Email Digest (2021-04-09)

Pinot Slack Email Digest Fri, 09 Apr 2021 19:00:37 -0700

#general

@vaibhav.sinha: @vaibhav.sinha has joined the channel
@rymurr: @rymurr has joined the channel
@ricardo.bernardino: Hi! We are checking the realtime ingestion with upsert and we have some questions around it • Can we have a retention period of say 6 months? • Is there a significant impact on this for the upsert logic? • If we add new servers, are the partitions correctly spread to the new servers?
@g.kishore: Yes. You can have a retention of 6 months. Don’t see a significant impact on the performance. make sure you provision the servers accordingly and possibly over partition the Kafka topic Yes the rebalance command will take care of that
@ricardo.bernardino: Hi! Thanks a lot for your reply! Are there some guidelines over the server provisioning? I'm assuming you are talking mostly about disk space since the segments will be bound to a server in order to easily update that segment if a new event for a given key arrives. ~From a test we were running, it seems that Pinot can return more than one row for a given key even with updates enabled. Is that the right behaviour or is it a bug? Is it expected that the update event is returned until the "merge" takes place or should it be asynchronous?~ ~In the tests, we were sending events with repeated keys, and when doing a `select count(*), count(distinct key)` the values were different, but were converging if we repeated the query~
@g.kishore: That should not happen, most likely the stream is not partitioned properly
@g.kishore: Try adding $hostName , $segmentName in the query
@g.kishore: And paste the response
@ricardo.bernardino: I'll check the topic configuration, but regarding this test we were still running on a single server
@g.kishore: Can you share the query and response
@ricardo.bernardino: It appears there was an error with our configuration of the table, which I was still unaware of. All is good after all :)
@ricardo.bernardino: Regarding this > I'm assuming you are talking mostly about disk space since the segments will be bound to a server in order to easily update that segment if a new event for a given key arrives. is this assumption correct?
@g.kishore: mind sharing what was wrong with table config
@g.kishore: disk space and also memory, I think the key Map is stored in memory as of now for performance reasons @yupeng @jackie.jxt can confirm this
@yupeng: yes, the pk is stored in mem for lookup
@jackie.jxt: The map from primary key to record location are stored in heap memory (ConcurrentHashMap)
@g.kishore: is there plan to move this offheap?
@jackie.jxt: Not in short term. We need the concurrency primitives provided by the ConcurrentHashMap. We maintain one map per partition, so should be fine as long as the cardinality of the primary key is not too high
@zsolt: I was looking for the `HIGHEST_KAFKA_OFFSET_CONSUMED` or `HIGHEST_STREAM_OFFSET_CONSUMED` metric for monitoring stream ingestion lag, and found that it was removed as part of Is there an alternative way to monitor the stream ingestion?
@ssubrama: Kafka stopped exporting this metric along with the incoming messages, so we had to remove it.
@gaurav.madaan: @gaurav.madaan has joined the channel
@shyam.m: @shyam.m has joined the channel
@vaibhav.sinha: Hi everyone. I am planning to experiment with Pinot for the user facing analytics use cases we have. Our scale is not too large (~ 1M DAU) and we have a small team of 3 engineers working on data engineering for the first time. We primarily use managed services on AWS. With Pinot, one of the concerns is self managing the infrastructure and I wanted to know how has been the experience of others in this regard.
@ken: We’ve been running Pinot for a few months now, using Docker containers on self-serve hardware. In general it’s been no problem, though I always worry about having Zookeeper in the mix :slightly_smiling_face: We did run into one cluster-killer issue, where a query with a `distinct` count that was too large would put the cluster in a weirdly broken state, until brokers were restarted.
@mayanks: Thanks @ken for sharing your experience.
@vaibhav.sinha: Thanks @ken.
@raahulgupta07: @raahulgupta07 has joined the channel
@aaron: Is pinot-admin.sh the preferred way to upload batch data or is there a REST API for that too?
@mayanks: There's rest-api as well.
@mayanks: You can use either one that works best in your deployments
@aaron: Cool. Is there any documentation on how to use that REST API?
@mayanks: Let me find
@mayanks:
@mayanks: For data import ^^
@mayanks: Getting the rest api in a sec
@g.kishore: Most commands in Pinot admin use rest api under the hood
@g.kishore: You can see swagger UI
@mayanks: Yes, swagger UI lists all the rest apis including segment upload
@mayanks:
@mayanks: From swagger UI ^^
@jmeyer: Just curious, what's the difference with API ? This new API directly uploads to the server (via the controller) without going through the deep store, right ?
@mayanks: @jmeyer The PR description states: `These are meant for quick testing/trials, and not intended for production usage.`
@jmeyer: Yes, so I wonder how they are differing under the hood ?
@g.kishore: The segment generation is happening on the controller node.. running this on a controller is not scalable/fault tolerant for production setting
@mayanks: Correct
@mayanks: But if you are looking for prod supported data push using uri (ie do not push payload to controller), the ones listed in swagger should already do that @jmeyer
@jmeyer: With the `/v2/segments` API (also called by the CLI `LaunchDataIngestionJob` ?) - the minion is responsible for generating segments, right ? :slightly_smiling_face:
@mayanks: These upload api's are independent of where/how the segments are generated
@jmeyer: Ok, thanks @mayanks @g.kishore (and sorry for "hijacking" this thread :smile:)
@mayanks: No worries, we are all knowledge sharing here, so everyone benefits :slightly_smiling_face:
@aaron: Thanks all!
@aaron: Ok so the Swagger UI is super cool, I had no idea that existed at all :slightly_smiling_face:
@aaron: So if I understand right, pinot-admin.sh is largely a wrapper around that API, and for example in the case of LaunchDataIngestionJob, the machine that I run pinot-admin.sh on isn't doing much work at all, it's just telling the Pinot cluster to begin data ingestion?
@g.kishore: launchDataIngestion is an exception
@g.kishore: but if you configure minion, you can convert that into an API call
@aaron: I'm sorry -- I'm pretty new at this and don't understand. How is it an exception?
@g.kishore: because it uses the pinot library and generates the segment on the machine you ran the command and uploads the generated segment to Pinot
@aaron: I gotcha, so it is doing some work on that machine I run the command on. Is this work essentially re-encoding the data from whatever the incoming format is to Pinot's segment format?
@mayanks: Yes
@aaron: Gotcha, thanks
@aaron: Are there any docs about how to do this with minion?
@chad.preisler: What version of the JDK is required to run Pinot? When building, the tests don’t run correctly with JDK 11 and above. That makes me wonder what JDK I can run with.
@mayanks: Jdk 8 for now. 11 is under PR
@chad.preisler: Thank you!
@mayanks: :+1:

#random

@vaibhav.sinha: @vaibhav.sinha has joined the channel
@rymurr: @rymurr has joined the channel
@gaurav.madaan: @gaurav.madaan has joined the channel
@shyam.m: @shyam.m has joined the channel
@srini: howdy from your friends in the Apache Superset community! :wave: We’re doing a community live demo next week (Tue Apr 13) where @brianolsen87 & I will use Superset to visualize Trino-JOIN-ed data from MangoDB & Pinot :pinot: . We’d love to see you there :heart:
@raahulgupta07: @raahulgupta07 has joined the channel

#troubleshooting

@pabraham.usa: Hello , What will happen to the in realtime in-memory consuming segment if a server is restarted? Do Pino start recreating the segment again after restart or I lose the data?
@mayanks: Server will restart consuming from previous checkpoint, no data loss.
@pabraham.usa: I am actually thinking from the point of sudden and unexpected restarts or crashes. Not a proper restart.
@pabraham.usa: Also wondering How Pino knows where to resume from in case of in-memory segments.
@mayanks: Yes, still the same behavior. Kafka offsets are checkpointed with segments that are flushed. Any recovery restarts from that point.
@mayanks: Checkpoints are saved
@pabraham.usa: Thanks Mayank, it is good to know that the checkpoints are saved and Pino can survive crashes.
@mayanks: :+1:
@vaibhav.sinha: @vaibhav.sinha has joined the channel
@rymurr: @rymurr has joined the channel
@gaurav.madaan: @gaurav.madaan has joined the channel
@shyam.m: @shyam.m has joined the channel
@jmeyer: :wave: Does `ingestionConfig.transformConfigs` occur before `ingestionConfig.filterConfig` ? [realtime table] I'm seeing errors on some messages due to invalid transformations (-> missing field - since this table isn't for this type of event)
@jmeyer: Maybe filtering isn't working Here's how it's configured `table config` ```"ingestionConfig": { "filterConfig": { "filterFunction": "Groovy({type != \"com.insideboard.user-event.v1.post.like\"}, type)" },``` and the events that should have been filtered but generated transformation errors `event` ```{ "id": "1935a299-9f7c-43d4-a571-6c5cfdcf30cb", "source": "urn:insideboard:microservices:user-events", "specversion": "1.0", "type": "com.insideboard.user-event.v1.document.open", "time": "2021-04-09T14:53:13.812Z", "data": { "name": "document.open", "documentId": "60705cf9106a42150318d198", "projectId": "60646df59c4dbcdaf6e620bf", "origin": "DOCUMENTATION_PROJECT", "status": 200, "userId": "6049f2c188d36401807cf526", "eventTime": "2021-04-09T14:53:13.705Z" }, "ibcustomer": "demo", "x-b3-traceid": "b403e6464ab3991ba26a4aade442705e", "x-b3-parentspanid": "7f497c2501e745a8", "x-b3-spanid": "25ecdb6fec5b4840", "x-b3-sampled": "1" }```
@jmeyer:
@npawar: Yes transformation occurs before filter.
@jmeyer: Interesting, thanks @npawar
@npawar: So that you have the option to filter on the result of a transform
@jmeyer: Yeah, makes sense, I just thought about that use case :slightly_smiling_face: So there's no way to "hide" those error messages ?
@npawar: Only thing I can think of is if we make the jsonPathString function null safe
@npawar: @fx19880617 should we change jsonPathString to handle null input?
@jmeyer: Ah yes, indeed Thanks :slightly_smiling_face:
@jmeyer: I've had success adding default value to `jsonPathString` - so that should "fix" my "issue" indeed
@npawar: Oh great
@fx19880617: I think we should fix the null input to get null output for sure
@fx19880617: please create a github issue
@jmeyer: @fx19880617 so change the behavior from error (missing field) to using the null value for the given type (target column) ?
@fx19880617: right, or maybe an empty json
@raahulgupta07: @raahulgupta07 has joined the channel
@brianolsen87: Loading in the csv data for the preset demo and I am hitting this snag on date columns with nulls. ``` "dateTimeFieldSpecs": [ { "name": "cdc_case_earliest_dt", "dataType": "STRING", "format": "1:DAYS:SIMPLE_DATE_FORMAT:yyyy/MM/dd", "granularity": "1:DAYS" }, { "name": "cdc_report_dt", "dataType": "STRING", "format": "1:DAYS:SIMPLE_DATE_FORMAT:yyyy/MM/dd", "granularity": "1:DAYS" }, { "name": "pos_spec_dt", "dataType": "STRING", "format": "1:DAYS:SIMPLE_DATE_FORMAT:yyyy/MM/dd", "granularity": "1:DAYS" }, { "name": "onset_dt", "dataType": "STRING", "format": "1:DAYS:SIMPLE_DATE_FORMAT:yyyy/MM/dd", "granularity": "1:DAYS" } ]``` With csv that has various dates that don't exist ```[ec2-user@aws ~]$ head /tmp/pinot-quick-start/covid-cases.csv cdc_case_earliest_dt ,cdc_report_dt,pos_spec_dt,onset_dt,current_status,sex,age_group,race_ethnicity_combined,hosp_yn,icu_yn,death_yn,medcond_yn 2020/10/23,2021/01/28,2020/10/23,,Laboratory-confirmed case,Female,0 - 9 Years,"Black, Non-Hispanic",Missing,Missing,No,Missing 2020/10/23,2020/10/23,2020/10/23,,Laboratory-confirmed case,Female,0 - 9 Years,"Black, Non-Hispanic",No,Unknown,No,No 2020/10/23,2020/10/25,2020/10/23,2020/10/23,Laboratory-confirmed case,Female,0 - 9 Years,"Black, Non-Hispanic",No,Missing,Missing,Missing 2020/10/23,2020/10/25,2020/10/23,,Laboratory-confirmed case,Female,0 - 9 Years,"Black, Non-Hispanic",Missing,Missing,Missing,Missing``` Looks like when parsing null rows, the parser gets fed a null value. I'm tempted to update `defaultNullValue` in to be a default date of `1970/01/01` but I'd like to just keep those values null if possible. Anything i'm doing wrong or any way around this? ```Failed to generate Pinot segment for file - file:/tmp/pinot-quick-start/covid-cases.csv java.lang.IllegalArgumentException: Invalid format: "null" at org.joda.time.format.DateTimeParserBucket.doParseMillis(DateTimeParserBucket.java:187) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-50a4531b33475327bc9fe3c0199e7003f0a4c882] at org.joda.time.format.DateTimeFormatter.parseMillis(DateTimeFormatter.java:826) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-50a4531b33475327bc9fe3c0199e7003f0a4c882] at org.apache.pinot.core.segment.creator.impl.SegmentColumnarIndexCreator.writeMetadata(SegmentColumnarIndexCreator.java:555) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-50a4531b33475327bc9fe3c0199e7003f0a4c882] at org.apache.pinot.core.segment.creator.impl.SegmentColumnarIndexCreator.seal(SegmentColumnarIndexCreator.java:514) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-50a4531b33475327bc9fe3c0199e7003f0a4c882] at org.apache.pinot.core.segment.creator.impl.SegmentIndexCreationDriverImpl.handlePostCreation(SegmentIndexCreationDriverImpl.java:273) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-50a4531b33475327bc9fe3c0199e7003f0a4c882] at org.apache.pinot.core.segment.creator.impl.SegmentIndexCreationDriverImpl.build(SegmentIndexCreationDriverImpl.java:246) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-50a4531b33475327bc9fe3c0199e7003f0a4c882] at org.apache.pinot.plugin.ingestion.batch.common.SegmentGenerationTaskRunner.run(SegmentGenerationTaskRunner.java:111) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-50a4531b33475327bc9fe3c0199e7003f0a4c882] at org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner.lambda$submitSegmentGenTask$1(SegmentGenerationJobRunner.java:261) ~[pinot-batch-ingestion-standalone-0.7.0-SNAPSHOT-shaded.jar:0.7.0-SNAPSHOT-50a4531b33475327bc9fe3c0199e7003f0a4c882] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_282] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_282] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_282] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_282] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_282]```
@elon.azoulay: Try `"nullHandlingEnabled": true,` in the `"tableIndexConfig":` section of the table config (not the schema above). Not sure if time columns can be null though, if not then defaultNullValue = 0 would work, otherwise it will be set to Long.MIN_VALUE which is also not a valid value.

#pinot-dev

@pabraham.usa: @pabraham.usa has joined the channel

#pinot-rack-awareness

@pabraham.usa: Hello guys, just want to check something. Consider a scenario where I have 6 segments , 3 servers and replica-per-partition is 2 and 2 FD. Will the design ensure replicas always go to a different FD? something like the diagram?
@g.kishore: that is the idea
@g.kishore: any reason why you dont want to have equal number of nodes per FD? or multiple FD's?
@pabraham.usa: That was just an example, and I am using k8s so node count won't be exactly the same always.
@g.kishore: Yeah, we will have to enhance the algorithm and most likely, we will end up having multiple strategies
@g.kishore: Prefer availability over uniform load balancing vs best effort tact aware placement
@pabraham.usa: Yes agree, it will be better to spread segments uniformly. That might require additional checks?
@g.kishore: Segment assignment is a PhD topic and can get super complicated
@pabraham.usa: Yes, however as a quick workaround may be assign the segment to a node with least no of segments .

#minion-improvements

@laxman: okay. Will try to test my changes with this patch.
@laxman: @jackie.jxt I’m also exploring if there is a workaround without patching Pinot. Currently we have REALTIME tables only and Data Ingestion is *only from Kafka* Please validate if the following approach works to run the Purge task. • Create a OFFLINE table for each REALTIME table • Convert segments from REALTIME to OFFLINE using built-in minion task • Run PurgeTask on OFFLINE table
@jackie.jxt: I assume you want to convert segments from REAL-TIME to OFFLINE?
@jackie.jxt: Yes, it should work
@jackie.jxt: Basically taking the hybrid table approach
@laxman: > I assume you want to convert segments from REAL-TIME to OFFLINE? Yes. My bad. Fixed the typo.
@laxman: Thanks @jackie.jxt for the guidance. Will test this approach too.
@laxman: Going with a forked/patched version (for ) is not really an option for us in production. Will wait for the merge and released version. Thats the reason to explore other workarounds till then.
@jackie.jxt: I would suggest directly going the hybrid approach. The PR you mentioned will only allow uploading segments for upsert table in the first phase
@fx19880617: I feel setting up a hybrid table and moving segments from realtime -> offline then you can purge offline segments
@fx19880617: Your approach should work

#complex-type-support

@yupeng: @yupeng has joined the channel
@g.kishore: @g.kishore has joined the channel
@jackie.jxt: @jackie.jxt has joined the channel
@npawar: @npawar has joined the channel
@yupeng: hey folks, i have a proposal to add the complex data type support. PTAL:
@yupeng: @npawar @jackie.jxt @g.kishore
@yupeng: feel free to involve others to review
--------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]