Apache Pinot Daily Email Digest (2021-03-12)

Pinot Slack Email Digest Fri, 12 Mar 2021 18:00:38 -0800

#general

@anumukhe: Hi we will be installing Pinot cluster in AWS on top of EKS. We know that in AWS EKS has Multi (Three) Availability Zone (AZ) based HA in a specific Region. So I would like to understand that whether the EKS based Pinot cluster will be by default Fault Tolerant & HA within the region in case of any AZ failure or not. I know that Pinot Server has Segment Replica and replica-group which provide HA within the cluster in case of server failure. But what will happen if the controller has issue in the cluster (on EKS) or multiple servers have been corrupted or the cluster (on EKS) as a whole goes down? Considering the fact that the server will have EBS as data serving file system (& EBS multi AZ replication/sync will be ON), will EKS by default bring up alternative node like Controller or Server (or even Broker)? Net-net can we expect 100% service availability in Pinot on EKS in any Region? Or do we need to setup another Pinot Cluster on EKS on another AZ i.e. minimum Two Pinot Cluster (On EKS) in Two AZ within a Region? Please suggest
@g.kishore: Yes, Pinot is HA within a region.. as long as you make sure the replication factor is >1 and you have >1 controller and >1 broker
@g.kishore: yes, K8s will bring up another server/controller/broker on failure and everything will be handled seamlessly
@anumukhe: Thanks a lot
@matteo.santero: @matteo.santero has joined the channel
@ravi.maddi: @ravi.maddi has joined the channel
@ravi.maddi: Hi Guys,
@ravi.maddi: I have a doubt, is It possible for a nested json data as Pinot table? Avro support nested entities(json) by using record type in Avro Schema. Like Avro, Pinot Table configuration supports nested json entities.(Like Account json contains address json as embedded. )
@g.kishore: Yes, we recently added support for indexing nested json fields
@ravi.maddi: I have been gone through Pinot documentation that Pinot support Avro, but I am not able to find any samples or sample code regarding that. Can you help by referring some code with Pinot and Avor combination.
@g.kishore: If you ingest avro into Pinot complex nested fields will be automatically converted to json
@1705ayush: Hi all, I am writing this to explain the *loop of problems* that we are facing while working on the *architecture* having *Superset* (v1.0.1), *Pinot*(latest docker image) and *Presto* (starburstdata/presto:350-e.3 docker image). Working around a problem in one framework causes problem in the other. I do not know which community can help me to solve this hence, posting it on both. Till now: We have successfully pushed 1 million records in a pinot table and would like to build charts on Superset over it. *Problem # 1* We connected superset to pinot successfully and were able to build SQL lab queries only to find out that Superset does not support Exploring of SQL Lab virtual data as a chart if the connected database is Apache Pinot. (The "Explore" button is disabled) Please let me know, if this can be solved or we interpreted it incorrectly as it will solve the whole problem at once. To work it around, we got to know that superset - presto connection would enable this Explore button and we had implementation of presto any-which ways in our plan. So, we implemented Presto on top of pinot. *Problem # 2* We found that Presto cannot aggregate pinot records of count more than 50k throwing error `Segment query returned '50001' rows per split, maximum allowed is '50000' rows. with query "SELECT * FROM pinot_table LIMIT 50001"` Presto cannot even query something like this: ```presto:default> select count(*) from pinot.default.pinot_table;``` Even, if we increase the 50k limit of presto's pinot.properties `pinot.max-rows-per-split-for-segment-queries` to 1 million, the presto server crashes stating heap memory exceeded. To work it around, we got to know that we can make pinot to do the aggregations and feed the aggregated result to presto which will in turn feed the superset to visualize the charts, by writing the aggregation logic inside the sub query of presto like, ```presto:default> select * from pinot.default."select count(*) from pinot_table"``` This returns the expected result. *Problem # 3* We found that, though we can make pinot to do the aggregations, we cannot use the supported transformation function of pinot listed , inside the sub query of presto. The query ```select datetrunc('day', epoch_ms_col, 'milliseconds') from pinot_table limit 10``` works fine in pinot but when embedded in presto as sub query like below does not work ```presto:default> select * from pinot.default."select datetrunc('day', epoch_ms_col, 'milliseconds') from pinot_table limit 10"; Query failed: Column datetrunc('day',epoch_ms_col,'milliseconds') not found in table default.select datetrunc('day', epoch_ms_col, 'milliseconds') from pinot_table limit 10``` I do not know if we are doing something wrong while querying/implementing or have missed some useful config setting that can solve our problem. The SQL Lab query which we want to query from pinot and eventually use the result to make a chart is like ```SELECT day_of_week(epoch_ms_col), count(*) from pinot_table group by day_of_week(epoch_ms_col)``` Any help is really appreciated !!!
@g.kishore: Thanks Ayush. Looking into it. can you file a github issue.. this is super detailed and can be helpful for other developers
@fx19880617: For #3, @elon.azoulay do you know if it’s supported?
@elon.azoulay: Hey, I wanted to try some things out, might have a pr to fix that if I can't find a workaround
@elon.azoulay: Already have one open that I can add to:

#random

@matteo.santero: @matteo.santero has joined the channel
@ravi.maddi: @ravi.maddi has joined the channel

#troubleshooting

@humengyuk18: Hi team, when connecting pinot using python pinotdb driver, how should I route to different tenant broker? I have configured different table using different tenant, should I use different connection string for different tenant?
@fx19880617:
@fx19880617: ```conn = connect(host="localhost", port=8000, path="/query/sql", scheme="http")```
@fx19880617: You can set tenant broker as `host`
@humengyuk18: So I should set different connection for different tenant?
@fx19880617: You can also try `sqlalchemy.engine` , example is in the same python file
@fx19880617: yes
@humengyuk18: Got it, thanks
@fx19880617: basically per tenant a connection
@matteo.santero: @matteo.santero has joined the channel
@matteo.santero: Hello Here, I am new in Pinot and Presto, so maybe my question has an obvious explenation I am not seeing. I ve a table split in OFFLINE and REALTIME, with a defined key and time. >From pinot interface I am getting 1 record (the OFFLINE one) while from Presto by DBeaver client I am getting 2 records (one for each table) Is there something I am missing in thew configuration for this? ```upsertConfig mode true in REALTIME pinot 0.6.0 presto 0.247 dbeaver 21.0.0.202103021012``` Thanks you very much in advance
@g.kishore: have you turned on upsert?
@matteo.santero: i supposed that the `upsertConfig mode true ` mean yes but probably is not enough
@matteo.santero: strange is that the pinot web interface seems is using the upsert setitng while presto not
@g.kishore: I dont think upsert can support hybrid table's as of today.. @yupeng ^^
@matteo.santero: I don’t know if is something i forgot on the link between the 2 systems
@yupeng: yes, upsert is only for realtime table now. there is an ongoing PR to address upsert table with longer retention
@falexvr: Guys, we have an error in our controller instances, we get this message which we think might be preventing the kafka consumer to work properly: ```Got unexpected instance state map: {Server_mls-pinot-server-1.mls-pinot-server-headless.production.svc.cluster.local_8098=ONLINE, Server_mls-pinot-server-2.mls-pinot-server-headless.production.svc.cluster.local_8098=ONLINE} for segment: dpt_video_event_captured_v2__0__22__20210306T1745Z``` Would you please tell me what can cause this issue?
@falexvr: This is what follows to that log entry
@falexvr: Ah... It seems I'm in some kind of weird scenario not currently being handled by the validation manager
@falexvr: All my segments are ONLINE, all in DONE state, the last segment doesn't have any CONSUMING replicas
@falexvr: So, what can we do to fix this?
@ravi.maddi: @ravi.maddi has joined the channel
@joshhighley: when ingesting large amounts of streaming data, is there any mechanism for capturing/notifying about data errors other than looking through the server log files? We're ingesting millions of records as fast as Pinot can ingest them, so looking through logs isn't really practical.
@dlavoie: Please share your use case to this issue:
@joshhighley: that seems to be at a table level -- connectivity issues or configuration issues. Is row-level monitoring applicable to that issue?
@dlavoie: It’s worth sharing as this bring another perspective to the general “ingestion” observability topic.
@dlavoie: If one row breaks ingestion, it’s equaly important to bubble up the details than a table misconfiguration.
@dlavoie: I think both issues fits in the “table ingestion status” story
@g.kishore: there is a metric that you can monitor for errors during ingestion
@joshhighley: I see an error count in ServerMeter but it's just a counter. Per instructions here: I exposed the metrics but the counter is still 0 despite errors in the logs

#pinot-dev

@npawar: Quickstart isn’t working for me in Intellij after pulling recent changes. Anyone else seeing this? ```2021/03/11 20:28:36.874 ERROR [StartServiceManagerCommand] [main] Failed to start a Pinot [CONTROLLER] at 10.357 since launch java.lang.IllegalStateException: Failed to initialize PinotMetricsFactory. Please check if any pinot-metrics related jar is actually added to the classpath. at com.google.common.base.Preconditions.checkState(Preconditions.java:444) ~[guava-20.0.jar:?] at org.apache.pinot.common.metrics.PinotMetricUtils.initializePinotMetricsFactory(PinotMetricUtils.java:85) ~[classes/:?] at org.apache.pinot.common.metrics.PinotMetricUtils.init(PinotMetricUtils.java:57) ~[classes/:?] at org.apache.pinot.controller.ControllerStarter.initControllerMetrics(ControllerStarter.java:483) ~[classes/:?] at org.apache.pinot.controller.ControllerStarter.start(ControllerStarter.java:279) ~[classes/:?] at org.apache.pinot.tools.service.PinotServiceManager.startController(PinotServiceManager.java:116) ~[classes/:?] at org.apache.pinot.tools.service.PinotServiceManager.startRole(PinotServiceManager.java:91) ~[classes/:?] at org.apache.pinot.tools.admin.command.StartServiceManagerCommand.lambda$startBootstrapServices$0(StartServiceManagerCommand.java:234) ~[classes/:?] at org.apache.pinot.tools.admin.command.StartServiceManagerCommand.startPinotService(StartServiceManagerCommand.java:286) [classes/:?] at org.apache.pinot.tools.admin.command.StartServiceManagerCommand.startBootstrapServices(StartServiceManagerCommand.java:233) [classes/:?] at org.apache.pinot.tools.admin.command.StartServiceManagerCommand.execute(StartServiceManagerCommand.java:183) [classes/:?] at org.apache.pinot.tools.admin.command.StartControllerCommand.execute(StartControllerCommand.java:130) [classes/:?] at org.apache.pinot.tools.admin.command.QuickstartRunner.startControllers(QuickstartRunner.java:124) [classes/:?] at org.apache.pinot.tools.admin.command.QuickstartRunner.startAll(QuickstartRunner.java:170) [classes/:?] at org.apache.pinot.tools.Quickstart.execute(Quickstart.java:169) [classes/:?] at org.apache.pinot.tools.admin.command.QuickStartCommand.execute(QuickStartCommand.java:78) [classes/:?] at org.apache.pinot.tools.admin.PinotAdministrator.execute(PinotAdministrator.java:164) [classes/:?] at org.apache.pinot.tools.admin.PinotAdministrator.main(PinotAdministrator.java:184) [classes/:?]```
@npawar: @jlli could this be related to the metrics registry changes ?
@jlli: How do you run the `StartServiceManagerCommand`?
@npawar: i’m running PinotAdministrator with `QuickStart -type OFFLINE` arguments
@jlli: I see. It seems the actual metric module is missing. Let me create a hotfix in pinot-tools module
@npawar: thanks!
@g.kishore: Don’t we have a QuickStart test yet?
@npawar: this only happens on IDE i think, because you dont add any plugins to the classpath
@jlli: Opened a PR for that and tested it in my IDE:
@jlli: I’ve validated all the scripts in pinot-distribution and they work fine. So adding the missing one to pinot-tools would be sufficient
@fx19880617: the script works as it has all the dependencies from plugin directory
@anumukhe: @anumukhe has joined the channel
@anumukhe: Hi we will be installing Pinot cluster in AWS on top of EKS. We know that in AWS EKS has Multi (Three) Availability Zone (AZ) based HA in a specific Region. So I would like to understand that whether the EKS based Pinot cluster will be by default Fault Tolerant & HA within the region in case of any AZ failure or not. I know that Pinot Server has Segment Replica and replica-group which provide HA within the cluster in case of server failure. But what will happen if the controller has issue in the cluster (on EKS) or multiple servers have been corrupted or the cluster (on EKS) as a whole goes down? Considering the fact that the server will have EBS as data serving file system (& EBS multi AZ replication/sync will be ON), will EKS by default bring up alternative node like Controller or Server (or even Broker)? Net-net can we expect 100% service availability in Pinot on EKS in any Region? Or do we need to setup another Pinot Cluster on EKS on another AZ i.e. minimum Two Pinot Cluster (On EKS) in Two AZ within a Region? Please suggest (edited)

#segment-write-api

@npawar: @yupeng thread safety would be each implementations decision rt? do we need something on the interface? it is acceptable for someone to write a non thread safe impl and use it the way they want in their application
@yupeng: thats my question :slightly_smiling_face:
@yupeng: do you recommend thread safety for all implementations?
@npawar: okay, then no :slightly_smiling_face:
@yupeng: i wonder in the connector, what implementation it’ll use
@yupeng: basically, i am thinking of how to implement this async flush
@yupeng: in the wrapper, or let the connector do it
@npawar: thinking.. async flush would mean we create a new buffer and start collecting while flush completes. if flush encounters errors, then what happens? how will the connector go back and resubmit for that segment? and what if there’s some permanent exception in flush, we’ll keep creating new buffers
@yupeng: yes
@yupeng: i think there can be two buffers, staging (collecting records) and ready (to flush)
@yupeng: if there are issues with `ready`, then staging cannot be moved to ready
@yupeng: and eventually be blocking
@yupeng: the purpose is to have some simple pipeline not to block the record collection, assuming segment creation/upload will take some time
@npawar: `if there are issues with ready, then staging cannot be moved to ready` - what about the data that was in `ready`? how will you recover that?
@yupeng: same to `staging` ?
@npawar: didnt get you
@yupeng: i meant to have more extra buffer for better parallelism, the error handling would be similar to a single buffer
@yupeng: if there is a single buffer, how do we recover?
@npawar: flink connector will be waiting for success response from the segment build. If it fails, the connector knows the last point of flush, and will try write again. Say last flush was at 100, and now it is 250. But the flush at 250 failed. Flink connector will write from 100. But if the connector has moved onto writing new records, i.e. called flush at 250 and began writing, and is now at 330 when it received response that the flush at 250 was a failure. Then what does the flink connector do for records 100-250?
@yupeng: it’ll wait block there?
@yupeng: btw, i think like what spark connector behaves today, an error during the segment load will fail the entire job
@yupeng: i think similar behavior will be there in flink connector
@npawar: Alright, if you think the flink connector can handle that, then I have no concerns about doing the 2 burgers design
@npawar: Do you mind getting on a call with me next week and explaining it from the flink side some more? I've never worked with flink, so I'm probably not seeing what you are
@yupeng: sure thing
@yupeng: it’s just one buffer vs two buffers
@yupeng: and it’s internal to connector users
@yupeng: users only know there is some issue that it cannot upload
@yupeng: but staging/ready is internal
@chinmay.cerebro: how will this work with checkpointing ?
@chinmay.cerebro: the segments have to be aligned with checkpoint barriers is it not ?
@npawar: The segment writer doesn't have to worry about it right?
@chinmay.cerebro: no , but the connector will
@chinmay.cerebro: I mean the flush call will be of relevance
@chinmay.cerebro: The sink connector has to acknowledge a snapshot or checkpoint to the job manager - that's the point when everything upto that checkpoint is deemed consistent (and any failure will resume from the last checkpoint) .
@chinmay.cerebro: so the sink connector needs to know whether something has been persisted to the sink or not in a reliable manner
@chinmay.cerebro: Think of how this happens in a stateful Flink operator (which is using something like RocksDB) - the operator is doing local changes to RocksDB . In the case of a checkpoint barrier - it will block any further input processing and checkpoint the rocksdb state before moving on
@chinmay.cerebro: which means the RocksDB snapshot is a blocking operation
@chinmay.cerebro: I feel the Sink connector might have to follow similar paradigm
@chinmay.cerebro: @yupeng thoughts ?
@npawar: Hmm thanks for the context and details
@npawar: This is kinda why I think that the connector should not proceed unless it gets acknowledgement that push is done
@chinmay.cerebro: yes
@chinmay.cerebro: but its been a while for me for Flink - maybe there are newer things that I don't know about
@chinmay.cerebro: lets wait for Yupeng's comments
@npawar: Okay. Let's have a call on Tuesday? We've been going back and forth on this async discussion
@chinmay.cerebro: I also think - the first version shouldn't try to over optimize
@chinmay.cerebro: keep it simple
@chinmay.cerebro: sounds good @ Tuesday
@yupeng: yes, in current proposal, checkpoint not supported for pinot sink
@yupeng: so if the connector fails, the job has to be relaunched
@yupeng: as i explained in the connector design doc
@yupeng: however, it’s still possible to support checkpoint in future
@yupeng: that checkpoint shall remember the last successful segment name in the folder
@yupeng: and discard other segments whose sequence numbers are higher

#releases

@tanmay.movva: @tanmay.movva has joined the channel

#debug_upsert

@g.kishore: @g.kishore has joined the channel
@yupeng: @yupeng has joined the channel
@matteo.santero: @matteo.santero has joined the channel
@jackie.jxt: @jackie.jxt has joined the channel
@g.kishore: @yupeng @jackie.jxt
@matteo.santero:
@matteo.santero:
@matteo.santero:
@matteo.santero:
@matteo.santero: ```upsertConfig mode true in REALTIME pinot 0.6.0 presto 0.247 dbeaver 21.0.0.202103021012```
@josefarf: @josefarf has joined the channel
@jackie.jxt: @matteo.santero For the same primary key, there should be only one server queried
@jackie.jxt: Please check if the kafka stream is properly partitioned
@jackie.jxt: Sorry, didn't notice the other thread that this is a hybrid table. As Yupeng mentioned, upsert does not work on hybrid table
@g.kishore: Right.. do we have an eta on allowing batch updates to upsert enabled tables?
@g.kishore: We should flag this error somewhere
--------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]