Apache Pinot Daily Email Digest (2022-05-12)

Pinot Slack Email Digest Thu, 12 May 2022 19:00:45 -0700

#general

@saumya2700: Do we have any option to save pinot ingestion time in table so that we know if any latency in table while ingesting data from kafka.
@mayanks: Not atm, note that replicas ingest independent of each other. cc: @npawar @navi.trinity
@npawar: you could add a transformFunction `now()` on a new column. But with multiple replicas, there’s no guarantee about data consistency across replicas. This feature would be useful for you once it is ready:
@akashsaluja: @akashsaluja has joined the channel
@octchristmas: I have created a batch pipeline that stores datafiles from cloudera impala parquet table to pinot cluster. How to gracefully swap segments if the number of input files gets smaller? Like this: `segment.name.prefix : normalizedDate` `exclude.sequence.id : false` -- input data file ``` ``` -- pinot segment ``` ``` ------------------------------ If I redo the batch and the data file is reduced to two: -- input data file ``` ``` `segment.name: fixed` If I have to use the 'segment.name:fixed' setting, how can I gracefully delete the segment 'batch_2022-05-12_2022-05-12_2'?
@ken: It’s not atomic, but you could push the two new segments, and delete the third segment using the UI or REST API. Or for a real hack, create a segment with no data that has the same name as the third segment.
@npawar: if you can generate unique names every time (prolly wiht a prefix), you may be able to use the startSegmentReplace endSegmentReplace constructs. @snlee is this somewhere we can use that?
@dnanassy: @dnanassy has joined the channel
@filipdolinski: @filipdolinski has joined the channel
@tanmay.movva: Hello! Is there a php client available for Pinot?
@mitchellh: Hi, There is not a native PHP client. However the apache pinot end points are HTTP based. You can use the swagger docs link in the UI to see all of the RESTful endpoints
@tanmay.movva: Got it. We are doing the same now. Just wanted to check if there is a client available. Thanks! @mitchellh
@g.kishore: If you are around and would love to know how Cisco Webex is powering End-user analytics using Pinot - you can join
@harshininair14: @harshininair14 has joined the channel

#random

@akashsaluja: @akashsaluja has joined the channel
@dnanassy: @dnanassy has joined the channel
@filipdolinski: @filipdolinski has joined the channel
@tonya: Starting in 20minutes! :computer:
@harshininair14: @harshininair14 has joined the channel

#troubleshooting

@saumya2700: We are facing very few times latency issue in pinot, so I tried to see from Grafana and grafana is showing very high latency but this is not the case. Why graph is showing that much latency this is giving wrong impression. Is there anything we need to change in Table Consuming Latency graph,I added the graphs as it is in Grafana as mentioned in monitoring link in pinot documents :
@saurabhd336: Can you share the promql query you're plotting?
@saumya2700: *avg by (table) (pinot_server_freshnessLagMs_50thPercentile{namespace="pinot-qa"})* and same for other percentiles
@saumya2700: avg by (table) (pinot_server_freshnessLagMs_50thPercentile{namespace="pinot-qa"}) and same for other percentiles
@akashsaluja: @akashsaluja has joined the channel
@dnanassy: @dnanassy has joined the channel
@luys8611: I'm struggling to add Segments to pinot-controller using this command. `docker exec -it manual-pinot-controller bin/pinot-admin.sh LaunchDataIngestionJob -jobSpecFile /data/docker-job-spec.yml -exec` But it gives an error. ``` 2022/05/12 09:51:12.633 ERROR [PinotAdministrator] [main] Exception caught: picocli.CommandLine$UnmatchedArgumentException: Unknown option: '-exec'``` Anyone can help me with this?
@xiangfu0: remove `-exec`?
@saumya2700: hi, I am struggling to update table config, I have updated schema to add new column and for same column I have written transformationFunction, in existing table it is showing the new column but not copying the value in that new column, with same table config I created new table and all is working fine. I have also reloaded the segments but not working with the existing table, record's json string is -> ```{"header": {"nnTransId": "9003", "qid": 1, "timestamp": 1234567890123}, "status": "N200_SUCCESS"}``` ```"ingestionConfig": { "transformConfigs": [ { "columnName": "header_js", "transformFunction": "jsonFormat(header)" }, { "columnName": "header_nnTransId", "transformFunction": "JSONPATHSTRING(header, '$.nnTransId')" } .....```
@mayanks: Did you not do both at the same time? One issue I can see is that adding transform function on existing column is considered backward incompatible and is not allowed. If you created the column in schema first and later trying to add transform function you might run into this issue
@saumya2700: both at the same time means , I first created column in schema today and just after that updated table schema. Without adding column in schema it wont allow to update column in table config.
@mayanks: Are you able to update the table config with transform, and confirm that it is accepted?
@mayanks: You can check by querying table config to see if your transform function shows up
@saumya2700: yes it says updated successfully
@saumya2700: I can see new column also in table when querying, query is also happening just value is not coming in that column always null, no errors in logs. With same config when I created new table value is coming up, so no issue in transformation function also.
@mayanks: Hmm that doesn’t make sense. If table config shows transform function when you check in Pinot UI, then it should be applied
@kharekartik: Hi, so were there any records in the table before the schema and table config were updated? If yes, can you check if the query result contains older records or new records
@filipdolinski: @filipdolinski has joined the channel
@luisfernandez: hello my friends it’s me again, does anyone know what would be the reason of zookeeper crashing while we are ingesting data with the job yaml? we are running some migrations and it seems like zookeeper is just keeping on being sad crashing a lot, also what’s your recommended sizing for zookeeper, we are just using the default in the helm chart we may be hitting some roof
@mayanks: Too many segments (tens of thousands or more)?
@luisfernandez: i see 2k segments after the migration was done but at the end is def more than that
@luisfernandez: how can i tell?
@mayanks: Pinot UI should show
@mayanks: 2k is small
@luisfernandez: the thing is that i don’t know if all the segments that were supposed to be migrated were migrated
@luisfernandez: we have a 512mb with 256 heap zk
@luisfernandez: 3 replicas
@mayanks: @dlavoie any comments
@dlavoie: What is the crash error?
@luisfernandez: i’m trying to find that out :smile:
@luisfernandez:
@dlavoie: `kubectl logs pinot-zookeeper-1 --previous`
@luisfernandez: ```2022-05-12 10:15:32,603 [myid:2] - INFO [WorkerSender[myid=2]:QuorumCnxManager@430] - Have smaller server identifier, so dropping the connection: (3, 2) 2022-05-12 10:15:32,604 [myid:2] - INFO [WorkerReceiver[myid=2]:FastLeaderElection@679] - Notification: 2 (message format version), 2 (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 2 (n.sid), 0x0 (n.peerEPoch), LOOKING (my state)0 (n.config version) 2022-05-12 10:15:32,614 [myid:2] - INFO [WorkerReceiver[myid=2]:FastLeaderElection@679] - Notification: 2 (message format version), 3 (n.leader), 0x110001d0cf (n.zxid), 0x6 (n.round), FOLLOWING (n.state), 1 (n.sid), 0x12 (n.peerEPoch), LOOKING (my state)0 (n.config version) 2022-05-12 10:15:32,615 [myid:2] - INFO [WorkerReceiver[myid=2]:FastLeaderElection@679] - Notification: 2 (message format version), 3 (n.leader), 0x110001d0cf (n.zxid), 0x6 (n.round), FOLLOWING (n.state), 1 (n.sid), 0x12 (n.peerEPoch), LOOKING (my state)0 (n.config version) 2022-05-12 10:15:32,816 [myid:2] - INFO [QuorumPeer[myid=2](plain=/0.0.0.0:2181)(secure=disabled):FastLeaderElection@919] - Notification time out: 400 2022-05-12 10:15:32,816 [myid:2] - INFO [WorkerSender[myid=2]:QuorumCnxManager@430] - Have smaller server identifier, so dropping the connection: (3, 2) 2022-05-12 10:15:32,817 [myid:2] - INFO [WorkerReceiver[myid=2]:FastLeaderElection@679] - Notification: 2 (message format version), 2 (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 2 (n.sid), 0x0 (n.peerEPoch), LOOKING (my state)0 (n.config version) 2022-05-12 10:15:32,817 [myid:2] - INFO [WorkerReceiver[myid=2]:FastLeaderElection@679] - Notification: 2 (message format version), 3 (n.leader), 0x110001d0cf (n.zxid), 0x6 (n.round), FOLLOWING (n.state), 1 (n.sid), 0x12 (n.peerEPoch), LOOKING (my state)0 (n.config version) 2022-05-12 10:15:33,217 [myid:2] - INFO [QuorumPeer[myid=2](plain=/0.0.0.0:2181)(secure=disabled):FastLeaderElection@919] - Notification time out: 800 2022-05-12 10:15:33,221 [myid:2] - INFO [WorkerSender[myid=2]:QuorumCnxManager@430] - Have smaller server identifier, so dropping the connection: (3, 2) 2022-05-12 10:15:33,221 [myid:2] - INFO [WorkerReceiver[myid=2]:FastLeaderElection@679] - Notification: 2 (message format version), 2 (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 2 (n.sid), 0x0 (n.peerEPoch), LOOKING (my state)0 (n.config version) 2022-05-12 10:15:33,221 [myid:2] - INFO [WorkerReceiver[myid=2]:FastLeaderElection@679] - Notification: 2 (message format version), 3 (n.leader), 0x110001d0cf (n.zxid), 0x6 (n.round), FOLLOWING (n.state), 1 (n.sid), 0x12 (n.peerEPoch), LOOKING (my state)0 (n.config version) 2022-05-12 10:15:34,022 [myid:2] - INFO [QuorumPeer[myid=2](plain=/0.0.0.0:2181)(secure=disabled):FastLeaderElection@919] - Notification time out: 1600 2022-05-12 10:15:34,023 [myid:2] - INFO [WorkerSender[myid=2]:QuorumCnxManager@430] - Have smaller server identifier, so dropping the connection: (3, 2) 2022-05-12 10:15:34,023 [myid:2] - INFO [WorkerReceiver[myid=2]:FastLeaderElection@679] - Notification: 2 (message format version), 2 (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 2 (n.sid), 0x0 (n.peerEPoch), LOOKING (my state)0 (n.config version) 2022-05-12 10:15:34,023 [myid:2] - INFO [WorkerReceiver[myid=2]:FastLeaderElection@679] - Notification: 2 (message format version), 3 (n.leader), 0x110001d0cf (n.zxid), 0x6 (n.round), FOLLOWING (n.state), 1 (n.sid), 0x12 (n.peerEPoch), LOOKING (my state)0 (n.config version) 2022-05-12 10:15:35,624 [myid:2] - INFO [QuorumPeer[myid=2](plain=/0.0.0.0:2181)(secure=disabled):FastLeaderElection@919] - Notification time out: 3200 2022-05-12 10:15:35,624 [myid:2] - INFO [WorkerSender[myid=2]:QuorumCnxManager@430] - Have smaller server identifier, so dropping the connection: (3, 2) 2022-05-12 10:15:35,624 [myid:2] - INFO [WorkerReceiver[myid=2]:FastLeaderElection@679] - Notification: 2 (message format version), 2 (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 2 (n.sid), 0x0 (n.peerEPoch), LOOKING (my state)0 (n.config version) 2022-05-12 10:15:35,625 [myid:2] - INFO [WorkerReceiver[myid=2]:FastLeaderElection@679] - Notification: 2 (message format version), 3 (n.leader), 0x110001d0cf (n.zxid), 0x6 (n.round), FOLLOWING (n.state), 1 (n.sid), 0x12 (n.peerEPoch), LOOKING (my state)0 (n.config version) 2022-05-12 10:15:38,825 [myid:2] - INFO [QuorumPeer[myid=2](plain=/0.0.0.0:2181)(secure=disabled):FastLeaderElection@919] - Notification time out: 6400 2022-05-12 10:15:38,826 [myid:2] - INFO [WorkerSender[myid=2]:QuorumCnxManager@430] - Have smaller server identifier, so dropping the connection: (3, 2) 2022-05-12 10:15:38,826 [myid:2] - INFO [WorkerReceiver[myid=2]:FastLeaderElection@679] - Notification: 2 (message format version), 2 (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 2 (n.sid), 0x0 (n.peerEPoch), LOOKING (my state)0 (n.config version) 2022-05-12 10:15:38,826 [myid:2] - INFO [WorkerReceiver[myid=2]:FastLeaderElection@679] - Notification: 2 (message format version), 3 (n.leader), 0x110001d0cf (n.zxid), 0x6 (n.round), FOLLOWING (n.state), 1 (n.sid), 0x12 (n.peerEPoch), LOOKING (my state)0 (n.config version) 2022-05-12 10:15:45,227 [myid:2] - INFO [QuorumPeer[myid=2](plain=/0.0.0.0:2181)(secure=disabled):FastLeaderElection@919] - Notification time out: 12800 2022-05-12 10:15:45,228 [myid:2] - INFO [WorkerSender[myid=2]:QuorumCnxManager@430] - Have smaller server identifier, so dropping the connection: (3, 2) 2022-05-12 10:15:45,228 [myid:2] - INFO [WorkerReceiver[myid=2]:FastLeaderElection@679] - Notification: 2 (message format version), 2 (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 2 (n.sid), 0x0 (n.peerEPoch), LOOKING (my state)0 (n.config version) 2022-05-12 10:15:45,233 [myid:2] - INFO [WorkerReceiver[myid=2]:FastLeaderElection@679] - Notification: 2 (message format version), 3 (n.leader), 0x110001d0cf (n.zxid), 0x6 (n.round), FOLLOWING (n.state), 1 (n.sid), 0x12 (n.peerEPoch), LOOKING (my state)0 (n.config version) 2022-05-12 10:15:56,491 [myid:2] - INFO [NIOWorkerThread-1:FourLetterCommands@234] - The list of known four letter word commands is : [{1936881266=srvr, 1937006964=stat, 2003003491=wchc, 1685417328=dump, 1668445044=crst, 1936880500=srst, 1701738089=envi, 1668247142=conf, -720899=telnet close, 2003003507=wchs, 2003003504=wchp, 1684632179=dirs, 1668247155=cons, 1835955314=mntr, 1769173615=isro, 1920298859=ruok, 1735683435=gtmk, 1937010027=stmk}] 2022-05-12 10:15:56,491 [myid:2] - INFO [NIOWorkerThread-1:FourLetterCommands@235] - The list of enabled four letter word commands is : [[wchs, stat, wchp, dirs, stmk, conf, ruok, mntr, srvr, wchc, envi, srst, isro, dump, gtmk, telnet close, crst, cons]] 2022-05-12 10:15:56,491 [myid:2] - INFO [NIOWorkerThread-1:NIOServerCnxn@518] - Processing ruok command from /127.0.0.1:56218 2022-05-12 10:15:57,298 [myid:2] - INFO [NIOWorkerThread-2:NIOServerCnxn@518] - Processing srvr command from /127.0.0.1:56224```
@dlavoie: I see no shutdown command.
@dlavoie: `kubectl describe pod pinot-zookeeper-1` Should give an exit status for previous container
@luisfernandez: ``` Last State: Terminated Reason: Error Exit Code: 143```
@dlavoie: We get this error when the liveness probe stops returning exit code 0
@mayanks: Perhaps too low heap is a potential issue here?
@dlavoie: For 2K segments, that’s a a reasonable assumption
@dlavoie: But I would expect some OOM Killed or JVM OOM stacktrace
@luisfernandez: ops people did tell me that those pods seem to be using mostly of the memory available
@dlavoie: Do you have any trace of a OOMKilled in the pod describe output?
@luisfernandez: like w this one? `kubectl describe pod pinot-zookeeper-1`
@dlavoie: yes
@dlavoie: `kubectl describe pod pinot-zookeeper-1 | grep OOM`
@luisfernandez: nada
@dlavoie: Ok, then maybe ZK gracefully reports error status on mntr command when memory is running out. So, I would recommend bumping memory
@luisfernandez: this is our config
@luisfernandez: ```resources: { requests: { cpu: '500m', memory: '1Gi', }, limits: { cpu: '500m', memory: '1Gi', }, },```
@luisfernandez: ```ZK_HEAP_SIZE: '256M',```
@dlavoie: I would 2x or 4x all of these values
@luisfernandez: AHA
@luisfernandez: ```2022-05-06 15:55:56,927 [myid:3] - WARN [LearnerHandler-/10.24.10.144:59094:LearnerHandler@928] - Ignoring unexpected exception java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1220) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:335) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:339) at org.apache.zookeeper.server.quorum.LearnerHandler.shutdown(LearnerHandler.java:926) at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:647) 2022-05-06 15:55:56,927 [myid:3] - WARN [QuorumPeer[myid=3](plain=/0.0.0.0:2181)(secure=disabled):ZooKeeperThread@55] - Exception occurred from thread QuorumPeer[myid=3](plain=/0.0.0.0:2181)(secure=disabled) java.lang.OutOfMemoryError: Java heap space 2022-05-06 15:55:56,927 [myid:3] - WARN [LearnerHandler-/10.24.7.181:38132:LearnerHandler@928] - Ignoring unexpected exception java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1220) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:335) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:339) at org.apache.zookeeper.server.quorum.LearnerHandler.shutdown(LearnerHandler.java:926) at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:647) 2022-05-06 15:55:56,928 [myid:3] - INFO [main:QuorumPeerMain@104] - Exiting normally```
@luisfernandez: in `pinot-zookeeper-2`
@dlavoie: there you go :slightly_smiling_face: Smoking gun
@luisfernandez: cpu also you recommend 2x or 4x?
@dlavoie: 2x is going to be fine, but my recommendation would be to instrument and measure resource usage
@dlavoie: and adjust
@luisfernandez: :pray: will do so
@luisfernandez: now, in terms of the job spec yaml, if zookeeper dies what are the implications for the job
@luisfernandez: i do see a lot of exceptions in the controller where we ran the job
@dlavoie: @mayanks ^^
@luisfernandez: and i imagine that zookeeper dying would just take all the controllers to a sad state and then mess up with the import
@mayanks: Yes, exceptions are probably effect rather than cause
@luisfernandez: yea, so we have beefed up those machines
@luisfernandez: also another question, is there anything we have to do to the helm config in order for us to get grafana metrics for zk?
@dlavoie: Yeah current helm chart is broken
@luisfernandez: it seems like that future is also recent
@dlavoie: latest ZK embedds a prometheus
@dlavoie: We implemented it in our Cloud operator. You only meed to add a ZK properties. Chart needs that default properties and the removal of the side car containers
@dlavoie: If you are interested in contributing to the project, that’s a nice one to look at.
@alihaydar.atil: Hello everyone, Is it possible to generate fixed segment names with sequenceId appended to them? I have an input folder with multiple csv files in it. I want to run an ingestion job to import them. I am trying to use backfill data feature to truncate my table. I want to replace those segments with another data set in the future and data doesn't have a time column actually. It seems that segment names are playing a role in replacing segments. That's why i am asking about fixed name plus sequenceId. Thanks in advance :pray:
@mayanks: I think setting the table as REFRESH table will do that
@alihaydar.atil: Thank you @mayanks it did the work. Would it be possible to do this on a table with timeColumn? If I wanted to discard all the old segments and import fresh data.
@mayanks: Hmm do you want to do it on a regular basis or once in a while ?
@mayanks: For append table, you can still achieve, if your input folders are date partitioned. Then segment name for a day will be deterministic
@alihaydar.atil: it would be nice to do it on a regular basis. I would like to keep daily data in my table.
@stuart.millholland: Hi. I'm trying to setup gcs as the data bucket for our pinot controller in our gke dev environments only. I've set things up in the extra.configs in the controller section and I'm getting this error: Local temporary directory is not configured, cannot use remote data directory
@stuart.millholland: Ah I may have answered my own question, Looks like my extra configs were overwritten
@mayanks: Also for reference:
@stuart.millholland: Thanks!
@harshininair14: @harshininair14 has joined the channel

#pinot-dev

@noon: @noon has joined the channel

#getting-started

@akashsaluja: @akashsaluja has joined the channel
@dnanassy: @dnanassy has joined the channel
@filipdolinski: @filipdolinski has joined the channel
@madhumitamantri: @madhumitamantri has joined the channel
@harshininair14: @harshininair14 has joined the channel

#jobs

@madhumitamantri: @madhumitamantri has joined the channel

#introductions

#linen_dev

@kam: @xiangfu0 btw if you have the export history I can upload it to Linen to have the conversations show up
@xiangfu0: sure, let me retry this first
--------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org For additional commands, e-mail: dev-h...@pinot.apache.org