Apache Pinot Daily Email Digest (2022-04-18)

Pinot Slack Email Digest Mon, 18 Apr 2022 19:00:38 -0700

#general

@jayeshchoudhary619: @jayeshchoudhary619 has joined the channel
@xuhongkun1103: Hi, @slack1 Do you mind spending some time reviewing this PR. . Thanks in advance!
@slack1: Hi Kevin, yes - I’ll have a look today. Thank you for your contribution. This is a valuable addition!
@kennybastani: Amazing!
@xuhongkun1103: @slack1 Thanks for your time and attention to this feature.
@matheus.felisberto: @matheus.felisberto has joined the channel

#random

@jayeshchoudhary619: @jayeshchoudhary619 has joined the channel
@matheus.felisberto: @matheus.felisberto has joined the channel

#troubleshooting

@saumya2700: Hi All, what is the way to retain data in realtime table with no time limit. I configured tables with no retention time limit. But after 3,4 weeks I can see from query console that totalDocs are very few . What is the config property if we dont want to purge any data from table?
@mayanks: Not configuring is the correct way. If you are seeing less data, check the debug endpoint to see if there are segments that are offline for some reason. Also, if you have infinite retention, you should ensure you have enough capacity to handle infinite retention.
@saumya2700: Debug endpoint is showing all well. Just for the knowledge, If lets say it has not enough capacity then how pinot behave, will it raise any alert or just it stops ingesting data.
@mayanks: Yes Pinot has metrics you can monitor and set alert on
@mayanks:
@saumya2700: After checking server logs , I checked this error is coming , can below exception cause message skip ? What it is about in my table config I am using flattened structure , I am using fieldsToUnnest and have multiple json nested objects in message. Although data is coming in the table from topic sometimes. ```Sending request: to controller: pinot-controller-0.pinot-controller-headless.pinot.svc.cluster.local, version: Unknown Exception while in work java.lang.RuntimeException: shaded.com.fasterxml.jackson.databind.JsonMappingException: Infinite recursion (StackOverflowError) (through reference chain: org.apache.pinot.spi.data.readers.GenericRow["fieldToValueMap"]->java.util.Collections$UnmodifiableMap["$MULTIPLE_RECORDS_KEY$"]->java.util.ArrayList[0]```
@mayanks: @francois how did you solve this issue?
@mayanks:
@francois: Bumping to 10.0.0 version have removed it but it reapear when using filterfunctions :/ filled a ticket with some info. I will provide more tomorow as asked in the ticket
@mayanks: Thanks.
@jayeshchoudhary619: @jayeshchoudhary619 has joined the channel
@saumya2700: Hi All, Facing issue many times that data is lost, segments are keep increasing, how pinot decides to create new segments. From console I can see all segments are in Good state. How we can identify why data is getting lost intermittently, which is very frequent now . Which logs we should look, controller , server, broker? Getting this message from controller logs : *Invalid retention time: null null for table: table1_REALTIME, skip*
@mayanks: From your other threads it seems that it could be because of rows not consumed due to the error you pasted. check that thread for reply.
@kishorenaidu712: Hi everyone, I was trying to ingest a JSON data through Kafka. One of the columns is an array of nested JSON and I have marked it as JSON datatype in the schema. When I publish the data to the topic, I get an error in the Pinot server which is "Caused by: java.lang.IllegalStateException: Cannot read single-value from Collection". A sample record would be "fieldToValueMap" : { "Agent_phone_number" : 2807536641, "Call_end_time" : "2021-09-20 19:41:41", "Calling_number" : "4025165405", "Call_start_time" : "2021-09-20 19:38:19", "Account_number" : "4T1QUDSKPI", "Customer_name" : "Dan", "Queue" : { "qdetails" : [ { "queue_duration" : 229, "qname" : "q2" }, { "queue_duration" : 90, "qname" : "q3" } ] }, "Agent_id" : "K3GDP9" }, "nullValueFields" : [ ] Where am I going wrong? I have attached the schema and configuration files in the thread.
@kishorenaidu712: These are my schema and configuration files.
@kharekartik: Hi, In the attached schema there is no field declared as JSON. Also, the sample-record shared doesn't appear to be a valid json. Can you send the full json for it?
@kishorenaidu712: Sorry i had shared a different schema Here's a record from my json file {"Calling_number":9486855381,"Customer_name":"Changed","Account_number":"8H4GORV05Q","Agent_id":"FHG5Z1","Agent_phone_number":4470000588,"Call_start_time":"2021-07-31 01:45:15","Call_end_time":"2021-07-31 01:48:02","Queue": {"qdetails":[{"qname":"q3","queue_duration":150},{"qname":"q2", "queue_duration":157}]}}
@kharekartik: Thanks. Can you also paste the complete stacktrace for `"Caused by: java.lang.IllegalStateException: Cannot read single-value from Collection".` The line should also contain the column name as well
@kishorenaidu712: It's for column 'Queue'
@kishorenaidu712: java.lang.RuntimeException: Caught exception while transforming data type for column: Queue at org.apache.pinot.segment.local.recordtransformer.DataTypeTransformer.transform(DataTypeTransformer.java:95) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at org.apache.pinot.segment.local.recordtransformer.CompositeTransformer.transform(CompositeTransformer.java:83) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.processStreamEvents(LLRealtimeSegmentDataManager.java:532) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.consumeLoop(LLRealtimeSegmentDataManager.java:420) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager$PartitionConsumer.run(LLRealtimeSegmentDataManager.java:598) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at java.lang.Thread.run(Thread.java:829) [?:?] Caused by: java.lang.IllegalStateException: Cannot read single-value from Collection: [111, q2] for column: Queue at .google.common.base.Preconditions.checkState(Preconditions.java:721) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at org.apache.pinot.segment.local.recordtransformer.DataTypeTransformer.standardizeCollection(DataTypeTransformer.java:176) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at org.apache.pinot.segment.local.recordtransformer.DataTypeTransformer.standardize(DataTypeTransformer.java:119) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at org.apache.pinot.segment.local.recordtransformer.DataTypeTransformer.standardize(DataTypeTransformer.java:132) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at org.apache.pinot.segment.local.recordtransformer.DataTypeTransformer.standardizeCollection(DataTypeTransformer.java:159) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at org.apache.pinot.segment.local.recordtransformer.DataTypeTransformer.standardize(DataTypeTransformer.java:119) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at org.apache.pinot.segment.local.recordtransformer.DataTypeTransformer.transform(DataTypeTransformer.java:63) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
@kharekartik: Hi Can you try by setting dataType of `Queue` column to `STRING` in schema. Also, change the `noDictionaryColumns` to `jsonIndexColumns` in table config
@kishorenaidu712: It still throws an error java.lang.RuntimeException: Caught exception while transforming data type for column: Queue
@kharekartik: Hi, It works with `JSON` datatype in master branch. This seems to be a bug. If you are stuck with 0.10 release, I suggest exploring
@npawar: another easier option if you have to use 0.10.0, you can keep everything as STRING, ``` { "name":"QueueName", "dataType":"STRING" }, { "name":"QueueJson", "dataType":"STRING" },``` and then use JSONFORMAT for QueueJson and JSONPATHSTRING for the other extractions ``` "ingestionConfig": { "transformConfigs": [{ "columnName": "QueueName", "transformFunction": "JSONPATHSTRING(Queue,'$.qdetails[0].qname','null')" },{ "columnName": "QueueJson", "transformFunction": "JSONFORMAT(Queue)" }] }```
@luisfernandez: hey friends, i’m seeing our space in zookeeper almost getting to max usage in terms of disk, we have 5gb disk space, we currently have it setup with ``` - name: ZK_PURGE_INTERVAL value: "1" - name: ZK_SNAP_RETAIN_COUNT value: "3"``` in the logs i can see things getting set: ```2022-04-15 16:14:35,914 [myid:1] - INFO [main:DatadirCleanupManager@78] - autopurge.snapRetainCount set to 3 2022-04-15 16:14:35,915 [myid:1] - INFO [main:DatadirCleanupManager@79] - autopurge.purgeInterval set to 1 2022-04-15 16:14:35,979 [myid:1] - INFO [PurgeTask:DatadirCleanupManager$PurgeTask@138] - Purge task started. 2022-04-15 16:14:35,980 [myid:1] - INFO [main:ManagedUtil@46] - Log4j found with jmx enabled. 2022-04-15 16:14:35,988 [myid:1] - INFO [PurgeTask:DatadirCleanupManager$PurgeTask@144] - Purge task completed.``` however i don’t see any more logs of clean up logs after, is there a reason for this? also, i can see that the space is being chugged by the logs does anyone know why things may not get cleaned up? Thank you would appreciate your help.
@mayanks: @xiaoman ^^
@luisfernandez: i did an ls -lh `/data/log/version-2`
@luisfernandez: and things are getting filled up there
@luisfernandez: ```$ ls -lh total 3.2G -rw-rw-r-- 1 zookeeper zookeeper 768M Apr 5 00:45 log.700000001 -rw-rw-r-- 1 zookeeper zookeeper 1.0G Apr 11 17:51 log.700014b0e -rw-rw-r-- 1 zookeeper zookeeper 1.1G Apr 15 16:01 log.8000001e6 -rw-r--r-- 1 zookeeper zookeeper 448M Apr 18 20:11 log.80000ea05```
@xiaoman: A bit off topic but I don’t think this is related to zookeeper node size too big issue; this looks very zookeeper specific cc @mayanks I will try to have a quick look at zookeeper but we may need some zookeeper expertise here
@mayanks: What’s filling the logs @luisfernandez? Also, for prod ZK, you probably want to have persistent store fo snapshots.
@luisfernandez: whatever gets stored in this route ```/data/log/version-2```
@luisfernandez: they do not look like logs tho lol they look like our configs
@mayanks: Transaction logs?
@luisfernandez: example
@luisfernandez: ````ZKLG ?5 "id" : "WorkflowContext", "simpleFields" : { "LAST_PURGE_TIME" : "1646699352774", "NAME" : "TaskQueue_RealtimeToOfflineSegmentsTask", "START_TIME" : "1646234218213", "STATE" : "IN_PROGRESS" }, "mapFields" : { "JOB_STATES" : { "TaskQueue_RealtimeToOfflineSegmentsTask_Task_RealtimeToOfflineSegmentsTask_1646698682995" : "COMPLETED" }, "StartTime" : { "TaskQueue_RealtimeToOfflineSegmentsTask_Task_RealtimeToOfflineSegmentsTask_1646698682995" : "1646698697926" } }, "listFields" : { } } "id" : "etsyads_metrics_dev__0__5__20220307T2037Z", "simpleFields" : {```
@mayanks: Yeah, I am saying it is the ZK transaction logs
@luisfernandez: yes so this is filling up
@luisfernandez: any way to purge them?
@mayanks: You should mount snapshots on EVS, otherwise you are running the risk of data loss
@xiaoman: Not sure if it is helpful but I searched and found this:
@mayanks: cc: @dlavoie
@luisfernandez: aren’t snapshots here? ```/data/version-2```
@luisfernandez: content: ```total 3.5M -rw-r--r-- 1 zookeeper zookeeper 1 Apr 15 19:06 acceptedEpoch -rw-r--r-- 1 zookeeper zookeeper 1 Apr 15 19:06 currentEpoch -rw-rw-r-- 1 zookeeper zookeeper 481K Mar 8 19:58 snapshot.500a20f00 -rw-rw-r-- 1 zookeeper zookeeper 1.3M Apr 5 00:45 snapshot.700014b0d -rw-rw-r-- 1 zookeeper zookeeper 1.8M Apr 11 17:53 snapshot.8000001e5```
@luisfernandez: also noob question what’s EVS?
@luisfernandez: @xiaoman we are using `apache-zookeeper-3.5.5`
@mayanks: Typo EBS - persistent storage
@luisfernandez: we are using whatever is there in the helm image
@dlavoie: I think the ZK env variable have changed between 3.5 and 3.7
@dlavoie: For snapshot configure.
@dlavoie: can you share the content of `/conf/zoo.cfg` from within the ZK container?
@luisfernandez: si
@luisfernandez: ```clientPort=2181 dataDir=/data dataLogDir=/data/log tickTime=2000 initLimit=10 syncLimit=10 maxClientCnxns=60 minSessionTimeout= 4000 maxSessionTimeout= 40000 autopurge.snapRetainCount=3 autopurge.purgeInterval=1 4lw.commands.whitelist=* server.1=pinot-zookeeper-0.pinot-zookeeper-headless.pinot-dev.svc.cluster.local:2888:3888 server.2=pinot-zookeeper-1.pinot-zookeeper-headless.pinot-dev.svc.cluster.local:2888:3888 server.3=pinot-zookeeper-2.pinot-zookeeper-headless.pinot-dev.svc.cluster.local:2888:3888```
@dlavoie: Auto purge seems to be configured as expected.
@luisfernandez: is it that it still hasn’t hit those parameters whatever they are?
@dlavoie: I see that both dataDir and dataLogDir are within the same directory. I’ve personally encountered issue when these two are nested.
@dlavoie: Is that a default config that go you that directory configuration?
@luisfernandez: yes that was default
@luisfernandez: from the helm chart
@luisfernandez: ``` ZK_DATA_DIR=${ZK_DATA_DIR:-"/data"} ZK_DATA_LOG_DIR=${ZK_DATA_LOG_DIR:-"/data/log"}```
@luisfernandez: i’m not super knowledgeable about zookeeper but how are these 2 directories related?
@dlavoie: the log dir stores the snapshots of your ZK state.
@dlavoie: sorry
@dlavoie: Data Dir stores the snapshots of your ZK state. That’s a complete of the state.
@dlavoie: The Data Log Dir holds the transaction logs. That’s the critical data.
@dlavoie: When a snapshot is made, ZK can purge the transactions log to reduce its size.
@luisfernandez: probably don't wanna lose that transaction log yes?
@dlavoie: You can lose the transactions if you have a fresh snapshot.
@dlavoie: But you would want to let ZK do its thing.
@dlavoie: I have reason to believe you could be impacted by a bug in the helm chart configuration. Can you manually edit the PVC to increase the size of the PVC? If you are on AWS it can be done without restart.
@luisfernandez: i will have to ask my team but i think we can increase it yes
@luisfernandez: `/data/log/version-2` ```total 3.2G -rw-rw-r-- 1 zookeeper zookeeper 768M Apr 5 00:45 log.700000001 -rw-rw-r-- 1 zookeeper zookeeper 1.0G Apr 11 17:51 log.700014b0e -rw-rw-r-- 1 zookeeper zookeeper 1.1G Apr 15 16:01 log.8000001e6 -rw-r--r-- 1 zookeeper zookeeper 448M Apr 18 20:37 log.80000ea05``` `/data/version-2` ```total 3.5M -rw-r--r-- 1 zookeeper zookeeper 1 Apr 15 19:06 acceptedEpoch -rw-r--r-- 1 zookeeper zookeeper 1 Apr 15 19:06 currentEpoch -rw-rw-r-- 1 zookeeper zookeeper 481K Mar 8 19:58 snapshot.500a20f00 -rw-rw-r-- 1 zookeeper zookeeper 1.3M Apr 5 00:45 snapshot.700014b0d -rw-rw-r-- 1 zookeeper zookeeper 1.8M Apr 11 17:53 snapshot.8000001e5```
@dlavoie: Snapshots are happening but the log isn’t cleaned.
@dlavoie: I observed that behaviour when dataDirLog and dataDir are nested in the same folder.
@dlavoie: I’ll do some testing tomorrow and see if we can get a recipe to fix this.
@luisfernandez: oh shoot D-:
@mayanks: Thanks @dlavoie for jumping on this.
@matheus.felisberto: @matheus.felisberto has joined the channel

#thirdeye-pinot

@shounakmk219: @shounakmk219 has joined the channel

#getting-started

@jayeshchoudhary619: @jayeshchoudhary619 has joined the channel
@matheus.felisberto: @matheus.felisberto has joined the channel
--------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]