Apache Pinot Daily Email Digest (2022-04-15)

Pinot Slack Email Digest Fri, 15 Apr 2022 19:00:33 -0700

#general

@dprotsenko.repos: @dprotsenko.repos has joined the channel
@harish.bohara: Hi, I have a setup with apachepinot/pinot:0.10.0-SNAPSHOT-df1c2681fd-20220207-jdk11 docker version to run my Pinot Server. I am running 4 pinot-servers, running on 4 m5a.large. It is the only container running in one box (i have given 6GB for VM in JVM args) What I expect: The ingestion of events should be constant. What I observed: When I run this cluster, the ingestion is > 5000RPS. However, it goes below 100RPS after 1 day of continue run. If I restart all the pinot-servers, they ingestion rate goes back to 5K+ RPS again Am I missing some setting which is causing this degrade?
@kharekartik: Hi Harish Can you share the tableConfig and schema for this.
@harish.bohara: ``` { "schemaName": "aggregate_v1", "dimensionFieldSpecs": [ { "name": "ext_event_type", "dataType": "STRING" }, { "name": "dim__channel", "dataType": "STRING" }, { "name": "dim__pipeline", "dataType": "STRING" }, { "name": "dim__internal", "dataType": "STRING" }, { "name": "provider", "dataType": "STRING" }, { "name": "status", "dataType": "STRING" }, { "name": "year", "dataType": "INT" }, { "name": "month", "dataType": "INT" }, { "name": "day", "dataType": "INT" }, { "name": "hour", "dataType": "INT" } ], "dateTimeFieldSpecs": [ { "name": "eventTime", "dataType": "TIMESTAMP", "format": "1:MILLISECONDS:EPOCH", "granularity": "1:MILLISECONDS" } ] } { "tableName": "aggregate_v1", "tableType": "REALTIME", "segmentsConfig": { "schemaName": "aggregate_v1", "replication": "2", "timeColumnName": "eventTime", "allowNullTimeValue": false, "replicasPerPartition": "2", "retentionTimeUnit": "DAYS", "retentionTimeValue": "30" }, "tenants": { "broker": "DefaultTenant", "server": "DefaultTenant", "tagOverrideConfig": {} }, "tableIndexConfig": { "noDictionaryColumns": [], "invertedIndexColumns": [ ], "streamConfigs": { "streamType": "kafka", "stream.kafka.topic.name": "MY TOPIC", "stream.kafka.broker.list": "MY BROKER", "stream.kafka.consumer.type": "lowlevel", "stream.kafka.consumer.prop.auto.offset.reset": "smallest", "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory", "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder", "realtime.segment.flush.threshold.rows": "0", "realtime.segment.flush.threshold.time": "24h", "realtime.segment.flush.segment.size": "100M" }, "rangeIndexColumns": [], "rangeIndexVersion": 1, "autoGeneratedInvertedIndex": false, "createInvertedIndexDuringSegmentGeneration": false, "sortedColumn": [], "bloomFilterColumns": [], "loadMode": "MMAP", "onHeapDictionaryColumns": [], "varLengthDictionaryColumns": [], "enableDefaultStarTree": false, "enableDynamicStarTreeCreation": false, "aggregateMetrics": false, "nullHandlingEnabled": true, "starTreeIndexConfigs": [ { "dimensionsSplitOrder": [ "ext_event_type", "dim__channel", "status" ], "skipStarNodeCreationForDimensions": [], "functionColumnPairs": [ "COUNT__*" ], "maxLeafRecords": 1 } ] }, "metadata": {}, "quota": {}, "routing": {}, "query": {}, "ingestionConfig": { "transformConfigs": [ { "columnName": "year", "transformFunction": "year(eventTime, 'Asia/Kolkata')" }, { "columnName": "month", "transformFunction": "month(eventTime, 'Asia/Kolkata')" }, { "columnName": "day", "transformFunction": "day(eventTime, 'Asia/Kolkata')" }, { "columnName": "hour", "transformFunction": "hour(eventTime, 'Asia/Kolkata')" } ] }, "isDimTable": false, "upsertConfig": {} }```
@kharekartik: @npawar can you help here? The flush configs seem to be normal as well.
@navina: @harish.bohara how do you restart your servers? and have you checked if the segments are rolling over? Asking because I notice 2 things: 1. flush threshold rows = 0 (not sure what the expected behavior is, probably ignore row threshold). flush threshold time is 24h. 2. stream's offset reset criteria is set to `smallest`. I am guessing when you restart you don't have completed segments that are flushed to disk. So, before restart, can you check if the table has completed segments ?
@npawar: Thresholds look right (0 means use size based ad ignore row based like Navina said). my hunch is also the same, so just adding on top of what Navina said. when you set smallest, pinot consumes from earliest offset in kafka, hence the high rate of ingestion in the beginning, which then stabilizes. And when you restart, Pinot will not flush what was in memory, instead reconsume it when it comes back up, hence right rate of ingestion after restart. It’s not a cause for concern, unless you are noticing lag in event arrival.
@sbang: @sbang has joined the channel

#random

@dprotsenko.repos: @dprotsenko.repos has joined the channel
@sbang: @sbang has joined the channel

#troubleshooting

@dprotsenko.repos: @dprotsenko.repos has joined the channel
@sbang: @sbang has joined the channel
@ysuo: Hi, one Pinot server shows dead status. Is there any way to make it cover from this status? :rolling_on_the_floor_laughing:
@npawar: Restart is the only way I can think of to recover here. But capture the logs before so we can check why it went dead
@diogo.baeder: Hi folks, I'm having issues running batch ingestion in my local experiments with Pinot using docker-compose. More in this thread.
@diogo.baeder: Here's what I get, with the command I run at the top: ```$ docker-compose -f docker-compose-databases.yml exec pinot-controller bin/pinot-admin.sh LaunchDataIngestionJob -jobSpecFile /config/ingestion/weights.yaml SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/pinot/lib/pinot-all-0.10.0-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/pinot/plugins/pinot-environment/pinot-azure/pinot-azure-0.10.0-shaded.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/pinot/plugins/pinot-file-system/pinot-s3/pinot-s3-0.10.0-shaded.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/pinot/plugins/pinot-input-format/pinot-parquet/pinot-parquet-0.10.0-shaded.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/pinot/plugins/pinot-metrics/pinot-dropwizard/pinot-dropwizard-0.10.0-shaded.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/pinot/plugins/pinot-metrics/pinot-yammer/pinot-yammer-0.10.0-shaded.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. WARNING: An illegal reflective access operation has occurred WARNING: Illegal reflective access by org.codehaus.groovy.reflection.CachedClass (file:/opt/pinot/lib/pinot-all-0.10.0-jar-with-dependencies.jar) to method java.lang.Object.finalize() WARNING: Please consider reporting this to the maintainers of org.codehaus.groovy.reflection.CachedClass WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations WARNING: All illegal access operations will be denied in a future release``` and here are the ingestion jobs files visible inside the controller: ``` $ docker-compose -f docker-compose-databases.yml exec pinot-controller ls -lh /config/ingestion/ total 12K -rw-r--r-- 1 1000 1000 956 Apr 15 21:34 brands_metrics.yaml -rw-r--r-- 1 1000 1000 935 Apr 15 21:37 filters.yaml -rw-r--r-- 1 1000 1000 935 Apr 15 21:31 weights.yaml``` any ideas what's going on?
@xiaoman: The warning itself is not fatal so it does not help finding what is happening. Can you try ssh into that pinot controller container and find out the Pinot logs?
@diogo.baeder: Sure! Hold on...
@diogo.baeder: These are the only non-INFO log lines I found when trying to run the batch jobs: ```2022/04/15 22:02:59.975 INFO [ControllerPeriodicTask] [pool-7-thread-7] Processing 3 tables in task: OfflineSegmentIntervalChecker 2022/04/15 22:02:59.981 WARN [ZKMetadataProvider] [pool-7-thread-7] Path: /SEGMENTS/filters_OFFLINE does not exist 2022/04/15 22:02:59.991 WARN [ZKMetadataProvider] [pool-7-thread-7] Path: /SEGMENTS/brands_metrics_OFFLINE does not exist 2022/04/15 22:02:59.994 WARN [ZKMetadataProvider] [pool-7-thread-7] Path: /SEGMENTS/weights_OFFLINE does not exist``` but I'm not sure what's going on - shouldn't Pinot just create the segments automatically, when running those jobs?
@diogo.baeder: Here's my job spec for `weights`: ```executionFrameworkSpec: name: 'standalone' segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner' segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner' segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner' jobType: SegmentCreationAndTarPush inputDirURI: '/sensitive-data/outputs/' includeFileNamePattern: 'glob:weights/**/*.json' outputDirURI: '/opt/pinot/data' overwriteOutput: true pinotFSSpecs: - scheme: file className: org.apache.pinot.spi.filesystem.LocalPinotFS recordReaderSpec: dataFormat: 'json' className: 'org.apache.pinot.plugin.inputformat.json.JSONRecordReader' tableSpec: tableName: 'weights' schemaURI: '' pinotClusterSpecs: - controllerURI: ''``` and the data I have in that directory is a .json file, but which contains a list of JSON dictionaries/maps/objects (not JSONL, just a simple JSON file whose top-level element is a list/array)
@xiaoman: We can try look through the log lines and see what the job has progressed into. For example, is the log line in seen?
@xiaoman: The next checkpoint could be
@diogo.baeder: Here are the full Controller logs for the time when I try to run the load job:
@xiaoman: Right now based on the code, it seems if the directory/path points to a place where there is no input file, Pinot does not print out warning log If that’s the case, it could be a good chance for you to submit a code change for Pinot too
@xiaoman: The log seems to be cut off short.
@xiaoman: Is very unlikely but is the pinot process alive? Someone else killed the process?
@diogo.baeder: But here are the relevant configs for the location of the input files: ```inputDirURI: '/sensitive-data/outputs/' includeFileNamePattern: 'glob:weights/**/*.json'``` and here are the files, for example: ```root@d7ae8b63b8e1:/opt/pinot/logs# find /sensitive-data/outputs/weights/ -name *2013041*.json /sensitive-data/outputs/weights/br/20130415.json /sensitive-data/outputs/weights/br/20130418.json /sensitive-data/outputs/weights/br/20130416.json /sensitive-data/outputs/weights/br/20130419.json /sensitive-data/outputs/weights/br/20130417.json /sensitive-data/outputs/weights/br/20130414.json /sensitive-data/outputs/weights/br/20130413.json```
@diogo.baeder: Lemme check
@diogo.baeder: All instances alive, yes
@xiaoman: The pinot logs stopped producing (if you copied the file properly), it has only 86 lines.
@xiaoman: If even logs are not written, something is wrong that Pinot has not even started yet
@diogo.baeder: Yeah, I noticed the truncation there, but that's what shows up in my terminal, not sure why
@diogo.baeder: I'll check the RAM usage
@xiaoman: My past experience with such truncation of logs is the process was killed, so that’s why I brought it up. Other case could be the Pinot process is stuck in a dead loop, which is very unlikely. Or maybe the error log is redirected somewhere else? maybe checking the log4j2.xml file?
@diogo.baeder: Using ~16G, I have 32 total. So not that much.
@diogo.baeder: Let me try taking a look at those, hold on.
@diogo.baeder: The process hasn't been killed, it's been running for more than 1h. Dunno if a thread has been killed though.
@diogo.baeder: Is it normal for the Controller to spawn hundreds of threads?
@diogo.baeder: Alright, I truncated the whole log fine, checked the log4j configs and they seem just fine, and I ran the batch jobs again, same issue, the logs are somehow truncated. Let me restart Pinot from scratch with everything clean and try to run the jobs again. I did notice a stupid bug in my ingestable data that basically invalidates many of the data I have, but at least some of it should have been processed.
@diogo.baeder: I started from scratch, cleaned up all directories related to those instances, created the tables, truncated the log file so that I could only have the stuff for the ingestion job, and I still get truncated logs after running it, and no useful lines telling me what went wrong
@diogo.baeder: Anyway, I'll fix my issue with the dates I'm generating, maybe that by itself fixes the issue, but I have low confidence that it will.
@diogo.baeder: Thanks for the help!

#getting-started

@dprotsenko.repos: @dprotsenko.repos has joined the channel
@sbang: @sbang has joined the channel
--------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org For additional commands, e-mail: dev-h...@pinot.apache.org

Apache Pinot Daily Email Digest (2022-04-15)

#general

#random

#troubleshooting

#getting-started

Reply via email to