Apache Pinot Daily Email Digest (2021-02-13)

Pinot Slack Email Digest Sat, 13 Feb 2021 18:00:32 -0800

#general

@abdulquddus: @abdulquddus has joined the channel
@mnk666yura: @mnk666yura has joined the channel

#random

@abdulquddus: @abdulquddus has joined the channel
@mnk666yura: @mnk666yura has joined the channel

#troubleshooting

@josefarf: Hi, I have basic pinot deploy, with realtime table. Everything was working ok the first 2 days, but now, I am getting error with this query: "SELECT player_nr, processTime, id FROM transaction_line_REALTIME LIMIT 214748364"
@josefarf: This is the error
@josefarf: but if I do the query with another value in limit, everything is ok
@josefarf: by example" SELECT player_nr, processTime, id FROM transaction_line_REALTIME LIMIT 21474836"
@g.kishore: You are pulling a lot of data from Pinot
@g.kishore: Pinot is not meant to be used to pull all the data out
@josefarf: Hi, there are only 3000 rows now in the table
@josefarf: it was working few days, and stop to work,
@g.kishore: That should have worked... what’s the jvm memory
@josefarf: I have updated now to this configuration: - JAVA_OPTS=-Xms4G -Xmx16G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -Xloggc:gc-pinot-server.log
@josefarf: the broker,server and controller
@josefarf: if I dont the query : select * from transaction_line limit 2147483647, it is not working
@josefarf: but : select * from transaction_line limit 2147483645 it is working
@ken: I have to assume somewhere in the code is an `int val = <sql query limit> + 1`, or equivalent, that overflows when you pass Integer.MAX_VALUE as the limit. But why would you want to use such a huge limit? :slightly_smiling_face:
@josefarf: Hi
@josefarf: The problem is with this query in presto
@josefarf: *with* lastValue *as* ( *select* player_nr, *rank*() *OVER* (*PARTITION* *BY* player_nr,id *ORDER* *BY* processTime *DESC*) *AS* rnk *FROM* pinot."default".transaction_line ) *select* *count*(*) *from* lastvalue *where* rnk=1 *limit* 100000;
@josefarf: then I getting this message
@josefarf: SQL Error [84213860]: Query failed (#20210213_035642_00049_23fka): Error when hitting host Server_172.19.0.7_8098 with pinot query "SELECT player_nr, processTime, id FROM transaction_line_REALTIME LIMIT 2147483647"
@josefarf: I am trying to update this parameter pinot.broker.enable.query.limit.override to a small values, but it is not creating the effect that I want
@josefarf: I have used the api to update: ```{ "allowParticipantAutoJoin": "true", "enable.case.insensitive": "false", "pinot.broker.enable.query.limit.override": "10000000", "default.hyperloglog.log2m": "8" }```
@ken: I haven’t tried using the API to update broker settings, sorry - but changing in the config has worked for me (then restart the broker processes)
@g.kishore: Ah if you are using with presto then use the streaming api
@g.kishore: @fx19880617 will be able to point to the instructions to enable that
@g.kishore: That will allow full scan in parallel
@josefarf: Hi @ken, I did the change, and a broker.conf file with this values
@josefarf: ```pinot.broker.enable.query.limit.override = true pinot.broker.query.response.limit = 10000000```
@josefarf: now the query console of pinot is working with this query: SELECT player_nr, processTime, id FROM transaction_line_REALTIME LIMIT 2147483647
@josefarf: but, when I use prestodb, I get the same error
@josefarf: hi, at the end I fixed, I have changed the configuration of presto for pinot, and add this line to pinot file the catalog folder
@josefarf: pinot.forbid-broker-queries:true
@josefarf: And now it is working,
@josefarf: Thx Kishore, Ken for your help
@josefarf: have a nice weekend
@g.kishore: You too.. btw you don’t want to forbid broker queries for everything
@g.kishore: We will help you next week with the right config
@josefarf: thx
@fx19880617: to enable streaming connector, in presto, please config : ```pinot.use-streaming-for-segment-queries=true```
@fx19880617: in pinot you need to configure: ```pinot.server.grpc.enable=true pinot.server.grpc.port=8090```
@fx19880617: also you need to make the pinot change first then presto changes
@tamas.nadudvari: Hi, we have a realtime table that consumes a Kafka topic and creates new segments every hour (`realtime.segment.flush.threshold.time: "1h"`). Its replica is set to 2, and when I query the number of documents on a not too recent interval, I can see two different numbers alternating. I understand that the Kafka offsets of the two servers consuming the same partition can drift. But when I select an interval that’s couple hours from the current time, so presumably querying from a finished/closed segment, I’m still facing the same issue. According to the docs, shouldn’t the other replica, which has less records consumed, acquire the segment with the more documents in it after it’s closed?
@g.kishore: What’s the query? Can you check if partialresponse flag is try in the response
@tamas.nadudvari: The query’s like: ```select count(*) from mytable_REALTIME where startedAt < 1613220000000 and startedAt > 1613218000000 limit 10``` and according to the Query Console the `partialResponse` is `-` .
@g.kishore: that does not make sense unless you still have data flowing in for that time range
@tamas.nadudvari: That’s highly unlikely. Maybe there’s some misconfiguration in our table?
@tamas.nadudvari: Our configuration (the indices are removed): ```{ "tableName": "mytable", "tableType": "REALTIME", "segmentsConfig": { "schemaName": "mytable", "timeColumnName": "startedAt", "timeType": "MILLISECONDS", "replicasPerPartition": "2", "retentionTimeUnit": "DAYS", "retentionTimeValue": "1", "completionMode": "DOWNLOAD" }, "tableIndexConfig": { "invertedIndexColumns": [ ], "createInvertedIndexDuringSegmentGeneration": false, "sortedColumn": [], "bloomFilterColumns": [ ], "starTreeIndexConfigs": [], "noDictionaryColumns": [ ], "onHeapDictionaryColumns": [], "varLengthDictionaryColumns": [], "loadMode": "HEAP", "columnMinMaxValueGeneratorMode": "ALL", "nullHandlingEnabled": false, "aggregateMetrics": true }, "ingestionConfig": { "streamIngestionConfig": { "streamConfigMaps": [{ "realtime.segment.flush.threshold.rows": "0", "realtime.segment.flush.threshold.time": "1h", "realtime.segment.flush.threshold.segment.size": "100M", "stream.kafka.broker.list": "${KAFKA_BOOTSTRAP_SERVERS}", "stream.kafka.consumer.prop.auto.offset.reset": "largest", "stream.kafka.consumer.type": "lowlevel", "security.protocol": "SASL_SSL", "sasl.mechanism": "PLAIN", "sasl.jaas.config": "${KAFKA_JAAS_CONFIG}", "isolation.level": "read_committed", "stream.kafka.topic.name": "pinot", "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder", "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory", "streamType": "kafka" }] }, "batchIngestionConfig": { "segmentPushType": "APPEND", "segmentPushFrequency": "HOURLY" } }, "task": { "taskTypeConfigsMap": { "RealtimeToOfflineSegmentsTask": { "bucketTimePeriod": "4h", "bufferTimePeriod": "5m", "collectorType": "rollup", "length.aggregationType": "sum", "endPosition.aggregationType": "sum", "maxNumRecordsPerSegment": "10000000" } } }, "tenants": { "broker": "DefaultTenant", "server": "DefaultTenant" }, "metadata": {} }```
@fx19880617: I saw this is a rollup config, is it possible that it happens and impact the count? have you tried to query sum(met) to see if the results is same?
@fx19880617: also does restart broker help?
@g.kishore: The issue was one of the committed segment had different number of rows across the replicas.. my guess is that Kafka brokers have inconsistent data and gets getting reflected in Pinot..
@g.kishore: Basically both segments have same start/ end offsets but different number of rows
@abdulquddus: @abdulquddus has joined the channel
@mnk666yura: @mnk666yura has joined the channel
--------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]

Apache Pinot Daily Email Digest (2021-02-13)

#general

#random

#troubleshooting

Reply via email to