Apache Pinot Daily Email Digest (2021-02-23)

Pinot Slack Email Digest Tue, 23 Feb 2021 18:00:45 -0800

#general

@ratish1992: @ratish1992 has joined the channel
@adilsonbna: @adilsonbna has joined the channel
@tsjagan: @tsjagan has joined the channel
@1705ayush: @1705ayush has joined the channel
@juraj.komericki: @juraj.komericki has joined the channel
@1705ayush:
@fx19880617: how do you run pinot in docker?
@fx19880617: I think you need to start with a parameter? like ```docker run \ --network=pinot-demo \ --name thirdeye \ -p 1426:1426 \ -p 1427:1427 \ -d apachepinot/thirdeye:latest pinot-quickstart```
@fx19880617: @pyne.suvodeep might have better idea
@1705ayush: I start the pinot for my own data and cluster configs. Like this, ```# zookeeper docker run \ --network=pinot-demo \ --name pinot-zookeeper \ --restart always \ -p 2181:2181 \ -d zookeeper:latest``` ```# controller docker run -ti \ --network=pinot-demo \ --name pinot-controller \ -p 9000:9000 \ -e JAVA_OPTS="-Dplugins.dir=/opt/pinot/plugins -Xms1G -Xmx4G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -Xloggc:gc-pinot-controller.log" \ -d apachepinot/pinot:latest StartController \ -zkAddress pinot-zookeeper:2181``` ```# broker docker run -ti \ --network=pinot-demo \ --name pinot-broker \ -e JAVA_OPTS="-Dplugins.dir=/opt/pinot/plugins -Xms4G -Xmx4G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -Xloggc:gc-pinot-broker.log" \ -d apachepinot/pinot:latest StartBroker \ -zkAddress pinot-zookeeper:2181``` ```# server docker run -ti \ --network=pinot-demo \ --name pinot-server \ -e JAVA_OPTS="-Dplugins.dir=/opt/pinot/plugins -Xms4G -Xmx16G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -Xloggc:gc-pinot-server.log" \ -d apachepinot/pinot:latest StartServer \ -zkAddress pinot-zookeeper:2181```
@pyne.suvodeep: @1705ayush You can use helm to fire up everything in one shot
@pyne.suvodeep: For example, something like this should work for Pinot. ```helm install pinot ./incubator-pinot/kubernetes/helm/pinot --set replicas=1 ```
@1705ayush: @pyne.suvodeep You mean, running pinot using helm instead of docker might fix the docker start of ThirdEye ? I'll try that and let you know.
@pyne.suvodeep: @1705ayush For thirdeye, should be something simliar, Check out the `kubernetes` directory inside `incubator-pinot` ```./helmw install thirdeye . ```
@pyne.suvodeep: @1705ayush So, helm is a package manager for kubernetes. You are basically running pinot on kubernetes at that point. The pods are built off docker containers
@vmadhira: @vmadhira has joined the channel

#random

#troubleshooting

@ratish1992: @ratish1992 has joined the channel
@adilsonbna: @adilsonbna has joined the channel
@jai.patel856: I’m having some trouble with upserts where a query through the Pinot UI will sometimes return the latest row, sometimes it’ll return all rows. Query: ```select * from enriched_customer_orders_jp_upsert_realtime_streaming_v1 where normalized_order_id='62:1221247' and ofo_slug='fofo' and store_id='73f6975b-07e8-407a-97a1-580043094a68' limit 10``` Table Spec: ```{ "REALTIME": { "tableName": "enriched_customer_orders_jp_upsert_realtime_streaming_v1_REALTIME", "tableType": "REALTIME", "segmentsConfig": { "segmentAssignmentStrategy": "BalanceNumSegmentAssignmentStrategy", "timeColumnName": "updated_at_seconds", "retentionTimeUnit": "DAYS", "retentionTimeValue": "30", "segmentPushType": "APPEND", "replicasPerPartition": "3", "schemaName": "enriched_customer_orders_jp_upsert_realtime_streaming_v1" }, "tenants": { "broker": "DefaultTenant", "server": "DefaultTenant" }, "tableIndexConfig": { "createInvertedIndexDuringSegmentGeneration": true, "bloomFilterColumns": [ "Filter1", "Filter2" ], "loadMode": "MMAP", "streamConfigs": { "streamType": "kafka", "stream.kafka.consumer.type": "LowLevel", "stream.kafka.topic.name": "topic-topic-topic-topic-topic", "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.inputformat.avro.confluent.KafkaConfluentSchemaRegistryAvroMessageDecoder", "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory", "stream.kafka.broker.list": "kafka-host:9092", "realtime.segment.flush.threshold.size": "1000", "realtime.segment.flush.threshold.rows": "1000", "realtime.segment.flush.threshold.time": "6h", "realtime.segment.flush.desired.size": "200M", "isolation.level": "read_committed", "stream.kafka.consumer.prop.auto.offset.reset": "smallest", "stream.kafka.consumer.prop.group.id": "enriched_customer_orders_jp_upsert_realtime_streaming_v1_8F6C7BAF-EEA7-441F-ABE3-50BF5F2C4F0A", "stream.kafka.consumer.prop.client.id": "v1_732F3C29-4CDA-45AA-85F1-740A0176C6A5", "stream.kafka.decoder.prop.schema.registry.rest.url": "" }, "enableDefaultStarTree": false, "enableDynamicStarTreeCreation": false, "aggregateMetrics": true, "nullHandlingEnabled": false, "invertedIndexColumns": [ "store_id" ], "autoGeneratedInvertedIndex": false }, "metadata": {}, "routing": { "instanceSelectorType": "strictReplicaGroup" }, "upsertConfig": { "mode": "FULL" } } }``` Simplification of our schema. There are a lot of other columns. But trimmed to something that would fit (kept all keys). ``` { "schemaName": "enriched_customer_orders_jp_upsert_realtime_streaming_v1", "dimensionFieldSpecs": [ { "name": "store_id", "dataType": "STRING" }, { "name": "updated_at", "dataType": "LONG", "defaultNullValue": 0 }, { "name": "normalized_order_id", "dataType": "STRING" }, { "name": "ofo_slug", "dataType": "STRING" } ], "metricFieldSpecs": [ { "name": "usd_exchange_rate", "dataType": "DOUBLE" }, { "name": "total", "dataType": "DOUBLE" } ], "dateTimeFieldSpecs": [ { "name": "updated_at_seconds", "dataType": "LONG", "defaultNullValue": 0, "transformFunction": "toEpochSeconds(updated_at)", "format": "1:MILLISECONDS:EPOCH", "granularity": "1:SECONDS" } ], "primaryKeyColumns": [ "ofo_slug", "store_id", "normalized_order_id" ] }``` Our kafka key is: `store_id::ofo_slug::normalized_order_id` as a concatenation.
@g.kishore: @jackie.jxt @yupeng ^^
@yupeng: add `$segmentName` to your select clause, and check the values for the duplicate records
@jai.patel856: ```enriched_customer_orders_jp_upsert_realtime_streaming_v1__10__57__20210220T2243Z enriched_customer_orders_jp_upsert_realtime_streaming_v1__10__61__20210221T0000Z enriched_customer_orders_jp_upsert_realtime_streaming_v1__10__1__20210220T0807Z enriched_customer_orders_jp_upsert_realtime_streaming_v1__10__70__20210221T0315Z```
@jai.patel856: I’m querying based on the 3 key columns and so it seems to be matching.
@yupeng: can you check what are the servers of the segments?
@yupeng: you can go to UI for this
@yupeng: btw, are you using master branch?
@jai.patel856: *All 4 segments are on the same Replica Set:* • Server_pinot-us-central1-server-0.pinot-us-central1-server-headless.pinot.svc.cluster.local_8098 • Server_pinot-us-central1-server-1.pinot-us-central1-server-headless.pinot.svc.cluster.local_8098 • Server_pinot-us-central1-server-2.pinot-us-central1-server-headless.pinot.svc.cluster.local_8098
@yupeng: k. and if you use `from enriched_customer_orders_jp_upsert_realtime_streaming_v1 option( skipUpsert=true)`
@yupeng: do you see difference?
@jai.patel856: Yeah. With skipUpsert I saw all 4 copies 100% of the time over 20 successive calls. Without it I saw 4 copies about 50% of the time over 20 calls.
@jai.patel856: Is it skipUpsert or disableUpsert?
@jai.patel856: skipUpsert causes consistently all 4 rows. disableUpsert doesn’t seem to work (I still see 50% consistency)
@yupeng: skipUpsert
@yupeng: `disableUpsert` was the previous deprecated name
@yupeng: i need to fix the doc
@yupeng: hmm 50% of 4 and 50% of 1?
@yupeng: how many brokers do you have
@jai.patel856: I ran with skipUpsert 20 times. Of those 20 calls all of them return 4 rows.
@jai.patel856: I ran without skipUpsert 20 times. Of those 20 calls, 11 returned one row, 9 returned 4 rows. So I seem to be getting inconsistent deduplication with the table.
@jai.patel856: We have 2 tenants, 3 controllers, 3 brokers, 6 servers
@yupeng: hmm this 50% rate is suspicious, i feel it’s some inconsistent configs of broker/server
@yupeng: or it’s either 1 or 4 copies?
@jai.patel856: I get either 1 row back or 4 rows back. When I get 4 rows back the rows share the same key set, but the rest of the rows are different.
@jai.patel856: When I get one row back it is the one with the highest time column value
@yupeng: @jackie.jxt any thoughts?
@jai.patel856: Is there a way to target the query at a specific broker in the UI?
@jackie.jxt: When you get 4 rows, can you check the number of servers queried within the response metadata?
@jai.patel856: Can I get the response metadata in the UI?
@jackie.jxt: Yes, you should be able to see the response metadata through the query console
@jackie.jxt: Or choose to show the raw response json
@jai.patel856: “exceptions”: [], “numServersQueried”: 1, “numServersResponded”: 1, “numSegmentsQueried”: 3072, “numSegmentsProcessed”: 2786, “numSegmentsMatched”: 1, “numConsumingSegmentsQueried”: 12, “numDocsScanned”: 1, “numEntriesScannedInFilter”: 315, “numEntriesScannedPostFilter”: 112, “numGroupsLimitReached”: false, “totalDocs”: 254020,
@jai.patel856: (used json output)
@jackie.jxt: `"numDocsScanned": 1,`
@jackie.jxt: There is only 1 row returned right?
@jai.patel856: this time, yeah… let me see what it looks like when i get 4
@jai.patel856: (noticed that too)
@jai.patel856: When I see 4 rows returned: ``` "exceptions": [], "numServersQueried": 1, "numServersResponded": 1, "numSegmentsQueried": 3079, "numSegmentsProcessed": 2796, "numSegmentsMatched": 4, "numConsumingSegmentsQueried": 12, "numDocsScanned": 4, "numEntriesScannedInFilter": 315, "numEntriesScannedPostFilter": 448, "numGroupsLimitReached": false, "totalDocs": 254640, "timeUsedMs": 719, "segmentStatistics": [],```
@jackie.jxt: So only 1 server is queried, but the records are not dedupped correctly
@jackie.jxt: Did you enable the upsert in table config on the fly (by modifying the table config), or you directly created the table with upsert enabled?
@jai.patel856: directly created using swagger
@jai.patel856: realtime, only (as specced), started streaming content from our topic.
@jai.patel856: We changed our kafka key to match the pinot keyset and truncated the old messages off the topic. Confirmed that by catting the first kafka message from the topic and checking the key.
@jai.patel856: So the kafka key matches what was in the original message.
@jackie.jxt: Hmm.. Can you try restarting the servers and see if it fixes the issue?
@g.kishore: lets try to find out the root cause before restarting
@jai.patel856: This has been a bit consistent. I’ve created 2 or 3 tables in very similar ways and I’ve consistently seen this behavior.
@jackie.jxt: All the tables are created with upsert enabled, and never updated?
@jackie.jxt: Which version of Pinot are you running?
@g.kishore: @jai.patel856 what is the kafka topic partitioned on
@g.kishore: ``` "ofo_slug", "store_id", "normalized_order_id"```
@jai.patel856: store_id::ofo_slug::normalized_order_id << fixed sequence
@jai.patel856: we concatenate the strings separated by double colons
@g.kishore: ok
@jai.patel856: and all the rows returned have identical values for those 3 columns and none are null.
@jai.patel856: CC: @elon.azoulay
@jai.patel856: we’re using 0.6.0
@jackie.jxt: This could cause problem: ``` "aggregateMetrics": true,```
@jackie.jxt: Basically `aggregateMetrics` cannot be configured together with `upsert` . Can you please check the server log and see if it encounters exceptions when ingesting the records?
@jackie.jxt: Hmm, actually it will be turned off automatically because the metric fields are dictionary encoded
@jackie.jxt: @jai.patel856 Do you have some time for a zoom call to further debug the issue?
@jai.patel856: Errors are falling into buckets: 1. Schema name does not match raw table name 2. Please reduce the rate of create, update, and delete requests 3. Could not move segment 4. Failed to move segment 5. Failed to find local segment file for segment 6. already exists. Replace it with segment: There were some GCS issues. But those look unrelated.
@jai.patel856: @jackie.jxt yes i have time. But should I try recreating the table without aggregateMetrics on to see if it repros under that condition?
@jackie.jxt: Let's check some states first
@jackie.jxt:
@jackie.jxt: @jai.patel856
@jai.patel856: Thanks Jackie for the call.
@jai.patel856: @g.kishore atm, the best theory is that I created a upsert table on top of a partially deleted non-upsert table (there were a number of errors in our logs about the delete during the GCS operations)
@jai.patel856: if restarting the server resolves the issue it would lend to that theory.
@jai.patel856: restarting the server resolved the issue
@chundong.wang: @chundong.wang has joined the channel
@pabraham.usa: Hello, I have 3 Pinot servers with 4 cores and 48Gi each and using realtime table. I noticed that when the load/flow increases there is a lag in the search results (Inverted Index). Once the load is reduced Pinot will catch up. CPU and MEM usage all looks normal. Wondering why this is happening. Are there any settings to make Pinot servers to process faster?
@ssubrama: @pabraham.usa trying to understand your question. Are you saying that when the input stream increases in volume, the query latency increases? That is sort of intuitive because the same CPU is being used for consumption as well as query processing. Increasing the number of cores may help.
@pabraham.usa: @ssubrama Actually it is not latency it is the data time lag. In normal scenario the data pushed into kafka will be available almost at the same time in Pinot as well. So Pinot consume the data as nrt from kafka and make it available for search. But when the volume increases Pino takes more time to ingest and it causes the time lag and it is not nrt anymore. I would like Pino to ingest the data as soon as it is available in the stream.
@ssubrama: I can think of a few scenarios, but it will be useful to check what you are bottlenecked by : network, cpu, I/O or memory? I am guessing it is cpu. Maybe we are not able to consume fast enough. We start one consuming thread per partition per replica. How many partitions does your topic have? How many replicas have you configured? Do the total number of cores equal `numPartitions * numReplicas`? If not, then it is likely that the threads that consume some of the partitions are delayed because other threads are not giving up the cores.
@pabraham.usa: Good to know that 1 core is required for per partition per replica. So I assume that could be where the issue is. Let me change my instances and see. Thanks
@ssubrama: Correction. One core is not "required". Just that one core can be kept busy if the pipeline is full.
@ssubrama: We consume doing something like this (roughly): ```while (true) { pullMsgsFromKafka(); if (there are no msgs) { sleep a little } }```
@ssubrama: It is also worthwhile to check if you have a network bottleneck (unlikely, but if you do, it speaks of Pinot's efficiency :slightly_smiling_face:
@ssubrama: You may also be blocking due to I/O (possibly paging) if you have memory mapped setting for your consuming segments.
@pabraham.usa: Good suggestions, let me check io and disk as well.
@tsjagan: @tsjagan has joined the channel
@1705ayush: @1705ayush has joined the channel
@juraj.komericki: @juraj.komericki has joined the channel
@bowlesns: Trying to run a fairly simply query, borrowing from the docs, and anytime I try to do any sort of grouping on the date field I get an error. Grouping for others works. Thanks in advance! ```SELECT COUNT(*) FROM mytable GROUP BY DATETIMECONVERT(item_date, '1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd', '1:WEEKS:EPOCH', '1:WEEKS')``` error ```"errorCode": 200, "message": "QueryExecutionError:\norg.apache.pinot.core.query.exception.BadQueryRequestException: Caught exception while initializing transform function: datetimeconvert\n\tat org.apache.pinot.core.operator.transform.function.TransformFunctionFactory.get(TransformFunctionFactory.java:207)\n\tat ``` table date config ``` "dateTimeFieldSpecs": [ { "name": "item_date", "dataType": "STRING", "format": "1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd", "granularity": "1:DAYS" } ]```
@fx19880617: ```DATETIMECONVERT``` doesn’t support week,
@fx19880617: use dateTrunc
@fx19880617:
@bowlesns: That’s got a week example? will make a note to update that
@fx19880617: I think it’s wrong :disappointed:
@bowlesns: thanks for the quick response!
@fx19880617: I will fix the doc
@bowlesns: I can do that, going to go through and do a couple of things
@fx19880617: sure
@fx19880617: thanks!
@bowlesns: np thanks for your help
@bowlesns: I also tried to put the column name in quotes like `"item_date"`
@vmadhira: @vmadhira has joined the channel

#pinot-dev

@ratish1992: @ratish1992 has joined the channel

#getting-started

@ratish1992: @ratish1992 has joined the channel
@1705ayush: @1705ayush has joined the channel
@1705ayush: Hi people, I am facing an issue with starting ThirdEye on top of Pinot. I have got pinot successfully set up and running. Now, I am trying to run ThirdEye on top of this pinot using the docker apachepinot/thirdeye image. After running the following docker command, an Error stating `Database may be already in use` Please find the attached log file. Any help is appreciated! ```docker run \ --network=pinot-demo \ --name thirdeye \ -p 1426:1426 \ -p 1427:1427 \ -d apachepinot/thirdeye:latest```

#segment-write-api

@npawar: can you share the doc with me @fx19880617?
@npawar: thanks
@npawar: could one of you brief me about what was the conclustion of the discussion?
@fx19880617: So we want to create a SegmentWriter interface and with simple apis: ``` interface SegmentWriter { init(Configuration configs); index(GenericRow row); index(GenericRow[] rows); flush(); }```
@fx19880617: From Flink side, we can create a Pinot Sink
@fx19880617: ```public class PinotSinkFunction<T> extends RichSinkFunction<T> implements CheckpointedFunction { open() {...} invoke() {...} close() {...} ... }```
@fx19880617: so flink/spark side can just use SegmentWriter api for data sink and just upgrade pinot-core lib version if needed
@npawar: great, sounds good
@yupeng: Yes that sounds good
@npawar: since i’ll be working on it @fx19880617, I can take up adding more things to that doc and polishing out the details
@yupeng: Thanks Neha. I can help you from flink connector side since I will create the poc
@yupeng: There will be several configurations for us to agree on
@npawar: sounds good Yupeng
@fx19880617: right, also about the exact once semantics
@yupeng: @fx19880617 i think we can leave flink exactly once semantics in another doc of flink/pinot connector (i’ll create one)
@yupeng: i hope the segment writer can be generic enough so that different compute frameworks can all use it
@yupeng: and also make it portable
@fx19880617: make sense
--------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]