Apache Pinot Daily Email Digest (2021-03-19)

Pinot Slack Email Digest Fri, 19 Mar 2021 19:00:28 -0700

#general

@lam010210: @lam010210 has joined the channel
@kc2005au: @kc2005au has joined the channel
@shvetadogra: @shvetadogra has joined the channel
@saurabh: @saurabh has joined the channel
@mike.davis: hello, I see that `0.7.0` was released, congrats! But there does not appear to be a corresponding `0.7.0-jdk11` image available via docker hub, only SNAPSHOT versions. Any chance that can get published?
@ken: Interesting, also doesn’t show `0.7.0`. Also thinking the 0.1.0 through 0.5.0 downloads could be removed from that page…
@g.kishore: It’s not yet officially released.. there was a mix up in ASF process.. please stay tuned. We will update soon
@ken: OK - but it’s in Maven Central :slightly_smiling_face: Should we avoid upgrading to that version?
@g.kishore: Yes.. please wait. We accidentally pushed it before getting the approval from ASF. We need official confirmation
@aaron: Does Pinot's batch insert have any way to avoid inserting duplicate data? Say that ever day I want to batch-insert the previous day of data, and I have multiple batches of data per day (say each batch of data corresponds to data from a different ice cream flavor). If I'm generating + batch inserting yesterday's data for each ice cream flavor in parallel, and the "strawberry" job fails, so I rerun it, how do I make sure I'm not batch-inserting "strawberry" data that was already inserting?
@g.kishore: Segment name is unique across the table. As long as you maintain idempotent across multiple runs. It will be fine
@g.kishore: So in your case, make sure you encode value of the flavor in segment name
@g.kishore: So even if you push the same data again, it will be overridden
@aaron: Ok, super cool. So I just need to make sure I set the segment name correctly -- like in `segmentNameGeneratorSpec`?
@g.kishore: Right
@g.kishore: We typically use date and partition Some kind of partition id as the convention
@aaron: Awesome, thank you
@aaron: Also just for my understanding -- is there any point in time where partially complete segments or partially overwritten segments are visible to consumers?
@g.kishore: Is this hybrid or batch only table
@aaron: I'm curious about the answer for both!
@g.kishore: with batch only, its visible as soon as a segment is pushed
@g.kishore: in hybrid, its only visible after a time boundary moves from one day to another.
@sunxiaohui.bj: @sunxiaohui.bj has joined the channel
@savannahjenglish: @savannahjenglish has joined the channel
@ken: My ops guy is trying to validate JMX metrics, and he asked me how to trigger NUM_MISSING_SEGMENTS. Any suggestions?
@g.kishore: NUM_MISSING_SEGMENT?
@ken: NUM_MISSING_SEGMENTS, I think -
@ken: It looks like this can happen in the window between when a segment is removed from the server, and a broker sees the ExternalView change. Maybe this is too challenging to manually trigger, so I should tell ops not to worry about trying to validate?
@g.kishore: yeah, you can skip this for now
@g.kishore: some of these were probably added when there was a bug to monitor frequency of the occurrence. Its probably not useful anymore

#random

@lam010210: @lam010210 has joined the channel
@kc2005au: @kc2005au has joined the channel
@shvetadogra: @shvetadogra has joined the channel
@saurabh: @saurabh has joined the channel
@sunxiaohui.bj: @sunxiaohui.bj has joined the channel
@savannahjenglish: @savannahjenglish has joined the channel

#troubleshooting

@ravi.maddi: Hi All I am trying to ingres data through kafka and json file and running this command: ```bin/kafka-console-producer.sh --broker-list localhost:19092 --topic mytopic < $PDATA_HOME/opt_flatten_json.json``` _But Ia m getting error:_ ```Exception while executing a state transition task mystats__0__0__20210319T0430Z java.lang.reflect.InvocationTargetException: null at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_282] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_282] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_282] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_282] at org.apache.helix.messaging.handling.HelixStateTransitionHandler.invoke(HelixStateTransitionHandler.java:404) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-d87755899eccba3554e9cc39a1439d5ecb53aaac] at org.apache.helix.messaging.handling.HelixStateTransitionHandler.handleMessage(HelixStateTransitionHandler.java:331) [pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-d87755899eccba3554e9cc39a1439d5ecb53aaac] at org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:97) [pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-d87755899eccba3554e9cc39a1439d5ecb53aaac] at org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:49) [pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-d87755899eccba3554e9cc39a1439d5ecb53aaac] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_282] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_282] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_282] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_282] Caused by: java.lang.OutOfMemoryError: Direct buffer memory at java.nio.Bits.reserveMemory(Bits.java:695) ~[?:1.8.0_282] at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123) ~[?:1.8.0_282] at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) ~[?:1.8.0_282] at org.apache.pinot.core.segment.memory.PinotByteBuffer.allocateDirect(PinotByteBuffer.java:39) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-d87755899eccba3554e9cc39a1439d5ecb53aaac] at org.apache.pinot.core.segment.memory.PinotDataBuffer.allocateDirect(PinotDataBuffer.java:116) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-d87755899eccba3554e9cc39a1439d5ecb53aaac] at org.apache.pinot.core.io.writer.impl.DirectMemoryManager.allocateInternal(DirectMemoryManager.java:53) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-d87755899eccba3554e9cc39a1439d5ecb53aaac] at org.apache.pinot.core.io.readerwriter.RealtimeIndexOffHeapMemoryManager.allocate(RealtimeIndexOffHeapMemoryManager.java:79) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-d87755899eccba3554e9cc39a1439d5ecb53aaac] at org.apache.pinot.core.realtime.impl.forward.FixedByteMVMutableForwardIndex.addDataBuffer(FixedByteMVMutableForwardIndex.java:162) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-d87755899eccba3554e9cc39a1439d5ecb53aaac] at org.apache.pinot.core.realtime.impl.forward.FixedByteMVMutableForwardIndex.<init>(FixedByteMVMutableForwardIndex.java:137) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-d87755899eccba3554e9cc39a1439d5ecb53aaac] at org.apache.pinot.core.indexsegment.mutable.MutableSegmentImpl.<init>(MutableSegmentImpl.java:307) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-d87755899eccba3554e9cc39a1439d5ecb53aaac] at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.<init>(LLRealtimeSegmentDataManager.java:1270) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-d87755899eccba3554e9cc39a1439d5ecb53aaac] at org.apache.pinot.core.data.manager.realtime.RealtimeTableDataManager.addSegment(RealtimeTableDataManager.java:324) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-d87755899eccba3554e9cc39a1439d5ecb53aaac] at org.apache.pinot.server.starter.helix.HelixInstanceDataManager.addRealtimeSegment(HelixInstanceDataManager.java:132) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-d87755899eccba3554e9cc39a1439d5ecb53aaac] at org.apache.pinot.server.starter.helix.SegmentOnlineOfflineStateModelFactory$SegmentOnlineOfflineStateModel.onBecomeOnlineFromOffline(SegmentOnlineOfflineStateModelFactory.java:164) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-d87755899eccba3554e9cc39a1439d5ecb53aaac] at org.apache.pinot.server.starter.helix.SegmentOnlineOfflineStateModelFactory$SegmentOnlineOfflineStateModel.onBecomeConsumingFromOffline(SegmentOnlineOfflineStateModelFactory.java:88) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-d87755899eccba3554e9cc39a1439d5ecb53aaac] ... 12 more Default rollback method invoked on error. Error Code: ERROR Message execution failed. msgId: eed5b297-ea20-437e-a0b5-ad4d0be75c3c, errorMsg: java.lang.reflect.InvocationTargetException Skip internal error. errCode: ERROR, errMsg: null Event bcbad381_DEFAULT : Unable to find a next state for resource: mystats_REALTIME partition: mystats__0__0__20210319T0430Z from stateModelDefinitionclass org.apache.helix.model.StateModelDefinition from:ERROR to:CONSUMING Event c910d226_DEFAULT : Unable to find a next state for resource: mystats_REALTIME partition: mystats__0__0__20210319T0430Z from stateModelDefinitionclass org.apache.helix.model.StateModelDefinition from:ERROR to:CONSUMING Event d194950f_DEFAULT : Unable to find a next state for resource: mystats_REALTIME partition: mystats__0__0__20210319T0430Z from stateModelDefinitionclass org.apache.helix.model.StateModelDefinition from:ERROR to:CONSUMING``` Need Help :slightly_smiling_face:
@fx19880617: ```Caused by: java.lang.OutOfMemoryError: Direct buffer memory``` try to give larger memory
@fx19880617: increase JVM and your VM or container memory setting
@ravi.maddi: Thanks , increased size, and resolved the issue
@lam010210: @lam010210 has joined the channel
@ravi.maddi: Hi Team *Data not appearing in Pinot Query Console.* I am pushing data to Pinot through kafka, By command bin/kafka-console-producer.sh --broker-list localhost:19092 --topic mytopic < $PDATA_HOME/data.json I check all logs, there is no exceptions, but my data not appearing in query tool. *Need Help* :slightly_smiling_face: My Schema look like this: ```{ "schemaName": "eventflowstats", "dimensionFieldSpecs": [ { "name": "_index", "dataType": "STRING" }, { "name": "_type", "dataType": "STRING", "maxLength": 5 }, { "name": "_id", "dataType": "STRING" }, { "name": "_source.aExpIds", "dataType": "INT", "singleValueField": false } ] "dateTimeFieldSpecs": [ { "name": "_source.sDate", "dataType": "LONG", "format": "1:SECONDS:SIMPLE_DATE_FORMAT:SECONDS:SIMPLE_DATE_FORMAT", "granularity": "1:DAYS" } ] }``` My Table Config like this: ```{ "tableName": "mytable", "tableType": "REALTIME", "tenants": {}, "segmentsConfig": { "timeColumnName": "_source.sDate", "timeType": "MILLISECONDS", "segmentPushType": "APPEND", "replicasPerPartition": "1", "retentionTimeUnit": "DAYS", "retentionTimeValue": "1" }, "tableIndexConfig": { "loadMode": "MMAP", "streamConfigs": { "streamType": "kafka", "stream.kafka.consumer.type": "lowLevel", "stream.kafka.topic.name": "mytopic", "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder", "stream.kafka.hlc.zk.connect.string": "localhost:2191/kafka", "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory", "stream.kafka.zk.broker.url": "localhost:2191/kafka", "stream.kafka.broker.list": "localhost:19092" } }, "metadata": { "customConfigs": {} } }``` And Data like this: ```{"_index":"dhfkdfkdsjfk","_type":"_doc","_id":"68767677989hjhjkhkjh","_source.aExpIds":[815850,815857,821331],"_source.sDate":"2021-01-04 00:00:00"}``` I check all logs, I did not find any exceptions. But data is not appearing in Pinot controller portal.
@fx19880617: Kafka topic `event-count-stats-topic` you are producing to and the topic in your table configs are not same `"stream.kafka.topic.name": "mytopic",`
@ravi.maddi: sorry, both I am using same, here(post) I forgot to change both places as mytopic.
@fx19880617: the schema format is wrong here ```dateTimeFieldSpecs": [ { "name": "_source.sDate", "dataType": "LONG", "format": "1:SECONDS:SIMPLE_DATE_FORMAT:SECONDS:SIMPLE_DATE_FORMAT", "granularity": "1:DAYS" }```
@ravi.maddi: can correct me please
@fx19880617: ```"format": "1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss",```
@ravi.maddi: i thinks , "dataType: LONG" also wrong? it should be STRING , am I right?
@fx19880617: right
@kc2005au: @kc2005au has joined the channel
@ravi.maddi: *Can you help me* -- how to check, sample data is valid to defined schema?
@shvetadogra: @shvetadogra has joined the channel
@1705ayush: Hi everyone, I am facing an issue while ingesting batch data into Pinot. The command to ingest the data executes successfully, ```$ pinot-admin.sh LaunchDataIngestionJob -jobSpecFile /home/ayush/ayush_workspace/iVoyant/analytics/data/hospital_data/job-spec.yml ..... Pushing segment: hospital to location: for table hospital Sending request: to controller: 4cb684aaf215, version: Unknown Response for pushing table hospital segment hospital to location - 200: {"status":"Successfully uploaded segment: hospital of table: hospital"}``` But, the *table status* on UI turns *BAD* Here is the Error logged in pinot-server: `2021/03/19 15:03:24.082 ERROR [SegmentOnlineOfflineStateModelFactory$SegmentOnlineOfflineStateModel] [HelixTaskExecutor-message_handle_thread] Caught exception in state transition from OFFLINE -> ONLINE for resource: hospital_OFFLINE, partition: hospital` `java.lang.IllegalStateException: Key separator not found: APR, segment: /tmp/pinotServerData/hospital_OFFLINE/hospital/v3` `at shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:444)` Any idea? what could be wrong here ? I have attached the error log. Any help is appreciated !
@1705ayush: I did not realize that I was dealing with column names having spaces in it. Removing the spaces in the column names, worked out
@fx19880617: which column has space? in schema?
@1705ayush: Most of the column names had space in it. As well the column names mentioned in the schema had space in it. The column names were exactly same in both the csv file and schema.json
@fx19880617: oic, then we should try to prevent creating schema in Pinot then :stuck_out_tongue:
@fx19880617: and give the error msg
@saurabh: @saurabh has joined the channel
@sunxiaohui.bj: @sunxiaohui.bj has joined the channel
@savannahjenglish: @savannahjenglish has joined the channel
@tisantos: @tisantos has joined the channel
@pabraham.usa: Hello, Just wondering is it normal MMAP going very high ? Also do this means I need to have ~1.5TB free space to hold the MMAP?
@mayanks: The servers memory map the indexes. So this should reflect the size of segments you have on the server. Is that not the case?

#getting-started

@brianolsen87: @brianolsen87 has joined the channel
@kc2005au: @kc2005au has joined the channel

#pinot-rack-awareness

@xulinnankai: @xulinnankai has joined the channel
@xulinnankai: @xulinnankai set the channel purpose: Server Rack Metadata Retrieval and Persistence on Azure Environment
@ssubrama: @ssubrama has joined the channel
@rkanumul: @rkanumul has joined the channel
@docchial: @docchial has joined the channel
@g.kishore: @g.kishore has joined the channel
@fx19880617: @fx19880617 has joined the channel
@dlavoie: @dlavoie has joined the channel
@pabraham.usa: @pabraham.usa has joined the channel
@ssubrama: Thanks for creating the channel, Lin. Can we rename this channel to (say) pinot-rack-awareness
@xulinnankai: Sure. Will do. I will invite Jay once he join Pinot oss slack.
@xulinnankai: @xulinnankai has renamed the channel from "issue-6532" to "pinot-rack-awareness"
--------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org For additional commands, e-mail: dev-h...@pinot.apache.org