#general


@mohamed.sultan: @mohamed.sultan has joined the channel
@sukypanesar: @sukypanesar has joined the channel
@pugal.selvan: @pugal.selvan has joined the channel
@chxing: Hi all, when pinot to consuming kafka , how to parse nest json such like { “data”: { “name”: “cc”, “age”: 3 } } I just need “name” and age in table
  @tanmay.movva: You have to define `name` and `age` in schemaConfig. After that, You need to use the json inbuilt functions in the transform config to extract those fields.
  @tanmay.movva: ```"transformConfigs": [ { "columnName": "merchant_id", "transformFunction": "jsonPathString(data, '$.name')" }]``` This is an example to extract name from the field data.
  @tanmay.movva: Related documentation.
  @chxing: And then i need to use during query time
  @tanmay.movva: You won’t have to. Once you have defined transformConfigs during ingestion, you would be able to select the fields directly. ```select name from your_table```
  @tanmay.movva: You can check this
  @chxing: Ok ,thx ,if this method will not have good performance compare to flatten json
  @chxing: Like you wiki writed
  @chxing:
  @tanmay.movva: If you are ingesting it as a string you can use those functions during query time. Else you can extract them during ingestion itself.
@sosyalmedya.oguzhan: Hi, is there any article about kubernetes production experience for pinot? We want to learn things like optimal server count, num of segments per server, optimal resources for realtime and offline servers etc. I've found a few articles, but i want to know if there are other articles
  @mayanks: Some of it is tribal knowledge atm, but yes would be good to document those. What are your specific questions, may be I can help
  @sosyalmedya.oguzhan: I actually dont know what should be memory size for each server for about 1 tb table with high qps. For example, we are using 128 gb memory per node in druid for all tables. But in pinot, we want to use different tenant for each big tables
  @sosyalmedya.oguzhan: Also we expect 4 segments for each day, and keep last 2 years data for a table. Each segment size can be 100 mb to 300 mb. I really dont know num servers, memory size etc
  @mayanks: What's the read throughput/latency requirement?
  @mayanks: For high throughput (thousands of reads) and low latency (< 200ms) you do want at least 64 GB RAM, 32 cores and load each server with few hundreds of GBs of data on SSD.
  @mayanks: If the data can be partitioned, that really helps a lot with scaling for throughput
  @mayanks: Say you load 200 GB per server, then you need 5 server nodes x 3 for replication.
  @mayanks: This will lead you with good head room
  @mayanks: If you want to discuss further, we can hop on a zoom call if that is more efficient.
@karinwolok1:
@wolfram: @wolfram has joined the channel
@everton.santana: @everton.santana has joined the channel
@chxing: Hi. All , If my kafka topic has 32 partitions, can we control pinot consuming threads self
  @mayanks: What do you mean by controlling the consuming threads?
  @chxing: @mayanks I means, when kafka topic has 32 partitions, pinot will start 32 threads to consuming this topic, can we control this consuming threads such like i can config to 12?
  @mayanks: Not at the moment. You could play with number of servers though. For example if you add 4 servers then each will read 8 partitions, and if you add 8 then each will read 4 partitions.
  @mayanks: Curious what's the intention here? Is it because you have less number of cores and also don't want to add more servers?
  @chxing: Year, I think if one topic have 32 partitions but don’t have too much logs, I think we can config less threads to consuming this topic
  @mayanks: Any chance you can reduce number of partitions? At any rate, this does seems like a legit ask. Mind filing an issue (with reasoning above) so we can see if this makes sense to support in future?
  @chxing: Thx @mayanks another question, Can we delete or truncate one day’s data in pinot by commends like mysql does?
  @mayanks: Deletion is supported at segment level, not at record or time-boundary. If you can find segments for the day, threre is api's to delete (check swagger api)
  @chxing: ok, thx for your replay:slightly_smiling_face:

#random


@mohamed.sultan: @mohamed.sultan has joined the channel
@sukypanesar: @sukypanesar has joined the channel
@pugal.selvan: @pugal.selvan has joined the channel
@wolfram: @wolfram has joined the channel
@everton.santana: @everton.santana has joined the channel

#troubleshooting


@mohamed.sultan: @mohamed.sultan has joined the channel
@sukypanesar: @sukypanesar has joined the channel
@mohamed.sultan: Hi. I'm new to Apache pinot. I have a use case where I need to ingest data through kafka (externally with SSL certs configured) to pinot which is installed in kubernetes environment. How can i connect exernal kafka with SSL certs configured in pinot on kubernetes environment? It would be great if you forward me to the respective documentation.
  @g.kishore: Use Kafka Partition(Low) Level Consumer with SSL
  @mohamed.sultan: Hi @g.kishore Thanks for sharing. can you please point me where should i supposed to add the JSON script in which the config will reflect in pinot.
  @tanmay.movva: > How can i connect exernal kafka with SSL certs configured in pinot on kubernetes environment? You can reference Environment variables in table config. That should be enough to connect to SSL.
  @tanmay.movva:
  @tanmay.movva: ^This is how our streamConfigs for kafka look like. The secrets and the truststore, keystore location generation are handled using the InitScript of the sts.
  @tanmay.movva: More on how to use Env vars -
  @mohamed.sultan: @tanmay.movva so we can add streamconfigs JSON scripts in OFFLINE table, right?
  @mohamed.sultan: There are some tables already created, I have checked the table configs, there i see streamConfigs, if I change that, the ingestion from the external kafka? on the particular table. am i right?
  @mohamed.sultan: @tanmay.movva trust store location and key store location should be the mounted path in pinot-zookeeper pods right? since i'm using kubernetes environment. Please confirm this.
  @tanmay.movva: In the pinot servers, controllers and brokers(might not be required).
  @mohamed.sultan: I just want to ask where to mount the keystore and truststore
  @mohamed.sultan: since we need to mention in table config
@mohamed.sultan: Need help!!
@pugal.selvan: @pugal.selvan has joined the channel
@chxing: Hi All When I am writing big with very high speed 190k/s into pinot I got the following error, where can I check the reason
  @g.kishore: Click on the segment and check the logs in that server
@valentin: Hello, I have some issues on a realtime table, my segments are moved to OFFLINE because of a `java.lang.NullPointerException` : ```Mar 25 09:50:52 pinot-hosts-3 bash[11610]: 2021/03/25 09:50:52.405 ERROR [LLRealtimeSegmentDataManager_datasource_605b02ec0eb00003003bfc41__0__18__20210325T0855Z] [datasource_605b02ec0eb00003003bfc41__0__18__20210325T0855Z] Exception while in work Mar 25 09:50:52 pinot-hosts-3 bash[11610]: java.lang.NullPointerException: null Mar 25 09:50:52 pinot-hosts-3 bash[11610]: at org.apache.pinot.core.data.manager.realtime.SegmentBuildTimeLeaseExtender.addSegment(SegmentBuildTimeLeaseExtender.java:100) ~[pinot-all-0.7.0-jar-with-dependencies.jar:0.7.0-695ca390a8080409b3b76f250f2315b81b86b362] Mar 25 09:50:52 pinot-hosts-3 bash[11610]: at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.buildSegmentForCommit(LLRealtimeSegmentDataManager.java:698) ~[pinot-all-0.7.0-jar-with-dependencies.jar:0.7.0-695ca390a8080409b3b76f250f2315b81b86b362] Mar 25 09:50:52 pinot-hosts-3 bash[11610]: at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager$PartitionConsumer.run(LLRealtimeSegmentDataManager.java:610) [pinot-all-0.7.0-jar-with-dependencies.jar:0.7.0-695ca390a8080409b3b76f250f2315b81b86b362] Mar 25 09:50:52 pinot-hosts-3 bash[11610]: at java.lang.Thread.run(Thread.java:748) [?:1.8.0_275]``` Do you have any idea on why this is happening? Thank you
  @jackie.jxt: Hi Valentin, did you delete a real-time table colocated on this server? There was a bug when a real-time table is deleted, and is fixed in this PR:
  @jackie.jxt: Upgrading to the latest master version or restarting the server can solve the issue
  @valentin: it work thank you
@laxman: Hello, we are on 0.6 and we are facing one issue with realtime tables consumption from Kafka. Consumption from kafka stops all of a sudden. There are no errors/exceptions in logs. And from thread dumps, we don’t see the consumption thread. From this I assume we hit some uncaught exception in consumption thread. We are using Kafka low level consumer. At pinot bootstrap, should we register a *default uncaught exception handler* which just logs them with full stack trace. This helps in debugging several unhandled corner cases.
  @jackie.jxt: The consuming threads have a global catch which will log the exception if anything goes wrong. Can you please check the external view of the table and see if the segment runs into ERROR state.
  @laxman: yes. checking
  @jackie.jxt: Also search for the `ERROR` log with logger name `LLRealtimeSegmentDataManager_<segmentName>`
  @laxman: Nope. There are no error logs for this either in controller or server
  @fx19880617: hmm, have you checked the log file inside the server container?
  @laxman: yes
  @fx19880617: hmm, can you try to set the log level to info or debug on server so it may print out the stacktrace, in any case, there should be something output. Also for test purpose, you can make the logger to be sync
@wolfram: @wolfram has joined the channel
@everton.santana: @everton.santana has joined the channel
@harold: Hi, We have a Pinot cluster with around 100 realtime tables. Around 28 tables went into bad state. We have 2 sets of servers (3 each) with different tags (e.g., realtime and offline). Our tables are configured (using the tagOverrideConfig) that once a consuming segment is completed, it is moved immediately to servers with offline tag. On the tables that went to bad state, we noticed that from the UI, it's still showing that the segment is still assigned to the "realtime" server. We do see the segment get completed and get uploaded to the deepstore. Also, we noticed that in zookeeper in pinot -> instances -> server -> messages there are lots of messages. Does that mean that messages are not getting consumed by the server? I assume this is how controller/server communicate thru helix (?).
  @fx19880617: @npawar
  @npawar: when you use tagOverrideConfig, the consuming segment completes and still remains on the realtime servers. There’s an hourly periodic task, that will move the completed segments from the realtime to the offline tagged servers.
  @npawar: so, it is normal to see some completed segments still on the realtime servers
  @harold: I think the issue is that the tables are in bad state
  @harold: and the segment completed 12+ hours ago
  @harold: So for example, we started the workload midnight. 0_0 got completed, and then 0_1 segment is now consuming (both in realtime server 0). and no new data is getting consumed)
  @harold: I do see this message in pinot-controller (every hour): 2021/03/25 09:02:16.763 ERROR [SegmentRelocator] [restapi-multiget-thread-789] Relocation failed for table: comp0_horizontal_REALTIME
  @npawar: and any stack trace with it? afaik, relocation will not happen if the segments are in error state
  @harold: External View in ZK: ```{ "id": "comp0_horizontal_REALTIME", "simpleFields": { "BATCH_MESSAGE_MODE": "false", "BUCKET_SIZE": "0", "IDEAL_STATE_MODE": "CUSTOMIZED", "INSTANCE_GROUP_TAG": "comp0_horizontal_REALTIME", "MAX_PARTITIONS_PER_INSTANCE": "1", "NUM_PARTITIONS": "2", "REBALANCE_MODE": "CUSTOMIZED", "REPLICAS": "1", "STATE_MODEL_DEF_REF": "SegmentOnlineOfflineStateModel", "STATE_MODEL_FACTORY_NAME": "DEFAULT" }, "mapFields": { "comp0_horizontal__0__0__20210324T2221Z": { "Server_pinot-server-realtime-1.pinot-server-realtime-headless.svc.cluster.local_8098": "CONSUMING" } }, "listFields": {} }``` Ideal view: ```{ "id": "comp0_horizontal_REALTIME", "simpleFields": { "BATCH_MESSAGE_MODE": "false", "IDEAL_STATE_MODE": "CUSTOMIZED", "INSTANCE_GROUP_TAG": "comp0_horizontal_REALTIME", "MAX_PARTITIONS_PER_INSTANCE": "1", "NUM_PARTITIONS": "2", "REBALANCE_MODE": "CUSTOMIZED", "REPLICAS": "1", "STATE_MODEL_DEF_REF": "SegmentOnlineOfflineStateModel", "STATE_MODEL_FACTORY_NAME": "DEFAULT" }, "mapFields": { "comp0_horizontal__0__0__20210324T2221Z": { "Server_pinot-server-realtime-1.pinot-server-realtime-headless.svc.cluster.local_8098": "ONLINE" }, "comp0_horizontal__0__1__20210325T0755Z": { "Server_pinot-server-realtime-1.pinot-server-realtime-headless.svc.cluster.local_8098": "CONSUMING" } }, "listFields": {} }```
  @harold: Stack trace before: ```2021/03/25 09:02:16.763 WARN [TableRebalancer] [restapi-multiget-thread-789] Caught exception while waiting for ExternalView to converge for table: comp0_horizontal_REALTIME, aborting the rebalance java.util.concurrent.TimeoutException: Timeout while waiting for ExternalView to converge at org.apache.pinot.controller.helix.core.rebalance.TableRebalancer.waitForExternalViewToConverge(TableRebalancer.java:504) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-27b61fe6a338b1363efb64a7fed87d95cc793f8a] at org.apache.pinot.controller.helix.core.rebalance.TableRebalancer.rebalance(TableRebalancer.java:351) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-27b61fe6a338b1363efb64a7fed87d95cc793f8a] at org.apache.pinot.controller.helix.core.relocation.SegmentRelocator.lambda$processTable$0(SegmentRelocator.java:96) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-27b61fe6a338b1363efb64a7fed87d95cc793f8a] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_282] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_282] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_282] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_282] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_282]```
  @jackie.jxt: Seems the problem is that the consuming segment is not able to turn online. Can you please check the server log and see if there is any exception log
  @jackie.jxt: This issue might be related:
  @harold: I don't see any NullPointerException in the log. Any particular thing I need to look for?
  @jackie.jxt: Any `ERROR` log in your server log?
  @jackie.jxt: Based on the external view and ideal state, the server stuck at consuming->online segment transition
  @harold: I don't see any other error in SERVER log related to this particular table

#segment-write-api


@npawar: File based SegmentWriter:
--------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org For additional commands, e-mail: dev-h...@pinot.apache.org

Reply via email to