Apache Pinot Daily Email Digest (2021-05-06)

Pinot Slack Email Digest Thu, 06 May 2021 19:00:34 -0700

#general

@rohithuppala: @rohithuppala has joined the channel
@pedro.cls93: Hello, I've been reading the Pinot documentation and I'm a bit confused regarding the data that Controller & Server are responsible for respectively. My understanding is that Server instances store actual data segments/partitions of a table. Controllers store only a mapping of which servers store which segments for a given table. If this is the case, what does it mean when a segment is uploaded to a Controller? As mentioned in:
@pedro.cls93: My use-case is, I want to create a realtime table with upsert capabilities that consumes from a partitioned kafka topic. As I understand it: • The Pinot Server instances are responsible for consuming from the kafka partitions into local segments. • The Pinot Controller instances contain a mapping of which servers contain certain segments. • Once the segments are completed (whether by space or time requirements) they are uploaded to a distributed file system.
@mayanks: Servers store a local copy of the data for faster query serving. Controller maintains a mapping of segment to server, but also stores a golden copy of segments in the plugged in storage (HDFS/S3/etc)
@mayanks: The distributed file system you are referring to is hooked up to the controller
@pedro.cls93: What is the golden copy for?
@pedro.cls93: Don't the servers upload their segments to the plugged storage?
@arthurnamiasdecrasto: @arthurnamiasdecrasto has joined the channel
@patidar.rahul8392: @patidar.rahul8392 has joined the channel
@patidar.rahul8392: Hello everyone I have recently started Using Apache Pinot and I am able to integrate my Kafka with Pinot. I have done the setup in local and once I started Pinot using bin/quick-start-batch.sh. I am able to see all the Pinot details on localhost:9000. I want to add one user Authentication feature here so someone use this localhost:9000. It should ask for credential and then it should go to pinot home page. I checked multiple documents and youtube videos but could not find any reference for the same. Kindly suggest /guide me how Can i implement the same.
@mayanks: @slack1
@patidar.rahul8392: Team Someone Please help.
@mayanks: I know that @slack1 has added support for authentication. Let's give him a minute. In the meanwhile I can try to find the PRs he added.
@patidar.rahul8392: Okay thanks @mayanks
@mayanks: You can refer to this PR for now: @patidar.rahul8392
@patidar.rahul8392: @mayanks Thanks for sharing this link.I checked this link but here it's implemented through java. In my case I am not using any programming language already have data in Kafka topic so I am giving all the Kafka topic and server details in realtime-table-config.json table and schema details in table-schema.json file and executing admin.sh file with AddTable property and passing this schemaFile and config fille, checking on localhost:9000 and able to see data directly without any user Authentication details instead of jumping directly on pinot home page I want to give username and password.
@slack1: Hi RK - the PR above actually achieves the scenario you describe. It’ll officially become available in Pinot 0.8.0+, but you can already use the snapshot to try out the feature. There’s already a configuration reference here:
@richhickey: @richhickey has joined the channel
@patidar.rahul8392: What is the process to use HDFS as Pinot deepstrage?
@chinmay.cerebro: @tingchen ^^ looks like we dont have a good doc. Mind updating it ?
@tingchen: ok.
@patidar.rahul8392: Thanks @chinmay.cerebro @tingchen
@tingchen:
@tingchen: have your read the above tutorial?
@tingchen: HDFS setup is similar except the storage now is HDFS instead of s3
@patidar.rahul8392: Ok @Thanks Ting Chen.Will read this and try. will connect again in case of any issue.
@chinmay.cerebro: @tingchen might be useful to copy that and modify with a working example
@chinmay.cerebro: cause I'm sure others will have similar questions
@tingchen: sure would do that. Just that the s3 tutorial looks very close to HDFS setup too.
@gaurav.eca: @gaurav.eca has joined the channel
@guido.schmutz: @guido.schmutz has joined the channel

#random

@rohithuppala: @rohithuppala has joined the channel
@arthurnamiasdecrasto: @arthurnamiasdecrasto has joined the channel
@patidar.rahul8392: @patidar.rahul8392 has joined the channel
@richhickey: @richhickey has joined the channel
@gaurav.eca: @gaurav.eca has joined the channel
@guido.schmutz: @guido.schmutz has joined the channel

#troubleshooting

@rohithuppala: @rohithuppala has joined the channel
@chethanu.tech: I was try to do `mvn install` on pinot repo, Pinot Spark Connector is failing to build [is there any docs on building Spark Connector?] ```[INFO] Pinot Connectors ................................... SUCCESS [ 0.229 s] [INFO] Pinot Spark Connector .............................. FAILURE [ 7.478 s] [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 02:23 min (Wall Clock) [INFO] Finished at: 2021-05-06T13:36:03+05:30 [INFO] ------------------------------------------------------------------------ [ERROR] Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.2.2:compile (scala-compile-first) on project pinot-spark-connector: wrap: org.apache.commons.exec.ExecuteException: Process exited with an error: 240 (Exit value: 240) -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn <args> -rf :pinot-spark-connector ~/Work/Github/pinot/incubator-pinot helm-sec-update mvnd clean install -DskipTests -Dfast```
@fx19880617: Can you check scala version ?
@arthurnamiasdecrasto: @arthurnamiasdecrasto has joined the channel
@patidar.rahul8392: @patidar.rahul8392 has joined the channel
@richhickey: @richhickey has joined the channel
@gaurav.eca: @gaurav.eca has joined the channel
@pedro.cls93: When creating a table definition in the UI (using v0.7.1) I'm getting an http error 500 to the /tables POST endpoint with the following message: ```code: 500 error: "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory"``` Does anyone know what this means?
@pedro.cls93: Here are some associated screenshots
@pedro.cls93: The table definition is accepted in the validate http method so I'm assuming that the spec is ok.
@ken: Hi @pedro.cls93 - I sometimes have look in the logs (controller, broker(s), servers) to get more details when an error comes back via the REST API. Though others on Slack might instinctively know what causes that specific problem.
@pedro.cls93: Thank you for the input, I found that `org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory` is not in the classpath: ```2021/05/06 15:48:21.790 ERROR [PinotTableRestletResource] [grizzly-http-server-1] org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory java.lang.ClassNotFoundException: org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory at java.net.URLClassLoader.findClass(URLClassLoader.java:382) ~[?:1.8.0_282] at java.lang.ClassLoader.loadClass(ClassLoader.java:418) ~[?:1.8.0_282]```
@pedro.cls93: I thought that this particular class always comes by default, isn't that the case?
@ken: I had thought so too. Normally the beginning of the log file will have info from when the service started up, which includes a list of loaded plugins. So I’d look for that next.
@dlavoie: The default path dir on K8S loads all plugins, if you customized it as the doc instruct, you will not load the kafka and other plugins by default
@pedro.cls93: That was exactly it, thank you Daniel. As an improvement suggestion, there should be a way to append plugins instead of replacement.
@dlavoie: Plugin scanning is recursive, so you only need to adapt you plugin dir accordingly
@pedro.cls93: Understood, thank you very much Daniel. I misinterpreted the documentation thinking we had to override the java property with our plugins
@dlavoie: Tar-based pinot needs to be configured with the plugin dir, but the k8s deployment auto configures it. And the doc is not contextual to the k8s. I’ve been through the same troubleshooting :slightly_smiling_face:
@1705ayush: Hi all, We want to assign storageClass to pinot zookeeper in the helm chart so that whenever a pinot-zookeeper pod gets created it automatically claims a PV. But I see, there is no direct way to assign using values.yaml We tried installing incubator/zookeeper using the below command ```helm -n my-pinot-kube install pinot-zookeeper incubator/zookeeper --set replicaCount=1``` and disabled the zookeeper in helm/pinot/values.yaml by modifying zookeeper.enabled to false. But we see face an error from `_helpers.tpl` indicating not able to fetch the `zookeeper.url` and related to `configurationOverride`. ```Error: template: pinot/templates/server/statefulset.yml:63:27: executing "pinot/templates/server/statefulset.yml" at <include "zookeeper.url" .>: error calling include: template: pinot/templates/_helpers.tpl:79:33: executing "zookeeper.url" at <index .Values "configurationOverrides" "zookeeper.connect">: error calling index: index of nil pointer``` Any help is appreciated !
@fx19880617: You can find it under zookeeper helmChart :stuck_out_tongue: in short, here are the options under `zookeeper` section in `values.yaml` file: ```persistence: enabled: true ## zookeeper data Persistent Volume Storage Class ## If defined, storageClassName: <storageClass> ## If set to "-", storageClassName: "", which disables dynamic provisioning ## If undefined (the default) or set to null, no storageClassName spec is ## set, choosing the default provisioner. (gp2 on AWS, standard on ## GKE, AWS & OpenStack) ## # storageClass: "-" accessMode: ReadWriteOnce size: 5Gi```
@fx19880617: if you already have zk, then disable zk should be good, I will check the `_helpers.tpl` script
@fx19880617: can you try to define `zookeeper.url` in the `values.yml` file and that should provide the overrided zk url
@1705ayush: Thank you @fx19880617 for the help! I tried mentioning just the below values in values.yaml for zookeeper section ```zookeeper: enabled: false url: <ip> port: <pod-exposed-port>``` When I install pinot using `helm -n my-pinot-kube install pinot .` it throws error: ```Error: template: pinot/templates/server/statefulset.yml:63:27: executing "pinot/templates/server/statefulset.yml" at <include "zookeeper.url" .>: error calling include: template: pinot/templates/_helpers.tpl:79:33: executing "zookeeper.url" at <index .Values "configurationOverrides" "zookeeper.connect">: error calling index: index of nil pointer``` Line 79 in _helpers.tpl is ``` {{- $zookeeperConnectOverride := index .Values "configurationOverrides" "zookeeper.connect" }}```
@fx19880617: isee
@fx19880617: can you add below into ```zookeeper.connect=<ip>:2181```
@fx19880617: ```zookeeper: enabled: false url: <ip> port: <pod-exposed-port> connect: <ip>:<pod-exposed-port>```
@1705ayush: I tried : ```zookeeper: enabled: false url: <ip> port: <pod-exposed-port> connect: 192.168.49.2:30760 zookeeper.connect: 192.168.49.2:30760``` Still ```Error: template: pinot/templates/server/statefulset.yml:63:27: executing "pinot/templates/server/statefulset.yml" at <include "zookeeper.url" .>: error calling include: template: pinot/templates/_helpers.tpl:79:33: executing "zookeeper.url" at <index .Values "configurationOverrides" "zookeeper.connect">: error calling index: index of nil pointer``` Does it have issue with index .Values "configurationOverrides" ?
@fx19880617: I feel so, let’s remove that line in _helpers.tpl
@fx19880617: since we only need zk url
@fx19880617:
@fx19880617: Fix this in the PR
@jaydesai.jd: @fx19880617 Can u approve the workflow again for PR : Thanks in advance :slightly_smiling_face:
@guido.schmutz: @guido.schmutz has joined the channel
@avasudevan: Hi…When trying to `Upload the schema and Table Config` - I am getting the following error… ```Sending request: to controller: ea8d7bfc16ea, version: Unknown {"code":500,"error":"org.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadata"}``` However I am able to login to my kafka docker and describe and that works fine… ```bash-4.4# bin/kafka-topics.sh --bootstrap-server kafka:9092 --topic transcript-topic --describe Topic: transcript-topic PartitionCount: 1 ReplicationFactor: 1 Configs: segment.bytes=1073741824 Topic: transcript-topic Partition: 0 Leader: 0 Replicas: 0 Isr: 0``` Any idea where i am missing?
@dlavoie: `Tumeout expiered while fetching topic metadata` is typically a sign that pinot was able to connect to your kafka instance but failing to load the topic. Double check the topic name
@dlavoie: potentially the kafka hostname might also be the problem
@avasudevan: hmm….When i try to describe the topic specifying the hostname it works… ```incubator-pinot (master) ✗ docker exec \ -t kafka \ /opt/kafka/bin/kafka-topics.sh \ --bootstrap-server kafka:9092 \ --topic transcript-topic --describe Topic: transcript-topic PartitionCount: 1 ReplicationFactor: 1 Configs: segment.bytes=1073741824 Topic: transcript-topic Partition: 0 Leader: 0 Replicas: 0 Isr: 0``` Doesn’t this mean its already able to recognize the host.
@avasudevan: Pinot UI is also able to list out the topics….
@avasudevan: Apologies for some newbie questions here….. In the Streaming Example - I see that it says it starts the Pinot deployment by starting `Zookeeper`, `Kafka`…. But, i don’t see any other docker containers started other than `pinot-quickstart` ….
@g.kishore: quickstart starts everything in one process
@fx19880617: you can follow this doc for running all components in different docker container:
@avasudevan: Awesome this is what i was looking for. Thanks! :point_up:
@avasudevan: Where should the `schemaFile` and the `tableconfigFile` be placed? Tried it with placing it in local as well as in controller..didn’t work.. ```docker run \ --network=pinot-demo \ --name pinot-streaming-table-creation \ 0e536a319df3 AddTable \ -schemaFile /tmp/pinot-quick-start/transcript-schema.json \ -tableConfigFile /tmp/pinot-quick-start/transcript-table-realtime.json \ -controllerHost pinot-controller \ -controllerPort 9000 \ -exec```
@avasudevan: Steps until here :point_up:: • As of now running all the components in different docker containers…. • Created a kafka topic • Facing `FileNotFoundException` while creating table and schema
@avasudevan: Tried it with placing it in local as well as in controller..didn’t work..
@fx19880617: You need to use -v to mount your local disk to docker container
@fx19880617: Then you can use those files in the docker container
@avasudevan: Apologies for some newbie questions here….. In the Streaming Example - I see that it says it starts the Pinot deployment by starting `Zookeeper`, `Kafka`…. But, i don’t see any other docker containers started other than `pinot-quickstart` ….
@fx19880617: you can follow this doc for running all components in different docker container:

#pinot-dev

@gaurav.eca: @gaurav.eca has joined the channel

#getting-started