Apache Pinot Daily Email Digest (2021-10-01)

Pinot Slack Email Digest Fri, 01 Oct 2021 19:00:31 -0700

#general

@shish: I have parquet data placed in the S3 under prefix such as . I want to use these partitions(year, month, day) to filter data by partition value in pinot. How can I do it?
@shish: AWS Athena:
@mayanks: Currently, you need to push data to Pinot to be able to query it:
@shish: Yes I am able to push parquet files but during push I want to create partition based on S3 prefix (data is already partitioned in s3 and i want to take benefit of that: ) eg here year, month and day
@shish: In Athena during table creation we can pass partitioned by and it will handle it. Please check scenario 1 in below doc. I am looking for a way to do it in pinot.
@nanda.yugandhar: This is general path format supported by spark, it won't store folder path values (partition values like year, month, day, country) in parquet file.
@nanda.yugandhar: This also applies to CSV, Json, Text, ... not just parquet
@prabha.cloud: Does pinot supports arm arch which can leverage AWS EC2 Gravtion 2 processor ?
@mayanks: Pinot is built using Java, and runs on JVM. As long as you have that.
@prabha.cloud: will try in few mins
@prabha.cloud: docker image needs to be available in arm64 along with amd64
@mayanks: I see. @xiangfu0 ^^
@prabha.cloud: something like this docker pull trinodb/trino:362-arm64
@prabha.cloud: quickstart works fine. will evaluate and let you know if any issues, Thank you @mayanks
@mayanks: You mean on arm64?
@prabha.cloud: yes EC2 with Graviton 2
@xiangfu0: this is interesting. you can also build the docker image by yourself from the docker script:
@iamluckysharma.0910: @iamluckysharma.0910 has joined the channel
@nanda.yugandhar: @nanda.yugandhar has joined the channel
@son.nguyen.nam: @son.nguyen.nam has joined the channel
@dadelcas: I'm looking through the code to see if I can load config properties from the env vars instead of the files. It doesn't seems like this is supported at the moment, Can someone confirm? This is particularly useful for credentials
@mayanks: I think @xiangfu0 had added a while back?
@xiangfu0: for ingestion jobs, you can do that
@xiangfu0: but not for pinot instances
@xiangfu0:
@dadelcas: Yup, I was referring to server configuration. I should rise a github issue, may be worth discussing

#random

@iamluckysharma.0910: @iamluckysharma.0910 has joined the channel
@nanda.yugandhar: @nanda.yugandhar has joined the channel
@son.nguyen.nam: @son.nguyen.nam has joined the channel

#troubleshooting

@amol: Hii Pinot Team, my pinot cluster is running inside docker container. I want to monitor the Pinot cluster with Prometheus and for that I have tried to configure Prometheus JMX Exporter inside pinot-controller.conf , pinot-broker.conf and pinot-server.conf respectively like - controller.jvmOpts= "-javaagent:/opt/pinot/etc/jmx_prometheus_javaagent/jmx_prometheus_javaagent-0.12.0.jar=8008:/opt/pinot/etc/jmx_prometheus_javaagent/configs/pinot.yml -Xms256M -Xmx1G" broker.jvmOpts= "-javaagent:/opt/pinot/etc/jmx_prometheus_javaagent/jmx_prometheus_javaagent-0.12.0.jar=8008:/opt/pinot/etc/jmx_prometheus_javaagent/configs/pinot.yml -Xms256M -Xmx1G" server.jvmOpts= "-javaagent:/opt/pinot/etc/jmx_prometheus_javaagent/jmx_prometheus_javaagent-0.12.0.jar=8008:/opt/pinot/etc/jmx_prometheus_javaagent/configs/pinot.yml -Xms256M -Xmx1G" But unable to get the metrics. what should I do? Kindly help. @mayanks
@mayanks: Hey @amol does this doc help:
@iamluckysharma.0910: @iamluckysharma.0910 has joined the channel
@nanda.yugandhar: @nanda.yugandhar has joined the channel
@son.nguyen.nam: @son.nguyen.nam has joined the channel
@gabuglc: Hello, is there an optimal table config for upserts?. I'm able to consume all my kafka topic without the upsert config (93M+ messsages). However, when I put the upsert config on my table it stops consuming at a certain offset around (23M+ messages)
@gabuglc:
@qianbo.wang: Hi pinot team, I’m getting this error `Catalog 'pinot' does not support table property 'time_field'` when creating table with this query: ```CREATE IF NOT EXIST ... WITH ( pinot_table_name = 'enriched_invoices', time_field = 'created_at', offline_replication = 3, offline_retention = 365, index_inverted = ARRAY['licensee_id','facility_id'], index_bloom_filter = ARRAY['licensee_id','facility_id'], index_sorted = 'created_at', index_aggregate_metrics = true, index_create_during_segment_generation = true, index_auto_generated_inverted = false, index_enable_default_star_tree = false);```
@qianbo.wang: I need to double check but I think it worked in 0.6.x version but not right now in 0.8.x, were there any changes could cause this?
@mayanks: Can you explain the query in Pinot to see what is the Pinot side query?
@qianbo.wang: Hi sorry, no worries, it is actually caused by an infras change on our side

#feat-geo-spatial-index

@kchavda: @kchavda has joined the channel
@kchavda: Hi all, I'm working on creating a schema for a realtime table (using Kafka) and have a geo column which is already formatted in the kafka topic ```"location_st_point":{"wkb":"AQEAACDmEAAArS5MS1GXXcD0lychov5AQA==","srid":4326}``` Do I need to do a transform on this in the schema? ```{ "dataType": "BYTES", "name": "location_st_point", "transformFunction": "toSphericalGeography(point)" },```
@kchavda: Since not a lot of ppl in here I hope it's okay for
@yupeng: If you do not transform it during ingestion, you can transform it at query time

#pinot-dev

@son.nguyen.nam: @son.nguyen.nam has joined the channel

#getting-started

@son.nguyen.nam: @son.nguyen.nam has joined the channel
@tiger: Is there a way to view the number of nodes that are generated for a star tree? (I'm exploring various indexing configs and was wondering how different setups affects the storage and performance)
@tiger: Also is there any documentation that goes more in depth about how the star tree works and exactly what data is generated?
--------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]