Apache Pinot Daily Email Digest (2021-06-23)

Pinot Slack Email Digest Wed, 23 Jun 2021 19:00:40 -0700

#general

@srikanth: @srikanth has joined the channel
@zsolt: In the docs at Is this section still accurate? I can't find any usage of the property in the code. > *Real-Time Pinot table:* In case of real-time tables, make sure the "_pinot.server.instance.reload.consumingSegment_" config is set to true inside . Without this, the current consuming segment(s) will not reflect the default null value for newly added columns.
@mayanks: You are right, seems like the default value was switched to true in this PR.
@mayanks: This was just merged a couple of days ago. The doc is still as per the prior official release.
@mayanks: cc @jackie.jxt
@zsolt: I mean I grepped through the history and I couldn't find a reference actually reading the config value. I don't think it was ever actually wired. It was weird because we haven't set it and adding fields worked.
@mayanks: Are you saying it works with 0.7.1. release?
@zsolt: yes
@jackie.jxt: @zsolt The next consuming segment will always get the updated schema. The feature is about whether the current consuming segment can add the new columns on the fly via reload
@zsolt: I see, I've found that the usage is through a different constant
@zsolt: Theres a comment > // Whether to reload consuming segment on scheme update. Will change default behavior to true when this feature is stabilized I assume this has happened already
@jackie.jxt: Yes, happened in less a week :wink:
@specsek: @specsek has joined the channel
@hsaini: @hsaini has joined the channel
@vlum: @vlum has joined the channel
@mercyshans: hi, does pinot provide any benefits for use cases that no aggregation will be applied to query (means there is no metrics columns, all are dimension columns). and is that the reason why metrics column does not allow other like String types
@mayanks: Do you mean you want to just fetch a bunch of rows from Pinot without any aggregation or group-by?
@mercyshans: @mayanks yes. or combination usage (like a table to server both aggregation queries but also just few rows query)
@mayanks: Yes, you can do so. Especially, if you have both aggregation and selection queries, both will work fine with Pinot
@mayanks: What you want to avoid is cases where you just have select * queries that are simply fetching millions of records per query, and there are no aggregation queries. In that case, you are not really utilizing the power of Pinot
@mercyshans: make sense. what about string type support for metric columns, why this is not supported? what if I just want to aggregate the count of this metrics, string type should be reasonable right?
@mayanks: You can do count(*) anytime.
@mayanks: Metric columns are usually ones where you would do something like `sum(metric)`
@mercyshans: ok so I need to make it as dimension columns, and avoid index by set it to `noDictionaryColumns` correct?
@mayanks: No
@mayanks: You just do something like `select count(<col>) from myTable where <col> != null`
@mayanks: Note, Pinot does not support nulls natively yet, so null values are replaced by a default value, which you will have to filter out
@mayanks: You don't need to do anything special here for just getting count
@mercyshans: yeah, but I do not want to create index on this column since will not filter or aggregate on this column, how do I avoid the indexing
@karinwolok1: Hi all! Would love for you to join us tomorrow for this event! We have some great speakers from LinkedIn, DoorDash, Confluent, Microsoft, Decodable, Stripe and more!
@qianbo.wang: Hi, having a question on . Can we use the return value of this function in `group by` statement? thanks in advance.
@mayanks: Looking at the code Lookup is implemented as a TransformFunction, which can be applied to group by.
@qianbo.wang: thanks

#random

@srikanth: @srikanth has joined the channel
@specsek: @specsek has joined the channel
@hsaini: @hsaini has joined the channel
@vlum: @vlum has joined the channel

#troubleshooting

@srikanth: @srikanth has joined the channel
@jmeyer: Hello Is it possible to generate segment names following the input file names ? Say I generate 10 files for 10 "ids", I'd want segments to contain these ids, so that they can be replaced later by generating another segment with the same name. e.g. `ID1.parquet -> prefix_ID1.segment` Anyway to make this work using `segmentNameGeneratorSpec.type` ? Maybe using a particular file structure like `data/ID/file.parquet` ? Thanks !
@mayanks: The default naming scheme already generates names friendly to overwrite at a later point in time, right?
@mayanks: For example, <tableName>_<minTime>_<maxTime>_<id>
@mayanks: If you regenerate data for a date partitioned folder, you will get consistent names, as long as the number of files is unchanged.
@jmeyer: What if I don't have a time column ? :smile:
@jmeyer: Can I use an "id" partitionned folder then ?
@mayanks: I think for refresh use case the convention is <tableName>_<id>
@jmeyer: So using a structure like so ```basedir/id1/file.parquet basedir/id2/file.parquet``` would generate segments with names ```<table_name>_id1.segment <table_name>_id2.segment``` ? :slightly_smiling_face:
@jmeyer: Meaning that regenerating those files would easily replace previous segments for the same "ids"
@mayanks: No. the id I am referring to is just a sequence number generated on the fly.
@mayanks: Just curious, why don't you have a time column? Is it a pure refresh use case?
@jmeyer: Ah I see.. Any way to make it easy to replace segments based on a user provided id ?
@jmeyer: > Just curious, why don't you have a time column? Is it a pure refresh use case? It is a table I'm using with `IN_SUBQUERY` :slightly_smiling_face:
@jmeyer: So it's more of a dimension table
@jmeyer: Hence why no time column
@jmeyer: I guess I can cheat and map ids -> time, but that sounds kind of hacky ^^
@mayanks: I'll have to check the code. But in the worst case ,the name generator is a very simple interface, and your use case seems like a one others might need, so might be good to implement, if not already supported.
@mayanks:
@mayanks: Care to take a look?
@jmeyer: Looks pretty simple indeed Can't seem to find a "suitable" strategy for my use case though
@mayanks: Ah, the interface does not provide a way to specify input file name.
@mayanks: Might be worth discussing in a broader forum via a github issue.
@mayanks: I do see your use case to be a good one to support.
@jmeyer: Interesting, I'll do that
@specsek: @specsek has joined the channel
@specsek: Greetings! Is there a stable version of the helm chart to run? I install the latest (0.7.1) but all the components crash with messages like the following ```Unrecognized VM option 'PrintGCDateStamps' Error: Could not create the Java Virtual Machine. Error: A fatal exception has occurred. Program will exit.```
@mayanks: @xiangfu0 ^^
@xiangfu0: oh, are you using k8s ?
@xiangfu0: can you try to remove `PrintGCDateStamps` tag from the javaOpts in `values.yaml` file?
@xiangfu0: I meant because we upgrade to java11 for those configs
@xiangfu0: You can also try to use image tag: `0.7.1-jdk11`
@specsek: I had to remove all the PrintGC statements to get it to run, and though are other errors ```SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/pinot/lib/pinot-all-0.7.1-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/pinot/plugins/pinot-file-system/pinot-s3/pinot-s3-0.7.1-shaded.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. ERROR StatusLogger File not found in file system or classpath: /opt/pinot/conf/log4j2.xml ERROR StatusLogger Reconfiguration failed: No configuration found for 'Default' at 'null' in 'null' WARNING: An illegal reflective access operation has occurred WARNING: Illegal reflective access by org.apache.pinot.spi.plugin.PluginClassLoader (file:/opt/pinot/lib/pinot-all-0.7.1-jar-with-dependencies.jar) to method java.net.URLClassLoader.addURL(java.net.URL) WARNING: Please consider reporting this to the maintainers of org.apache.pinot.spi.plugin.PluginClassLoader WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations WARNING: All illegal access operations will be denied in a future release 17:41:43.847 [main] ERROR org.apache.pinot.spi.plugin.PluginManager - Failed to load plugin [pinot-gcs] from dir [/opt/pinot/plugins/pinot-file-system/pinot-gcs] java.lang.IllegalArgumentException: object is not an instance of declaring class at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?] at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:?] at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?] at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?] at org.apache.pinot.spi.plugin.PluginClassLoader.<init>(PluginClassLoader.java:50) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-8ce1309844a639e5d441f4173e289459c4a5d918] at org.apache.pinot.spi.plugin.PluginManager.createClassLoader(PluginManager.java:196) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-8ce1309844a639e5d441f4173e289459c4a5d918] at org.apache.pinot.spi.plugin.PluginManager.load(PluginManager.java:187) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-8ce1309844a639e5d441f4173e289459c4a5d918] at org.apache.pinot.spi.plugin.PluginManager.init(PluginManager.java:157) [pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-8ce1309844a639e5d441f4173e289459c4a5d918] at org.apache.pinot.spi.plugin.PluginManager.init(PluginManager.java:123) [pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-8ce1309844a639e5d441f4173e289459c4a5d918] at org.apache.pinot.spi.plugin.PluginManager.<init>(PluginManager.java:104) [pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-8ce1309844a639e5d441f4173e289459c4a5d918] at org.apache.pinot.spi.plugin.PluginManager.<clinit>(PluginManager.java:46) [pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-8ce1309844a639e5d441f4173e289459c4a5d918] at org.apache.pinot.tools.admin.PinotAdministrator.main(PinotAdministrator.java:182) [pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-8ce1309844a639e5d441f4173e289459c4a5d918] 17:41:43.868 [main] ERROR org.apache.pinot.spi.plugin.PluginManager - Failed to load plugin [pinot-adls] from dir [/opt/pinot/plugins/pinot-file-system/pinot-adls]```
@specsek: and yes, I am using k8s :+1:
@xiangfu0: i see, is this on 0.7.1 image or 0.7.1-jdk11 image?
@specsek: 0.7.1-jdk11
@xiangfu0: ok, i’ll take a look
@specsek: thanks!:pray:
@xiangfu0: meanwhile you can try to use jdk8 image:
@xiangfu0: with the old jvmOpts
@hsaini: @hsaini has joined the channel
@vlum: @vlum has joined the channel
@sheetalarun.kadam2: Hello! I am using Presto Pinot python connector to query Pinot. I have a requirement for a regex type predicate on one of the dimensions. I created text index on the dimension. Will this help in the performance? Will it be able to use TEXT_MATCH to query?
@mayanks: You can explain the presto query that should tell the Pinot query. You can check if it is using text match.
@sheetalarun.kadam2: ohh, sorry if this seems dumb, I am new to Presto-Pinot. I did try the explain<query> but the output does not speify any query plan details. How to check it?

#pinot-dev

@s.azimigehraz: @s.azimigehraz has joined the channel
@syedakram93: Is it possible for provide snapshot tar with current code? above 7.1
@dlavoie: `mvn clean install -DskipTests -Pbin-dist`
@dlavoie:
@xiangfu0: for snapshot , you can use the docker images we published or you need to build and publish them by yourself
@xiangfu0: also, i know that is able to build and publish dependencies in case you wanna try.
@syedakram93: or 8.0 tar
@evan.galpin: @evan.galpin has joined the channel

#getting-started

@s.azimigehraz: @s.azimigehraz has joined the channel
--------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]