#general


@fcolopera89: @fcolopera89 has joined the channel
@teo: @teo has joined the channel
@sleepythread: I am trying to start pinot with hdfs as deep storage but getting error while starting the server ```bin/start-server.sh -zkAddress pinot1.plan:2181,pinot2.plan:2181,pinot3.plan:2181 -configFileName conf/server.conf``` and server config are ```pinot.server.instance.enable.split.commit=true pinot.server.storage.factory.class.hdfs=org.apache.pinot.plugin.filesystem.HadoopPinotFS hadoop.conf.path=/local/hadoop/etc/hadoop/ pinot.server.storage.factory.hdfs.hadoop.conf.path=/local/hadoop/etc/hadoop/ pinot.server.segment.fetcher.protocols=file,http,hdfs pinot.server.segment.fetcher.hdfs.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher pinot.server.instance.dataDir=/home/akashmishra/hpgraph/apache-pinot-incubating-0.6.0-bin/data/PinotServer/index pinot.server.instance.segmentTarDir=/home/akashmishra/hpgraph/apache-pinot-incubating-0.6.0-bin/data/PinotServer/segmentTar```
  @dlavoie: Hi, can you move this conversation to <#C011C9JHN7R|troubleshooting>? Also, error logs would be helpful to understand what is wrong.
  @sleepythread: Sorry did not knew there was such channel. Will move it there thanks
@sleepythread: In UI documentations its written ```-Dplugins.dir=/opt/pinot/plugins -Dplugins.include=pinot-hdfs```
  @dlavoie: the documentation does mention that but, the `-Dplugins.include=pinot-hdfs` flag will deactivate all other plugins. Just configuring `-Dplugins.dir=/opt/pinot/plugins` will autoscan all available plugins including hdfs
  @sleepythread: ```[akashmis...@pinot1.mlan apache-pinot-incubating-0.6.0-bin]$ bin/start-server.sh -zkAddress pinot1.mlan:2181,pinot2.mlan:2181,pinot3.mlan:2181 -configFileName /home/akashmishra/hpgraph/apache-pinot-incubating-0.6.0-bin/conf/server.conf -Dplugins.dir=/home/akashmishra/hpgraph/apache-pinot-incubating-0.6.0-bin/plugins 2021/04/13 14:53:14.235 ERROR [PinotAdministrator] [main] Error: "-Dplugins.dir=/home/akashmishra/hpgraph/apache-pinot-incubating-0.6.0-bin/plugins" is not a valid option``` When i am adding it on config then i getting following error.
  @dlavoie: The JVM flags `-D` must passed passed through the `JAVA_OPTS` env variable
@g.kishore: We're happy to see now listed on the as one of the top platforms to assess. This is a big accomplishment for the entire Pinot community. Thanks to everyone that helped us get there!
@gabuglc: @gabuglc has joined the channel
@sosyalmedya.oguzhan: Helloo, We've tried to use AliCloud OSS (like S3 in Amazon) as Pinot deep storage. There is no pinot-oss deep storage plugin right now but we are able to use OSS as pinot deep storage using the pinot hdfs file system plugin. We created a documentation for that;
  @g.kishore: Thanks a lot for this contribution
@toasifmohammed: @toasifmohammed has joined the channel
@aaron: If I understand right, when I batch ingest a set of parquet files, the job will create a segment for each parquet file and then will upload it all to Pinot? Is that right? If so, are there any guidelines about picking segment sizes for optimal query performance?
  @mayanks: Yes, all data is internally stored in Pinot’s columnar indexed format.
  @mayanks: You want to avoid large number of tiny segments. If your data allows, few hundred MB per segment is a good size
@aaron: Also when I run the batch ingestion job I see some debug output about dictionary encoding the columns, including numeric metric columns. Does that mean it's dictionary encoding the data in Pinot's internal format? Say I'd like to compute averages and quantiles of these metrics grouped by different dimensions -- is dictionary encoding best for that or should I disable it? Or is what I'm seeing not relevant to query performance
  @mayanks: By default most columns are dictionary encoded, and work well. It helps to disable in certain cases like strings with really high cardinality. For your case you can assume it works fine, unless you are seeing issues.
@karinwolok1: :mega: Just a reminder! :tada: *If anyone is interested in presenting at the Apache Pinot event series, please submit today!* Presentations will be scheduled in May, June, July. Topics will be a variety on use cases, how-to's, your experiences working with Pinot, "getting started with X in Pinot", features and connectors. There's really no limit with types of topics and it can be a work in progress! Even if your use case isn't fully built out, many people might be interested to see what are you working on, what made you think of Pinot, what were you doing before, what led you here, what works for you and what doesn't, how you compared your options, comparisons of Pinot and other solutions, etc. *Feel free to reach out to me if you have questions!*
@tingchen: @jackie.jxt @npawar do you know *JSONPATHARRAY*(jsonField, 'jsonPath') can be used in a WHERE clause to find out if the array contains a certain value?
  @jackie.jxt: I think you need to use `JsonExtractScalar` with an array type to extract a MV field
  @tingchen: is there an example or syntax manual for this?
  @jackie.jxt: E.g. `where jsonExtractScalar(json, '$.a', 'STRING_ARRAY') = 'abc'`
  @jackie.jxt:
  @tingchen: I suppose the above feature can not utilize Json index, right?
  @tingchen: probably good for medium or small use cases
  @tingchen: `where jsonExtractScalar(json, '$.a', 'STRING_ARRAY') = 'abc'` the _expression_ means the list contains a value `abc`?
  @jackie.jxt: Yes
  @jackie.jxt: Json index can be used to solve this problem
  @jackie.jxt:
  @tingchen: I am still a bit confused about which one to use in the where clause. `jsonExtractScalar` or `JSON_MATCH`
  @jackie.jxt: If you have json index generated for the column, `JSON_MATCH` should be much faster
  @tingchen: got it. thanks.
@aaron: If I've already created a table and batch ingested data, can I add a star-tree index after the fact or do I need to start from scratch?
  @g.kishore: You can add Star tree index later.. all indexes can be added dynamically
  @aaron: Thanks -- do I do that by updating the table config?
  @g.kishore: right
  @g.kishore: update table config and invoke reloadsegments api
  @jackie.jxt: You can refer to this doc:
  @jackie.jxt: Remember to set `enableDynamicStarTreeCreation` if you want to add a star-tree on the fly
  @aaron: Thanks!
  @aaron: In this case what does it mean to compute the star-tree on the fly?
  @aaron: Do I need to set `enableDynamicStarTreeCreation` in order to be able to update the table config and reload segments like Kishore said, or is this something different?
  @jackie.jxt: Yes, you need to set `enableDynamicStarTreeCreation` then server will generate the star-tree index configured in the table config
  @aaron: Thanks!
  @aaron: It looks like the reloadsegments API finished instantly -- should I expect it to take a while to reindex?
  @g.kishore: Yes.. there is a status api you can invoke to check the status
  @aaron: The table state API?
  @aaron: I see ```{ "state": "enabled" }```
  @aaron: Ok I think I got this working. At first I used the "reload" API which didn't seem to do anything. Then I tried the "reset" API and it did
@aaron: If I have SUM and COUNT in the star tree index's `functionColumnPairs`, will `AVG` implicitly be able to use the star tree index or do I need to put `AVG` in that list too?
  @jackie.jxt: You'll need to explicitly put `AVG` in that list
  @jackie.jxt: But very good point that we should be able to get the `AVG` with `SUM/COUNT`. Can you please submit a github issue for this?
  @aaron:
@karinwolok1: Welcome new :wine_glass: Pinot members! :wave: Tell us about yourselves! How'd you find the community? What are you working on? @toasifmohammed @gabuglc @sg @hochuen.wong @fcolopera89 @teo @ankitsultana @karthikbvnet @social.kangaroo.hop @alicelyu @ilchernenko @ravishankar.nair @raahulgupta07 @gaurav.madaan @shyam.m @vaibhav.sinha @rymurr @omkar.halikar14 @ricardo.bernardino @kulbir.nijjer @xysmiracle @sunilkumar.tc89 @wuwenw
@yupeng: hey, is there a plan to add a table creation module in the cluster management UI?
  @g.kishore: thats already supported right
  @yupeng: hmm, i did not find it on the UI...
  @g.kishore:
  @yupeng: oh.. under table.. thanks..
  @yupeng: is there a way to import schema from avro or json?
  @g.kishore: there is a admin tool but its not hooked up in the ui
  @g.kishore: ```AvroSchemaToPinotSchema```
  @yupeng: got it. yeah, it'll be a cool feature to integrate it into the uI
  @g.kishore: problem is we dont want to bring dependency on avro, thrift, protobuf, parquet etc.. took a long time to clean it up
  @g.kishore: its hard to do it in a generic way without depending on them explicitly
  @yupeng: i see. then how does the command work?
  @yupeng: to get the dependency?
  @g.kishore: Command is in a different module Pinot tools
  @yupeng: i see. then it does need a separate web server for the UI to get around this issue
  @g.kishore: Or model it as spi

#random


@marta: Nice to see Pinot coming through in the ThoughtWorks radar!
@fcolopera89: @fcolopera89 has joined the channel
@teo: @teo has joined the channel
@gabuglc: @gabuglc has joined the channel
@toasifmohammed: @toasifmohammed has joined the channel

#feat-presto-connector


@teo: @teo has joined the channel

#feat-upsert


@teo: @teo has joined the channel

#troubleshooting


@fcolopera89: @fcolopera89 has joined the channel
@teo: @teo has joined the channel
@sleepythread: @sleepythread has joined the channel
@sleepythread: I am trying to add HDFS as deep storage and running following command ```bin/start-server.sh -zkAddress pinot1.plan:2181,pinot2.plan:2181,pinot3.plan:2181 -configFileName conf/server.conf``` Server configs are ```pinot.server.instance.enable.split.commit=true pinot.server.storage.factory.class.hdfs=org.apache.pinot.plugin.filesystem.HadoopPinotFS hadoop.conf.path=/local/hadoop/etc/hadoop/ pinot.server.storage.factory.hdfs.hadoop.conf.path=/local/hadoop/etc/hadoop/ pinot.server.segment.fetcher.protocols=file,http,hdfs pinot.server.segment.fetcher.hdfs.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher pinot.server.instance.dataDir=/home/akashmishra/hpgraph/apache-pinot-incubating-0.6.0-bin/data/PinotServer/index pinot.server.instance.segmentTarDir=/home/akashmishra/hpgraph/apache-pinot-incubating-0.6.0-bin/data/PinotServer/segmentTar``` I am getting following error ```2021/04/13 15:00:49.231 INFO [MBeanRegistrar] [Start a Pinot [SERVER]] MBean HelixCallback:Change=MESSAGES_CONTROLLER,Key=PinotCluster.Server_10.10.211.27_8098,Type=PARTICIPANT has been registered. 2021/04/13 15:00:49.232 INFO [MBeanRegistrar] [Start a Pinot [SERVER]] MBean HelixCallback:Change=HEALTH,Key=PinotCluster.Server_10.10.211.27_8098,Type=PARTICIPANT has been registered. 2021/04/13 15:00:49.598 INFO [Reflections] [Start a Pinot [SERVER]] Reflections took 313 ms to scan 1 urls, producing 5 keys and 151 values 2021/04/13 15:00:49.645 ERROR [PinotFSFactory] [Start a Pinot [SERVER]] Could not instantiate file system for class org.apache.pinot.plugin.filesystem.HadoopPinotFS with scheme hdfs java.lang.ClassNotFoundException: org.apache.pinot.plugin.filesystem.HadoopPinotFS at java.net.URLClassLoader.findClass(URLClassLoader.java:382) ~[?:1.8.0_275] at java.lang.ClassLoader.loadClass(ClassLoader.java:418) ~[?:1.8.0_275] at org.apache.pinot.spi.plugin.PluginClassLoader.loadClass(PluginClassLoader.java:80) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.spi.plugin.PluginManager.createInstance(PluginManager.java:268) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.spi.plugin.PluginManager.createInstance(PluginManager.java:239) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.spi.plugin.PluginManager.createInstance(PluginManager.java:220) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.spi.filesystem.PinotFSFactory.register(PinotFSFactory.java:53) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.spi.filesystem.PinotFSFactory.init(PinotFSFactory.java:74) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.server.starter.helix.SegmentFetcherAndLoader.<init>(SegmentFetcherAndLoader.java:71) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.server.starter.helix.HelixServerStarter.start(HelixServerStarter.java:316) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.tools.service.PinotServiceManager.startServer(PinotServiceManager.java:150) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.tools.service.PinotServiceManager.startRole(PinotServiceManager.java:95) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.tools.admin.command.StartServiceManagerCommand$1.lambda$run$0(StartServiceManagerCommand.java:260) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.tools.admin.command.StartServiceManagerCommand.startPinotService(StartServiceManagerCommand.java:286) [pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.tools.admin.command.StartServiceManagerCommand.access$000(StartServiceManagerCommand.java:57) [pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.tools.admin.command.StartServiceManagerCommand$1.run(StartServiceManagerCommand.java:260) [pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] 2021/04/13 15:00:49.650 ERROR [StartServiceManagerCommand] [Start a Pinot [SERVER]] Failed to start a Pinot [SERVER] at 0.513 since launch java.lang.RuntimeException: java.lang.ClassNotFoundException: org.apache.pinot.plugin.filesystem.HadoopPinotFS at org.apache.pinot.spi.filesystem.PinotFSFactory.register(PinotFSFactory.java:58) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.spi.filesystem.PinotFSFactory.init(PinotFSFactory.java:74) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.server.starter.helix.SegmentFetcherAndLoader.<init>(SegmentFetcherAndLoader.java:71) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.server.starter.helix.HelixServerStarter.start(HelixServerStarter.java:316) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.tools.service.PinotServiceManager.startServer(PinotServiceManager.java:150) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.tools.service.PinotServiceManager.startRole(PinotServiceManager.java:95) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.tools.admin.command.StartServiceManagerCommand$1.lambda$run$0(StartServiceManagerCommand.java:260) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.tools.admin.command.StartServiceManagerCommand.startPinotService(StartServiceManagerCommand.java:286) [pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.tools.admin.command.StartServiceManagerCommand.access$000(StartServiceManagerCommand.java:57) [pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.tools.admin.command.StartServiceManagerCommand$1.run(StartServiceManagerCommand.java:260) [pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] Caused by: java.lang.ClassNotFoundException: org.apache.pinot.plugin.filesystem.HadoopPinotFS at java.net.URLClassLoader.findClass(URLClassLoader.java:382) ~[?:1.8.0_275] at java.lang.ClassLoader.loadClass(ClassLoader.java:418) ~[?:1.8.0_275] at org.apache.pinot.spi.plugin.PluginClassLoader.loadClass(PluginClassLoader.java:80) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.spi.plugin.PluginManager.createInstance(PluginManager.java:268) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.spi.plugin.PluginManager.createInstance(PluginManager.java:239) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.spi.plugin.PluginManager.createInstance(PluginManager.java:220) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.spi.filesystem.PinotFSFactory.register(PinotFSFactory.java:53) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] ... 9 more```
  @dlavoie: You can pass the plugin dir config through the `JAVA_OPTS` env variable
  @sleepythread: Thank, when i tried to add -Dplugin.dir in start-server.sh script then i got the following error. ```2021/04/13 15:05:48.160 ERROR [StartServiceManagerCommand] [Start a Pinot [SERVER]] Failed to start a Pinot [SERVER] at 1.305 since launch java.lang.RuntimeException: java.lang.RuntimeException: Could not initialize HadoopPinotFS at org.apache.pinot.spi.filesystem.PinotFSFactory.register(PinotFSFactory.java:58) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.spi.filesystem.PinotFSFactory.init(PinotFSFactory.java:74) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.server.starter.helix.SegmentFetcherAndLoader.<init>(SegmentFetcherAndLoader.java:71) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.server.starter.helix.HelixServerStarter.start(HelixServerStarter.java:316) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.tools.service.PinotServiceManager.startServer(PinotServiceManager.java:150) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.tools.service.PinotServiceManager.startRole(PinotServiceManager.java:95) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.tools.admin.command.StartServiceManagerCommand$1.lambda$run$0(StartServiceManagerCommand.java:260) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.tools.admin.command.StartServiceManagerCommand.startPinotService(StartServiceManagerCommand.java:286) [pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.tools.admin.command.StartServiceManagerCommand.access$000(StartServiceManagerCommand.java:57) [pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.tools.admin.command.StartServiceManagerCommand$1.run(StartServiceManagerCommand.java:260) [pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] Caused by: java.lang.RuntimeException: Could not initialize HadoopPinotFS at org.apache.pinot.plugin.filesystem.HadoopPinotFS.init(HadoopPinotFS.java:71) ~[pinot-hdfs-0.6.0-shaded.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.spi.filesystem.PinotFSFactory.register(PinotFSFactory.java:54) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] ... 9 more Caused by: java.io.IOException: No FileSystem for scheme: hdfs at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2644) ~[pinot-parquet-0.6.0-shaded.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2651) ~[pinot-parquet-0.6.0-shaded.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92) ~[pinot-parquet-0.6.0-shaded.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687) ~[pinot-parquet-0.6.0-shaded.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669) ~[pinot-parquet-0.6.0-shaded.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371) ~[pinot-parquet-0.6.0-shaded.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:170) ~[pinot-parquet-0.6.0-shaded.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.plugin.filesystem.HadoopPinotFS.init(HadoopPinotFS.java:67) ~[pinot-hdfs-0.6.0-shaded.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.spi.filesystem.PinotFSFactory.register(PinotFSFactory.java:54) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] ... 9 more```
  @sleepythread: Looks like pinot is not able to load hdfs configs. ```hadoop.conf.path=/local/hadoop/etc/hadoop/ pinot.server.storage.factory.hdfs.hadoop.conf.path=/local/hadoop/etc/hadoop/```
  @sleepythread: is there any issue with these configurations ?
  @sosyalmedya.oguzhan: `-Dplugins.dir` not -Dplugin.dir
  @sosyalmedya.oguzhan: also you have not to pass -Dplugins.include
  @sleepythread: ```export JAVA_OPTS="-Dplugins.dir=/home/akashmishra/hpgraph/apache-pinot-incubating-0.6.0-bin/plugins/"```
  @sleepythread: I am not using this, but still the same error.
  @sleepythread: ```Caused by: java.io.IOException: No FileSystem for scheme: hdfs```
  @sosyalmedya.oguzhan: can you try without -Dplugins.include ?
  @sleepythread: I am not using -Dplugins.include anywhere. AFAIU, this is not set anywhere.
  @sosyalmedya.oguzhan: remove this config ```hadoop.conf.path=/local/hadoop/etc/hadoop/``` you already set pinot.server.storage.factory.hdfs.hadoop.conf.path this is not about your problem
  @sleepythread: Which one ?
  @sleepythread: The problem still persists.
  @dlavoie: Do you want to try the `pinot-admin.sh StartServer` instead?
  @sleepythread: Same results.
  @sleepythread: Let me debug log and come up with more better information for you guys to help me :slightly_smiling_face:
  @sleepythread: ```2021/04/13 17:01:28.443 ERROR [PinotFSFactory] [Start a Pinot [SERVER]] Could not instantiate file system for class org.apache.pinot.plugin.filesystem.HadoopPinotFS with scheme hdfs java.lang.RuntimeException: Could not initialize HadoopPinotFS at org.apache.pinot.plugin.filesystem.HadoopPinotFS.init(HadoopPinotFS.java:71) ~[pinot-hdfs-0.6.0-shaded.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.spi.filesystem.PinotFSFactory.register(PinotFSFactory.java:54) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.spi.filesystem.PinotFSFactory.init(PinotFSFactory.java:74) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.server.starter.helix.SegmentFetcherAndLoader.<init>(SegmentFetcherAndLoader.java:71) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.server.starter.helix.HelixServerStarter.start(HelixServerStarter.java:316) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.tools.service.PinotServiceManager.startServer(PinotServiceManager.java:150) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.tools.service.PinotServiceManager.startRole(PinotServiceManager.java:95) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.tools.admin.command.StartServiceManagerCommand$1.lambda$run$0(StartServiceManagerCommand.java:260) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.tools.admin.command.StartServiceManagerCommand.startPinotService(StartServiceManagerCommand.java:286) [pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.tools.admin.command.StartServiceManagerCommand.access$000(StartServiceManagerCommand.java:57) [pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.tools.admin.command.StartServiceManagerCommand$1.run(StartServiceManagerCommand.java:260) [pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] Caused by: java.io.IOException: No FileSystem for scheme: hdfs at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2644) ~[pinot-parquet-0.6.0-shaded.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2651) ~[pinot-parquet-0.6.0-shaded.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92) ~[pinot-parquet-0.6.0-shaded.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687) ~[pinot-parquet-0.6.0-shaded.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669) ~[pinot-parquet-0.6.0-shaded.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371) ~[pinot-parquet-0.6.0-shaded.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:170) ~[pinot-parquet-0.6.0-shaded.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.plugin.filesystem.HadoopPinotFS.init(HadoopPinotFS.java:67) ~[pinot-hdfs-0.6.0-shaded.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] ... 10 more 2021/04/13 17:01:28.447 ERROR [StartServiceManagerCommand] [Start a Pinot [SERVER]] Failed to start a Pinot [SERVER] at 1.134 since launch java.lang.RuntimeException: java.lang.RuntimeException: Could not initialize HadoopPinotFS at org.apache.pinot.spi.filesystem.PinotFSFactory.register(PinotFSFactory.java:58) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.spi.filesystem.PinotFSFactory.init(PinotFSFactory.java:74) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.server.starter.helix.SegmentFetcherAndLoader.<init>(SegmentFetcherAndLoader.java:71) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.server.starter.helix.HelixServerStarter.start(HelixServerStarter.java:316) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.tools.service.PinotServiceManager.startServer(PinotServiceManager.java:150) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.tools.service.PinotServiceManager.startRole(PinotServiceManager.java:95) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.tools.admin.command.StartServiceManagerCommand$1.lambda$run$0(StartServiceManagerCommand.java:260) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.tools.admin.command.StartServiceManagerCommand.startPinotService(StartServiceManagerCommand.java:286) [pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.tools.admin.command.StartServiceManagerCommand.access$000(StartServiceManagerCommand.java:57) [pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.tools.admin.command.StartServiceManagerCommand$1.run(StartServiceManagerCommand.java:260) [pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] Caused by: java.lang.RuntimeException: Could not initialize HadoopPinotFS at org.apache.pinot.plugin.filesystem.HadoopPinotFS.init(HadoopPinotFS.java:71) ~[pinot-hdfs-0.6.0-shaded.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.spi.filesystem.PinotFSFactory.register(PinotFSFactory.java:54) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] ... 9 more Caused by: java.io.IOException: No FileSystem for scheme: hdfs at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2644) ~[pinot-parquet-0.6.0-shaded.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2651) ~[pinot-parquet-0.6.0-shaded.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92) ~[pinot-parquet-0.6.0-shaded.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687) ~[pinot-parquet-0.6.0-shaded.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669) ~[pinot-parquet-0.6.0-shaded.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371) ~[pinot-parquet-0.6.0-shaded.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:170) ~[pinot-parquet-0.6.0-shaded.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.plugin.filesystem.HadoopPinotFS.init(HadoopPinotFS.java:67) ~[pinot-hdfs-0.6.0-shaded.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.spi.filesystem.PinotFSFactory.register(PinotFSFactory.java:54) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] ... 9 more```
  @sleepythread: Still problem
  @sleepythread: I have also added Hadoop and hadoop_conf env ```export HADOOP_HOME=/local/hadoop/ export HADOOP_CONF_DIR=/local/hadoop/etc/hadoop/```
@gabuglc: @gabuglc has joined the channel
@toasifmohammed: @toasifmohammed has joined the channel
@aaron: I just uploaded a lot of data to a new table, and when I try to `select * from foo limit 10` I get: "message": "MergeResponseError: responses for table: foo from servers: [10.20.67.239_O] got dropped due to data schema inconsistency.",
  @mayanks: The error you posted seems to indicate that not all segments have the same schema in your table. When you do `select *`, you are selecting all columns. However, your aggregation query may not be touching the column that is not present (or different) across segments, so you are not running into the issue.
  @mayanks: Did you change your schema?
  @aaron: I didn't change the schema -- I deleted the table, I created a new table with a schema, and uploaded new segments
  @aaron: How can I debug this to see which segments differ and how?
@aaron: I do seem to be able to perform aggregations over it though, like to take the average of a column
@kevinv: Having issues adding a new streaming tables in pinot, I have added 3 new tables prior with status Good but adding this new table called interactions_REALTIME will show a status Bad and it doesn't seem to consume any new data. Any idea why this is the case? Also looking at the logs, there seems to be warning logs for this table being below replica threshold and failing to find servers hosting the segment.
  @1705ayush: Hi @kevinv, Can you also post the log of AddTable execution and the table config files.
  @mayanks: Are there any warn/error during table creation on the controller/server logs?
  @kevinv: ```{ "tableName": "interactions", "tableType": "REALTIME", "segmentsConfig": { "timeColumnName": "interactionunixtime", "timeType": "MILLISECONDS", "segmentPushType": "APPEND", "segmentAssignmentStrategy": "BalanceNumSegmentAssignmentStrategy", "schemaName": "interactions", "replication": "0" }, "tenants": {}, "tableIndexConfig": { "loadMode": "MMAP", "streamConfigs": { "streamType": "solace", "stream.solace.consumer.type": "highLevel", "stream.solace.topic.name": "cw/jms/interactions/v1", "stream.solace.decoder.class.name": "org.apache.pinot.plugin.stream.solace.server.SolaceJSONMessageDecoder", "stream.solace.consumer.factory.class.name": "org.apache.pinot.plugin.stream.solace.server.SolaceStreamConsumerFactory", "stream.solace.jms.Host": "localhost", "stream.solace.jms.Username": "****", "stream.solace.jms.Password": "****", "stream.solace.jms.VPN": "solace-dev", "stream.solace.destinationType": "queue", "stream.solace.jms.ClientID": "dev-profile" } }, "metadata": { "customConfigs": {} } }```
  @kevinv: For adding the table, I'll see {"status":"Table interactions_REALTIME succesfully added"}
  @mayanks: Any logs in controller/server during table creation?
  @kevinv: I've only seen warn logs, here is whats being shown
  @kevinv: 2021/04/13 11:40:26.282 WARN [SegmentStatusChecker] [pool-7-thread-6] Segment interactions_REALTIME_1618330076604_0__0__1618330076699 of table interactions_REALTIME has no online replicas 2021/04/13 11:40:26.282 WARN [SegmentStatusChecker] [pool-7-thread-6] Table interactions_REALTIME has 1 segments with no online replicas 2021/04/13 11:40:26.282 WARN [SegmentStatusChecker] [pool-7-thread-6] Table interactions_REALTIME has 0 replicas, below replication threshold :1   2021/04/13 11:07:57.743 WARN [BaseInstanceSelector] [ClusterChangeHandlingThread] Failed to find servers hosting segment: interactions_REALTIME_1618330076604_0__0__1618330076699 for table: interactions_REALTIME (all ONLINE/CONSUMING instances: [] and OFFLINE instances: [] are disabled, counting segment as unavailable)
  @mayanks: This is probably from query execution?
  @mayanks: I am asking more around the time of table creation.
  @mayanks: Also, can you paste the Ideal State and External view?
  @kevinv: What do you mean by Ideal State and External View?
  @kevinv: Logs look fine during table creation, no errors or warnings besides what i mentioned earlier
  @mayanks:
  @mayanks: Also curious if there's any difference between the tables that are working and the one that isn't?
  @kevinv: EXTERNAL VIEW {  "id": "interactions_REALTIME",  "simpleFields": {    "BATCH_MESSAGE_MODE": "false",    "BUCKET_SIZE": "0",    "IDEAL_STATE_MODE": "CUSTOMIZED",    "INSTANCE_GROUP_TAG": "interactions_REALTIME",    "MAX_PARTITIONS_PER_INSTANCE": "1",    "NUM_PARTITIONS": "1",    "REBALANCE_MODE": "CUSTOMIZED",    "REPLICAS": "1",    "STATE_MODEL_DEF_REF": "SegmentOnlineOfflineStateModel",    "STATE_MODEL_FACTORY_NAME": "DEFAULT"  },  "mapFields": {    "interactions_REALTIME_1618332793297_0__0__1618332793369": {      "Server_135.113.208.194_7000": "ERROR"    }  },  "listFields": {} }   IDEAL STATE {  "id": "interactions_REALTIME",  "simpleFields": {    "BATCH_MESSAGE_MODE": "false",    "IDEAL_STATE_MODE": "CUSTOMIZED",    "INSTANCE_GROUP_TAG": "interactions_REALTIME",    "MAX_PARTITIONS_PER_INSTANCE": "1",    "NUM_PARTITIONS": "1",    "REBALANCE_MODE": "CUSTOMIZED",    "REPLICAS": "1",    "STATE_MODEL_DEF_REF": "SegmentOnlineOfflineStateModel",    "STATE_MODEL_FACTORY_NAME": "DEFAULT"  },  "mapFields": {    "interactions_REALTIME_1618332793297_0__0__1618332793369": {      "Server_135.113.208.194_7000": "ONLINE"    }  },  "listFields": {} }
  @kevinv: seems like there is an error in mapFields
  @mayanks: `"Server_135.113.208.194_7000": "ERROR"`
  @mayanks: The server hosting this segment is in ERROR state. There should be a log in the server indicating why it is in error state.
  @mayanks: Are you able to access the server log? If so, grep for `interactions_REALTIME_1618332793297_0__0__1618332793369` in that log
  @kevinv: yes its the pinotServer.log correct?
  @mayanks: Are you running locally? If so, yes
  @mayanks: Can you grep `interactions_REALTIME_1618332793297_0__0__1618332793369` there?
  @kevinv: 2021/04/13 11:53:13.507 INFO [HelixInstanceDataManager] [HelixTaskExecutor-message_handle_thread] Adding segment: interactions_REALTIME_1618332793297_0__0__1618332793369 to table: interactions_REALTIME 2021/04/13 11:53:13.537 INFO [FixedByteSVMutableForwardIndex] [HelixTaskExecutor-message_handle_thread] Allocating 20000000 bytes for: interactions_REALTIME_1618332793297_0__0__1618332793369:.unsorted.fwd 2021/04/13 11:53:13.770 INFO [FixedByteSVMutableForwardIndex] [HelixTaskExecutor-message_handle_thread] Allocating 20000000 bytes for: interactions_REALTIME_1618332793297_0__0__1618332793369:.unsorted.fwd
  @mayanks: Is that it? Also, no errors elsewhere?
  @kevinv: yes
  @mayanks: What's your log4j setting?
  @kevinv: defaults within the pinot binary
  @mayanks: The fact that you see segment in ERROR state implies that the server is unable to load that segment for some reason. What are your JVM settings?
  @mayanks: I am guessing server is running out of resources required to host this segment. But that should most definitely generate an error in the log.
  @kevinv: what jvm settings does pinot start with if i initialized the instance using ./quick-start-batch.sh?
  @mayanks: That would probably be minimal.
  @mayanks: That is just a demo purpose script (and also for batch ingestion). Are you launching Pinot using that and creating new realtime tables on that demo cluster?
  @kevinv: yes
  @mayanks: What's the end goal for your exercise? If it is to setup a cluster to ingest sizable production size data, we might not be able to use quick-start.
  @mayanks: But before that, I'd like to understand why you are not getting any errors logged.
  @mayanks: The quick-start script is setting log4j as follows: ```if [ -z "$JAVA_OPTS" ] ; then ALL_JAVA_OPTS="-Xms4G -Dlog4j2.configurationFile=conf/quickstart-log4j2.xml" else ALL_JAVA_OPTS=$JAVA_OPTS fi```
  @mayanks: Can you play with it to increase the logging level to info or debug?
  @mayanks: And then delete and recreate the table?
  @kevinv: ok I will try that, also this cluster is only for testing/poc, its not that much data.
  @mayanks: Yeah, enabling the logging correctly will tell us what happened in the server.
  @kevinv: looking at the log4j for quickstart, seems like most of them are already set to info
  @mayanks: Are you seeing info messages in your logs though?
  @kevinv: yes
  @mayanks: Ah ok
  @mayanks: For some reason I thought you werent
  @kevinv: no i was, i was just only grepping for warn logs
  @mayanks: Mind deleting and re-creating the table again and monitoring the logs
  @mayanks: In that case
  @mayanks: can you grep all occurances the segment name (not just warn)?
  @kevinv: thats what I did earlier but there were no error logs
  @mayanks: Ok, then lets delete and recreate and see if there are errors this time
  @kevinv: no errors
  @mayanks: And still the same issue?
  @kevinv: yes same issue with the new table
  @mayanks: Can we do a quick zoom meeting?
  @kevinv: can you join my webex instead,
  @mayanks: Joining
  @mayanks: Just to update the thread: ```1. We deleted and recreated the table, this time with entire cluster nuked, and did not see the issue. 2. Our current suspicion is that given that quick-start uses only 32M as Xmx, and several tables were created, the server ran out of resources. 3. This is using a custom connector (Solace), so that might have caused the logging issue.```
  @mayanks: Thanks @kevinv for trying Pinot, let us know if you need more help

#pinot-k8s-operator


@teo: @teo has joined the channel

#announcements


@teo: @teo has joined the channel

#getting-started


@teo: @teo has joined the channel

#feat-partial-upsert


@yupeng: sent out an invite

#fix-numerical-predicate


@amrish.k.lal: FYI: this is the change that I am looking at for keeping track of precomputed predicates on the Broker side (`QueryOptimizer.optimize` function for `PinotQuery`). Basically, if we can precompute the value of an `_expression_` or `Function`, then `precomputed` will contain that value. This will help in adding type support as discussed and also further predicate pruning and optimization in future. Let me know if there are suggestions or alternate ideas for this? ```alal@alal-mn1 amrish-pinot-1 % git diff pinot-common/src/thrift/query.thrift diff --git a/pinot-common/src/thrift/query.thrift b/pinot-common/src/thrift/query.thrift index 98329f5ac..5445ce645 100644 --- a/pinot-common/src/thrift/query.thrift +++ b/pinot-common/src/thrift/query.thrift @@ -47,6 +47,7 @@ struct _expression_ {   2: optional Function functionCall;   3: optional Literal literal;   4: optional Identifier identifier; + 5: optional Literal precomputed;  }    struct Identifier { @@ -67,4 +68,5 @@ union Literal {  struct Function {   1: required string operator;   2: optional list<_expression_> operands; + 3: optional Literal precomputed;  }```
@jackie.jxt: No, we should not have this `precomputed` field
@jackie.jxt: If the result can be pre-computed, we should either remove the predicate or short-circuit the query to directly return
@amrish.k.lal: ok, I was planning to do a two-pass _expression_ tree traversal to do mark and remove, but let me see if it can be done in a single pass.
@amrish.k.lal: For short-circuit return, we would need to know whether the predicate is evaluating to false, so we would need to keep that information somewhere right?
@amrish.k.lal: So for example if the query is `SELECT * from mytable WHERE intColumn = 5.5`, the predicate here will always evaluate to false and hence the query will become `SELECT * from mytable` .
@amrish.k.lal: If we don't want to store precomputed values, then one option may be to rewrite the query to `SELECT * from mytable WHERE FALSE` and then short-circuit the query if predicate is `FALSE`? There could be more complicated cases such as `SELECT * FROM mytable WHERE FALSE AND (intColumn > 5 OR FALSE)` . I think this approach will avoid adding precomputed to `_expression_` and `Function`, etc while ensuring that we have the information that we need to short-circuit the query. Sounds ok?
@amrish.k.lal: The query `SELECT * from mytable WHERE FALSE` isn't a valid query, but its just a temporary form that we are using for optimization so should be ok.
@jackie.jxt: You can remove the predicate only when it evaluates to `true` or it evaluates to `false` as a child under `AND`
  @amrish.k.lal: I don't think remove a predicate that evaluates to true will work in all cases. For example if we remove `intColumn != 4.4` from `SELECT * FROM mytable WHERE intColumn != 4.4 OR intColumn = 5.5` then the query will become `SELECT * FROM mytable WHERE intColumn = 5.5` which is semantically incorrect. Also, note that `SELECT * FROM mytable WHERE intColumn = 5.5` will throw an exception on the server side since we are comparing intColumn with 5.5. I think we need a more generic mechanism.
  @jackie.jxt: The logic is quite simple, and we have similar logic on server side segment pruner
  @jackie.jxt: It's a recursive algorithm. Under `AND`, `true` can be removed, `false` result in the whole `AND` as `false`; Under `OR`, `false` can be removed, `true` result in the whole `OR` as `true`; At root level, `true` can be removed, `false` result in empty result
@jackie.jxt: Short-circuit means we don't even need to send the query to the servers
@jackie.jxt: I'm thinking adding this info (always `true` or `false`) into the return value of the `FilterOptimizer` to pass it back to the caller
@jackie.jxt: We can take small steps. To unblock the issue of parsing error, we can first only convert the integral value to a format without decimal part (`1.0` -> `1`)
  @amrish.k.lal: Thats too narrow and not sure if it solves the underlying problem. I agree we should take small steps, but while keeping the big picture in mind so that we are moving towards a complete solution that works :slightly_smiling_face:
@amrish.k.lal: Let's go with rewriting predicates to TRUE and FALSE, so that `SELECT * from mytable WHERE intColumn = 5.5`, is rewritten to `SELECT * from mytable WHERE FALSE` and that will allow us to do everything including circuiting without adding precomputed.
@amrish.k.lal: I think this is a good solution and give us a clean way forward. Later on if someone comes with a better solution, then the existing code will be generic and clean enough to allow for easy refactoring.
@jackie.jxt: My question here is why do we need to send such query to the server?
@amrish.k.lal: We don't
@amrish.k.lal: :slightly_smiling_face:
@amrish.k.lal: if the predicate is known to be false, then we short-circuit as you mentioned earlier.
@jackie.jxt: You mean add a special filter _expression_ as `FALSE`?
@amrish.k.lal: Yes sir, for example in case of query `SELECT * FROM mytable WHERE FALSE` and if it the query gets rewritten to `SELECT * FROM mytable WHERE TRUE` then we drop the WHERE clause.
@jackie.jxt: Yeah, that works
@jackie.jxt: Currently we don't support this syntax, but I think it is valid sql
@amrish.k.lal: Yes, I think in some database (mysql if I am not mistaken) its valid.
@amrish.k.lal: cool :slightly_smiling_face: I think we are on the same page @steotia ^^
@jackie.jxt: To summarize, we want to do the followings: 1. Introduce `TRUE` and `FALSE` as valid predicate 2. Implement the filter optimizer to rewrite values 3. Short circuit the query if the predicate is `FALSE`
@jackie.jxt: I like the idea of adding predicate `TRUE` and `FALSE` to pass around the information
@amrish.k.lal: Yes, but I would qualify point 1 slightly to read `Introduce TRUE and FALSE as valid predicate in Broker optimizer.` As a separate ticket item, we could add SQL querying support for queries that contain TRUE/FALSE in WHERE clause.
@jackie.jxt: Sounds good
@amrish.k.lal: ok cool :slightly_smiling_face:
--------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org For additional commands, e-mail: dev-h...@pinot.apache.org

Reply via email to