#general
@wrbriggs: I apologize in advance for my ignorant question, but I’m struggling conceptually a bit with how to handle dateTime column definitions in my table schema and segmentsConfig. I have a millisecond-level epoch field on my incoming realtime data (creatively named `eventTimestamp`). I would like to maintain this when querying / filtering my records at the individual event level. However, I would also like to define an hourly derived timestamp to be used for pre-aggregating with a star tree index. My segments config looks like this: ``` "segmentsConfig": { "timeColumnName": "eventTimestamp", "timeType": "MILLISECONDS", "retentionTimeUnit": "HOURS", "retentionTimeValue": "48", "segmentPushType": "APPEND", "segmentAssignmentStrategy": "BalanceNumSegmentAssignmentStrategy", "schemaName": "mySchema", "replication": "1", "replicasPerPartition": "1" },``` My star tree index looks like this: ``` "starTreeIndexConfigs": [{ "dimensionsSplitOrder": [ "dimension1", "dimension2" ], "skipStarNodeCreationForDimensions": [ ], "functionColumnPairs": [ "SUM__metric1", "SUM__metric2", "SUM__metric3", "DISTINCT_COUNT_HLL__dimension3", "DISTINCT_COUNT_HLL__dimension4" ], "maxLeafRecords": 10000 }],``` And my dateTimeFieldSpecs: ``` "dateTimeFieldSpecs": [ { "name": "eventTimestamp", "dataType": "LONG", "format": "1:MILLISECONDS:EPOCH", "granularity": "1:HOUR", "dateTimeType": "PRIMARY" } ],``` Can anyone confirm that this is the correct approach? Should I be using an ingestion transformation of `toEpochHoursRounded` instead, and specifying that as a DERIVED dateTimeField in the dateTimeFieldSpecs configuration, and manually adding that to the dimensionsSplitOrder of my star tree index?
@fx19880617: @jackie.jxt I think in this case, we need to add a new column for hour rounded time value then do star tree on it right
@wrbriggs: @fx19880617 Thank you, that makes sense to me, but I was confused as to why the dateTimeFieldSpec allows me to enter a granularity different from the incoming format. Also, the current airport examples all use the deprecated `timeFieldSpec`, which meant I had to go digging in the
@fx19880617: true, we are updating code base with this pr:
@fx19880617: will update the wiki as well
@fx19880617:
@wrbriggs: Heh, awesome - I also made the change locally for the `latest` image for submitting admin commands as jobs :slightly_smiling_face:
@fx19880617: the link you put was outdated wiki
@fx19880617: let me know if docs.pinot helps
@fx19880617: we will update in this site
@wrbriggs: Thanks
@wrbriggs: So it looks like `dateTimeType` (e..g, `PRIMARY`, `SECONDARY`, or `DERIVED`) is no longer necessary?
@fx19880617: it’s not
@fx19880617: you can define multiple dateTimeFields
@fx19880617: and specify the transform in the table
@fx19880617: you can set `ingestionConfig` in table, e.g. ```{ "tableName": "githubEvents", "tableType": "OFFLINE", "segmentsConfig": { "segmentPushType": "APPEND", "segmentAssignmentStrategy": "BalanceNumSegmentAssignmentStrategy", "schemaName": "githubEvents", "replication": "1", "timeColumnName": "event_time", "timeType": "MILLISECONDS" }, "tenants": {}, "tableIndexConfig": { "starTreeIndexConfigs": [ { "dimensionsSplitOrder": [ "type", "repo_id" ], "skipStarNodeCreationForDimensions": [], "functionColumnPairs": [ "SUM__pull_request_additions", "SUM__pull_request_deletions", "SUM__pull_request_changed_files", "COUNT__star", "DISTINCT_COUNT_HLL__actor_id" ], "maxLeafRecords": 1000 } ], "enableDynamicStarTreeCreation": true, "loadMode": "MMAP", "invertedIndexColumns": [], "segmentPartitionConfig": { "columnPartitionMap": { "repo_id": { "functionName": "Murmur", "numPartitions": 1024 } } }, "noDictionaryColumns": [] }, "routing": { "segmentPrunerTypes": [ "partition" ] }, "metadata": { "customConfigs": {} }, "ingestionConfig": { "batchIngestionConfig": { "segmentIngestionType": "APPEND", "segmentIngestionFrequency": "DAILY", "batchConfigMaps": [], "segmentNameSpec": {}, "pushSpec": {} }, "transformConfigs": [ { "columnName": "event_time", "transformFunction": "fromDateTime(created_at, \"yyyy-MM-dd'T'HH:mm:ssZ\")" } ] } }```
@fx19880617: here i convert `yyyy-MM-dd` format string column `created_at` in raw data to millis epoch value to `event_time`
@fx19880617: you can specify more time fields and add them into this transformConfigs, fyi:
@wrbriggs: Perfect, thank you. One more stupid question (hopefully last one for the day)… what should I look for in the trace in order to verify that my query is using my star tree index? Is there a Pinot equivalent of SQL `EXPLAIN` ?
@fx19880617: typically from the results, you can see numDocsScanned
@fx19880617: which should be way less than the total docs
@fx19880617: e.g.
@fx19880617: @jackie.jxt might provide more insights here
@wrbriggs: Ok. I have inverted indices as well, so I was just trying to figure out how to ensure it was using the star tree index instead - it is definitely showing far fewer scanned than total:
@wrbriggs: I just barely started ingestion, so I need to let it build up some more data :slightly_smiling_face:
@fx19880617: ic
@fx19880617: for consuming segment, i think there is no star-tree built
@fx19880617: it will go to inv index
@wrbriggs: Ah
@fx19880617: once the segment is sealed, star-tree will be built
@wrbriggs: That makes sense
@jackie.jxt: Another way is to enable the tracing for the query and see if it uses the `StarTreeFilterOperator`
@jackie.jxt: For the date time fields, is this column already rounded to each hour? ``` "dateTimeFieldSpecs": [ { "name": "eventTimestamp", "dataType": "LONG", "format": "1:MILLISECONDS:EPOCH", "granularity": "1:HOUR" } ],```
@jackie.jxt: If so, you can directly use it as the star-tree dimension, if not, then you can create a new rounded time column and use it in the star-tree
#troubleshooting
@contact: Hey everyone, i'm trying to setup pinot from tarbal distribution (so without docker) with an ansible playbook (hopefully will be able to open source it at some point). However i hit a wall when trying to load plugins, i'm using java 8 (`openjdk version "1.8.0_275"`) with the following jvm flags: ```JVM_OPTS=-Xms1G -Xmx4G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:+PrintGCDetails -Dplugins.include=pinot-pubsub,pinot-s3 -Xloggc:/var/log/pinot-gc-controller.log -Dplugins.dir=/usr/local/pinot/plugins```
@contact: The directories are the same as in the docker image:
@contact: I've a test setup in docker working fine (with the same controller config), however in bare metal with the tar distrib i'm getting:
@contact: ```Dec 30 16:34:21 ubuntu2004.localdomain bash[10587]: java.lang.RuntimeException: java.lang.ClassNotFoundException: org.apache.pinot.plugin.filesystem.S3PinotFS Dec 30 16:34:21 ubuntu2004.localdomain bash[10587]: at org.apache.pinot.spi.filesystem.PinotFSFactory.register(PinotFSFactory.java:58) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd> Dec 30 16:34:21 ubuntu2004.localdomain bash[10587]: at org.apache.pinot.spi.filesystem.PinotFSFactory.init(PinotFSFactory.java:74) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b84> Dec 30 16:34:21 ubuntu2004.localdomain bash[10587]: at org.apache.pinot.controller.ControllerStarter.initPinotFSFactory(ControllerStarter.java:481) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-> Dec 30 16:34:21 ubuntu2004.localdomain bash[10587]: at org.apache.pinot.controller.ControllerStarter.setUpPinotController(ControllerStarter.java:329) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.> Dec 30 16:34:21 ubuntu2004.localdomain bash[10587]: at org.apache.pinot.controller.ControllerStarter.start(ControllerStarter.java:287) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd> Dec 30 16:34:21 ubuntu2004.localdomain bash[10587]: at org.apache.pinot.tools.service.PinotServiceManager.startController(PinotServiceManager.java:116) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.> Dec 30 16:34:21 ubuntu2004.localdomain bash[10587]: at org.apache.pinot.tools.service.PinotServiceManager.startRole(PinotServiceManager.java:91) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb6> Dec 30 16:34:21 ubuntu2004.localdomain bash[10587]: at org.apache.pinot.tools.admin.command.StartServiceManagerCommand.lambda$startBootstrapServices$0(StartServiceManagerCommand.java:234) ~[pinot-al> Dec 30 16:34:21 ubuntu2004.localdomain bash[10587]: at org.apache.pinot.tools.admin.command.StartServiceManagerCommand.startPinotService(StartServiceManagerCommand.java:286) [pinot-all-0.6.0-jar-wit> Dec 30 16:34:21 ubuntu2004.localdomain bash[10587]: at org.apache.pinot.tools.admin.command.StartServiceManagerCommand.startBootstrapServices(StartServiceManagerCommand.java:233) [pinot-all-0.6.0-ja> Dec 30 16:34:21 ubuntu2004.localdomain bash[10587]: at org.apache.pinot.tools.admin.command.StartServiceManagerCommand.execute(StartServiceManagerCommand.java:183) [pinot-all-0.6.0-jar-with-dependen> Dec 30 16:34:21 ubuntu2004.localdomain bash[10587]: at org.apache.pinot.tools.admin.command.StartControllerCommand.execute(StartControllerCommand.java:130) [pinot-all-0.6.0-jar-with-dependencies.jar> Dec 30 16:34:21 ubuntu2004.localdomain bash[10587]: at org.apache.pinot.tools.admin.PinotAdministrator.execute(PinotAdministrator.java:154) [pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646bace> Dec 30 16:34:21 ubuntu2004.localdomain bash[10587]: at org.apache.pinot.tools.admin.PinotAdministrator.main(PinotAdministrator.java:166) [pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafc> Dec 30 16:34:21 ubuntu2004.localdomain bash[10587]: Caused by: java.lang.ClassNotFoundException: org.apache.pinot.plugin.filesystem.S3PinotFS Dec 30 16:34:21 ubuntu2004.localdomain bash[10587]: at java.net.URLClassLoader.findClass(URLClassLoader.java:382) ~[?:1.8.0_275] Dec 30 16:34:21 ubuntu2004.localdomain bash[10587]: at java.lang.ClassLoader.loadClass(ClassLoader.java:418) ~[?:1.8.0_275] Dec 30 16:34:21 ubuntu2004.localdomain bash[10587]: at org.apache.pinot.spi.plugin.PluginClassLoader.loadClass(PluginClassLoader.java:80) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646bacea> Dec 30 16:34:21 ubuntu2004.localdomain bash[10587]: at org.apache.pinot.spi.plugin.PluginManager.createInstance(PluginManager.java:268) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafc> Dec 30 16:34:21 ubuntu2004.localdomain bash[10587]: at org.apache.pinot.spi.plugin.PluginManager.createInstance(PluginManager.java:239) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafc> Dec 30 16:34:21 ubuntu2004.localdomain bash[10587]: at org.apache.pinot.spi.plugin.PluginManager.createInstance(PluginManager.java:220) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafc> Dec 30 16:34:21 ubuntu2004.localdomain bash[10587]: at org.apache.pinot.spi.filesystem.PinotFSFactory.register(PinotFSFactory.java:53) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd> Dec 30 16:34:21 ubuntu2004.localdomain bash[10587]: ... 13 more```
@contact: For more info here are part of the init logs which logs env config: ```Dec 30 16:34:20 ubuntu2004.localdomain bash[10587]: ZkClient monitor key or type is not provided. Skip monitoring. Dec 30 16:34:20 ubuntu2004.localdomain bash[10587]: Starting ZkClient event thread. Dec 30 16:34:21 ubuntu2004.localdomain bash[10587]: Terminate ZkClient event thread. Dec 30 16:34:21 ubuntu2004.localdomain bash[10587]: Terminate ZkClient event thread. Dec 30 16:34:21 ubuntu2004.localdomain bash[10587]: Closed zkclient Dec 30 16:34:21 ubuntu2004.localdomain bash[10587]: Initializing PinotFSFactory Dec 30 16:34:20 ubuntu2004.localdomain bash[10587]: Client environment:java.class.path=/usr/local/pinot/lib/pinot-all-0.6.0-jar-with-dependencies.jar Dec 30 16:34:20 ubuntu2004.localdomain bash[10587]: Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib Dec 30 16:34:20 ubuntu2004.localdomain bash[10587]: Client environment:java.io.tmpdir=/tmp Dec 30 16:34:20 ubuntu2004.localdomain bash[10587]: Client environment:java.compiler=<NA> Dec 30 16:34:20 ubuntu2004.localdomain bash[10587]: Client environment:os.name=Linux Dec 30 16:34:20 ubuntu2004.localdomain bash[10587]: Client environment:os.arch=amd64 Dec 30 16:34:20 ubuntu2004.localdomain bash[10587]: Client environment:os.version=5.4.0-54-generic Dec 30 16:34:20 ubuntu2004.localdomain bash[10587]: Client environment:user.name=root Dec 30 16:34:20 ubuntu2004.localdomain bash[10587]: Client environment:user.home=/root Dec 30 16:34:20 ubuntu2004.localdomain bash[10587]: Client environment:user.dir=/usr/local/apache-pinot-incubating-0.6.0-bin Dec 30 16:34:20 ubuntu2004.localdomain bash[10587]: Initiating client connection, connectString=10.1.0.11:2181 sessionTimeout=30000 watcher=org.apache.helix.manager.zk.client.ZkConnectionManager@71e9ebae Dec 30 16:34:20 ubuntu2004.localdomain bash[10587]: Opening socket connection to server 10.1.0.11/10.1.0.11:2181. Will not attempt to authenticate using SASL (unknown error) Dec 30 16:34:20 ubuntu2004.localdomain bash[10587]: Socket connection established to 10.1.0.11/10.1.0.11:2181, initiating session Dec 30 16:34:20 ubuntu2004.localdomain bash[10587]: Session establishment complete on server 10.1.0.11/10.1.0.11:2181, sessionid = 0x100000024270010, negotiated timeout = 30000 Dec 30 16:34:20 ubuntu2004.localdomain bash[10587]: zookeeper state changed (SyncConnected) Dec 30 16:34:20 ubuntu2004.localdomain bash[10587]: MBean HelixZkClient:Key=10_1_0_11_2181_30000,Type=ZkConnectionManager has been registered. Dec 30 16:34:20 ubuntu2004.localdomain bash[10587]: MBean HelixZkClient:Key=10_1_0_11_2181_30000,PATH=Root,Type=ZkConnectionManager has been registered. Dec 30 16:34:20 ubuntu2004.localdomain bash[10587]: ZkConnection 10_1_0_11_2181_30000 was created for sharing. Dec 30 16:34:20 ubuntu2004.localdomain bash[10587]: Sharing ZkConnection 10_1_0_11_2181_30000 to a new SharedZkClient. Dec 30 16:34:20 ubuntu2004.localdomain bash[10587]: ZkClient monitor key or type is not provided. Skip monitoring. Dec 30 16:34:20 ubuntu2004.localdomain bash[10587]: Starting ZkClient event thread.```
@contact: Do anyone have an idea ?
@dlavoie: Mind running a `ps aux` so we get confirmation of the exact arguments that where provided to the jvm process?
--------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
