#general


@rootshellz: @rootshellz has joined the channel
@yves.kurz: @yves.kurz has joined the channel
@songxinbai: @songxinbai has joined the channel
@ricky: @ricky has joined the channel
@anantha.sharma4: @anantha.sharma4 has joined the channel
@brad: @brad has joined the channel
@luys8611: @luys8611 has joined the channel
@tim.spann: I found a typo in this page
  @tim.spann: under the Launch Pinot Cluster section
  @tim.spann: This command will run a single instance of the Pinot Controller, Pinot Server, Pinot Broker, Kafka, and Zookeeper. You can find the file on GitHub.
  @tim.spann: should read: Pinot Broker, Pulsar, and Zookeeper
  @tim.spann: thanks
  @mayanks: @mark.needham ^^
  @richard892: I love an eye for detail
  @mark.needham: ta - have updated
@luys8611: It would be great if there's some docs of details how to set all env of pinot & thirdeye for this on my machine.
  @mayanks: @pyne.suvodeep ^^
  @pyne.suvodeep: Hi @luys8611 Here's a TE OSS public fork of @cyril that might help. There is a dockerized quickstart that uses MySQL as a datasource:
  @pyne.suvodeep: This doc should provide a good guideline
  @luys8611: I have a data as csv file.
  @luys8611: And I've installed pinot docker containers.
  @luys8611: Now I am trying to create pinot table from csv data.
  @pyne.suvodeep: There are docs for this. Example:
  @luys8611: Ok, let me follow it. Thanks
@luys8611: I started Apache Zookeeper, Pinot Controller, Pinot Broker, and Pinot Server. Now how can I create and add new table in cluster manager?
  @mayanks:
  @luys8611: Does it work completely at local env?
  @mayanks: Sorry wrong link
  @mayanks: If you just want to play here’s a local step by step guide:
  @luys8611: *`java.net.UnknownHostException: manual-pinot-controller: Temporary failure in name resolution`* I'm getting this error when I tried to add new table
  @luys8611: I'm trying to run this
  @mark.needham: did you start those up with docker?
  @mark.needham: if you have them all running outside docker then it's not gonna have a controller called `manual-pinot-controller` available
  @luys8611: ```pinot-broker-run | May 11, 2022 8:52:35 PM org.glassfish.grizzly.http.server.NetworkListener start pinot-broker-run | INFO: Started listener bound to [0.0.0.0:8099] pinot-broker-run | May 11, 2022 8:52:35 PM org.glassfish.grizzly.http.server.HttpServer start pinot-broker-run | INFO: [HttpServer] Started. pinot-server-run | May 11, 2022 8:52:38 PM org.glassfish.grizzly.http.server.NetworkListener start pinot-server-run | INFO: Started listener bound to [0.0.0.0:8097] pinot-server-run | May 11, 2022 8:52:38 PM org.glassfish.grizzly.http.server.HttpServer start pinot-server-run | INFO: [HttpServer] Started. pinot-controller-run | May 11, 2022 8:52:40 PM org.glassfish.grizzly.http.server.NetworkListener start pinot-controller-run | INFO: Started listener bound to [0.0.0.0:9000] pinot-controller-run | May 11, 2022 8:52:40 PM org.glassfish.grizzly.http.server.HttpServer start pinot-controller-run | INFO: [HttpServer] Started. pinot-broker-run | 2022/05/11 20:52:40.923 INFO [StartServiceManagerCommand] [Start a Pinot [BROKER]] Started Pinot [BROKER] inst ance [Broker_172.19.0.5_8099] at 14.235s since launch pinot-server-run | 2022/05/11 20:52:43.533 INFO [StartServiceManagerCommand] [Start a Pinot [SERVER]] Started Pinot [SERVER] inst ance [Server_172.19.0.6_8098] at 14.637s since launch pinot-controller-run | 2022/05/11 20:52:45.017 INFO [StartServiceManagerCommand] [main] Started Pinot [CONTROLLER] instance [Control ler_172.19.0.4_9000] at 20.284s since launch zookeeper-run | 2022-05-11 20:52:54,061 [myid:1] - INFO [SessionTracker:ZooKeeperServer@398] - Expiring session 0x100027422aa 0005, timeout of 30000ms exceeded zookeeper-run | 2022-05-11 20:52:54,062 [myid:1] - INFO [SessionTracker:ZooKeeperServer@398] - Expiring session 0x100027422aa 000a, timeout of 30000ms exceeded```
  @luys8611: I have all running in docker
@eddyreynoso: @eddyreynoso has joined the channel
@anli: Hi team, we're using Presto with Pinot and would like to support pushdown of functions like `COALESCE` or multi-column `CASE` statements on the Pinot side. This seems reasonable for predicates as currently it looks like push down logic is on aggregations / predicates. However, we're looking for some performance improvements here by having this as a `SELECT` pushdown instead of having to return all data to Presto for processing as we can "aggregate" row-wise for various operators and take advantage of certain indexing i.e. bloom filters, etc. for `COALESCE`, `CONCAT` , etc. Are there concerns or pointers around this? @xiangfu0
  @xiangfu0: So far the support for pushing down different functions might need to be implemented separated.
  @xiangfu0: If the semantics are the same across presto and pinot, you can try to have one general way to push down e.g. sum/count/min/max
  @xiangfu0: otherwise it may require extra override e.g. count(distinct xx)
  @xiangfu0: for row level _expression_, so far there is pushing down for arithmetic, you can follow that to support more transform functions
  @xiangfu0: You can check PinotPushdownUtils and PinotAggregationProjectConverter for more details
@yzhou86: @yzhou86 has joined the channel

#random


@rootshellz: @rootshellz has joined the channel
@yves.kurz: @yves.kurz has joined the channel
@songxinbai: @songxinbai has joined the channel
@ricky: @ricky has joined the channel
@anantha.sharma4: @anantha.sharma4 has joined the channel
@brad: @brad has joined the channel
@luys8611: @luys8611 has joined the channel
@eddyreynoso: @eddyreynoso has joined the channel
@yzhou86: @yzhou86 has joined the channel

#feat-text-search


@rootshellz: @rootshellz has joined the channel

#feat-presto-connector


@rootshellz: @rootshellz has joined the channel

#troubleshooting


@rootshellz: @rootshellz has joined the channel
@yves.kurz: @yves.kurz has joined the channel
@hjj8645561: Hi guys, I am trying to use `LastWithTime` function to get the latest data of each group. But unfortunately I got NPE exception. Original intention is to get the top N of a group after group by ad hoc column. Wondering if I have a grammar issue of my PQL: ```select hostname, lastWithTime('alertId', 'issued', 'STRING') from uas_nomalized_alert where JSON_EXTRACT_SCALAR(attributes, '$.graphQL-businessService', 'STRING', '') = 'Meeting Centers' group by hostname``` `alertId` is a 'STRING' column, `issued` is a 'TIMESTAMP' column
@fizza.abid: Hi, after setting up authentication. My ingestion stopped.
@fizza.abid: Used this link for enabling authentication
  @xiangfu0: @xiaobing do you know if the auth is carried over for scheduling minion task. @fizza.abid can you paste the controller and minion logs here
  @fizza.abid:
  @fizza.abid: We want to enable with authentication
  @fizza.abid: ```SegmentGenerationAndPushTask_Task_SegmentGenerationAndPushTask_1652256960190_0 completed at: 1652257150076, results: true. FrameworkTime: 2 ms; HandlerTime: 22 ms. Caught exception while executing task: Task_SegmentGenerationAndPushTask_1652256960190_0 java.lang.RuntimeException: Failed to execute SegmentGenerationAndPushTask at org.apache.pinot.plugin.minion.tasks.segmentgenerationandpush.SegmentGenerationAndPushTaskExecutor.executeTask(SegmentGenerationAndPushTaskExecutor.java:120) ~[pinot-all-0.10.0-SNAPSHOT-jar-with-dependencies.jar:0.10.0-SNAPSHOT-078c711d35769be2dc4e4b7e235e06744cf0bba7] at org.apache.pinot.plugin.minion.tasks.segmentgenerationandpush.SegmentGenerationAndPushTaskExecutor.generateTaskSpec(SegmentGenerationAndPushTaskExecutor.java:269) ~[pinot-all-0.10.0-SNAPSHOT-jar-with-dependencies.jar:0.10.0-SNAPSHOT-078c711d35769be2dc4e4b7e235e06744cf0bba7] at org.apache.pinot.plugin.minion.tasks.segmentgenerationandpush.SegmentGenerationAndPushTaskExecutor.executeTask(SegmentGenerationAndPushTaskExecutor.java:117) ~[pinot-all-0.10.0-SNAPSHOT-jar-with-dependencies.jar:0.10.0-SNAPSHOT-078c711d35769be2dc4e4b7e235e06744cf0bba7] Task: Task_SegmentGenerationAndPushTask_1652256960190_0 completed in: 59ms```
  @xiaobing: the auth token should have be carried over. I can dig a bit more on this with the logs.
@songxinbai: @songxinbai has joined the channel
@ysuo: Hi team, I’m trying to use Pinot upsert feature. Part of my table config is like below: { “tableName”: “upsert_test_local”, “tableType”: “REALTIME”, “segmentsConfig”: { “schemaName”: “upsert_test_local”, “timeColumnName”: “*created_on*”, “timeType”: “MILLISECONDS”, “allowNullTimeValue”: true, “replicasPerPartition”: “1", “retentionTimeUnit”: “DAYS”, “retentionTimeValue”: “30", “segmentPushType”: “APPEND”, “completionConfig”: { “completionMode”: “DOWNLOAD” } }, “tenants”: { }, “tableIndexConfig”: { “loadMode”: “MMAP”, “aggregateMetrics”: true, “nullHandlingEnabled”: true, “streamConfigs”: { “streamType”: “kafka”, “stream.kafka.consumer.type”: “lowlevel”, “stream.kafka.decoder.class.name”: “org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder”, “stream.kafka.consumer.factory.class.name”: “org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory”, “stream.kafka.consumer.prop.auto.offset.reset”: “largest”, “realtime.segment.flush.threshold.time”: “*30m*”, “realtime.segment.flush.threshold.rows”: “0", “realtime.segment.flush.threshold.segment.size”: “100M”, “realtime.segment.flush.autotune.initialRows”: “1000000" } }, “ingestionConfig”: { “filterConfig”: { “filterFunction”: “Groovy({tablename != \“test_table_name\“}, tablename)” }, “transformConfigs”: [ { “columnName”: “id”, “transformFunction”: “Groovy({UUID.randomUUID().toString()}, tablename)” }, { “columnName”: “*timestamp*”, “transformFunction”: “jsonPathString(metrics, ‘$.timestamp’)” }, { “columnName”: “*created_on*”, “transformFunction”: “Groovy({System.currentTimeMillis()}, tablename)” }, { “columnName”: “updated_on”, “transformFunction”: “Groovy({System.currentTimeMillis()}, tablename)” } ] }, “metadata”: { “customConfigs”: {} }, “routing”: { “instanceSelectorType”: “strictReplicaGroup” }, “upsertConfig”: { “mode”: “PARTIAL”, “defaultPartialUpsertStrategy”: “OVERWRITE”, “partialUpsertStrategies”:{ “*created_on*”: “IGNORE” } } } And part of schema is like this: “dateTimeFieldSpecs”: [ { “name”: “*timestamp*”, “dataType”: “LONG”, “format”: “1:MILLISECONDS:EPOCH”, “granularity”: “1:MILLISECONDS” }, { “name”: “*created_on*”, “dataType”: “LONG”, “format”: “1:MILLISECONDS:EPOCH”, “granularity”: “1:MILLISECONDS” }, { “name”: “updated_on”, “dataType”: “LONG”, “format”: “1:MILLISECONDS:EPOCH”, “granularity”: “1:MILLISECONDS” } ], “primaryKeyColumns”: [ “timestamp” ] At first, upsert works as expected. But after a while, like 30 minutes later, when I query this table, there is no record in this table. But totalDocs in the query response stats is not 0. Then I write some data to the same Kafka topic, and query this table, there are some records. But the value of the *created_on* field is 0 instead of the current timestamp. Any idea what property is not set right here? Is it timeColumnName property?
  @kharekartik: Hi, I see you are using timestamp column as primaryKey, is that expected? Ideally for upsert to work your primaryKey should be unique for a record and your input kafka stream should be partitioned on the primary key.
  @kharekartik: @jackie.jxt
  @ysuo: Hi, I’ve replicated this error. The records are there(use skipupsert option). But can’t be queried.
  @kharekartik: Hi, when you apply `skipUpsert` option in query, do you get both old data + updated data or just old data?
  @ysuo: And when I add another 2 records to the same Kafka topic. I think it’s ingested to Pinot from the info in the following picture. The only problem is the created_on is 0 here. It should be a long type timestamp.
  @ysuo: When I apply skipUpsert option, both old data and new data are showed.
  @jackie.jxt: How many servers do you have? Since it returns 2 records with the same timestamp, I suspect the source steam is not partitioned on the primary key
  @ysuo: Hi, only one Kafka topic partition and 6 servers.
  @ysuo: My confusion is why 0 is created for created_on field here? I used System.currentTimeMillis() or now(). And why the existing records could not be queried?
  @ysuo: the following picture may explain this issue more clearly. After a while, no records found. With skipUpsert option, these records are actually there.
@mariums82: I was trying to connect pinot with trino , connections looks good , I’m getting pinot catalog also but when i exectue `{"code":403,"error":"Permission is denied for access type 'READ' to the endpoint ''"}` Any idea how to resolve ?
  @kharekartik: Hi, do you have auth enabled on your cluster?
  @mariums82: yes , and already added auth config in trino clustet . ``` pinot.controller.authentication.type = PASSWORD pinot.controller.authentication.user=XXXXX pinot.controller.authentication.password = XXXXXXX$Z pinot.broker.authentication.type = PASSWORD pinot.broker.authentication.user = XXXXXX pinot.broker.authentication.password = XXXXXX$Z```
  @kharekartik: And this user has READ permissions on table? You might need to update the user permissions if they are not there
  @mariums82: No , this is master user,
@ricky: @ricky has joined the channel
@anantha.sharma4: @anantha.sharma4 has joined the channel
@brad: @brad has joined the channel
@facundo.bianco: Hi All, we're running ingestion and Push Job Spec () is configured like ```pushJobSpec: pushParallelism: 20 pushAttempts: 2 segmentUriPrefix: "" segmentUriSuffix : ""``` And got this error message: > 2022/05/11 14:28:50.531 ERROR [BaseTableDataManager] [HelixTaskExecutor-message_handle_thread] Attempts exceeded when downloading segment: foo_OFFLINE_2022-05-03_2022-05-03_11 for table: foo_OFFLINE from: to: /tmp/pinot-tmp/server/index/foo_OFFLINE/tmp/tmp-foo_OFFLINE_2022-05-03_2022-05-03_11-b2f3a97c-9c14-4b4c-9874-fb028597a237/foo_OFFLINE_2022-05-03_2022-05-03_11.tar.gz ava:72) [pinot-all-0.10.0-SNAPSHOT-jar-with-dependencies.jar:0.10.0-SNAPSHOT-f It adds 'null' at end of file's URI. Any idea how to resolve? Thanks in advance.
  @mayanks:
  @mayanks: Seems like it is expecting segmentUriSuffix, @xiangfu0?
  @xiaobing: based on the exception msg, the wrong uri is actually from SegmentZKMetadata.getDownloadUrl(), so the wrong URI was set there when pushing the segment to Pinot. looks like if either prefix or suffix is empty, those three parts are simply stitched together (, but still I’d assume suffix should be an empty string instead of null, perhaps a bug somewhere) ```return URI.create(String.format("%s%s%s", prefix, fileURI.getRawPath(), suffix));``` perhaps can give a quick try to simply leave both prefix and suffix as empty to get into and see if it works as expected. You can confirm by checking the downloadUrl set in the segment metadata in ZK.
  @facundo.bianco: We solved it removing _segmentUriPrefix_ and _segmentUriSuffix_ from pushJobSpec -- thank you!
  @mayanks: @xiaobing Perhaps we need to document this or fix this?
@luys8611: @luys8611 has joined the channel
@eddyreynoso: @eddyreynoso has joined the channel
@yzhou86: @yzhou86 has joined the channel

#pinot-s3


@rootshellz: @rootshellz has joined the channel

#pinot-dev


@rootshellz: @rootshellz has joined the channel
@songxinbai: @songxinbai has joined the channel
@songxinbai: @walterddr Hi Rong. Sorry to bother you. I'm trying to build Pinot on branch 'pr-query-integration'. I followed instructions from this Conversation(). But when I try to query on broker, I encountered some problems. When I query like this "SELECT * FROM baseballStats_OFFLINE", it will return some results. But if I query like this "SELECT playerId FROM baseballStats_OFFLINE", it will take almost 10 seconds, and return an empty result. It's same when I use 'inner join', doesn't report any exception or error, but just return empty result. I started Pinot like this 'bin/pinot-admin.sh StartServiceManager -bootstrapConfigPaths conf/pinot-controller.conf conf/pinot-broker.conf conf/pinot-server.conf'(I has already started Zookeeper), and all components on the same machine(1 controller,1 broker,1 server) And now I don't know what happened and want to know how to locate problems, could you please give me some advices? Thanks.

#community


@rootshellz: @rootshellz has joined the channel

#announcements


@rootshellz: @rootshellz has joined the channel

#presto-pinot-connector


@anli: @anli has joined the channel

#getting-started


@rootshellz: @rootshellz has joined the channel
@yves.kurz: @yves.kurz has joined the channel
@songxinbai: @songxinbai has joined the channel
@ricky: @ricky has joined the channel
@anantha.sharma4: @anantha.sharma4 has joined the channel
@brad: @brad has joined the channel
@luys8611: @luys8611 has joined the channel
@eddyreynoso: @eddyreynoso has joined the channel
@yzhou86: @yzhou86 has joined the channel

#releases


@rootshellz: @rootshellz has joined the channel

#pinot-docsrus


@atri.sharma: Folks, please help review
  @mayanks: Approved with minor comments
  @atri.sharma: Thanks. Could you please merge the same?
  @mayanks: Done
  @atri.sharma: You rock, thanks!
@amrish.k.lal: Hello, this PR () is for adding more details to json queries document..

#introductions


@rootshellz: @rootshellz has joined the channel
@yves.kurz: @yves.kurz has joined the channel
@songxinbai: @songxinbai has joined the channel
@ricky: @ricky has joined the channel
@anantha.sharma4: @anantha.sharma4 has joined the channel
@brad: @brad has joined the channel
@luys8611: @luys8611 has joined the channel
@luys8611: Hi, I'm Luy and I want to detect anomalies in data pipeline.
@luys8611: It would be great if there's some docs of details how to set all env of pinot & thirdeye for this on my machine.
  @mayanks: Let’s ask this in general and I’ll tag the TE folks
@luys8611: This is what I'm following from the scratch now.
@mayanks: Hi Luy welcome to the community
@eddyreynoso: @eddyreynoso has joined the channel
@yzhou86: @yzhou86 has joined the channel

#linen_dev


@kam: yeah try `add to Slack`
@kam: it looks like it got fixed
--------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org For additional commands, e-mail: dev-h...@pinot.apache.org

Reply via email to