Apache Pinot Daily Email Digest (2021-06-16)

Pinot Slack Email Digest Wed, 16 Jun 2021 19:00:39 -0700

#general

@hashhar: @hashhar has joined the channel
@mbracke: Hi! Is there a way to write a where clause to match entries that do not match a given regular _expression_? Using `not` just results in an error message.
@fx19880617: I think not regex_match is not supported, current solution is to negate the regex (I know sometimes it’s hard)
@fx19880617: We should add NOT support for REGEX_MATCH
@fx19880617: can you create a github issue
@mbracke: OK, thanks.
@mbracke: Isn't this issue similar: ? It's on REGEXP_LIKE, but that's what I'm using.
@fx19880617: true, I think we can use same issue
@keweishang: Hi team, I downloaded and followed the `Manual cluster setup` ()’s `Using launcher scripts` section, I ran ```export JAVA_OPTS="-Xms4G -Xmx8G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -Xloggc:gc-pinot-controller.log" bin/pinot-admin.sh StartController \ -zkAddress localhost:2191 \ -controllerPort 9000``` to start the controller. However, the controller logs lots of warns like the following (in the thread), and `` returns a blank web UI. May I have some help please? Thanks. The docker version works for me but I want to install Pinot on our EC2 nodes for further PoC.
@keweishang: Controller WARN logs when starting controller: ```Jun 16, 2021 4:52:07 PM org.glassfish.grizzly.http.server.NetworkListener start INFO: Started listener bound to [0.0.0.0:9000] Jun 16, 2021 4:52:07 PM org.glassfish.grizzly.http.server.HttpServer start INFO: [HttpServer] Started. 2021/06/16 16:52:12.717 INFO [Reflections] [main] Reflections took 5310 ms to scan 1 urls, producing 65540 keys and 128519 values 2021/06/16 16:52:12.772 WARN [Reflections] [main] could not get type for name org.apache.commons.digester.AbstractObjectCreationFactory from any class loader org.reflections.ReflectionsException: could not get type for name org.apache.commons.digester.AbstractObjectCreationFactory at org.reflections.ReflectionUtils.forName(ReflectionUtils.java:390) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at org.reflections.Reflections.expandSuperTypes(Reflections.java:381) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at org.reflections.Reflections.<init>(Reflections.java:126) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at io.swagger.jaxrs.config.BeanConfig.classes(BeanConfig.java:276) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at io.swagger.jaxrs.config.BeanConfig.scanAndRead(BeanConfig.java:240) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at io.swagger.jaxrs.config.BeanConfig.setScan(BeanConfig.java:221) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at org.apache.pinot.controller.api.ControllerAdminApiApplication.setupSwagger(ControllerAdminApiApplication.java:101) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at org.apache.pinot.controller.api.ControllerAdminApiApplication.start(ControllerAdminApiApplication.java:78) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at org.apache.pinot.controller.ControllerStarter.setUpPinotController(ControllerStarter.java:421) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at org.apache.pinot.controller.ControllerStarter.start(ControllerStarter.java:283) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at org.apache.pinot.tools.service.PinotServiceManager.startController(PinotServiceManager.java:116) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at org.apache.pinot.tools.service.PinotServiceManager.startRole(PinotServiceManager.java:91) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at org.apache.pinot.tools.admin.command.StartServiceManagerCommand.lambda$startBootstrapServices$0(StartServiceManagerCommand.java:234) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at org.apache.pinot.tools.admin.command.StartServiceManagerCommand.startPinotService(StartServiceManagerCommand.java:286) [pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at org.apache.pinot.tools.admin.command.StartServiceManagerCommand.startBootstrapServices(StartServiceManagerCommand.java:233) [pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at org.apache.pinot.tools.admin.command.StartServiceManagerCommand.execute(StartServiceManagerCommand.java:183) [pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at org.apache.pinot.tools.admin.command.StartControllerCommand.execute(StartControllerCommand.java:130) [pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at org.apache.pinot.tools.admin.PinotAdministrator.execute(PinotAdministrator.java:164) [pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at org.apache.pinot.tools.admin.PinotAdministrator.main(PinotAdministrator.java:184) [pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] Caused by: java.lang.ClassNotFoundException: org.apache.commons.digester.AbstractObjectCreationFactory at java.net.URLClassLoader.findClass(URLClassLoader.java:382) ~[?:1.8.0_242] at java.lang.ClassLoader.loadClass(ClassLoader.java:419) ~[?:1.8.0_242] at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) ~[?:1.8.0_242] at java.lang.ClassLoader.loadClass(ClassLoader.java:352) ~[?:1.8.0_242] at org.reflections.ReflectionUtils.forName(ReflectionUtils.java:388) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] ... 18 more 2021/06/16 16:52:12.799 WARN [Reflections] [main] could not get type for name org.apache.log4j.EnhancedPatternLayout from any class loader org.reflections.ReflectionsException: could not get type for name org.apache.log4j.EnhancedPatternLayout at org.reflections.ReflectionUtils.forName(ReflectionUtils.java:390) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at org.reflections.Reflections.expandSuperTypes(Reflections.java:381) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at org.reflections.Reflections.<init>(Reflections.java:126) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at io.swagger.jaxrs.config.BeanConfig.classes(BeanConfig.java:276) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at io.swagger.jaxrs.config.BeanConfig.scanAndRead(BeanConfig.java:240) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at io.swagger.jaxrs.config.BeanConfig.setScan(BeanConfig.java:221) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at org.apache.pinot.controller.api.ControllerAdminApiApplication.setupSwagger(ControllerAdminApiApplication.java:101) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at org.apache.pinot.controller.api.ControllerAdminApiApplication.start(ControllerAdminApiApplication.java:78) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at org.apache.pinot.controller.ControllerStarter.setUpPinotController(ControllerStarter.java:421) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at org.apache.pinot.controller.ControllerStarter.start(ControllerStarter.java:283) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at org.apache.pinot.tools.service.PinotServiceManager.startController(PinotServiceManager.java:116) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at org.apache.pinot.tools.service.PinotServiceManager.startRole(PinotServiceManager.java:91) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at org.apache.pinot.tools.admin.command.StartServiceManagerCommand.lambda$startBootstrapServices$0(StartServiceManagerCommand.java:234) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at org.apache.pinot.tools.admin.command.StartServiceManagerCommand.startPinotService(StartServiceManagerCommand.java:286) [pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at org.apache.pinot.tools.admin.command.StartServiceManagerCommand.startBootstrapServices(StartServiceManagerCommand.java:233) [pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at org.apache.pinot.tools.admin.command.StartServiceManagerCommand.execute(StartServiceManagerCommand.java:183) [pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at org.apache.pinot.tools.admin.command.StartControllerCommand.execute(StartControllerCommand.java:130) [pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at org.apache.pinot.tools.admin.PinotAdministrator.execute(PinotAdministrator.java:164) [pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at org.apache.pinot.tools.admin.PinotAdministrator.main(PinotAdministrator.java:184) [pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] Caused by: java.lang.ClassNotFoundException: org.apache.log4j.EnhancedPatternLayout at java.net.URLClassLoader.findClass(URLClassLoader.java:382) ~[?:1.8.0_242] at java.lang.ClassLoader.loadClass(ClassLoader.java:419) ~[?:1.8.0_242] at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) ~[?:1.8.0_242] at java.lang.ClassLoader.loadClass(ClassLoader.java:352) ~[?:1.8.0_242] at org.reflections.ReflectionUtils.forName(ReflectionUtils.java:388) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] ... 18 more```
@mayanks: Does curl to controller work?
@mayanks: cc @fx19880617
@keweishang: Yes, curl to controller’s 9000 port works
@mayanks: Yeah so I think the cluster is up, not sure about the UI issue.
@keweishang: Yeah, I think its an UI issue too. Tried different browsers, all didn’t work. Are these `ClassNotFoundException` WARNs normal?
@mayanks: Don’t recall seeing them but if the cluster is up and behaving well, then not sure. Also don’t think that could cause UI issue
@keweishang: I can access the page, but not the page (blank)
@mayanks: what about help page (swagger)?
@mayanks: I have tagged Xiang in case he has seen this issue. If not, perhaps we should file an issue
@keweishang: all above pages works, only doesn’t work (blank). Sure let’s wait for Xiang’s feedback on it. I can file an issue later if that’s really a bug.
@fx19880617: I remember the root cause is that ui requires broker/server also up to be shown. It’s fixed recently:
@mayanks: Thanks @fx19880617
@fx19880617: should be fixed in next release
@keweishang: Thanks! Indeed, starting broker + server has solved the issue :+1:
@mark.needham: Hi, I'm trying to learn how to use dimension tables, but I'm doing something wrong, but what I'm not sure. I have a `regions` dim table and `cases` normal table. And then I run this query: ```select areaName, lookUp('regions', 'Region', 'LTLAName', areaName) from cases limit 10``` But the error message says it doesn't find the lookup function: ```[ { "errorCode": 200, "message": "QueryExecutionError:\norg.apache.pinot.core.query.exception.BadQueryRequestException: Unsupported function: lookup with 4 parameters\n\tat org.apache.pinot.core.operator.transform.function.TransformFunctionFactory.get(TransformFunctionFactory.java:189)\n\tat org.apache.pinot.core.operator.transform.TransformOperator.<init>(TransformOperator.java:56)\n\tat org.apache.pinot.core.plan.TransformPlanNode.run(TransformPlanNode.java:52)\n\tat org.apache.pinot.core.plan.SelectionPlanNode.run(SelectionPlanNode.java:83)\n\tat org.apache.pinot.core.plan.CombinePlanNode.run(CombinePlanNode.java:94)\n\tat org.apache.pinot.core.plan.InstanceResponsePlanNode.run(InstanceResponsePlanNode.java:33)\n\tat org.apache.pinot.core.plan.GlobalPlanImplV0.execute(GlobalPlanImplV0.java:45)\n\tat org.apache.pinot.core.query.executor.ServerQueryExecutorV1Impl.processQuery(ServerQueryExecutorV1Impl.java:234)\n\tat org.apache.pinot.core.query.executor.QueryExecutor.processQuery(QueryExecutor.java:60)\n\tat org.apache.pinot.core.query.scheduler.QueryScheduler.processQueryAndSerialize(QueryScheduler.java:155)\n\tat org.apache.pinot.core.query.scheduler.QueryScheduler.lambda$createQueryFutureTask$0(QueryScheduler.java:139)\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:266)\n\tat java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)\n\tat shaded.com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:111)" } ]``` Any ideas?
@jmeyer: Not an expert, but I remember making it work a few days ago, and your query looks okay to me What version of Pinot are you using ?
@kulbir.nijjer: Yes @jmeyer is right. @mark.needham Support for Lookup UDF join was added in 0.7.1 version only >From the error message it's not able to find the required code which means running with older Pinot version.
@mark.needham: aha, cool! Yeh I had it using the docker 'latest', tag but the first time that I ran that it picked up version 0.6.0.
@mark.needham:
@mark.needham: pinned it to 0.7.1 now :slightly_smiling_face:
@kulbir.nijjer: Cool!
@mark.needham: presumably on this query it's doing the lookup for every single row and therefore repeating the same lookup lots of times? Is there a way that I can get it to do the aggregation by area name first and then do the lookup afterwards so there are less lookups to do?
@mark.needham: (reason I ask is that the query time is 10x more with the lookup than without)
@jackie.jxt: @mark.needham Yes you are right, the lookup is performed on a per row basis because it is currently modeled as transform. Can you please file an issue for the optimization of deferring the lookup? Also add the feature contributor: @canerbalci
@hamoop: @hamoop has joined the channel
@prasanna.gsl: @prasanna.gsl has joined the channel
@keweishang: Hi team, the query to return the the earliest row’s timestamp `select DATETIMECONVERT(MIN(created), '1:MILLISECONDS:EPOCH', '1:SECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss', '1:SECONDS') as min_created from delivery_order limit 1` failed with the following error (in slack thread). The `created` column is of type: ```{ "name": "created", "dataType": "LONG", "format" : "1:MILLISECONDS:EPOCH", "granularity": "1:MILLISECONDS" }``` Interestingly, the query `select DATETIMECONVERT(created, '1:MILLISECONDS:EPOCH', '1:SECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss', '1:SECONDS') as min_created from delivery_order limit 1` without `MIN()` works fine. May I have some advice? Thanks.
@keweishang: ```ProcessingException(errorCode:450, message:InternalError: java.io.IOException: Failed : HTTP error code : 500 at org.apache.pinot.controller.api.resources.PinotQueryResource.sendPostRaw(PinotQueryResource.java:302) at org.apache.pinot.controller.api.resources.PinotQueryResource.sendRequestRaw(PinotQueryResource.java:340) at org.apache.pinot.controller.api.resources.PinotQueryResource.getQueryResponse(PinotQueryResource.java:222) at org.apache.pinot.controller.api.resources.PinotQueryResource.handlePostSql(PinotQueryResource.java:137) at sun.reflect.GeneratedMethodAccessor117.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:52) at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:124) at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:167) at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:219) at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:79) at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:469) at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:391))```
@qianbo.wang: @qianbo.wang has joined the channel
@steotia: Hi All, we published the blog post today that I had referred to in yesterday's talk
@nishanth: @nishanth has joined the channel
@jai.patel856: Pinot Upsert Question: Upsert is supported only for realtime tables. That’s fine. The time column is use to determine the order of the updates to choose the latest one. What time is used to determine when to evict a row (visible or not). The documents tend to point to segment age to determine when to evict messages. In practice it seems to evict based on when the row was actually imported. What’s the expected behavior for a realtime (upsert) table?
@jackie.jxt: If a row is not updated by another row with newer timestamp, then it will expire along with the segment containing it. The segment is expired based on the latest timestamp within the segment and the retention config
@jai.patel856: We currently have a convention where our rows are versioned with a number. We’re using this as our time column. Part of the reasoning for this is to ensure in the case we reprocess our flink job we won’t overwrite rows in Pinot with old data. But the ordering and the retention are both controlled by the time column, correct? Is there a good mechanism to control ordering more directly rather than relying ont he same time column ued for retention?
@jackie.jxt: IIUC, the requirement is the same as this issue: ?
@jackie.jxt: Currently it is not supported yet
@ken: For an offline (batch-generated) table, if I don’t specify a `segmentIngestionFrequency`, then are `APPEND` and `REFRESH` values for `segmentIngestionType` essentially equivalent?
@mayanks: Is this a hybrid table? Not specifying the frequency might mess up time boundary depending on time unit.
@ken: Just OFFLINE
@ken: I guess the meta-question is what happens if I create a new version of an existing segment file for an offline table, and do a metadata push. I’m assuming that’s a refresh, and Pinot will correctly handle that.
@mayanks: Another place it is used is for interval check for validation.
@mayanks: Even for APPEND table, you can refersh any segment at any time
@mayanks: That is how backfill works
@ken: So if I’ve got an offline table segment that I update on a daily basis, what’s the recommended settings? use `REFRESH` with a `segmentIngestionFrequency` of 1 day?
@mayanks: REFRESH is typically used for full refresh of data. These tables typically don't have a time column. If either one is not true for you, you might be ok just with APPEND
@ken: And what guarantees does Pinot provide (if any) for what happens to queries that are executing when an updated segment is being reloaded?
@mayanks: Single segment update is atomic. As in a query will either see old or new segment, not a partially updated segment.
@mayanks: If you are refreshing a bunch of segments, then you can have a situation where some segments are refreshed and others are not
@ken: Thanks, good to know.
@ken: Though I’m still curious about the meaning of `segmentIngestionFrequency` for an OFFLINE table. Why does Pinot care if I update every day or every week?
@mayanks: It is only used in two places (based on what I see with a quick grep of code): ```1. Time boundary (only applies to hybrid table). 2. There are checks that ensure data is pushed as expected (for operational monitoring).```
@ken: OK, guess I need to dig into the operational monitoring stuff more - thanks
@mayanks: Yeah, think of this situation - Your user of Pinot thinks data is being pushed to it daily, but their pipeline has been failing (and they didn't notice it). The first thing they would do is ask the question - "Why is Pinot not showing my latest data?" We build some checks to ensure we can automatically detect this situation.

#random

@hashhar: @hashhar has joined the channel
@hamoop: @hamoop has joined the channel
@prasanna.gsl: @prasanna.gsl has joined the channel
@qianbo.wang: @qianbo.wang has joined the channel
@nishanth: @nishanth has joined the channel

#troubleshooting

@e-ramirez: Hi, I am evaluating Pinot for possible production use in my company. I am encountering problem on `back up/restore` feature. I appreciate if anyone can help. Here is my setup. Kubernetes: EKS 1.20.4 Pinot version: 0.7.1 So I enable S3 as deep storage . Then ingested Parquet data from S3 . Data loaded fine and I can query the expected data from Pinot. Next I simulated replacing the cluster, by uninstalling all pods and its related volumes(therefore losing all state) but kept the segment files in s3 segment location(therefore backup is intact in deep store). Next I reinstalled cluster, and reconfigured the tables. I was expecting that the servers would automatically fetch the segments from deep store as mentioned in previous post, but it does not seem to be happening. Am i missing a step? Thanks in advance.
@g.kishore: You cannot undeploy zookeeper
@g.kishore: Zookeeper stores the metadata/list of segments
@e-ramirez: Thank for the reply. May I know what should be the steps in case I have to replace the cluster? Should I keep a backup of zookeeper and restore it to the new cluster?
@g.kishore: Yes
@g.kishore: Or upload all the segments again to new cluster using upload api call
@g.kishore: It can be simple script over the segments in S3
@e-ramirez: Got it. Looking at the `UploadSegment` command, the parameter `segmentDir` requires a local path. This means i have to download the segments first to upload. Is there a way to use the previous cluster’s s3 segment path as source location to new cluster s3 segment path upload?
@g.kishore: Use uri based or metadata based push
@e-ramirez: awesome. Thanks. I will try this.
@mayanks: Do you have realtime component as well?
@e-ramirez: I can think of several use-cases where Pinot might be useful to us. • As a main backend of our analytics dashboard. Currently we are using Druid, GreenPlum, Tidb etc, but each one have drawbacks • As one of data sources for our Machine Learning Jobs. Currently we are using Athena or direct files from S3, but Athena have upper bound throughput while S3 file is too limited. • As a backend sink of Kafka to complement our real time prediction in production serving.
@mayanks: Yeah, these sound like great use cases for Pinot. We are here to help you use Pinot successfully for these.
@hashhar: @hashhar has joined the channel
@jainendra1607tarun: Hello everyone, I am running Presto to query Pinot and the presto-pinot connector throws an exception when there is no data returned by Pinot. Example query is : ```select * from pinot.default.mytable where where datekey='2021-04-19 00:00:00' limit 10``` Though this query returns empty result in Pinot as expected. The exception in presto is : ```java.lang.IllegalStateException: Expected at least one row to be present at com.google.common.base.Preconditions.checkState(Preconditions.java:507) at com.facebook.presto.pinot.PinotBrokerPageSourceSql.populateFromQueryResults(PinotBrokerPageSourceSql.java:118) at com.facebook.presto.pinot.PinotBrokerPageSourceBase.lambda$issueQueryAndPopulate$0(PinotBrokerPageSourceBase.java:327) at com.facebook.presto.pinot.PinotUtils.doWithRetries(PinotUtils.java:39) at com.facebook.presto.pinot.PinotBrokerPageSourceBase.issueQueryAndPopulate(PinotBrokerPageSourceBase.java:312) at com.facebook.presto.pinot.PinotBrokerPageSourceBase.getNextPage(PinotBrokerPageSourceBase.java:222) at com.facebook.presto.operator.TableScanOperator.getOutput(TableScanOperator.java:252) at com.facebook.presto.operator.Driver.processInternal(Driver.java:418) at com.facebook.presto.operator.Driver.lambda$processFor$9(Driver.java:301) at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:722) at com.facebook.presto.operator.Driver.processFor(Driver.java:294) at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1077) at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162) at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:599) at com.facebook.presto.$gen.Presto_0_256_SNAPSHOT_5059796____20210616_162510_1.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)``` Is this a bug or am I missing some configuration ?
@fx19880617: This is a bug that @dharakkharod is working on.
@fx19880617: we should have it fixed soon
@patidar.rahul8392: @ Hi Team, I am ingesting realtime data from Kafka and updating realtime data in superset dashboard. In superset dashboard I have one slice where I am displaying events of last 5 min based on my timestamp columns. So for testing purpose I have pushed one event in Kafka which was already available ( duplicate) as soon as pushed the data in Kafka it's showing on pinot within milisec.but same it's not reflecting at dashboard side as last 5 mins count. So my question is. Will it take sometime to reflect at dashboard side ?or duplicates records will not show as last 5 mins. Count at Dashboard. @mayanks
@mayanks: can you check what query the dashboard is firing to Pinot, and compare it with query you used to verify that event is in Pinot?
@patidar.rahul8392:
@patidar.rahul8392: @mayanks
@mayanks: can you manually run superset query directly on pinot?
@patidar.rahul8392: Okay
@mayanks: my guess is the superset query is filtering the second record. you'll need to then compare the rows with the predicate on why that is happening
@patidar.rahul8392: my bad @mayanks . I am pushing events whose current_timestamp is older than 5 min and in superset giving interval as 5 min .so might be this is the issue.
@mayanks: :+1:
@hamoop: @hamoop has joined the channel
@dharakkharod: @dharakkharod has joined the channel
@prasanna.gsl: @prasanna.gsl has joined the channel
@mateus.oliveira: Hello team, need helo with something, I'm trying to load some data from S3 bucket into Pinot but is give me this error ```Trying to create instance for class org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner Initializing PinotFS for scheme s3, classname org.apache.pinot.plugin.filesystem.S3PinotFS Creating an executor service with 1 threads(Job parallelism: 0, available cores: 1.) Listed 8 files from URI: , is recursive: true Got exception to kick off standalone data ingestion job - java.lang.RuntimeException: Caught exception during running - org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.kickoffIngestionJob(IngestionJobLauncher.java:144) ~[pinot-all-0.8.0-SNAPSHOT-jar-with-dependencies.jar:0.8.0-SNAPSHOT-2de40fde8051c2c0281416c2da11c179c2190435] at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.runIngestionJob(IngestionJobLauncher.java:113) ~[pinot-all-0.8.0-SNAPSHOT-jar-with-dependencies.jar:0.8.0-SNAPSHOT-2de40fde8051c2c0281416c2da11c179c2190435] at org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.execute(LaunchDataIngestionJobCommand.java:132) [pinot-all-0.8.0-SNAPSHOT-jar-with-dependencies.jar:0.8.0-SNAPSHOT-2de40fde8051c2c0281416c2da11c179c2190435] at org.apache.pinot.tools.admin.PinotAdministrator.execute(PinotAdministrator.java:166) [pinot-all-0.8.0-SNAPSHOT-jar-with-dependencies.jar:0.8.0-SNAPSHOT-2de40fde8051c2c0281416c2da11c179c2190435] at org.apache.pinot.tools.admin.PinotAdministrator.main(PinotAdministrator.java:186) [pinot-all-0.8.0-SNAPSHOT-jar-with-dependencies.jar:0.8.0-SNAPSHOT-2de40fde8051c2c0281416c2da11c179c2190435] Caused by: java.lang.IllegalArgumentException at sun.nio.fs.UnixFileSystem.getPathMatcher(UnixFileSystem.java:288) ~[?:1.8.0_292] at org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner.run(SegmentGenerationJobRunner.java:175) ~[pinot-batch-ingestion-standalone-0.8.0-SNAPSHOT-shaded.jar:0.8.0-SNAPSHOT-2de40fde8051c2c0281416c2da11c179c2190435] at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.kickoffIngestionJob(IngestionJobLauncher.java:142) ~[pinot-all-0.8.0-SNAPSHOT-jar-with-dependencies.jar:0.8.0-SNAPSHOT-2de40fde8051c2c0281416c2da11c179c2190435] ... 4 more``` this is my job ```executionFrameworkSpec: name: 'standalone' segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner' segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner' segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner' jobType: SegmentCreationAndTarPush inputDirURI: '' includeFileNamePattern: '*.json' outputDirURI: '' overwriteOutput: true pinotFSSpecs: - scheme: s3 className: org.apache.pinot.plugin.filesystem.S3PinotFS configs: region: 'us-east-1' endpoint: '' accessKey: 'access' secretKey: 'key' recordReaderSpec: dataFormat: 'json' className: 'org.apache.pinot.plugin.inputformat.json.JSONRecordReader' tableSpec: tableName: 'bank' pinotClusterSpecs: - controllerURI: ''```
@aaron: Try `includeFileNamePattern: 'glob:**/*.json'`
@mayanks: Yeah ^^. Seems it is failing here in the code: ``` if (_spec.getIncludeFileNamePattern() != null) { includeFilePathMatcher = FileSystems.getDefault().getPathMatcher(_spec.getIncludeFileNamePattern()); }```
@mateus.oliveira: not receive any error anymore, but he is not create segments
@mateus.oliveira: ```Trying to create instance for class org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner Initializing PinotFS for scheme s3, classname org.apache.pinot.plugin.filesystem.S3PinotFS Creating an executor service with 1 threads(Job parallelism: 0, available cores: 1.) Listed 8 files from URI: , is recursive: true Trying to create instance for class org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner Initializing PinotFS for scheme s3, classname org.apache.pinot.plugin.filesystem.S3PinotFS Listed 0 files from URI: , is recursive: true Start pushing segments: []... to locations: [org.apache.pinot.spi.ingestion.batch.spec.PinotClusterSpec@106cc338] for table bank```
@fx19880617: Can you try Aaron’s suggestion?
@fx19880617: Try `includeFileNamePattern: 'glob:**/*.json'`
@fx19880617: I feel the pattern doesn’t match any file
@mateus.oliveira: sure, I try and I have no more errors, but is not creating segments
@mateus.oliveira: can be, I will take a look into the files
@fx19880617: ic
@fx19880617: what’s your file names/paths?
@mateus.oliveira: ```bank_2021_5_19_11_33_43.json```
@fx19880617: hmm
@mateus.oliveira: he even reads the 8 files as log message shows but is weird
@fx19880617: have you set this ``` schemaURI: '' tableConfigURI: ''```
@fx19880617: under `tableSpec:`
@mateus.oliveira: no but I will do it now
@mateus.oliveira: nothing, dont create the segments and the table is empty, I will review the schema, maybe is something related with that
@mateus.oliveira: ```SegmentGenerationJobSpec: !!org.apache.pinot.spi.ingestion.batch.spec.SegmentGenerationJobSpec authToken: null cleanUpOutputDir: false excludeFileNamePattern: null executionFrameworkSpec: {extraConfigs: null, name: standalone, segmentGenerationJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner, segmentMetadataPushJobRunnerClassName: null, segmentTarPushJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner, segmentUriPushJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner} failOnEmptySegment: false includeFileNamePattern: glob:*.json inputDirURI: jobType: SegmentCreationAndTarPush outputDirURI: overwriteOutput: true pinotClusterSpecs: - {controllerURI: ''} pinotFSSpecs: - className: org.apache.pinot.plugin.filesystem.S3PinotFS configs: {region: us-east-1, endpoint: '', accessKey: YOURACCESSKEY, secretKey: YOURSECRETKEY} scheme: s3 pushJobSpec: null recordReaderSpec: {className: org.apache.pinot.plugin.inputformat.json.JSONRecordReader, configClassName: null, configs: null, dataFormat: json} segmentCreationJobParallelism: 0 segmentNameGeneratorSpec: null tableSpec: {schemaURI: '', tableConfigURI: '', tableName: bank} tlsSpec: null Trying to create instance for class org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner Initializing PinotFS for scheme s3, classname org.apache.pinot.plugin.filesystem.S3PinotFS Creating an executor service with 1 threads(Job parallelism: 0, available cores: 1.) Listed 8 files from URI: , is recursive: true Trying to create instance for class org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner Initializing PinotFS for scheme s3, classname org.apache.pinot.plugin.filesystem.S3PinotFS Listed 0 files from URI: , is recursive: true Start pushing segments: []... to locations: [org.apache.pinot.spi.ingestion.batch.spec.PinotClusterSpec@63f259c3] for table bank root@pinot-controller-0:/opt/pinot#```
@mateus.oliveira: the output of job execution
@fx19880617: hmm, ok
@fx19880617: ```includeFileNamePattern: glob:*.json```
@fx19880617: `'glob:**/*.json'`
@fx19880617: not `'glob:*.json'`
@mateus.oliveira: work @fx19880617! thanks you and @aaron for the help
@mayanks: @mateus.oliveira curious, was this a documentation issue (as in was it not clear enough)?
@mayanks: If so, any suggestions on how to improve it?
@mateus.oliveira: In this case was my mistake but if you guys could detail a little more the configs for example this part of pattern wasnt in the file document, at least not in the s3, maybe even repeat a little this info will be great, but besides was not a documentation problem, was my mistake
@mayanks: I see, thanks
@kulbir.nijjer: @mateus.oliveira btw endpoint is AWS S3 specific client config not Pinot controller address,so current setting is invalid (AWS SDK probably overriding it automatically based on region), u r probably fine not specifying it at all ```endpoint: ''``` In case u interested about valid values:
@fx19880617: it might be a different s3 compatible fs endpoint, like minio?
@kulbir.nijjer: Yes good pt, it can be depending on object backend that you are integrating with. Generally for AWS S3 access , its only needed for advanced use cases.
@qianbo.wang: @qianbo.wang has joined the channel
@nishanth: @nishanth has joined the channel
@nishanth: Hello
@chxing: Hi @jackie.jxt Can pinot support druid connection pool
@chxing: We want to use connection pool for java service
@jackie.jxt: It doesn’t support connection pool, but Pinot supports jdbc connector. @fx19880617 can you share more info about the jdbc connector?

#docs

@e-ramirez: @e-ramirez has joined the channel

#pinot-dev

@nishanth: @nishanth has joined the channel

#community

@nishanth: @nishanth has joined the channel

#announcements

@nishanth: @nishanth has joined the channel

#presto-pinot-streaming

@hashhar: @hashhar has joined the channel

#aggregate-metrics-change

@nishanth: @nishanth has joined the channel

#presto-pinot-connector

@hashhar: @hashhar has joined the channel

#getting-started

@mark.needham: @mark.needham has joined the channel
@hamoop: @hamoop has joined the channel

#debug_upsert

@e-ramirez: @e-ramirez has joined the channel

#pinot-docsrus

@e-ramirez: @e-ramirez has joined the channel
--------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org For additional commands, e-mail: dev-h...@pinot.apache.org

Apache Pinot Daily Email Digest (2021-06-16)

#general

#random

#troubleshooting

#docs

#pinot-dev

#community

#announcements

#presto-pinot-streaming

#aggregate-metrics-change

#presto-pinot-connector

#getting-started

#debug_upsert

#pinot-docsrus

Reply via email to