Apache Pinot Daily Email Digest (2021-02-18)

Pinot Slack Email Digest Thu, 18 Feb 2021 18:00:37 -0800

#general

@jungmwiner: @jungmwiner has joined the channel
@yg.y: @yg.y has joined the channel
@karinwolok1: In case you missed yesterday's Upsert & JSON Indexing in Pinot meetup (by speakers @yupeng and @jackie.jxt). Here's the recording! Also! If you're using Pinot :wine_glass: and interested in presenting in a future meetup, please DM me! :slightly_smiling_face:
@gbahrani: @gbahrani has joined the channel
@write2agb: @write2agb has joined the channel
@ray: @ray has joined the channel
@jdelmerico: @jdelmerico has joined the channel
@renegomezlondono: @renegomezlondono has joined the channel
@demetrius.williams.pm: @demetrius.williams.pm has joined the channel
@atadesoba: @atadesoba has joined the channel
@atadesoba: :wave: I’m here! What’d I miss?
@vananth22:
@paulwelch: @paulwelch has joined the channel

#random

@jungmwiner: @jungmwiner has joined the channel
@yg.y: @yg.y has joined the channel
@gbahrani: @gbahrani has joined the channel
@write2agb: @write2agb has joined the channel
@ray: @ray has joined the channel
@jdelmerico: @jdelmerico has joined the channel
@renegomezlondono: @renegomezlondono has joined the channel
@demetrius.williams.pm: @demetrius.williams.pm has joined the channel
@atadesoba: @atadesoba has joined the channel
@paulwelch: @paulwelch has joined the channel

#troubleshooting

@jungmwiner: @jungmwiner has joined the channel
@jungmwiner: Hello~ When thirdeye is executed using helm, if the following message is displayed, it does not work. The same problem occurs when using the master branch, 0.6.0 release branch. ------------------------------------------------ ------------------------------------------------ Running Thirdeye frontend config: ./config/pinot-quickstart log4j:WARN No appenders could be found for logger (org.apache.pinot.thirdeye.dashboard.ThirdEyeDashboardApplication). log4j:WARN Please initialize the log4j system properly. [2021-02-18 12:25:39] INFO [main] o.h.v.i.u.Version - HV000001: Hibernate Validator null io.dropwizard.configuration.ConfigurationParsingException: ./config/pinot-quickstart/dashboard.yml has an error: * Failed to parse configuration at: logging; Cannot construct instance of `io.dropwizard.logging.DefaultLoggingFactory`, problem: Unable to acquire the logger context at [Source: UNKNOWN; line: -1, column: -1] (through reference chain: org.apache.pinot.thirdeye.dashboard.ThirdEyeDashboardConfiguration["logging"]) at io.dropwizard.configuration.ConfigurationParsingException$Builder.build(ConfigurationParsingException.java:279) at io.dropwizard.configuration.BaseConfigurationFactory.build(BaseConfigurationFactory.java:156) at io.dropwizard.configuration.BaseConfigurationFactory.build(BaseConfigurationFactory.java:89) at io.dropwizard.cli.ConfiguredCommand.parseConfiguration(ConfiguredCommand.java:126) at io.dropwizard.cli.ConfiguredCommand.run(ConfiguredCommand.java:74) at io.dropwizard.cli.Cli.run(Cli.java:78) at io.dropwizard.Application.run(Application.java:93) at org.apache.pinot.thirdeye.dashboard.ThirdEyeDashboardApplication.main(ThirdEyeDashboardApplication.java:200) Caused by: com.fasterxml.jackson.databind.exc.ValueInstantiationException: Cannot construct instance of `io.dropwizard.logging.DefaultLoggingFactory`, problem: Unable to acquire the logger context at [Source: UNKNOWN; line: -1, column: -1] (through reference chain: org.apache.pinot.thirdeye.dashboard.ThirdEyeDashboardConfiguration["logging"]) at com.fasterxml.jackson.databind.exc.ValueInstantiationException.from(ValueInstantiationException.java:47) at com.fasterxml.jackson.databind.DeserializationContext.instantiationException(DeserializationContext.java:1732) at com.fasterxml.jackson.databind.deser.std.StdValueInstantiator.wrapAsJsonMappingException(StdValueInstantiator.java:491) at com.fasterxml.jackson.databind.deser.std.StdValueInstantiator.rewrapCtorProblem(StdValueInstantiator.java:514) at com.fasterxml.jackson.module.afterburner.deser.OptimizedValueInstantiator._handleInstantiationProblem(OptimizedValueInstantiator.java:59) at io.dropwizard.logging.DefaultLoggingFactory$Creator4JacksonDeserializer53fd30f2.createUsingDefault(io/dropwizard/logging/DefaultLoggingFactory$Creator4JacksonDeserializer.java) at com.fasterxml.jackson.databind.deser.BeanDeserializer.vanillaDeserialize(BeanDeserializer.java:277) at com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeOther(BeanDeserializer.java:189) at com.fasterxml.jackson.module.afterburner.deser.SuperSonicBeanDeserializer.deserialize(SuperSonicBeanDeserializer.java:120) at com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer._deserializeTypedUsingDefaultImpl(AsPropertyTypeDeserializer.java:178) at com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer.deserializeTypedFromObject(AsPropertyTypeDeserializer.java:105) at com.fasterxml.jackson.databind.deser.AbstractDeserializer.deserializeWithType(AbstractDeserializer.java:254) at com.fasterxml.jackson.databind.deser.impl.MethodProperty.deserializeAndSet(MethodProperty.java:138) at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:252) at com.fasterxml.jackson.module.afterburner.deser.SuperSonicBeanDeserializer.deserialize(SuperSonicBeanDeserializer.java:155) at com.fasterxml.jackson.databind.ObjectMapper._readValue(ObjectMapper.java:4173) at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2467) at io.dropwizard.configuration.BaseConfigurationFactory.build(BaseConfigurationFactory.java:127) ... 6 more Caused by: java.lang.IllegalStateException: Unable to acquire the logger context at io.dropwizard.logging.LoggingUtil.getLoggerContext(LoggingUtil.java:46) at io.dropwizard.logging.DefaultLoggingFactory.<init>(DefaultLoggingFactory.java:77) ... 19 more ------------------------------------------------ ------------------------------------------------ When I analyzed the problem, it seems to be a logging-related issue, but I do not know how to fix it. Can I get guidance on how to fix it?
@g.kishore: @pyne.suvodeep ^^
@pyne.suvodeep: Hi @jungmwiner Just to understand the steps here. So if you run off the master, you are running into this issue?
@yg.y: @yg.y has joined the channel
@gbahrani: @gbahrani has joined the channel
@pabraham.usa: Is there a way to spread the replicas per partition to different AZs? I would like the replicas to be in a different host on different AZ for HA.
@bowlesns: You’re using the helm chart I assume? What cloud provider?
@ssubrama: Maybe you should also chime in on ?
@pabraham.usa: I am using AWS
@pabraham.usa: Looks like the PR is what I need, seems not much progress though
@ssubrama: What is needed is a generic Pinot mechanism to support the criteria that each replica should be in a separate <insert your cloud-dependent term here>. This requires pinot to interact with the cloud APIs to fetch/parse/store some information in a cloud-independent fashion.
@bowlesns: If you’re using k8s, I think this can be solved on the k8s/cloud provider side by doing this: • Set nodegroups for server to be multi AZ • Use labels/podAntiAffinity to keep server pods from being colocated OR set it up as a daemonset so there will always be 1 server pod per node • Set your replications depending on # of nodes (or if there is a way to dynamically set this)
@bowlesns: It’s hacky but might work to achieve HA
@pabraham.usa: The third point might not work as I am planning to have 9 instances. 3 in each AZ. There is a chance that all replicas end up in same AZ.
@pabraham.usa: @ssubrama I can label servers based on AZ. So all Pino have to ensure is to not create replica in same label server.
@ssubrama: Can you add these ideas to the issue? thanks. A solution we come up with should be cloud generic.
@bowlesns: I’ll add some things in there in a bit and think through it some more. Something is blowing up at work right now and I’m the only person who can do SRE :sweat_smile:
@bowlesns: Hey team I created a table with this in it to attempt to use the `minion` component to ingest data. When doing a POST at tasks/schedule, it looks like the minions are doing something (talks about using AVRO in logs) but they’ll either just hang, or error out. Any insights? I also made these changes: controller.task.scheduler.enabled=true minion config: ``` pinot.set.instance.id.to.hostname=true =org.apache.pinot.plugin.filesystem.GcsPinotFS pinot.minion.storage.factory.gs.projectId=REDACTED pinot.minion.storage.factory.gs.gcpKey=REDACTED pinot.minion.segment.fetcher.protocols=file,http,gs pinot.minion.segment.fetcher.gs.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher plugins.include=pinot-gcs``` Added auth key to controller, server, and minion (auth worked before ssh’ing into server and running a job)
@dlavoie: You can monitor CPU activity on the minion worker. Also, `pinotMinion.log` has more verbose logs. How big are the files you are ingesting?
@bowlesns: The CPU goes up on the minions when I launch the jobs but no output after a few hours. Files range but the max is right under a gig. Each minion pod has 1 cpu and 5g, and java mem settings are -Xms1G -Xmx4G
@bowlesns: I also tried to kill the task and kick off a new one but it doesn’t like that. Have a few logs just combing through them to see what’s valuable.
@dlavoie: If you are running a pod, get in the minion pod and look at the logs of `pinotMinion.log` in the home dir
@dlavoie: Why have you configured an outputDirURI on the batchconfig ?
@bowlesns: To store the segments in deep storage?
@dlavoie: The controller will do that on its own
@bowlesns: ahh ok, was porting over what I had from the job
@bowlesns: I’m looking in the logs right now one second please
@dlavoie: you config is also missing `"includeFileNamePattern": "glob:**/*.gz",`
@bowlesns: I’m grabbing all files in that dir does that matter?
@dlavoie: Try “includeFileNamePattern”: “glob:*.gz” then
@dlavoie: How many files are there?
@bowlesns: ~150
@dlavoie: Ok, I don’t see anything unreasonable with what you described.
@bowlesns: let me add that, purge the log, and then kick it off and tail
@dlavoie: Remote the outputdiruri too
@dlavoie: Have you configured deepstore on the controller and server?
@bowlesns: I have and they have been writing there fine. all are using same auth
@dlavoie: Ok, just try without outputdiruri. When minion uploads a segment to pinot, it will endup in deepstore thanks to the controller
@dlavoie: After that, all I have left for you to to make an healthcheck on all systems such as controler, servers and zookeeper. ensure their heap and offheap is fine.
@dlavoie: Also, are you sure that no segments are uploaded?
@bowlesns: There were some from a prior job but none had been added/modified. I just delete a couple segments to see if it’ll try
@bowlesns: when I do the post this is my response: {“SegmentGenerationAndPushTask”:null} And the controller logs this: 2021/02/18 18:07:26.436 WARN [ZKMetadataProvider] [grizzly-http-server-1] Path: /SEGMENTS/REDACTED_OFFLINE does not exist
@bowlesns: I’m in one of the minions and no logs in pod or from kubectl logs yet other than startup.
@bowlesns: not sure if I need to try another minion
@bowlesns: If I do a GET on tasks/SegmentGenerationAndPushTask/state the response is IN_PROGRESS
@bowlesns: none of the minion pods appear to have any spikes in cpu/mem utilization
@dlavoie: Can you go in the zookeeper explorer and look for the status of the sub tasks status here :
@dlavoie: Are you running the latest version of pinot? Make sure your pods are not running with `latest` and `IfNotPresent` as a pull-policy
@bowlesns: correct I changed that last night
@bowlesns: I have two tasks in there, one has many subtasks, status is TASK_ERROR. Also see this in PreviousResourceAssignment: ``` "TaskQueue_SegmentGenerationAndPushTask_Task_SegmentGenerationAndPushTask_1613626284771_99": { "Minion_pinot-minion-6.pinot-minion-headless.default.svc.cluster.local_9514": "DROPPED" }``` For the other task, this is the output of context: ```{ "id": "WorkflowContext", "simpleFields": { "NAME": "TaskQueue_SegmentGenerationAndPushTask", "START_TIME": "1613626265328", "STATE": "IN_PROGRESS" }, "mapFields": { "JOB_STATES": { "TaskQueue_SegmentGenerationAndPushTask_Task_SegmentGenerationAndPushTask_1613626284771": "COMPLETED" }, "StartTime": { "TaskQueue_SegmentGenerationAndPushTask_Task_SegmentGenerationAndPushTask_1613626284771": "1613626302656" } }, "listFields": {} }```
@bowlesns: Thanks again for your help :slightly_smiling_face:
@dlavoie: No errors on controller and server?
@bowlesns:
@bowlesns: server has no logs since last restart, controller just spit this out
@dlavoie: what about `pinotController.log` ?
@bowlesns: This is the only error, and corresponds to when I’m doing things in the controller UI’s zookeeper page: `2021/02/18 18:15:09.643 ERROR [ZkBaseDataAccessor] [grizzly-http-server-1] paths is null or empty`
@bowlesns: Rest of the logs corresponding to tasks
@bowlesns: If there are syntax errors etc that’s because I just edited for readability
@tamas.nadudvari: I’m probably way off here, but I noticed that the minion segment configurations aren’t prefixed with `pinot.minion`. Minion: Controller: Ended up with a configuration like this that works with a `RealtimeToOfflineSegmentsTask`: ``` pinot.minion.port=9514 storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS storage.factory.s3.region=my-region segment.fetcher.protocols=file,http,s3 segment.fetcher.s3.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher``` But I had a quite clear error message something about that the minion doesn’t have a class factory for `s3` scheme.
@bowlesns: That’s a great catch, I just assumed that’s what it was. Let me change that and give it a go!
@write2agb: @write2agb has joined the channel
@ray: @ray has joined the channel
@jdelmerico: @jdelmerico has joined the channel
@renegomezlondono: @renegomezlondono has joined the channel
@demetrius.williams.pm: @demetrius.williams.pm has joined the channel
@atadesoba: @atadesoba has joined the channel
@pabraham.usa: Hello, I set the controller config as per the documentation. However controller is not starting up and throwing error. ```controller.realtime.segment.validation.frequencyInSeconds=900 controller.broker.resource.validation.frequencyInSeconds=900 2021/02/18 14:46:44.389 ERROR [StartServiceManagerCommand] [main] Failed to start a Pinot [CONTROLLER] at 39.246 since launch java.lang.NumberFormatException: For input string: "[300, 900]" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) ~[?:1.8.0_282] at java.lang.Integer.parseInt(Integer.java:580) ~[?:1.8.0_282]```
@dlavoie: Can you provider a complete stack which will tell more context?
@pabraham.usa: This is the error sorry accidentally send before adding full log
@pabraham.usa: ``` 2021/02/18 14:46:44.389 ERROR [StartServiceManagerCommand] [main] Failed to start a Pinot [CONTROLLER] at 39.246 since launch java.lang.NumberFormatException: For input string: "[300, 900]" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) ~[?:1.8.0_282] at java.lang.Integer.parseInt(Integer.java:580) ~[?:1.8.0_282]```
@pabraham.usa: it did says frequencyInSeconds and it rejects 300/900
@dlavoie: I mean… don’t you have more logs?
@dlavoie: This stack trace doesn’t tell where the error is hapening
@fx19880617: what’s the controller config ? did you put any config with value `[300, 900]`?
@pabraham.usa: nope all I did was setting two properties
@pabraham.usa: ```controller.realtime.segment.validation.frequencyInSeconds=900 controller.broker.resource.validation.frequencyInSeconds=900```
@pabraham.usa: seems like Pinot internally doing this
@ken: I’m wondering if you have another (duplicate) setting in the config file for either of those two values, with a setting =300, and it builds a multi-value property setting.
@dlavoie: Indeed, the config framework of pinot appends duplicate properties
@fx19880617: This is a good catch, we should avoid this case.
@pabraham.usa: Ahh thats correct I have it defined at two places..!!! . I was somehow having kafka in mind which picks the last config.
@pabraham.usa: Thanks guys
@dlavoie: Kudos to @ken
@fx19880617: Can you also create a github issue so we can track and fix this
@fx19880617: Thanks @ken
@ken: You’re welcome - though I still owe about 50x in help that I’ve received from everyone while learning about Pinot :slightly_smiling_face:
@pabraham.usa: @fx19880617 - .
@fx19880617: Thanks!
@paulwelch: @paulwelch has joined the channel

#pinot-dev

@yg.y: @yg.y has joined the channel
@mayanks: Hi Team, a heads up in case your dashboards don't see Pinot controller metrics: there was a recent change in naming conventions for Pinot Controller metrics (`pinot_controller` -> `pinot_controller_`).

#community

@ray: @ray has joined the channel

#announcements

@yg.y: @yg.y has joined the channel
@ray: @ray has joined the channel

#thirdeye-pinot

@jungmwiner: @jungmwiner has joined the channel

#getting-started

@yg.y: @yg.y has joined the channel
--------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]