#general


@nachiket.kate90: @nachiket.kate90 has joined the channel
@leslie.nyahwa: @leslie.nyahwa has joined the channel
@phillip.fleischer: @phillip.fleischer has joined the channel
@karinwolok1: :musical_note: If you love Pinot and you know it ~clap your.~.. vote for it on Data Council :notes: 2021 Community Survey: what open-source data projects are you most excited about? Take 20 seconds to answer now: And if you haven't done so yet, star the project :star: on GitHub
@ricardobordon: @ricardobordon has joined the channel

#random


@nachiket.kate90: @nachiket.kate90 has joined the channel
@leslie.nyahwa: @leslie.nyahwa has joined the channel
@phillip.fleischer: @phillip.fleischer has joined the channel
@ricardobordon: @ricardobordon has joined the channel

#troubleshooting


@bowlesns: That’s interesting thanks for sharing! I’m fine with using a single server instance right now just trying to figure out how to speed it up.
@dlavoie: Using this new mechanism you can have 1 worker generate 1 segment in parrallel per file ingested
@dlavoie: So it will scale horizontally based on how many files you have in a blob store
@bowlesns: Do you happen to have any example spec files you can share that are redacted?
@dlavoie: It auto manage ingestion scheduling and task distribution.
@bowlesns: I see some config options in
@dlavoie: Yup that’s the task. I think @npawar or @fx19880617 are working on preparing some docs.
@dlavoie: Say you have 2000 files, the task will generate a sub task for each file and schedule them to available minion workers.
@bowlesns: I’m also using the GCP (gs) plugin to pull in, so was going to see if that’ll install on a minion and if I could convert the jobspec to command line options and try that way
@dlavoie: Let me see if I can get an example
@bowlesns: Awesome thank you so much. I’ll try to do some config on the minion and see if I can get it working
@dlavoie: In your table config, you can configure something like this: ``` "ingestionConfig": { "batchIngestionConfig": { "segmentIngestionType": "APPEND", "segmentIngestionFrequency": "HOURLY", "batchConfigMaps":[ { "inputDirURI": "gs://<input root data dir>", "inputFormat": "json", "includeFileNamePattern": "glob:**/*.gz", "input.fs.className": "org.apache.pinot.plugin.filesystem.GcsPinotFS }] } }```
@dlavoie: then call the `/task/schedule` which will result in the task being triggered.
@bowlesns: that’s the endpoint on the controller correct?
@dlavoie: Yes
@bowlesns: oh awesome, that’s two birds with one stone then because I was trying to figure out the best way to kick off programmatically. Awesome thanks Daniel really appreciate it.
@dlavoie: It gets better
@dlavoie: The table also now support a cron schedule. ``` "task": { "taskTypeConfigsMap": { "SegmentGenerationAndPushTask": { "schedule" : "0 0 * ? * * *" } } }```
@dlavoie: This will trigger batch ingestion for any new files found in the configured input dir
@dlavoie: and filters the files already processed
@bowlesns: another thing I was going to have to solve for :stuck_out_tongue: was going to try and kick that off using Jenkins or a k8s cronjob
@dlavoie: That cron will require a config on the controller : `controller.task.scheduler.enabled=true`
@dlavoie: BTW, it’s important that your minion workers are configured with the required GcsPinotFS configs and credentials, just like the controller and server.
@bowlesns: Perfect thank you. I’m happy to help write some of that documentation. I know the documentation is I believe. Is there a list of any outstanding things like the minion that need to be added/priority?
  @fx19880617: Thanks Nick! I'm writing the task framework docs. will keep you updated
  @bowlesns: Thanks Xiang! Can also help with helm chart or k8s resources.
@nachiket.kate90: @nachiket.kate90 has joined the channel
@leslie.nyahwa: @leslie.nyahwa has joined the channel
@phillip.fleischer: @phillip.fleischer has joined the channel
@ricardobordon: @ricardobordon has joined the channel
--------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org For additional commands, e-mail: dev-h...@pinot.apache.org

Reply via email to