milenkovicm commented on PR #1351: URL: https://github.com/apache/datafusion-ballista/pull/1351#issuecomment-3748618528
> > thanks @sebbegg, > > just to clarify, we can have three configuration options: > > > > * proxy not configured, client needs to fetc data from executors > > * proxy configured, no ip address or port provided, scheduler needs to start proxy on the same port (withing process) > > * proxy configured ip/port provided, scheduler considers this as extenal process running proxy, it just needs to put that value in the response, scheduler will not start proxy. client needs to use that ip/port combination to connect to process > > If I get this right, the last variant would mean we don't need this block, right? > > https://github.com/sebbegg/datafusion-ballista/blob/5022263904c37d660bc77e3f5c065206b6720d20/ballista/scheduler/src/scheduler_process.rs#L202-L212 > yes we don't start in process proxy on a different port > How would you then start this external process? I guess we could add another crate/binary at `ballista/flight-proxy`? we can provide new library, or users create their own based on proxy you have created > Starting a cluster could then look like: > > * `./ballista-flight-proxy --bind-host localhost --bind-port 50040` > * `./ballista-scheduler --advertise-flight-sql-endpoint localhost:50040` > * `./ballista-executor --scheduler-host localhost --scheduler-port 50050` > > I guess it's smart because like this all services can be run independently. yes, we offload scheduler process from proxying data, and let it in charge of orchestration only > As far as I can tell all the scheduler-state is in-memory right? So in this setup we could e.g. not perform the check whether the requested data / executor-host is actually alive and belongs to the cluster. On the other hand, it would make the proxy stateless, which is probably a good thing. maybe we could relax this requirement, perhaps i should speak earlier. why do we need to check if executor is there? there is no corrective actions we can take. > I wonder though, whether it's worthwhile to add the possibility (and hence the complexity in the cli & protobuf) of running the flight-proxy "embedded" in the scheduler? I'm not sure i understand, we still have option to run it "embedded" ``` * `./ballista-scheduler --advertise-flight-sql-endpoint` ``` should listen "embedded". please let me know if i got you wrong -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
