thinkharderdev commented on code in PR #41: URL: https://github.com/apache/arrow-ballista/pull/41#discussion_r881469734
########## docs/developer/architecture.md: ########## @@ -22,11 +22,10 @@ ## Overview Ballista allows queries to be executed in a distributed cluster. A cluster consists of one or -more scheduler processes and one or more executor processes. See the following sections in this document for more -details about these components. +more scheduler processes and one or more executor processes. Review Comment: Strictly speaking we only support a single scheduler at the moment. But maybe we keep it like this since I hope we can fix that soon :) ########## docs/developer/architecture.md: ########## @@ -22,11 +22,10 @@ ## Overview Ballista allows queries to be executed in a distributed cluster. A cluster consists of one or -more scheduler processes and one or more executor processes. See the following sections in this document for more -details about these components. +more scheduler processes and one or more executor processes. The scheduler accepts logical query plans and translates them into physical query plans using DataFusion and then -runs a secondary planning/optimization process to translate the physical query plan into a distributed physical +runs a secondary planning process to translate the physical query plan into a _distributed_ physical Review Comment: Maybe a word here about how the DataFusion plan gets turned into a distributed plan? Something like "We get the distributed physical plan by replacing any operator in the DataFusion plan which performs a repartition with a stage boundary (i.e. a shuffle exchange)" ########## docs/developer/architecture.md: ########## @@ -76,14 +66,14 @@ The scheduler can run in standalone mode, or can be run in clustered mode using The executor process implements the Apache Arrow Flight gRPC interface and is responsible for: -- Executing query stages and persisting the results to disk in Apache Arrow IPC Format -- Making query stage results available as Flights so that they can be retrieved by other executors as well as by - clients +- Connecting to the scheduler and requesting tasks to execute +- Executing tasks within a query stage and persisting the results to disk in Apache Arrow IPC Format +- Making query stage output partitions available as "Flights" so that they can be retrieved by other executors as well + as by clients ## Rust Client -The Rust client provides a DataFrame API that is a thin wrapper around the DataFusion DataFrame and provides -the means for a client to build a query plan for execution. +The Rust client provides a `BallistaContext` that allows queries to be built using DataFrames or SQL (or both). The client executes the query plan by submitting an `ExecuteLogicalPlan` request to the scheduler and then calls Review Comment: `ExecuteLogicalPlan` -> `ExecuteQuery` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org