mach-kernel opened a new pull request, #1333:
URL: https://github.com/apache/datafusion-ballista/pull/1333
### Which issue does this PR close?
TBD
### Rationale for this change
A Ballista scheduler may have tables in its catalog that the client is
unaware of. Currently, `show tables` only produces what is available in the
client's local session context. This change allows the client to plan queries
against schemas available in the Ballista cluster but not the client.
### What changes are included in this PR?
###### ballista
- New protos to represent the hierarchy of catalog -> schema -> table and
the custom table provider.
- Scheduler RPC to pack the catalog names into protos
- `SessionContextExt::remote*` calls scheduler and loads tables using
`MemoryCatalogProvider`
- A custom `RemoteTableProvider` that encapsulates names (catalog, schema,
table), and a concrete Arrow schema
- `RemoteTableProvider::scan` errors -- we should never try to actually
scan it.
- `BallistaLogicalExtensionCodec` treats `RemoteTableProvider` as a special
case
- Encode just packs into the table provider proto representation
- Decode makes a concrete table reference and attempts to resolve it in
the remote context
###### ballista-python
- Add to workspace to consume new client changes
### Are there any user-facing changes?
`ballista-cli` can now run queries against cluster tables!
```bash
Ballista CLI v49.0.0
❯ show tables;
+---------------+--------------------+---------------+------------+
| table_catalog | table_schema | table_name | table_type |
+---------------+--------------------+---------------+------------+
| spice | runtime | task_history | BASE TABLE |
| spice | public | wiki_a_potion | BASE TABLE |
| spice | information_schema | tables | VIEW |
| spice | information_schema | views | VIEW |
| spice | information_schema | columns | VIEW |
| spice | information_schema | df_settings | VIEW |
| spice | information_schema | schemata | VIEW |
| spice | information_schema | routines | VIEW |
| spice | information_schema | parameters | VIEW |
| datafusion | information_schema | tables | VIEW |
| datafusion | information_schema | views | VIEW |
| datafusion | information_schema | columns | VIEW |
| datafusion | information_schema | df_settings | VIEW |
| datafusion | information_schema | schemata | VIEW |
| datafusion | information_schema | routines | VIEW |
| datafusion | information_schema | parameters | VIEW |
+---------------+--------------------+---------------+------------+
18446744073709551615 row(s) fetched.
Elapsed 0.016 seconds.
❯ select id, title from spice.public.wiki_a_potion limit 2;
+----------+----------+
| id | title |
+----------+----------+
| 4549720 | Al Salvi |
| 36963017 | Al Samha |
+----------+----------+
18446744073709551615 row(s) fetched.
Elapsed 0.777 seconds.
```
And with the Python client, in notebooks too!!
<img width="1084" height="834" alt="image"
src="https://github.com/user-attachments/assets/939e863e-a9e8-4f41-96c4-3386be9efc85"
/>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]