This is an automated email from the ASF dual-hosted git repository.
github-bot pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion.git
The following commit(s) were added to refs/heads/main by this push:
new 0871741025 Add Kubeflow Trainer to known users (#18935)
0871741025 is described below
commit 087174102593d20f8eeac425704529f5ba4e0b06
Author: Andrey Velichkevich <[email protected]>
AuthorDate: Tue Nov 25 21:32:36 2025 +0000
Add Kubeflow Trainer to known users (#18935)
## Which issue does this PR close?
No issue
## Rationale for this change
Hi Folks, thanks for driving DataFusion forward!
We've recently released support for distributed data cache in [Kubeflow
Trainer.](https://github.com/kubeflow/trainer)
It allows users to stream massive datasets directly to distributed
training nodes and optimize GPU utilization.
Docs and public talks are available in this guide:
https://www.kubeflow.org/docs/components/trainer/user-guides/data-cache/
I've updated the DataFusion known users with that.
cc @akshaychitneni @bigsur0 @comphead @andygrove
## What changes are included in this PR?
Update DataFusion docs to include Kubeflow Trainer.
## Are these changes tested?
<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code
If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->
## Are there any user-facing changes?
<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->
<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
---
docs/source/user-guide/introduction.md | 3 +++
1 file changed, 3 insertions(+)
diff --git a/docs/source/user-guide/introduction.md
b/docs/source/user-guide/introduction.md
index 778562d55f..66076e6b73 100644
--- a/docs/source/user-guide/introduction.md
+++ b/docs/source/user-guide/introduction.md
@@ -82,6 +82,7 @@ Here are some example systems built using DataFusion:
- Streaming data platforms such as [Synnada]
- Tools for reading / sorting / transcoding Parquet, CSV, AVRO, and JSON files
such as [qv]
- Native Spark runtime replacement such as [Auron]
+- Distributed data cache to boost GPU utilization of AI workloads with
[Kubeflow
Trainer](https://www.kubeflow.org/docs/components/trainer/user-guides/data-cache/)
By using DataFusion, projects are freed to focus on their specific
features, and avoid reimplementing general (but still necessary)
@@ -114,6 +115,8 @@ Here are some active projects using DataFusion:
- [Iceberg-rust](https://github.com/apache/iceberg-rust) Rust implementation
of Apache Iceberg
- [InfluxDB] Time Series Database
- [Kamu] Planet-scale streaming data pipeline
+- [Kubeflow Trainer](https://github.com/kubeflow/trainer) Kubernetes-native
project designed for
+ scalable LLMs fine-tuning and distributed AI model training.
- [LakeSoul](https://github.com/lakesoul-io/LakeSoul) Open source LakeHouse
framework with native IO in Rust.
- [Lance](https://github.com/lancedb/lance) Modern columnar data format for ML
- [OpenObserve] Distributed cloud native observability platform
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]