(datafusion) branch main updated: Add Kubeflow Trainer to known users (#18935)

github-bot Tue, 25 Nov 2025 13:33:11 -0800

This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion.git



The following commit(s) were added to refs/heads/main by this push:
     new 0871741025 Add Kubeflow Trainer to known users (#18935)
0871741025 is described below

commit 087174102593d20f8eeac425704529f5ba4e0b06
Author: Andrey Velichkevich <[email protected]>
AuthorDate: Tue Nov 25 21:32:36 2025 +0000

    Add Kubeflow Trainer to known users (#18935)
    
    ## Which issue does this PR close?
    
    No issue
    
    ## Rationale for this change
    
    Hi Folks, thanks for driving DataFusion forward!
    
    We've recently released support for distributed data cache in [Kubeflow
    Trainer.](https://github.com/kubeflow/trainer)
    It allows users to stream massive datasets directly to distributed
    training nodes and optimize GPU utilization.
    
    Docs and public talks are available in this guide:
    https://www.kubeflow.org/docs/components/trainer/user-guides/data-cache/
    
    I've updated the DataFusion known users with that.
    
    cc @akshaychitneni @bigsur0 @comphead @andygrove
    ## What changes are included in this PR?
    
    Update DataFusion docs to include Kubeflow Trainer.
    
    ## Are these changes tested?
    
    <!--
    We typically require tests for all PRs in order to:
    1. Prevent the code from being accidentally broken by subsequent changes
    2. Serve as another way to document the expected behavior of the code
    
    If tests are not included in your PR, please explain why (for example,
    are they covered by existing tests)?
    -->
    
    ## Are there any user-facing changes?
    
    <!--
    If there are user-facing changes then we may require documentation to be
    updated before approving the PR.
    -->
    
    <!--
    If there are any breaking changes to public APIs, please add the `api
    change` label.
    -->
---
 docs/source/user-guide/introduction.md | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/docs/source/user-guide/introduction.md 
b/docs/source/user-guide/introduction.md
index 778562d55f..66076e6b73 100644
--- a/docs/source/user-guide/introduction.md
+++ b/docs/source/user-guide/introduction.md
@@ -82,6 +82,7 @@ Here are some example systems built using DataFusion:
 - Streaming data platforms such as [Synnada]
 - Tools for reading / sorting / transcoding Parquet, CSV, AVRO, and JSON files 
such as [qv]
 - Native Spark runtime replacement such as [Auron]
+- Distributed data cache to boost GPU utilization of AI workloads with 
[Kubeflow 
Trainer](https://www.kubeflow.org/docs/components/trainer/user-guides/data-cache/)
 
 By using DataFusion, projects are freed to focus on their specific
 features, and avoid reimplementing general (but still necessary)
@@ -114,6 +115,8 @@ Here are some active projects using DataFusion:
 - [Iceberg-rust](https://github.com/apache/iceberg-rust) Rust implementation 
of Apache Iceberg
 - [InfluxDB] Time Series Database
 - [Kamu] Planet-scale streaming data pipeline
+- [Kubeflow Trainer](https://github.com/kubeflow/trainer) Kubernetes-native 
project designed for
+  scalable LLMs fine-tuning and distributed AI model training.
 - [LakeSoul](https://github.com/lakesoul-io/LakeSoul) Open source LakeHouse 
framework with native IO in Rust.
 - [Lance](https://github.com/lancedb/lance) Modern columnar data format for ML
 - [OpenObserve] Distributed cloud native observability platform


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(datafusion) branch main updated: Add Kubeflow Trainer to known users (#18935)

Reply via email to