This is an automated email from the ASF dual-hosted git repository. github-bot pushed a commit to branch asf-staging in repository https://gitbox.apache.org/repos/asf/datafusion-site.git
The following commit(s) were added to refs/heads/asf-staging by this push: new 29b6d23 Commit build products 29b6d23 is described below commit 29b6d23ddc68b94bfa025d445fa2893a48002526 Author: Build Pelican (action) <priv...@infra.apache.org> AuthorDate: Wed Jul 30 11:35:18 2025 +0000 Commit build products --- blog/2025/07/29/metadata-handling/index.html | 19 +++++++++++++++++++ blog/feeds/all-en.atom.xml | 19 +++++++++++++++++++ blog/feeds/blog.atom.xml | 19 +++++++++++++++++++ .../tim-saucer-dewey-dunnington-andrew-lamb.atom.xml | 19 +++++++++++++++++++ 4 files changed, 76 insertions(+) diff --git a/blog/2025/07/29/metadata-handling/index.html b/blog/2025/07/29/metadata-handling/index.html index 9d59e70..fd4a0a7 100644 --- a/blog/2025/07/29/metadata-handling/index.html +++ b/blog/2025/07/29/metadata-handling/index.html @@ -222,6 +222,25 @@ be used thusly:</p> </code></pre> <p>The <a href="https://github.com/timsaucer/datafusion_extension_type_examples">example repository</a> also contains a crate that demonstrates how to expose these UDFs to <a href="https://datafusion.apache.org/python/">datafusion-python</a>. This requires version 48.0.0 or later.</p> +<h2>Other use cases</h2> +<p>The metadata attached to the fields can be used to store <em>any</em> user data in key/value +pairs. Some of the other use cases that have been identified include:</p> +<ul> +<li>Storing statistics data for a column. If you have a table provider that can produce + column level statistics, then you can write functions that take advantage of that + data.</li> +<li>Creating output for downstream systems. One user of DataFusion produces + <a href="https://rerun.io/blog/column-chunks">data visualizations</a> that are dependant upon metadata in record batch fields. By + enabling metadata on output of user defined functions, we can now produce batches + that are directly consumable by these systems.</li> +<li>Describe the relationships between columns of data. You can store data about how + one column of data relates to another and use these during function evaluation. For + example, in robotics it is common to use <a href="https://wiki.ros.org/tf2">transforms</a> to describe how to convert + from one coordinate system to another. It can be convenient to send the function + all of the columns that contain transform information and then allow the function + to determine which columns to use based on the metadata. This allows for + encapsulation of the transform logic within the user function.</li> +</ul> <h2>Thanks to our sponsor</h2> <p>We would like to thank <a href="https://rerun.io">Rerun.io</a> for sponsoring the development of this work. <a href="https://rerun.io">Rerun.io</a> is building a data visualization system for Physical AI and uses metadata to specify diff --git a/blog/feeds/all-en.atom.xml b/blog/feeds/all-en.atom.xml index 123d961..d69e41f 100644 --- a/blog/feeds/all-en.atom.xml +++ b/blog/feeds/all-en.atom.xml @@ -197,6 +197,25 @@ be used thusly:</p> </code></pre> <p>The <a href="https://github.com/timsaucer/datafusion_extension_type_examples">example repository</a> also contains a crate that demonstrates how to expose these UDFs to <a href="https://datafusion.apache.org/python/">datafusion-python</a>. This requires version 48.0.0 or later.</p> +<h2>Other use cases</h2> +<p>The metadata attached to the fields can be used to store <em>any</em> user data in key/value +pairs. Some of the other use cases that have been identified include:</p> +<ul> +<li>Storing statistics data for a column. If you have a table provider that can produce + column level statistics, then you can write functions that take advantage of that + data.</li> +<li>Creating output for downstream systems. One user of DataFusion produces + <a href="https://rerun.io/blog/column-chunks">data visualizations</a> that are dependant upon metadata in record batch fields. By + enabling metadata on output of user defined functions, we can now produce batches + that are directly consumable by these systems.</li> +<li>Describe the relationships between columns of data. You can store data about how + one column of data relates to another and use these during function evaluation. For + example, in robotics it is common to use <a href="https://wiki.ros.org/tf2">transforms</a> to describe how to convert + from one coordinate system to another. It can be convenient to send the function + all of the columns that contain transform information and then allow the function + to determine which columns to use based on the metadata. This allows for + encapsulation of the transform logic within the user function.</li> +</ul> <h2>Thanks to our sponsor</h2> <p>We would like to thank <a href="https://rerun.io">Rerun.io</a> for sponsoring the development of this work. <a href="https://rerun.io">Rerun.io</a> is building a data visualization system for Physical AI and uses metadata to specify diff --git a/blog/feeds/blog.atom.xml b/blog/feeds/blog.atom.xml index 42f82c1..f88ed7a 100644 --- a/blog/feeds/blog.atom.xml +++ b/blog/feeds/blog.atom.xml @@ -197,6 +197,25 @@ be used thusly:</p> </code></pre> <p>The <a href="https://github.com/timsaucer/datafusion_extension_type_examples">example repository</a> also contains a crate that demonstrates how to expose these UDFs to <a href="https://datafusion.apache.org/python/">datafusion-python</a>. This requires version 48.0.0 or later.</p> +<h2>Other use cases</h2> +<p>The metadata attached to the fields can be used to store <em>any</em> user data in key/value +pairs. Some of the other use cases that have been identified include:</p> +<ul> +<li>Storing statistics data for a column. If you have a table provider that can produce + column level statistics, then you can write functions that take advantage of that + data.</li> +<li>Creating output for downstream systems. One user of DataFusion produces + <a href="https://rerun.io/blog/column-chunks">data visualizations</a> that are dependant upon metadata in record batch fields. By + enabling metadata on output of user defined functions, we can now produce batches + that are directly consumable by these systems.</li> +<li>Describe the relationships between columns of data. You can store data about how + one column of data relates to another and use these during function evaluation. For + example, in robotics it is common to use <a href="https://wiki.ros.org/tf2">transforms</a> to describe how to convert + from one coordinate system to another. It can be convenient to send the function + all of the columns that contain transform information and then allow the function + to determine which columns to use based on the metadata. This allows for + encapsulation of the transform logic within the user function.</li> +</ul> <h2>Thanks to our sponsor</h2> <p>We would like to thank <a href="https://rerun.io">Rerun.io</a> for sponsoring the development of this work. <a href="https://rerun.io">Rerun.io</a> is building a data visualization system for Physical AI and uses metadata to specify diff --git a/blog/feeds/tim-saucer-dewey-dunnington-andrew-lamb.atom.xml b/blog/feeds/tim-saucer-dewey-dunnington-andrew-lamb.atom.xml index a550385..3ebd092 100644 --- a/blog/feeds/tim-saucer-dewey-dunnington-andrew-lamb.atom.xml +++ b/blog/feeds/tim-saucer-dewey-dunnington-andrew-lamb.atom.xml @@ -197,6 +197,25 @@ be used thusly:</p> </code></pre> <p>The <a href="https://github.com/timsaucer/datafusion_extension_type_examples">example repository</a> also contains a crate that demonstrates how to expose these UDFs to <a href="https://datafusion.apache.org/python/">datafusion-python</a>. This requires version 48.0.0 or later.</p> +<h2>Other use cases</h2> +<p>The metadata attached to the fields can be used to store <em>any</em> user data in key/value +pairs. Some of the other use cases that have been identified include:</p> +<ul> +<li>Storing statistics data for a column. If you have a table provider that can produce + column level statistics, then you can write functions that take advantage of that + data.</li> +<li>Creating output for downstream systems. One user of DataFusion produces + <a href="https://rerun.io/blog/column-chunks">data visualizations</a> that are dependant upon metadata in record batch fields. By + enabling metadata on output of user defined functions, we can now produce batches + that are directly consumable by these systems.</li> +<li>Describe the relationships between columns of data. You can store data about how + one column of data relates to another and use these during function evaluation. For + example, in robotics it is common to use <a href="https://wiki.ros.org/tf2">transforms</a> to describe how to convert + from one coordinate system to another. It can be convenient to send the function + all of the columns that contain transform information and then allow the function + to determine which columns to use based on the metadata. This allows for + encapsulation of the transform logic within the user function.</li> +</ul> <h2>Thanks to our sponsor</h2> <p>We would like to thank <a href="https://rerun.io">Rerun.io</a> for sponsoring the development of this work. <a href="https://rerun.io">Rerun.io</a> is building a data visualization system for Physical AI and uses metadata to specify --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org For additional commands, e-mail: commits-h...@datafusion.apache.org