(arrow-site) branch asf-site updated: Updating built site

github-bot Tue, 26 May 2026 14:44:00 -0700

This is an automated email from the ASF dual-hosted git repository.

github-actions[bot] pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/arrow-site.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new 59559a82768 Updating built site
59559a82768 is described below

commit 59559a827687f8085df997070d6a2a8a5306fc58
Author: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
AuthorDate: Tue May 26 21:43:34 2026 +0000

    Updating built site
---
 blog/2025/01/10/arrow-result-transfer-japanese/index.html | 6 +++---
 blog/2025/01/10/arrow-result-transfer/index.html          | 6 +++---
 feed.xml                                                  | 2 +-
 release/index.html                                        | 6 +++---
 4 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/blog/2025/01/10/arrow-result-transfer-japanese/index.html 
b/blog/2025/01/10/arrow-result-transfer-japanese/index.html
index 12ee0ca7e05..d85b24ff2eb 100644
--- a/blog/2025/01/10/arrow-result-transfer-japanese/index.html
+++ b/blog/2025/01/10/arrow-result-transfer-japanese/index.html
@@ -299,7 +299,7 @@
   <img 
src="/img/arrow-result-transfer/part-1-figure-1-row-vs-column-layout.png" 
width="100%" class="img-responsive" 
alt="図1：5行3列のテーブルの物理メモリーレイアウトは行指向と列指向でどのように違うのか。">
   <figcaption>図1：5行3列のテーブルの物理メモリーレイアウトは行指向と列指向でどのように違うのか。</figcaption>
 </figure>
-<p>高性能な分析データベース・データウェアハウス・クエリーエンジン・ストレージシステムは列指向アーキテクチャーを採用することが多いです。これは、よく使われる分析クエリーを高速に実行するためです。最新の列指向クエリーシステムは、Amazon
 Redshift・Apache DataFusion・ClickHouse・Databricks Photon Engine・DuckDB・Google 
BigQuery・Microsoft Azure Synapse Analytics・OpenText Analytics Database 
(Vertica)・Snowflake・Voltron Data Theseusなどです。</p>
+<p>高性能な分析データベース・データウェアハウス・クエリーエンジン・ストレージシステムは列指向アーキテクチャーを採用することが多いです。これは、よく使われる分析クエリーを高速に実行するためです。最新の列指向クエリーシステムは、Amazon
 Redshift・Apache DataFusion・ClickHouse・Databricks Photon Engine・DuckDB・Google 
BigQuery・Microsoft Azure Synapse Analytics・OpenText Analytics Database 
(Vertica)・Snowflakeなどです。</p>
 
<p>同様に、分析用クエリー結果の多くの出力先も列指向アーキテクチャーを採用しています。出力先は、たとえば、BIツール・データアプリケーションプラットフォーム・データフレームライブラリー・機械学習プラットフォームなどです。列指向のBIツールは、Amazon
 QuickSight・Domo・GoodData・Power BI・Qlik 
Sense・Spotfire・Tableauなどです。列指向のデータフレームライブラリーは、cuDF・pandas・Polarsなどです。</p>
 
<p>そのため、クエリー結果の送信元のフォーマットと受信先のフォーマットがどちらも列指向フォーマットであることがますます一般的になっています。列指向の送信元と列指向の受信先でもっとも効率的にデータを転送する方法は列指向の転送用フォーマットを使うことです。これにより、行と列を転置するという時間のかかる処理をせずに済みます。行指向の転送用フォーマットを使うと、データ転送元のシリアライズ処理で列を行に転置し、データ受信先のデシリアライズ処理で行を列に転置しないといけません。</p>
 
<p>Arrowは列指向のデータフォーマットです。Arrowフォーマットのデータの列指向のレイアウトは、広く使われている送信元システム・受信先システムでのデータのレイアウトと似ています。多くのケースでは似ているのではなく同一のレイアウトになっています。</p>
@@ -313,7 +313,7 @@
 <p>Arrowフォーマットはゼロコピー操作をサポートしています。データの値の集まりを保持するために、Arrowは<a 
href="https://arrow.apache.org/docs/format/Columnar.html#serialization-and-interprocess-communication-ipc";>レコードバッチ</a>と呼んでいる列指向の表形式のデータ構造を定義しています。Arrowのレコードバッチはメモリー上に保持することもできますし、ネットワーク経由で送信することもできますし、ディスクに保存することもできます。レコードバッチがどのメディアにあってもどのシステムで生成されてもバイナリー構造は変わりません。スキーマと他のメタデータを保存するために、ArrowはFlatBuffersを使っています。FlatBuffersはGoogleが作ったフォーマットです。FlatBuffersも、どのメディア上でも同じバイナリー構造になります。</p>
 
<p>これらの設計判断により、Arrowは転送用のフォーマットとしてだけでなく、メモリー上のフォーマットとしてもディスク上のフォーマットとしても使えます。これは、JSONやCSVといったテキストベースのフォーマットやProtocol
 
BuffersやThriftといったシリアライズされたバイナリーフォーマットとは対照的です。これらのフォーマットは専用の構文を使ってデータをエンコードします。これらのフォーマットのデータをメモリー上で使える構造にロードするには、データをパースしてデコードする必要があります。これはParquetやORCといったバイナリーフォーマットとも対照的です。これらはディスク上でのデータサイズを削減するためにエンコードしたり圧縮したりします。これらのフォーマットのデータをメモリー上で使える構造にロード�
 �ためには、展開してデコードする必要があります<sup class="footnote-ref"><a href="#fn3" 
id="fnref3">3</a></sup>。</p>
 
<p>データ送信元のシステムでは、メモリー上あるいはディスク上にArrowフォーマットのデータがあればシリアライズせずにArrowフォーマットでネットワーク越しにデータ転送できるということです。また、データ受信先のシステムでは、デシリアライズせずにネットワークからメモリー上にデータを読み込んだりディスク上にArrowファイルとして書き出したりできるということです。</p>
-<p>Arrowフォーマットは非常に効率よく分析操作できるメモリー上のフォーマットとして設計されています。このため、多くの列指向データシステムは内部のメモリー上のフォーマットとしてArrowを採用しています。たとえば、Apache
 DataFusion・cuDF・Dremio・InfluxDB・Polars・Velox・Voltron Data 
Theseusなどが採用しています。これらのシステムがデータ送信元あるいはデータ受信先である場合、シリアライズ・デシリアライズのオーバーヘッドは完全になくなります。他の多くの列指向のデータシステムの場合、それらが使っているプロプライエタリなメモリー上のフォーマットはArrowと非常に似ています。それらのシステムでは、Arrowフォーマットとのシリアライズ・デシリアライズ処理は高速で効率的です。</p>
+<p>Arrowフォーマットは非常に効率よく分析操作できるメモリー上のフォーマットとして設計されています。このため、多くの列指向データシステムは内部のメモリー上のフォーマットとしてArrowを採用しています。たとえば、Apache
 
DataFusion・cuDF・Dremio・InfluxDB・Polars・Veloxなどが採用しています。これらのシステムがデータ送信元あるいはデータ受信先である場合、シリアライズ・デシリアライズのオーバーヘッドは完全になくなります。他の多くの列指向のデータシステムの場合、それらが使っているプロプライエタリなメモリー上のフォーマットはArrowと非常に似ています。それらのシステムでは、Arrowフォーマットとのシリアライズ・デシリアライズ処理は高速で効率的です。</p>
 <h3>4. Arrowフォーマットはストリーム可能</h3>
 
<p>ストリーム可能なデータフォーマットはデータセット全体を待たずに1つのチャンクずつ順番に処理できます。データがストリーム可能なフォーマットで転送されているとき、受信先のシステムは最初のチャンクが到着したらすぐに処理を開始できます。これによりいくつかの方法でデータ転送を高速化できます。たとえば、データを処理している間に次のデータを受信できます。たとえば、受信先のシステムはメモリーをより効率的に使うことができます。たとえば、複数のストリームを並列に転送することができます。これにより、データ転送・データのデシリアライズ・データ処理を高速化できます。</p>
 
<p>たとえば、CSVはストリーム可能なデータフォーマットです。なぜなら、（もし含まれているなら）ファイルの先頭のヘッダーにカラム名があって、ファイル中のそれ以降の行は順番に処理できるからです。一方、ParquetとORCはストリーム可能ではないデータフォーマットです。なぜなら、データを処理するために必要なスキーマと他のメタデータがファイルの最後のフッターにあるからです。処理を始める前にファイル全体をダウンロードする（あるいはファイルの最後まで移動してフッターを別途ダウンロードする）必要があります<sup
 class="footnote-ref"><a href="#fn4" id="fnref4">4</a></sup>。</p>
@@ -327,7 +327,7 @@
 
<p>Arrowの汎用性により、実際のデータシステムを高速化する際の基本的な問題に対処できます。その問題とは、性能向上はシステムのボトルネックに律速するということです。この問題は<a
 
href="https://ja.wikipedia.org/wiki/%E3%82%A2%E3%83%A0%E3%83%80%E3%83%BC%E3%83%AB%E3%81%AE%E6%B3%95%E5%89%87";
 target="_blank" 
rel="noopener">Amdahlの法則</a>として知られています。実際のデータパイプラインでは、クエリー結果が複数のステージを流れることはよくあり、各ステージでシリアライズ・デシリアライズのオーバーヘッドがあります。たとえば、もし、あなたのデータパイプラインに5つのステージがあり、そのうちの4つのステージでシリアライズ・デシリアライズオーバーヘッドを取り除くことができたとしても、あなたのシステムは速くならないでしょう。なぜなら、残�
 �1ステージのシリアライズ・デシリアライズがパイプライン全体のボトルネックになるからです。</p>
 
<p>Arrowはどんな技術スタック上でも効率的に動くので、この問題の解決に役立ちます。たとえば、こんなデータフローがあったとしたらどうでしょう。NVIDIAのGPUを積んだワーカーを持つScalaベースの分散バックエンド→JettyベースのHTTPサーバー→Pyodideベースのブラウザーフロントエンドを持つNode.jsベースの機械学習フレームワークを使ってユーザーとやりとりするRailsベースの特徴量エンジニアリングアプリ。問題ありません。Arrowライブラリーはこれらのすべてのコンポーネント間のシリアライズ・デシリアライズオーバーヘッドを取り除けます。</p>
 <h3>まとめ</h3>
-<p>より多くの商用・オープンソースのツールがArrowに対応するにつれ、シリアライズ・デシリアライズのないあるいは少ない高速なクエリー転送がますます一般的になっています。現在では、多くのデータベース・データプラットフォーム・クエリーエンジンがArrowフォーマットでクエリー結果を転送できます。たとえば、Databricks・Dremio・Google
 BigQuery・InfluxDB・Snowflake・Voltron Data Theseusといった商用プロダクトや、Apache 
DataFusion・Apache Doris・Apache 
Spark・ClickHouse・DuckDBといったオープンソースプロダクトがサポートしています。これにより大幅に高速化しています。</p>
+<p>より多くの商用・オープンソースのツールがArrowに対応するにつれ、シリアライズ・デシリアライズのないあるいは少ない高速なクエリー転送がますます一般的になっています。現在では、多くのデータベース・データプラットフォーム・クエリーエンジンがArrowフォーマットでクエリー結果を転送できます。たとえば、Databricks・Dremio・Google
 BigQuery・InfluxDB・Snowflakeといった商用プロダクトや、Apache DataFusion・Apache Doris・Apache 
Spark・ClickHouse・DuckDBといったオープンソースプロダクトがサポートしています。これにより大幅に高速化しています。</p>
 <ul>
 <li>Apache Doris: <a 
href="https://doris.apache.org/blog/arrow-flight-sql-in-apache-doris-for-10x-faster-data-transfer";
 target="_blank" rel="noopener">「20倍から数百倍」高速化</a>
 </li>
diff --git a/blog/2025/01/10/arrow-result-transfer/index.html 
b/blog/2025/01/10/arrow-result-transfer/index.html
index 64bf98509f1..f3403a151ca 100644
--- a/blog/2025/01/10/arrow-result-transfer/index.html
+++ b/blog/2025/01/10/arrow-result-transfer/index.html
@@ -307,7 +307,7 @@
   <img 
src="/img/arrow-result-transfer/part-1-figure-1-row-vs-column-layout.png" 
width="100%" class="img-responsive" alt="Figure 1: An illustration of 
row-oriented and column-oriented physical memory layouts of a table containing 
three columns and five rows.">
   <figcaption>Figure 1: An illustration of row-oriented and column-oriented 
physical memory layouts of a table containing three columns and five 
rows.</figcaption>
 </figure>
-<p>High-performance analytic databases, data warehouses, query engines, and 
storage systems have converged on columnar architecture because it speeds up 
the most common types of analytic queries. Examples of modern columnar query 
systems include Amazon Redshift, Apache DataFusion, ClickHouse, Databricks 
Photon Engine, DuckDB, Google BigQuery, Microsoft Azure Synapse Analytics, 
OpenText Analytics Database (Vertica), Snowflake, and Voltron Data Theseus.</p>
+<p>High-performance analytic databases, data warehouses, query engines, and 
storage systems have converged on columnar architecture because it speeds up 
the most common types of analytic queries. Examples of modern columnar query 
systems include Amazon Redshift, Apache DataFusion, ClickHouse, Databricks 
Photon Engine, DuckDB, Google BigQuery, Microsoft Azure Synapse Analytics, 
OpenText Analytics Database (Vertica), and Snowflake.</p>
 <p>Likewise, many destinations for analytic query results (such as business 
intelligence tools, data application platforms, dataframe libraries, and 
machine learning platforms) use columnar architecture. Examples of columnar 
business intelligence tools include Amazon QuickSight, Domo, GoodData, Power 
BI, Qlik Sense, Spotfire, and Tableau. Examples of columnar dataframe libraries 
include cuDF, pandas, and Polars.</p>
 <p>So it is increasingly common for both the source format and the target 
format of a query result to be columnar formats. The most efficient way to 
transfer data between a columnar source and a columnar target is to use a 
columnar transfer format. This eliminates the need for a time-consuming 
transpose of the data from columns to rows at the source during the 
serialization step and another time-consuming transpose of the data from rows 
to columns at the destination during the deserializ [...]
 <p>Arrow is a columnar data format. The column-oriented layout of data in the 
Arrow format is similar—and in many cases identical—to the layout of data in 
many widely used columnar source systems and destination systems.</p>
@@ -321,7 +321,7 @@
 <p>The Arrow format supports zero-copy operations. To hold sets of data 
values, Arrow defines a column-oriented tabular data structure called a <a 
href="https://arrow.apache.org/docs/format/Columnar.html#serialization-and-interprocess-communication-ipc";>record
 batch</a>. Arrow record batches can be held in memory, sent over a network, or 
stored on disk. The binary structure remains the same regardless of which 
medium a record batch is on and which system generated it. To hold schemas and 
[...]
 <p>As a result of these design choices, Arrow can serve not only as a transfer 
format but also as an in-memory format and on-disk format. This is in contrast 
to text-based formats such as JSON and CSV and serialized binary formats such 
as Protocol Buffers and Thrift, which encode data values using dedicated 
structural syntax. To load data from these formats into a usable in-memory 
structure, the data must be parsed and decoded. This is also in contrast to 
binary formats such as Parquet a [...]
 <p>This means that at the source system, if data exists in memory or on disk 
in Arrow format, that data can be transmitted over the network in Arrow format 
without any serialization. And at the destination system, Arrow-formatted data 
can be read off the network into memory or into Arrow files on disk without any 
deserialization.</p>
-<p>The Arrow format was designed to be highly efficient as an in-memory format 
for analytic operations. Because of this, many columnar data systems have been 
built using Arrow as their in-memory format. These include Apache DataFusion, 
cuDF, Dremio, InfluxDB, Polars, Velox, and Voltron Data Theseus. When one of 
these systems is the source or destination of a transfer, ser/de overheads can 
be fully eliminated. With most other columnar data systems, the proprietary 
in-memory formats they u [...]
+<p>The Arrow format was designed to be highly efficient as an in-memory format 
for analytic operations. Because of this, many columnar data systems have been 
built using Arrow as their in-memory format. These include Apache DataFusion, 
cuDF, Dremio, InfluxDB, Polars, and Velox. When one of these systems is the 
source or destination of a transfer, ser/de overheads can be fully eliminated. 
With most other columnar data systems, the proprietary in-memory formats they 
use are very similar to [...]
 <h3>4. The Arrow format enables streaming.</h3>
 <p>A streamable data format is one that can be processed sequentially, one 
chunk at a time, without waiting for the full dataset. When data is being 
transmitted in a streamable format, the receiving system can begin processing 
it as soon as the first chunk arrives. This can speed up data transfer in 
several ways: transfer time can overlap with processing time; the receiving 
system can use memory more efficiently; and multiple streams can be transferred 
in parallel, speeding up transmissi [...]
 <p>CSV is an example of a streamable data format, because the column names (if 
included) are in a header at the top of the file, and the lines in the file can 
be processed sequentially. Parquet and ORC are examples of data formats that do 
not enable streaming, because the schema and other metadata, which are required 
to process the data, are held in a footer at the bottom of the file, making it 
necessary to download the entire file (or seek to the end of the file and 
download the footer  [...]
@@ -335,7 +335,7 @@
 <p>Arrow’s universality allows it to address a fundamental problem in speeding 
up real-world data systems: Performance improvements are inherently constrained 
by a system’s bottlenecks. This problem is known as <a 
href="https://www.geeksforgeeks.org/computer-organization-amdahls-law-and-its-proof/";
 target="_blank" rel="noopener">Amdahl’s law</a>. In real-world data pipelines, 
query results often flow through multiple stages, incurring ser/de overheads at 
each stage. If, for example, your [...]
 <p>Arrow’s ability to operate efficiently in virtually any technology stack 
helps to solve this problem. Does your data flow from a Scala-based distributed 
backend with NVIDIA GPU-accelerated workers to a Jetty-based HTTP server then 
to a Rails-powered feature engineering app which users interact with through a 
Node.js-based machine learning framework with a Pyodide-based browser front 
end? No problem; Arrow libraries are available to eliminate ser/de overheads 
between all of those compo [...]
 <h3>Conclusion</h3>
-<p>As more commercial and open source tools have added support for Arrow, fast 
query result transfer with low or no ser/de overheads has become increasingly 
common. Today, commercial data platforms and query engines including 
Databricks, Dremio, Google BigQuery, InfluxDB, Snowflake, and Voltron Data 
Theseus and open source databases and query engines including Apache 
DataFusion, Apache Doris, Apache Spark, ClickHouse, and DuckDB can all transfer 
query results in Arrow format. The speedup [...]
+<p>As more commercial and open source tools have added support for Arrow, fast 
query result transfer with low or no ser/de overheads has become increasingly 
common. Today, commercial data platforms and query engines including 
Databricks, Dremio, Google BigQuery, InfluxDB, and Snowflake and open source 
databases and query engines including Apache DataFusion, Apache Doris, Apache 
Spark, ClickHouse, and DuckDB can all transfer query results in Arrow format. 
The speedups are substantial:</p>
 <ul>
 <li>Apache Doris: <a 
href="https://doris.apache.org/blog/arrow-flight-sql-in-apache-doris-for-10x-faster-data-transfer";
 target="_blank" rel="noopener">faster “by a factor ranging from 20 to several 
hundreds”</a>
 </li>
diff --git a/feed.xml b/feed.xml
index c729e5cdd6b..f73e5e8f469 100644
--- a/feed.xml
+++ b/feed.xml
@@ -1,4 +1,4 @@
-<?xml version="1.0" encoding="utf-8"?><feed 
xmlns="http://www.w3.org/2005/Atom"; ><generator uri="https://jekyllrb.com/"; 
version="4.4.1">Jekyll</generator><link 
href="https://arrow.apache.org/feed.xml"; rel="self" type="application/atom+xml" 
/><link href="https://arrow.apache.org/"; rel="alternate" type="text/html" 
/><updated>2026-05-17T20:23:14-04:00</updated><id>https://arrow.apache.org/feed.xml</id><title
 type="html">Apache Arrow</title><subtitle>Apache Arrow is the universal 
columnar fo [...]
+<?xml version="1.0" encoding="utf-8"?><feed 
xmlns="http://www.w3.org/2005/Atom"; ><generator uri="https://jekyllrb.com/"; 
version="4.4.1">Jekyll</generator><link 
href="https://arrow.apache.org/feed.xml"; rel="self" type="application/atom+xml" 
/><link href="https://arrow.apache.org/"; rel="alternate" type="text/html" 
/><updated>2026-05-26T17:30:03-04:00</updated><id>https://arrow.apache.org/feed.xml</id><title
 type="html">Apache Arrow</title><subtitle>Apache Arrow is the universal 
columnar fo [...]
 
 -->
 <p>The Apache Arrow team is pleased to announce the v18.6.0 release of Apache 
Arrow Go.
diff --git a/release/index.html b/release/index.html
index f08290e566d..2e74f746c90 100644
--- a/release/index.html
+++ b/release/index.html
@@ -20,13 +20,13 @@
 <meta property="og:site_name" content="Apache Arrow" />
 <meta property="og:image" 
content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png";
 />
 <meta property="og:type" content="article" />
-<meta property="article:published_time" content="2026-05-17T20:23:14-04:00" />
-<meta property="article:modified_time" content="2026-05-17T20:23:14-04:00" />
+<meta property="article:published_time" content="2026-05-26T17:30:03-04:00" />
+<meta property="article:modified_time" content="2026-05-26T17:30:03-04:00" />
 <meta name="twitter:card" content="summary_large_image" />
 <meta name="twitter:image" 
content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png";
 />
 <meta name="twitter:title" content="Releases" />
 <script type="application/ld+json">
-{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2026-05-17T20:23:14-04:00","datePublished":"2026-05-17T20:23:14-04:00","description":"Apache
 Arrow Releases Navigate to the release page for downloads and the changelog. 
24.0.0 (21 April 2026) 23.0.1 (16 February 2026) 23.0.0 (18 January 2026) 
22.0.0 (24 October 2025) 21.0.0 (17 July 2025) 20.0.0 (27 April 2025) 19.0.1 
(16 February 2025) 19.0.0 (16 January 2025) 18.1.0 (24 November 2024) 18.0.0 
(28 October 2024) 17.0. [...]
+{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2026-05-26T17:30:03-04:00","datePublished":"2026-05-26T17:30:03-04:00","description":"Apache
 Arrow Releases Navigate to the release page for downloads and the changelog. 
24.0.0 (21 April 2026) 23.0.1 (16 February 2026) 23.0.0 (18 January 2026) 
22.0.0 (24 October 2025) 21.0.0 (17 July 2025) 20.0.0 (27 April 2025) 19.0.1 
(16 February 2025) 19.0.0 (16 January 2025) 18.1.0 (24 November 2024) 18.0.0 
(28 October 2024) 17.0. [...]
 <!-- End Jekyll SEO tag -->

(arrow-site) branch asf-site updated: Updating built site

Reply via email to