(incubator-graphar) branch main updated: [WIP] add benchmark in README.md (#656)

yecol Tue, 27 May 2025 00:09:10 -0700

This is an automated email from the ASF dual-hosted git repository.

yecol pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/incubator-graphar.git



The following commit(s) were added to refs/heads/main by this push:
     new a1a345ac [WIP] add benchmark in README.md (#656)
a1a345ac is described below

commit a1a345ac8267919b881e8318ffab07872708c8ec
Author: Elssky <[email protected]>
AuthorDate: Tue May 27 15:08:59 2025 +0800

    [WIP] add benchmark in README.md (#656)
    
    * feat(doc): add benchmark in README.md
    
    * [WIP] add benchmark in README.md
---
 README.md                                      | 201 +++++++++++++++++++++++++
 docs/images/benchmark_IO_time.png              | Bin 0 -> 1635999 bytes
 docs/images/benchmark_label_complex_filter.png | Bin 0 -> 2685452 bytes
 docs/images/benchmark_label_simple_filter.png  | Bin 0 -> 2744653 bytes
 docs/images/benchmark_label_storage.png        | Bin 0 -> 1089172 bytes
 docs/images/benchmark_neighbor_retrival.png    | Bin 0 -> 2352355 bytes
 docs/images/benchmark_storage.png              | Bin 0 -> 2630001 bytes
 7 files changed, 201 insertions(+)

diff --git a/README.md b/README.md
index 5ec6800a..3468588d 100644
--- a/README.md
+++ b/README.md
@@ -196,6 +196,207 @@ width="650" alt="edge logical table1" />
 <img src="docs/images/edge_physical_table2.png" class="align-center"
 width="650" alt="edge logical table2" />
 
+## Benchmark
+Our experiments are conducted on an Alibaba Cloud r6.6xlarge instance, 
equipped with a
+24-core Intel(R) Xeon(R) Platinum 8269CY CPU at 2.50GHz and
+192GB RAM, running 64-bit Ubuntu 20.04 LTS. The data is hosted
+on a 200GB PL0 ESSD with a peak I/O throughput of 180MB/s.
+Additional tests on other platforms and S3-like storage yield similar
+results.
+
+### dataset 
+Here we show statistics of datasets with hundreds of millions of vertices from 
[Graph500](Graph500.org) and [LDBC](https://doi.org/10.1145/2723372.2742786). 
Other datasets involved in the experiment can be found in  
[paper](https://arxiv.org/abs/2312.09577).
+
+<table>
+    <thead>
+        <tr>
+            <th>Abbr.</th>
+            <th>Graph</th>
+            <th>|V|</th>
+            <th>|E|</th>
+        </tr>
+    </thead>
+    <tbody>
+        <tr>
+            <td>G8</td>
+            <td>Graph500-28</td>
+            <td>268M</td>
+            <td>4.29B</td>
+        </tr>
+        <tr>
+            <td>G9</td>
+            <td>Graph500-29</td>
+            <td>537M</td>
+            <td>8.59B</td>
+        </tr>
+        <tr>
+            <td>SF30</td>
+            <td>SNB Interactive SF-30</td>
+            <td>99.4M</td>
+            <td>655M</td>
+        </tr>
+        <tr>
+            <td>SF100</td>
+            <td>SNB Interactive SF-100</td>
+            <td>318M</td>
+            <td>2.15B</td>
+        </tr>
+        <tr>
+            <td>SF300</td>
+            <td>SNB Interactive SF-300</td>
+            <td>908M</td>
+            <td>6.29B</td>
+        </tr>
+    </tbody>
+</table>
+
+<!-- We mainly conduct experiments from three aspects: Storage consumption, 
I/O efficiency and Query Time. -->
+
+### Storage efficiency
+<img src="docs/images/benchmark_storage.png" class="align-center"
+width="700" alt="storage consumption" />
+
+Two baseline approaches are
+considered: 1) “plain”, which employs plain encoding for the
+source and destination columns, and 2) “plain + offset”, which
+extends the “plain” method by sorting edges and adding an
+offset column to mark each vertex’s starting edge position.
+The result
+is a notable storage advantage: on average, GraphAr requires
+only 27.3% of the storage needed by the baseline “plain +
+offset”, which is due to delta encoding.
+
+### I/O speed
+<img src="docs/images/benchmark_IO_time.png" class="align-center"
+width="700" alt="I/O time" />
+
+In (a) indicate that GraphAr significantly
+outperforms the baseline (CSV), achieving an average speedup of 4.9×. In 
Figure (b), the immutable (“Imm”) and mutable (“Mut”) variants are two native 
in-memory storage of GraphScope. It demonstrates that although the querying 
time with GraphAr exceeds that of the in-memory storages, attributable to 
intrinsic I/O overhead, it significantly surpasses the process of loading and 
then
+executing the query, by 2.4× and 2.5×, respectively. This indicates that 
GraphAr is a viable option for executing infrequent queries.
+
+
+<!-- ### Neighbor Retrieval
+<img src="docs/images/benchmark_neighbor_retrival.png" class="align-center"
+width="700" alt="Neighbor retrival" />
+
+We query vertices with the largest
+degree in selected graphs, maintaining edges in CSR-like or CSC-like formats 
depending on the degree type. GraphAr significantly outperforms the baselines, 
achieving an average speedup of 4452× over the “plain” method, 3.05× over 
“plain + offset”, and 1.23× over “delta + offset”. -->
+### Label Filtering
+<img src="docs/images/benchmark_label_simple_filter.png" class="align-center"
+width="700" alt="Simple condition filtering" />
+
+**Performance of simple condition filtering.**
+For each graph, we perform experiments where we consider
+each label individually as the target label for filtering.
+GraphAr consistently outperforms the baselines. On average, it achieves a 
speedup of 14.8× over the “string” method, 8.9× over the “binary (plain)” 
method, and 7.4× over the  “binary (RLE)” method.
+
+<img src="docs/images/benchmark_label_complex_filter.png" class="align-center"
+width="700" alt="Complex condition filtering" />
+
+**Performance of complex condition filtering.**
+For each graph,
+we combine two labels by AND or OR as the filtering condition.
+The merge-based decoding method yields the largest gain, where “binary (RLE) + 
merge” outperforms the “binary (RLE)” method by up to 60.5×.
+<!-- ### Query efficiency
+<table>
+    <caption style="text-align: center;">Query Execution Times (in 
seconds)</caption>
+    <thead>
+        <tr>
+            <th rowspan="2">Query</th>
+            <th colspan="4" scope="colgroup">SF30</th>
+            <th colspan="4" scope="colgroup">SF100</th>
+            <th colspan="4" scope="colgroup">SF300</th>
+        </tr>
+        <tr>
+            <th>P</th>
+            <th>N</th>
+            <th>A</th>
+            <th>G</th>
+            <th>P</th>
+            <th>N</th>
+            <th>A</th>
+            <th>G</th>
+            <th>P</th>
+            <th>N</th>
+            <th>A</th>
+            <th>G</th>
+        </tr>
+    </thead>
+    <tbody>
+        <tr>
+            <td>ETL</td>
+            <td>6024</td>
+            <td>390</td>
+            <td>—</td>
+            <td>—</td>
+            <td>17726</td>
+            <td>2094</td>
+            <td>—</td>
+            <td>—</td>
+            <td>OM</td>
+            <td>9122</td>
+            <td>—</td>
+            <td>—</td>
+        </tr>
+        <tr>
+            <td>IS-3</td>
+            <td>1.00</td>
+            <td>0.30</td>
+            <td>0.16</td>
+            <td><strong>0.01</strong></td>
+            <td>6.59</td>
+            <td>2.09</td>
+            <td>0.48</td>
+            <td><strong>0.01</strong></td>
+            <td>OM</td>
+            <td>4.12</td>
+            <td>1.39</td>
+            <td><strong>0.03</strong></td>
+        </tr>
+        <tr>
+            <td>IC-8</td>
+            <td>1.35</td>
+            <td><strong>0.37</strong></td>
+            <td>72.2</td>
+            <td>3.36</td>
+            <td>8.43</td>
+            <td><strong>1.26</strong></td>
+            <td>246</td>
+            <td>6.56</td>
+            <td>OM</td>
+            <td><strong>2.98</strong></td>
+            <td>894</td>
+            <td>23.3</td>
+        </tr>
+        <tr>
+            <td>BI-2</td>
+            <td>125</td>
+            <td>45.0</td>
+            <td>67.7</td>
+            <td><strong>4.30</strong></td>
+            <td>3884</td>
+            <td>1101</td>
+            <td>232</td>
+            <td><strong>16.3</strong></td>
+            <td>OM</td>
+            <td>6636</td>
+            <td>756</td>
+            <td><strong>50.0</strong></td>
+        </tr>
+    </tbody>
+</table>
+<p><strong>Notes: <a href="https://github.com/apache/pinot"; 
target="_blank">Pinot (P)</a>, <a href="https://github.com/neo4j/neo4j"; 
target="_blank">Neo4j (N)</a>, <a 
href="https://arrow.apache.org/docs/cpp/streaming_execution.html"; 
target="_blank">Acero (A)</a>, and GraphAr (G).
+“OM” denotes failed execution due to out-of-memory errors. 
+While both Pinot and Neo4j are widely-used, they
+are not natively designed for data lakes and require an Extract-Transform-Load 
(ETL) process for integration. The three representative queries includes 
neighbor retrieval and label filtering, reference to <a 
href="https://github.com/ldbc/ldbc_snb_bi"; target="_blank">LDBC SNB Business 
Intelligence</a> and <a 
href="https://github.com/ldbc/ldbc_snb_interactive_v1_impls"; 
target="_blank">LDBC SNB Interactive v1 </a> workload implementations. 
</strong></p>
+
+GraphAr significantly outperforms Acero, achieving an
+average speedup of 29.5×. A closer analysis of the results reveals
+that the performance gains stem from the following factors: 1) data
+layout design and encoding/decoding optimizations we proposed,
+to enable efficient neighbor retrieval (IS-3, IC-8, BI-2) and label
+filtering (BI-2); 2) bitmap generation can be utilized in selection steps 
(IS-3, IC-8, BI-2). -->
+
 ## Libraries
 
 GraphAr offers a collection of libraries for the purpose of reading,
diff --git a/docs/images/benchmark_IO_time.png 
b/docs/images/benchmark_IO_time.png
new file mode 100644
index 00000000..20496b33
Binary files /dev/null and b/docs/images/benchmark_IO_time.png differ
diff --git a/docs/images/benchmark_label_complex_filter.png 
b/docs/images/benchmark_label_complex_filter.png
new file mode 100644
index 00000000..a5fc80d7
Binary files /dev/null and b/docs/images/benchmark_label_complex_filter.png 
differ
diff --git a/docs/images/benchmark_label_simple_filter.png 
b/docs/images/benchmark_label_simple_filter.png
new file mode 100644
index 00000000..7c9d4e6b
Binary files /dev/null and b/docs/images/benchmark_label_simple_filter.png 
differ
diff --git a/docs/images/benchmark_label_storage.png 
b/docs/images/benchmark_label_storage.png
new file mode 100644
index 00000000..3672a646
Binary files /dev/null and b/docs/images/benchmark_label_storage.png differ
diff --git a/docs/images/benchmark_neighbor_retrival.png 
b/docs/images/benchmark_neighbor_retrival.png
new file mode 100644
index 00000000..5b0db318
Binary files /dev/null and b/docs/images/benchmark_neighbor_retrival.png differ
diff --git a/docs/images/benchmark_storage.png 
b/docs/images/benchmark_storage.png
new file mode 100644
index 00000000..5c0eb6c0
Binary files /dev/null and b/docs/images/benchmark_storage.png differ


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(incubator-graphar) branch main updated: [WIP] add benchmark in README.md (#656)

Reply via email to