This is an automated email from the ASF dual-hosted git repository.
planka pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/orc.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 7e0e2aec7 Updated site to include documentation for Read vs Seek
Optimization
7e0e2aec7 is described below
commit 7e0e2aec73e5695ccf9b73243fdb23b7adfb40d2
Author: Pavan Lanka <[email protected]>
AuthorDate: Tue Jun 14 16:07:53 2022 -0700
Updated site to include documentation for Read vs Seek Optimization
---
develop/design/index.html | 1 +
develop/design/io/index.html | 537 +++++++++++++++++++++++++++++++++++++++++++
img/seekvsread.png | Bin 0 -> 32924 bytes
3 files changed, 538 insertions(+)
diff --git a/develop/design/index.html b/develop/design/index.html
index b2f5ff612..e019dfda9 100644
--- a/develop/design/index.html
+++ b/develop/design/index.html
@@ -86,6 +86,7 @@
<h1>Design</h1>
<ul>
<li><a href="lazy_filter">Lazy Filters</a></li>
+ <li><a href="io">IO</a></li>
</ul>
</article>
diff --git a/develop/design/io/index.html b/develop/design/io/index.html
new file mode 100644
index 000000000..01b3eba03
--- /dev/null
+++ b/develop/design/io/index.html
@@ -0,0 +1,537 @@
+<!DOCTYPE HTML>
+<html lang="en-US">
+<head>
+ <meta charset="UTF-8">
+ <title>IO</title>
+ <meta name="viewport" content="width=device-width,initial-scale=1">
+ <meta name="generator" content="Jekyll v3.8.6">
+ <link rel="stylesheet"
href="//fonts.googleapis.com/css?family=Lato:300,300italic,400,400italic,700,700italic,900">
+ <link rel="stylesheet" href="/css/screen.css">
+ <link rel="icon" type="image/x-icon" href="/favicon.ico">
+ <!--[if lt IE 9]>
+ <script src="/js/html5shiv.min.js"></script>
+ <script src="/js/respond.min.js"></script>
+ <![endif]-->
+</head>
+
+
+<body class="wrap">
+ <header role="banner">
+ <nav class="mobile-nav show-on-mobiles">
+ <ul>
+ <li class="">
+ <a href="/">Home</a>
+ </li>
+ <li class="">
+ <a href="/docs/"><span class="show-on-mobiles">Docs</span>
+ <span class="hide-on-mobiles">Documentation</span></a>
+ </li>
+ <li class="">
+ <a href="/talks/">Talks</a>
+ </li>
+ <li class="">
+ <a href="/news/">News</a>
+ </li>
+ <li class="">
+ <a href="/help/">Help</a>
+ </li>
+ <li class="current">
+ <a href="/develop/">Develop</a>
+ </li>
+</ul>
+
+ </nav>
+ <div class="grid">
+ <div class="unit one-third center-on-mobiles">
+ <h1>
+ <a href="/">
+ <span class="sr-only">Apache ORC</span>
+ <img src="/img/logo.png" width="249" height="101" alt="ORC Logo">
+ </a>
+ </h1>
+ </div>
+ <nav class="main-nav unit two-thirds hide-on-mobiles">
+ <ul>
+ <li class="">
+ <a href="/">Home</a>
+ </li>
+ <li class="">
+ <a href="/docs/"><span class="show-on-mobiles">Docs</span>
+ <span class="hide-on-mobiles">Documentation</span></a>
+ </li>
+ <li class="">
+ <a href="/talks/">Talks</a>
+ </li>
+ <li class="">
+ <a href="/news/">News</a>
+ </li>
+ <li class="">
+ <a href="/help/">Help</a>
+ </li>
+ <li class="current">
+ <a href="/develop/">Develop</a>
+ </li>
+</ul>
+
+ </nav>
+ </div>
+</header>
+
+
+ <section class="standalone">
+ <div class="grid">
+
+ <div class="unit whole">
+ <article>
+ <h1>IO</h1>
+ <ul>
+ <li><a href="#Background">Background</a>
+ <ul>
+ <li><a href="#SeekvsRead">Seek vs Read</a></li>
+ <li><a href="#ORCRead">ORC Read</a></li>
+ </ul>
+ </li>
+ <li><a href="#ReadOptimization">Read Optimization</a>
+ <ul>
+ <li><a href="#Approach">Approach</a></li>
+ <li><a href="#Scope">Scope</a></li>
+ <li><a href="#Benchmarks">Benchmarks</a>
+ <ul>
+ <li><a href="#LocalFS">Local FS</a></li>
+ <li><a href="#AWSS3">AWS S3</a></li>
+ </ul>
+ </li>
+ <li><a href="#Summary">Summary</a></li>
+ </ul>
+ </li>
+</ul>
+
+<h2 id="background-">Background <a id="Background"></a></h2>
+
+<p>We are moving our workloads from HDFS to AWS S3. As part of this activity
we wanted to understand the performance
+characteristics and costs of using S3.</p>
+
+<h3 id="seek-vs-read-">Seek vs Read <a id="SeekvsRead"></a></h3>
+
+<p>One particular scenario that stood out in our performance testing was Seek
vs Read when dealing with S3.</p>
+
+<p>In this test we are trying to read through a file</p>
+
+<ul>
+ <li>Seek to Point A in the file read X bytes</li>
+ <li>Move to Point B in the file that is A + X + Y
+ <ul>
+ <li>This is accomplished as another seek or as a read</li>
+ <li>We will leave Y variable to determine when this is best</li>
+ </ul>
+ </li>
+ <li>Read X bytes</li>
+</ul>
+
+<p><img src="/img/seekvsread.png" alt="Seek vs Read" /></p>
+
+<p>Observations:</p>
+
+<ul>
+ <li>We could clearly see that a read is more performant than seek when
dealing with steps/gaps smaller than 4 MB.
+ <ul>
+ <li>At 4 MB read is faster by ~ 11%</li>
+ <li>At 1 MB read is faster by ~ 20%</li>
+ </ul>
+ </li>
+ <li>Reads are also cheaper as we perform a single GET instead of multiple
GETs from <a href="https://aws.amazon.com/s3/pricing/">AWS S3 Pricing</a>
+ <ul>
+ <li>Cost for GET: $0.0004</li>
+ <li>Cost for Data Retrieval to the same region AWS EKS: $0.0000</li>
+ </ul>
+ </li>
+</ul>
+
+<h3 id="orc-read-">ORC Read <a id="ORCRead"></a></h3>
+
+<p>Based on the above performance penalty when dealing with multiple seeks
over small gaps, we measured the performance of
+ORC read on a file.</p>
+
+<p>File details:</p>
+
+<ul>
+ <li>Size ~ 21 MB</li>
+ <li>Column Count: ~ 400</li>
+ <li>Row Count: ~ 65K</li>
+</ul>
+
+<table>
+ <thead>
+ <tr>
+ <th style="text-align: left">Read Type</th>
+ <th style="text-align: right">Duration</th>
+ <th style="text-align: left">Unit</th>
+ </tr>
+ </thead>
+ <tbody>
+ <tr>
+ <td style="text-align: left">All Columns</td>
+ <td style="text-align: right">1.075</td>
+ <td style="text-align: left">s</td>
+ </tr>
+ <tr>
+ <td style="text-align: left">Alternate Columns</td>
+ <td style="text-align: right">6.489</td>
+ <td style="text-align: left">s</td>
+ </tr>
+ </tbody>
+</table>
+
+<p>Observations:</p>
+
+<ul>
+ <li>We can clearly see that we pay a significant penalty when reading
alternate columns, which in the current
+implementation of ORC translates to multiple GET calls on AWS S3</li>
+ <li>While the impact of penalty will be less significant in large reads, it
will incur overheads both in terms of time and
+cost</li>
+</ul>
+
+<h2 id="read-optimization-">Read Optimization <a
id="ReadOptimization"></a></h2>
+
+<h3 id="approach-">Approach <a id="Approach"></a></h3>
+
+<p>The following optimizations are planned:</p>
+
+<ul>
+ <li><strong>orc.min.disk.seek.size</strong> is a value in bytes: When trying
to determine a single read, if the gap between two reads
+is smaller than this then it is combined into a single read.</li>
+ <li><strong>orc.min.disk.seek.size.tolerance</strong> is a fractional input:
If the extra bytes read is greater than this fraction of
+the required bytes, then we drop the extra bytes from memory.</li>
+ <li>We can further consider adding an optimization for the complete stripe
in case the stripe size is smaller than
+<code class="highlighter-rouge">orc.min.disk.seek.size</code></li>
+</ul>
+
+<h3 id="scope-">Scope <a id="Scope"></a></h3>
+
+<p>Different types of IO takes place in ORC today.</p>
+
+<ul>
+ <li>Reading of File Footer: Unchanged</li>
+ <li>Reading of Stripe Footer: Unchanged</li>
+ <li>Reading of Stripe Index information: Optimized</li>
+ <li>Reading of Stripe Data: Optimized</li>
+</ul>
+
+<p>Each of the above happens at different stages of the read. The current
implementation optimizes reads that happen using
+the <a
href="https://github.com/apache/orc/tree/main/java/core/src/java/org/apache/orc/DataReader.java">DataReader</a>
interface.</p>
+
+<p>This does not:</p>
+
+<ul>
+ <li>Optimize the read of the file/stripe footer</li>
+ <li>Reads across multiple stripes</li>
+</ul>
+
+<h3 id="benchmarks-">Benchmarks <a id="Benchmarks"></a></h3>
+
+<h4 id="local-fs-">Local FS <a id="LocalFS"></a></h4>
+
+<p>This benchmark is run on the local filesystem with NVMe SSD, so it has very
different performance characteristics to AWS
+S3.</p>
+
+<p>The purpose of this benchmark is to ascertain if we have added any
significant penalties in the ORC code by adding
+<code class="highlighter-rouge">minSeekSize</code> and <code
class="highlighter-rouge">extraByteTolerance</code>.</p>
+
+<div class="language-bash highlighter-rouge"><div class="highlight"><pre
class="highlight"><code>java <span class="nt">-jar</span>
java/bench/core/target/orc-benchmarks-core-<span class="k">*</span><span
class="nt">-uber</span>.jar chunk_read
+</code></pre></div></div>
+
+<table>
+ <thead>
+ <tr>
+ <th style="text-align: left">(alt)</th>
+ <th style="text-align: right">(cols)</th>
+ <th style="text-align: right">(byteTol)</th>
+ <th style="text-align: right">(minSeek)</th>
+ <th style="text-align: left">Mode</th>
+ <th style="text-align: right">Cnt</th>
+ <th style="text-align: right">Score</th>
+ <th style="text-align: left">Sign</th>
+ <th style="text-align: right">Error</th>
+ <th style="text-align: left">Units</th>
+ </tr>
+ </thead>
+ <tbody>
+ <tr>
+ <td style="text-align: left">true</td>
+ <td style="text-align: right">128</td>
+ <td style="text-align: right">0.0</td>
+ <td style="text-align: right">0</td>
+ <td style="text-align: left">avgt</td>
+ <td style="text-align: right">20</td>
+ <td style="text-align: right">0.352</td>
+ <td style="text-align: left">±</td>
+ <td style="text-align: right">0.006</td>
+ <td style="text-align: left">s/op</td>
+ </tr>
+ <tr>
+ <td style="text-align: left">true</td>
+ <td style="text-align: right">128</td>
+ <td style="text-align: right">0.0</td>
+ <td style="text-align: right">4194304</td>
+ <td style="text-align: left">avgt</td>
+ <td style="text-align: right">20</td>
+ <td style="text-align: right">0.357</td>
+ <td style="text-align: left">±</td>
+ <td style="text-align: right">0.002</td>
+ <td style="text-align: left">s/op</td>
+ </tr>
+ <tr>
+ <td style="text-align: left">true</td>
+ <td style="text-align: right">128</td>
+ <td style="text-align: right">10.0</td>
+ <td style="text-align: right">4194304</td>
+ <td style="text-align: left">avgt</td>
+ <td style="text-align: right">20</td>
+ <td style="text-align: right">0.349</td>
+ <td style="text-align: left">±</td>
+ <td style="text-align: right">0.002</td>
+ <td style="text-align: left">s/op</td>
+ </tr>
+ <tr>
+ <td style="text-align: left">false</td>
+ <td style="text-align: right">128</td>
+ <td style="text-align: right">0.0</td>
+ <td style="text-align: right">0</td>
+ <td style="text-align: left">avgt</td>
+ <td style="text-align: right">20</td>
+ <td style="text-align: right">0.667</td>
+ <td style="text-align: left">±</td>
+ <td style="text-align: right">0.007</td>
+ <td style="text-align: left">s/op</td>
+ </tr>
+ <tr>
+ <td style="text-align: left">false</td>
+ <td style="text-align: right">128</td>
+ <td style="text-align: right">0.0</td>
+ <td style="text-align: right">4194304</td>
+ <td style="text-align: left">avgt</td>
+ <td style="text-align: right">20</td>
+ <td style="text-align: right">0.673</td>
+ <td style="text-align: left">±</td>
+ <td style="text-align: right">0.004</td>
+ <td style="text-align: left">s/op</td>
+ </tr>
+ <tr>
+ <td style="text-align: left">false</td>
+ <td style="text-align: right">128</td>
+ <td style="text-align: right">10.0</td>
+ <td style="text-align: right">4194304</td>
+ <td style="text-align: left">avgt</td>
+ <td style="text-align: right">20</td>
+ <td style="text-align: right">0.671</td>
+ <td style="text-align: left">±</td>
+ <td style="text-align: right">0.005</td>
+ <td style="text-align: left">s/op</td>
+ </tr>
+ </tbody>
+</table>
+
+<p>Observations/Details:</p>
+
+<ul>
+ <li><strong>Input File details</strong>:
+ <ul>
+ <li>Rows: 65536</li>
+ <li>Columns: 128</li>
+ <li>FileSize: ~ 72 MB</li>
+ </ul>
+ </li>
+ <li>Full Read (alternate = false)
+ <ul>
+ <li>No significant difference between the options as expected</li>
+ </ul>
+ </li>
+ <li>Alternate Read (alternate = true)
+ <ul>
+ <li>No significant difference between the options given the small file
size and performance of local disk</li>
+ <li>This also calls out that the recommended minSeekSize should be
determined for each platform e.g. HDFS, S3, etc</li>
+ </ul>
+ </li>
+</ul>
+
+<h4 id="aws-s3-">AWS S3 <a id="AWSS3"></a></h4>
+
+<p>In this benchmark we brought up an EKS Container in the same region as the
AWS S3 bucket to test the performance of the
+patch.</p>
+
+<table>
+ <thead>
+ <tr>
+ <th style="text-align: left">(alternate)</th>
+ <th style="text-align: right">(byteTol)</th>
+ <th style="text-align: right">(minSeekSize)</th>
+ <th style="text-align: left">Mode</th>
+ <th style="text-align: right">Cnt</th>
+ <th style="text-align: right">Score</th>
+ <th style="text-align: left">Sign</th>
+ <th style="text-align: right">Error</th>
+ <th style="text-align: left">Units</th>
+ </tr>
+ </thead>
+ <tbody>
+ <tr>
+ <td style="text-align: left">FALSE</td>
+ <td style="text-align: right">0.0</td>
+ <td style="text-align: right">0</td>
+ <td style="text-align: left">avgt</td>
+ <td style="text-align: right">5</td>
+ <td style="text-align: right">1.837</td>
+ <td style="text-align: left">±</td>
+ <td style="text-align: right">0.089</td>
+ <td style="text-align: left">s/op</td>
+ </tr>
+ <tr>
+ <td style="text-align: left">FALSE</td>
+ <td style="text-align: right">0.0</td>
+ <td style="text-align: right">4194304</td>
+ <td style="text-align: left">avgt</td>
+ <td style="text-align: right">5</td>
+ <td style="text-align: right">1.919</td>
+ <td style="text-align: left">±</td>
+ <td style="text-align: right">0.11</td>
+ <td style="text-align: left">s/op</td>
+ </tr>
+ <tr>
+ <td style="text-align: left">FALSE</td>
+ <td style="text-align: right">10.0</td>
+ <td style="text-align: right">4194304</td>
+ <td style="text-align: left">avgt</td>
+ <td style="text-align: right">5</td>
+ <td style="text-align: right">1.895</td>
+ <td style="text-align: left">±</td>
+ <td style="text-align: right">0.191</td>
+ <td style="text-align: left">s/op</td>
+ </tr>
+ <tr>
+ <td style="text-align: left">TRUE</td>
+ <td style="text-align: right">0.0</td>
+ <td style="text-align: right">0</td>
+ <td style="text-align: left">avgt</td>
+ <td style="text-align: right">5</td>
+ <td style="text-align: right">5.8</td>
+ <td style="text-align: left">±</td>
+ <td style="text-align: right">1.132</td>
+ <td style="text-align: left">s/op</td>
+ </tr>
+ <tr>
+ <td style="text-align: left">TRUE</td>
+ <td style="text-align: right">0.0</td>
+ <td style="text-align: right">4194304</td>
+ <td style="text-align: left">avgt</td>
+ <td style="text-align: right">5</td>
+ <td style="text-align: right">1.479</td>
+ <td style="text-align: left">±</td>
+ <td style="text-align: right">0.197</td>
+ <td style="text-align: left">s/op</td>
+ </tr>
+ <tr>
+ <td style="text-align: left">TRUE</td>
+ <td style="text-align: right">10.0</td>
+ <td style="text-align: right">4194304</td>
+ <td style="text-align: left">avgt</td>
+ <td style="text-align: right">5</td>
+ <td style="text-align: right">1.435</td>
+ <td style="text-align: left">±</td>
+ <td style="text-align: right">0.176</td>
+ <td style="text-align: left">s/op</td>
+ </tr>
+ </tbody>
+</table>
+
+<p>Observations/Details:</p>
+
+<ul>
+ <li><strong>Input File details</strong>:
+ <ul>
+ <li>Rows: 65536</li>
+ <li>Columns: 128</li>
+ <li>FileSize: ~ 72 MB</li>
+ </ul>
+ </li>
+ <li>Full Read (alternate = false)
+ <ul>
+ <li>No significant difference between the options as expected</li>
+ </ul>
+ </li>
+ <li>Alternate Read (alternate = true)
+ <ul>
+ <li>We get a significant boost in performance 5.8s without optimization
to 1.5s with optimization giving us a time
+reduction of ~ 75 %</li>
+ <li>This also gives us a cost saving as 64 GET one for each column per
stripe have been replaced with a single GET</li>
+ <li>We can see a marginal improvement ~ 3% when choosing to retain extra
bytes (extraByteTolerance=10.0) as compared to
+(extraByteTolerance=0.0) which performs additional work of dropping the extra
bytes from memory.</li>
+ </ul>
+ </li>
+</ul>
+
+<h3 id="summary-">Summary <a id="Summary"></a></h3>
+
+<p>Based on the benchmarks the following is recommended for ORC in AWS S3:</p>
+
+<ul>
+ <li><code class="highlighter-rouge">orc.min.disk.seek.size</code> is set to
<code class="highlighter-rouge">4194304</code> (4 MB)</li>
+ <li><code class="highlighter-rouge">orc.min.disk.seek.size.tolerance</code>
is set to value that is acceptable based on the memory usage constraints. When
set
+to <code class="highlighter-rouge">0.0</code> it will always do the extra work
of dropping the extra bytes.</li>
+</ul>
+
+
+ </article>
+ </div>
+
+ <div class="clear"></div>
+
+ </div>
+</section>
+
+
+ <footer role="contentinfo">
+ <p>The contents of this website are © 2022
+ <a href="https://www.apache.org/">Apache Software Foundation</a>
+ under the terms of the <a
+ href="https://www.apache.org/licenses/LICENSE-2.0.html">
+ Apache License v2</a>. Apache ORC and its logo are trademarks
+ of the Apache Software Foundation.</p>
+</footer>
+
+ <script>
+ var anchorForId = function (id) {
+ var anchor = document.createElement("a");
+ anchor.className = "header-link";
+ anchor.href = "#" + id;
+ anchor.innerHTML = "<span class=\"sr-only\">Permalink</span><i class=\"fa
fa-link\"></i>";
+ anchor.title = "Permalink";
+ return anchor;
+ };
+
+ var linkifyAnchors = function (level, containingElement) {
+ var headers = containingElement.getElementsByTagName("h" + level);
+ for (var h = 0; h < headers.length; h++) {
+ var header = headers[h];
+
+ if (typeof header.id !== "undefined" && header.id !== "") {
+ header.appendChild(anchorForId(header.id));
+ }
+ }
+ };
+
+ document.onreadystatechange = function () {
+ if (this.readyState === "complete") {
+ var contentBlock = document.getElementsByClassName("docs")[0] ||
document.getElementsByClassName("news")[0];
+ if (!contentBlock) {
+ return;
+ }
+ for (var level = 1; level <= 6; level++) {
+ linkifyAnchors(level, contentBlock);
+ }
+ }
+ };
+</script>
+
+
+</body>
+</html>
diff --git a/img/seekvsread.png b/img/seekvsread.png
new file mode 100644
index 000000000..6491a5492
Binary files /dev/null and b/img/seekvsread.png differ