[44/51] [partial] arrow-site git commit: Update API documentation for version 0.9.0

cpcloud Thu, 22 Mar 2018 16:59:56 -0700

http://git-wip-us.apache.org/repos/asf/arrow-site/blob/3cd84682/build/blog/2017/06/16/turbodbc-arrow/index.html
----------------------------------------------------------------------
diff --git a/build/blog/2017/06/16/turbodbc-arrow/index.html 
b/build/blog/2017/06/16/turbodbc-arrow/index.html
new file mode 100644
index 0000000..8bf1f4e
--- /dev/null
+++ b/build/blog/2017/06/16/turbodbc-arrow/index.html
@@ -0,0 +1,232 @@
+<!DOCTYPE html>
+<html lang="en-US">
+  <head>
+    <meta charset="UTF-8">
+    <title>Apache Arrow Homepage</title>
+    <meta http-equiv="X-UA-Compatible" content="IE=edge">
+    <meta name="viewport" content="width=device-width, initial-scale=1">
+    <meta name="generator" content="Jekyll v3.4.3">
+    <!-- The above 3 meta tags *must* come first in the head; any other head 
content must come *after* these tags -->
+    <link rel="icon" type="image/x-icon" href="/favicon.ico">
+
+    <link rel="stylesheet" 
href="//fonts.googleapis.com/css?family=Lato:300,300italic,400,400italic,700,700italic,900">
+
+    <link href="/css/main.css" rel="stylesheet">
+    <link href="/css/syntax.css" rel="stylesheet">
+    <script src="https://code.jquery.com/jquery-3.2.1.min.js";
+            integrity="sha256-hwg4gsxgFZhOsEEamdOYGBf13FyQuiTwlAQgxVSNgt4="
+            crossorigin="anonymous"></script>
+    <script src="/assets/javascripts/bootstrap.min.js"></script>
+    
+    <!-- Global Site Tag (gtag.js) - Google Analytics -->
+<script async 
src="https://www.googletagmanager.com/gtag/js?id=UA-107500873-1";></script>
+<script>
+  window.dataLayer = window.dataLayer || [];
+  function gtag(){dataLayer.push(arguments)};
+  gtag('js', new Date());
+
+  gtag('config', 'UA-107500873-1');
+</script>
+
+    
+  </head>
+
+
+
+<body class="wrap">
+  <div class="container">
+    <nav class="navbar navbar-default">
+  <div class="container-fluid">
+    <div class="navbar-header">
+      <button type="button" class="navbar-toggle" data-toggle="collapse" 
data-target="#arrow-navbar">
+        <span class="sr-only">Toggle navigation</span>
+        <span class="icon-bar"></span>
+        <span class="icon-bar"></span>
+        <span class="icon-bar"></span>
+      </button>
+      <a class="navbar-brand" href="/">Apache 
Arrow&#8482;&nbsp;&nbsp;&nbsp;</a>
+    </div>
+
+    <!-- Collect the nav links, forms, and other content for toggling -->
+    <div class="collapse navbar-collapse" id="arrow-navbar">
+      <ul class="nav navbar-nav">
+        <li class="dropdown">
+          <a href="#" class="dropdown-toggle" data-toggle="dropdown"
+             role="button" aria-haspopup="true"
+             aria-expanded="false">Project Links<span class="caret"></span>
+          </a>
+          <ul class="dropdown-menu">
+            <li><a href="/install/">Install</a></li>
+            <li><a href="/blog/">Blog</a></li>
+            <li><a href="/release/">Releases</a></li>
+            <li><a href="https://issues.apache.org/jira/browse/ARROW";>Issue 
Tracker</a></li>
+            <li><a href="https://github.com/apache/arrow";>Source Code</a></li>
+            <li><a 
href="http://mail-archives.apache.org/mod_mbox/arrow-dev/";>Mailing List</a></li>
+            <li><a href="https://apachearrowslackin.herokuapp.com";>Slack 
Channel</a></li>
+            <li><a href="/committers/">Committers</a></li>
+            <li><a href="/powered_by/">Powered By</a></li>
+          </ul>
+        </li>
+        <li class="dropdown">
+          <a href="#" class="dropdown-toggle" data-toggle="dropdown"
+             role="button" aria-haspopup="true"
+             aria-expanded="false">Specification<span class="caret"></span>
+          </a>
+          <ul class="dropdown-menu">
+            <li><a href="/docs/memory_layout.html">Memory Layout</a></li>
+            <li><a href="/docs/metadata.html">Metadata</a></li>
+            <li><a href="/docs/ipc.html">Messaging / IPC</a></li>
+          </ul>
+        </li>
+
+        <li class="dropdown">
+          <a href="#" class="dropdown-toggle" data-toggle="dropdown"
+             role="button" aria-haspopup="true"
+             aria-expanded="false">Documentation<span class="caret"></span>
+          </a>
+          <ul class="dropdown-menu">
+            <li><a href="/docs/python">Python</a></li>
+            <li><a href="/docs/cpp">C++ API</a></li>
+            <li><a href="/docs/java">Java API</a></li>
+            <li><a href="/docs/c_glib">C GLib API</a></li>
+            <li><a href="/docs/js">Javascript API</a></li>
+          </ul>
+        </li>
+        <!-- <li><a href="/blog">Blog</a></li> -->
+        <li class="dropdown">
+          <a href="#" class="dropdown-toggle" data-toggle="dropdown"
+             role="button" aria-haspopup="true"
+             aria-expanded="false">ASF Links<span class="caret"></span>
+          </a>
+          <ul class="dropdown-menu">
+            <li><a href="http://www.apache.org/";>ASF Website</a></li>
+            <li><a href="http://www.apache.org/licenses/";>License</a></li>
+            <li><a 
href="http://www.apache.org/foundation/sponsorship.html";>Donate</a></li>
+            <li><a 
href="http://www.apache.org/foundation/thanks.html";>Thanks</a></li>
+            <li><a href="http://www.apache.org/security/";>Security</a></li>
+          </ul>
+        </li>
+      </ul>
+      <a href="http://www.apache.org/";>
+        <img style="float:right;" src="/img/asf_logo.svg" width="120px"/>
+      </a>
+      </div><!-- /.navbar-collapse -->
+    </div>
+  </nav>
+
+
+    <h2>
+      Connecting Relational Databases to the Apache Arrow World with turbodbc
+      <a href="/blog/2017/06/16/turbodbc-arrow/" class="permalink" 
title="Permalink">â</a>
+    </h2>
+
+    
+
+    <div class="panel">
+      <div class="panel-body">
+        <div>
+          <span class="label label-default">Published</span>
+          <span class="published">
+            <i class="fa fa-calendar"></i>
+            16 Jun 2017
+          </span>
+        </div>
+        <div>
+          <span class="label label-default">By</span>
+          <a href="http://github.com/MathMagique";><i class="fa fa-user"></i> 
Michael KÃ¶nig (MathMagique)</a>
+        </div>
+      </div>
+    </div>
+
+    <!--
+
+-->
+
+<p><em><a href="https://github.com/mathmagique";>Michael KÃ¶nig</a> is the lead 
developer of the <a href="https://github.com/blue-yonder/turbodbc";>turbodbc 
project</a></em></p>
+
+<p>The <a href="https://arrow.apache.org/";>Apache Arrow</a> project set out to 
become the universal data layer for
+column-oriented data processing systems without incurring serialization costs
+or compromising on performance on a more general level. While relational
+databases still lag behind in Apache Arrow adoption, the Python database module
+<a href="https://github.com/blue-yonder/turbodbc";>turbodbc</a> brings Apache 
Arrow support to these databases using a much
+older, more specialized data exchange layer: <a 
href="https://en.wikipedia.org/wiki/Open_Database_Connectivity";>ODBC</a>.</p>
+
+<p>ODBC is a database interface that offers developers the option to transfer 
data
+either in row-wise or column-wise fashion. Previous Python ODBC modules 
typically
+use the row-wise approach, and often trade repeated database roundtrips for 
simplified
+buffer handling. This makes them less suited for data-intensive applications,
+particularly when interfacing with modern columnar analytical databases.</p>
+
+<p>In contrast, turbodbc was designed to leverage columnar data processing 
from day
+one. Naturally, this implies using the columnar portion of the ODBC API. 
Equally
+important, however, is to find new ways of providing columnar data to Python 
users
+that exceed the capabilities of the row-wise API mandated by Pythonâs <a 
href="https://www.python.org/dev/peps/pep-0249/";>PEP 249</a>.
+Turbodbc has adopted Apache Arrow for this very task with the recently released
+version 2.0.0:</p>
+
+<div class="language-python highlighter-rouge"><pre 
class="highlight"><code><span class="o">&gt;&gt;&gt;</span> <span 
class="kn">from</span> <span class="nn">turbodbc</span> <span 
class="kn">import</span> <span class="n">connect</span>
+<span class="o">&gt;&gt;&gt;</span> <span class="n">connection</span> <span 
class="o">=</span> <span class="n">connect</span><span class="p">(</span><span 
class="n">dsn</span><span class="o">=</span><span class="s">"My columnar 
database"</span><span class="p">)</span>
+<span class="o">&gt;&gt;&gt;</span> <span class="n">cursor</span> <span 
class="o">=</span> <span class="n">connection</span><span 
class="o">.</span><span class="n">cursor</span><span class="p">()</span>
+<span class="o">&gt;&gt;&gt;</span> <span class="n">cursor</span><span 
class="o">.</span><span class="n">execute</span><span class="p">(</span><span 
class="s">"SELECT some_integers, some_strings FROM my_table"</span><span 
class="p">)</span>
+<span class="o">&gt;&gt;&gt;</span> <span class="n">cursor</span><span 
class="o">.</span><span class="n">fetchallarrow</span><span class="p">()</span>
+<span class="n">pyarrow</span><span class="o">.</span><span 
class="n">Table</span>
+<span class="n">some_integers</span><span class="p">:</span> <span 
class="n">int64</span>
+<span class="n">some_strings</span><span class="p">:</span> <span 
class="n">string</span>
+</code></pre>
+</div>
+
+<p>With this new addition, the data flow for a result set of a typical SELECT 
query
+is like this:</p>
+<ul>
+  <li>The database prepares the result set and exposes it to the ODBC driver 
using
+either row-wise or column-wise storage.</li>
+  <li>Turbodbc has the ODBC driver write chunks of the result set into 
columnar buffers.</li>
+  <li>These buffers are exposed to turbodbcâs Apache Arrow frontend. This 
frontend
+will create an Arrow table and fill in the buffered values.</li>
+  <li>The previous steps are repeated until the entire result set is 
retrieved.</li>
+</ul>
+
+<p><img src="/img/turbodbc_arrow.png" alt="Data flow from relational databases 
to Python with turbodbc and the Apache Arrow frontend" class="img-responsive" 
width="75%" /></p>
+
+<p>In practice, it is possible to achieve the following ideal situation: A 
64-bit integer
+column is stored as one contiguous block of memory in a columnar database. A 
huge chunk
+of 64-bit integers is transferred over the network and the ODBC driver 
directly writes
+it to a turbodbc buffer of 64-bit integers. The Arrow frontend accumulates 
these values
+by copying the entire 64-bit buffer into a free portion of an Arrow tableâs 
64-bit
+integer column.</p>
+
+<p>Moving data from the database to an Arrow table and, thus, providing it to 
the Python
+user can be as simple as copying memory blocks around, megabytes equivalent to 
hundred
+thousands of rows at a time. The absence of serialization and conversion logic 
renders
+the process extremely efficient.</p>
+
+<p>Once the data is stored in an Arrow table, Python users can continue to do 
some
+actual work. They can convert it into a <a 
href="https://arrow.apache.org/docs/python/pandas.html";>Pandas DataFrame</a> 
for data analysis
+(using a quick <code class="highlighter-rouge">table.to_pandas()</code>), pass 
it on to other data processing
+systems such as <a href="http://spark.apache.org/";>Apache Spark</a> or <a 
href="http://impala.apache.org/";>Apache Impala (incubating)</a>, or store
+it in the <a href="http://parquet.apache.org/";>Apache Parquet</a> file format. 
This way, non-Python systems are
+efficiently connected with relational databases.</p>
+
+<p>In the future, turbodbcâs Arrow support will be extended to use more
+sophisticated features such as <a 
href="https://arrow.apache.org/docs/memory_layout.html#dictionary-encoding";>dictionary-encoded</a>
 string fields. We also
+plan to pick smaller than 64-bit <a 
href="https://arrow.apache.org/docs/metadata.html#integers";>data types</a> 
where possible. Last but not
+least, Arrow support will be extended to cover the reverse direction of data
+flow, so that Python users can quickly insert Arrow tables into relational
+databases.</p>
+
+<p>If you would like to learn more about turbodbc, check out the <a 
href="https://github.com/blue-yonder/turbodbc";>GitHub project</a> and the
+<a href="http://turbodbc.readthedocs.io/";>project documentation</a>. If you 
want to learn more about how turbodbc implements the
+nitty-gritty details, check out parts <a 
href="https://tech.blue-yonder.com/making-of-turbodbc-part-1-wrestling-with-the-side-effects-of-a-c-api/";>one</a>
 and <a 
href="https://tech.blue-yonder.com/making-of-turbodbc-part-2-c-to-python/";>two</a>
 of the
+<a 
href="https://tech.blue-yonder.com/making-of-turbodbc-part-1-wrestling-with-the-side-effects-of-a-c-api/";>âMaking
 of turbodbcâ</a> series at <a href="https://tech.blue-yonder.com/";>Blue 
Yonderâs technology blog</a>.</p>
+
+
+
+    <hr/>
+<footer class="footer">
+  <p>Apache Arrow, Arrow, Apache, the Apache feather logo, and the Apache 
Arrow project logo are either registered trademarks or trademarks of The Apache 
Software Foundation in the United States and other countries.</p>
+  <p>&copy; 2017 Apache Software Foundation</p>
+</footer>
+
+  </div>
+</body>
+</html>


http://git-wip-us.apache.org/repos/asf/arrow-site/blob/3cd84682/build/blog/2017/07/24/0.5.0-release/index.html
----------------------------------------------------------------------
diff --git a/build/blog/2017/07/24/0.5.0-release/index.html 
b/build/blog/2017/07/24/0.5.0-release/index.html
new file mode 100644
index 0000000..8e99201
--- /dev/null
+++ b/build/blog/2017/07/24/0.5.0-release/index.html
@@ -0,0 +1,235 @@
+<!DOCTYPE html>
+<html lang="en-US">
+  <head>
+    <meta charset="UTF-8">
+    <title>Apache Arrow Homepage</title>
+    <meta http-equiv="X-UA-Compatible" content="IE=edge">
+    <meta name="viewport" content="width=device-width, initial-scale=1">
+    <meta name="generator" content="Jekyll v3.4.3">
+    <!-- The above 3 meta tags *must* come first in the head; any other head 
content must come *after* these tags -->
+    <link rel="icon" type="image/x-icon" href="/favicon.ico">
+
+    <link rel="stylesheet" 
href="//fonts.googleapis.com/css?family=Lato:300,300italic,400,400italic,700,700italic,900">
+
+    <link href="/css/main.css" rel="stylesheet">
+    <link href="/css/syntax.css" rel="stylesheet">
+    <script src="https://code.jquery.com/jquery-3.2.1.min.js";
+            integrity="sha256-hwg4gsxgFZhOsEEamdOYGBf13FyQuiTwlAQgxVSNgt4="
+            crossorigin="anonymous"></script>
+    <script src="/assets/javascripts/bootstrap.min.js"></script>
+    
+    <!-- Global Site Tag (gtag.js) - Google Analytics -->
+<script async 
src="https://www.googletagmanager.com/gtag/js?id=UA-107500873-1";></script>
+<script>
+  window.dataLayer = window.dataLayer || [];
+  function gtag(){dataLayer.push(arguments)};
+  gtag('js', new Date());
+
+  gtag('config', 'UA-107500873-1');
+</script>
+
+    
+  </head>
+
+
+
+<body class="wrap">
+  <div class="container">
+    <nav class="navbar navbar-default">
+  <div class="container-fluid">
+    <div class="navbar-header">
+      <button type="button" class="navbar-toggle" data-toggle="collapse" 
data-target="#arrow-navbar">
+        <span class="sr-only">Toggle navigation</span>
+        <span class="icon-bar"></span>
+        <span class="icon-bar"></span>
+        <span class="icon-bar"></span>
+      </button>
+      <a class="navbar-brand" href="/">Apache 
Arrow&#8482;&nbsp;&nbsp;&nbsp;</a>
+    </div>
+
+    <!-- Collect the nav links, forms, and other content for toggling -->
+    <div class="collapse navbar-collapse" id="arrow-navbar">
+      <ul class="nav navbar-nav">
+        <li class="dropdown">
+          <a href="#" class="dropdown-toggle" data-toggle="dropdown"
+             role="button" aria-haspopup="true"
+             aria-expanded="false">Project Links<span class="caret"></span>
+          </a>
+          <ul class="dropdown-menu">
+            <li><a href="/install/">Install</a></li>
+            <li><a href="/blog/">Blog</a></li>
+            <li><a href="/release/">Releases</a></li>
+            <li><a href="https://issues.apache.org/jira/browse/ARROW";>Issue 
Tracker</a></li>
+            <li><a href="https://github.com/apache/arrow";>Source Code</a></li>
+            <li><a 
href="http://mail-archives.apache.org/mod_mbox/arrow-dev/";>Mailing List</a></li>
+            <li><a href="https://apachearrowslackin.herokuapp.com";>Slack 
Channel</a></li>
+            <li><a href="/committers/">Committers</a></li>
+            <li><a href="/powered_by/">Powered By</a></li>
+          </ul>
+        </li>
+        <li class="dropdown">
+          <a href="#" class="dropdown-toggle" data-toggle="dropdown"
+             role="button" aria-haspopup="true"
+             aria-expanded="false">Specification<span class="caret"></span>
+          </a>
+          <ul class="dropdown-menu">
+            <li><a href="/docs/memory_layout.html">Memory Layout</a></li>
+            <li><a href="/docs/metadata.html">Metadata</a></li>
+            <li><a href="/docs/ipc.html">Messaging / IPC</a></li>
+          </ul>
+        </li>
+
+        <li class="dropdown">
+          <a href="#" class="dropdown-toggle" data-toggle="dropdown"
+             role="button" aria-haspopup="true"
+             aria-expanded="false">Documentation<span class="caret"></span>
+          </a>
+          <ul class="dropdown-menu">
+            <li><a href="/docs/python">Python</a></li>
+            <li><a href="/docs/cpp">C++ API</a></li>
+            <li><a href="/docs/java">Java API</a></li>
+            <li><a href="/docs/c_glib">C GLib API</a></li>
+          </ul>
+        </li>
+        <!-- <li><a href="/blog">Blog</a></li> -->
+        <li class="dropdown">
+          <a href="#" class="dropdown-toggle" data-toggle="dropdown"
+             role="button" aria-haspopup="true"
+             aria-expanded="false">ASF Links<span class="caret"></span>
+          </a>
+          <ul class="dropdown-menu">
+            <li><a href="http://www.apache.org/";>ASF Website</a></li>
+            <li><a href="http://www.apache.org/licenses/";>License</a></li>
+            <li><a 
href="http://www.apache.org/foundation/sponsorship.html";>Donate</a></li>
+            <li><a 
href="http://www.apache.org/foundation/thanks.html";>Thanks</a></li>
+            <li><a href="http://www.apache.org/security/";>Security</a></li>
+          </ul>
+        </li>
+      </ul>
+      <a href="http://www.apache.org/";>
+        <img style="float:right;" src="/img/asf_logo.svg" width="120px"/>
+      </a>
+      </div><!-- /.navbar-collapse -->
+    </div>
+  </nav>
+
+
+    <h2>
+      Apache Arrow 0.5.0 Release
+      <a href="/blog/2017/07/24/0.5.0-release/" class="permalink" 
title="Permalink">â</a>
+    </h2>
+
+    
+
+    <div class="panel">
+      <div class="panel-body">
+        <div>
+          <span class="label label-default">Published</span>
+          <span class="published">
+            <i class="fa fa-calendar"></i>
+            24 Jul 2017
+          </span>
+        </div>
+        <div>
+          <span class="label label-default">By</span>
+          <a href="http://wesmckinney.com";><i class="fa fa-user"></i> Wes 
McKinney (wesm)</a>
+        </div>
+      </div>
+    </div>
+
+    <!--
+
+-->
+
+<p>The Apache Arrow team is pleased to announce the 0.5.0 release. It includes
+<a 
href="https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%200.5.0"><strong>130
 resolved JIRAs</strong></a> with some new features, expanded integration
+testing between implementations, and bug fixes. The Arrow memory format remains
+stable since the 0.3.x and 0.4.x releases.</p>
+
+<p>See the <a href="http://arrow.apache.org/install";>Install Page</a> to learn 
how to get the libraries for your
+platform. The <a href="http://arrow.apache.org/release/0.5.0.html";>complete 
changelog</a> is also available.</p>
+
+<h2 id="expanded-integration-testing">Expanded Integration Testing</h2>
+
+<p>In this release, we added compatibility tests for dictionary-encoded data
+between Java and C++. This enables the distinct values (the 
<em>dictionary</em>) in a
+vector to be transmitted as part of an Arrow schema while the record batches
+contain integers which correspond to the dictionary.</p>
+
+<p>So we might have:</p>
+
+<div class="highlighter-rouge"><pre class="highlight"><code>data (string): 
['foo', 'bar', 'foo', 'bar']
+</code></pre>
+</div>
+
+<p>In dictionary-encoded form, this could be represented as:</p>
+
+<div class="highlighter-rouge"><pre class="highlight"><code>indices (int8): 
[0, 1, 0, 1]
+dictionary (string): ['foo', 'bar']
+</code></pre>
+</div>
+
+<p>In upcoming releases, we plan to complete integration testing for the 
remaining
+data types (including some more complicated types like unions and decimals) on
+the road to a 1.0.0 release in the future.</p>
+
+<h2 id="c-activity">C++ Activity</h2>
+
+<p>We completed a number of significant pieces of work in the C++ part of 
Apache
+Arrow.</p>
+
+<h3 id="using-jemalloc-as-default-memory-allocator">Using jemalloc as default 
memory allocator</h3>
+
+<p>We decided to use <a 
href="https://github.com/jemalloc/jemalloc";>jemalloc</a> as the default memory 
allocator unless it is
+explicitly disabled. This memory allocator has significant performance
+advantages in Arrow workloads over the default <code 
class="highlighter-rouge">malloc</code> implementation. We will
+publish a blog post going into more detail about this and why you might 
care.</p>
+
+<h3 id="sharing-more-c-code-with-apache-parquet">Sharing more C++ code with 
Apache Parquet</h3>
+
+<p>We imported the compression library interfaces and dictionary encoding
+algorithms from the <a href="http://github.com/apache/parquet-cpp";>Apache 
Parquet C++ library</a>. The Parquet library now
+depends on this code in Arrow, and we will be able to use it more easily for
+data compression in Arrow use cases.</p>
+
+<p>As part of incorporating Parquetâs dictionary encoding utilities, we have
+developed an <code class="highlighter-rouge">arrow::DictionaryBuilder</code> 
class to enable building
+dictionary-encoded arrays iteratively. This can help save memory and yield
+better performance when interacting with databases, Parquet files, or other
+sources which may have columns having many duplicates.</p>
+
+<h3 id="support-for-lz4-and-zstd-compressors">Support for LZ4 and ZSTD 
compressors</h3>
+
+<p>We added LZ4 and ZSTD compression library support. In ARROW-300 and other
+planned work, we intend to add some compression features for data sent via 
RPC.</p>
+
+<h2 id="python-activity">Python Activity</h2>
+
+<p>We fixed many bugs which were affecting Parquet and Feather users and fixed
+several other rough edges with normal Arrow use. We also added some additional
+Arrow type conversions: structs, lists embedded in pandas objects, and Arrow
+time types (which deserialize to the <code 
class="highlighter-rouge">datetime.time</code> type).</p>
+
+<p>In upcoming releases we plan to continue to improve <a 
href="http://github.com/dask/dask";>Dask</a> support and
+performance for distributed processing of Apache Parquet files with 
pyarrow.</p>
+
+<h2 id="the-road-ahead">The Road Ahead</h2>
+
+<p>We have much work ahead of us to build out Arrow integrations in other data
+systems to improve their processing performance and interoperability with other
+systems.</p>
+
+<p>We are discussing the roadmap to a future 1.0.0 release on the <a 
href="http://mail-archives.apache.org/mod_mbox/arrow-dev/";>developer
+mailing list</a>. Please join the discussion there.</p>
+
+
+
+    <hr/>
+<footer class="footer">
+  <p>Apache Arrow, Arrow, Apache, the Apache feather logo, and the Apache 
Arrow project logo are either registered trademarks or trademarks of The Apache 
Software Foundation in the United States and other countries.</p>
+  <p>&copy; 2017 Apache Software Foundation</p>
+</footer>
+
+  </div>
+</body>
+</html>

http://git-wip-us.apache.org/repos/asf/arrow-site/blob/3cd84682/build/blog/2017/07/25/0.5.0-release/index.html
----------------------------------------------------------------------
diff --git a/build/blog/2017/07/25/0.5.0-release/index.html 
b/build/blog/2017/07/25/0.5.0-release/index.html
new file mode 100644
index 0000000..4b7dd39
--- /dev/null
+++ b/build/blog/2017/07/25/0.5.0-release/index.html
@@ -0,0 +1,236 @@
+<!DOCTYPE html>
+<html lang="en-US">
+  <head>
+    <meta charset="UTF-8">
+    <title>Apache Arrow Homepage</title>
+    <meta http-equiv="X-UA-Compatible" content="IE=edge">
+    <meta name="viewport" content="width=device-width, initial-scale=1">
+    <meta name="generator" content="Jekyll v3.4.3">
+    <!-- The above 3 meta tags *must* come first in the head; any other head 
content must come *after* these tags -->
+    <link rel="icon" type="image/x-icon" href="/favicon.ico">
+
+    <link rel="stylesheet" 
href="//fonts.googleapis.com/css?family=Lato:300,300italic,400,400italic,700,700italic,900">
+
+    <link href="/css/main.css" rel="stylesheet">
+    <link href="/css/syntax.css" rel="stylesheet">
+    <script src="https://code.jquery.com/jquery-3.2.1.min.js";
+            integrity="sha256-hwg4gsxgFZhOsEEamdOYGBf13FyQuiTwlAQgxVSNgt4="
+            crossorigin="anonymous"></script>
+    <script src="/assets/javascripts/bootstrap.min.js"></script>
+    
+    <!-- Global Site Tag (gtag.js) - Google Analytics -->
+<script async 
src="https://www.googletagmanager.com/gtag/js?id=UA-107500873-1";></script>
+<script>
+  window.dataLayer = window.dataLayer || [];
+  function gtag(){dataLayer.push(arguments)};
+  gtag('js', new Date());
+
+  gtag('config', 'UA-107500873-1');
+</script>
+
+    
+  </head>
+
+
+
+<body class="wrap">
+  <div class="container">
+    <nav class="navbar navbar-default">
+  <div class="container-fluid">
+    <div class="navbar-header">
+      <button type="button" class="navbar-toggle" data-toggle="collapse" 
data-target="#arrow-navbar">
+        <span class="sr-only">Toggle navigation</span>
+        <span class="icon-bar"></span>
+        <span class="icon-bar"></span>
+        <span class="icon-bar"></span>
+      </button>
+      <a class="navbar-brand" href="/">Apache 
Arrow&#8482;&nbsp;&nbsp;&nbsp;</a>
+    </div>
+
+    <!-- Collect the nav links, forms, and other content for toggling -->
+    <div class="collapse navbar-collapse" id="arrow-navbar">
+      <ul class="nav navbar-nav">
+        <li class="dropdown">
+          <a href="#" class="dropdown-toggle" data-toggle="dropdown"
+             role="button" aria-haspopup="true"
+             aria-expanded="false">Project Links<span class="caret"></span>
+          </a>
+          <ul class="dropdown-menu">
+            <li><a href="/install/">Install</a></li>
+            <li><a href="/blog/">Blog</a></li>
+            <li><a href="/release/">Releases</a></li>
+            <li><a href="https://issues.apache.org/jira/browse/ARROW";>Issue 
Tracker</a></li>
+            <li><a href="https://github.com/apache/arrow";>Source Code</a></li>
+            <li><a 
href="http://mail-archives.apache.org/mod_mbox/arrow-dev/";>Mailing List</a></li>
+            <li><a href="https://apachearrowslackin.herokuapp.com";>Slack 
Channel</a></li>
+            <li><a href="/committers/">Committers</a></li>
+            <li><a href="/powered_by/">Powered By</a></li>
+          </ul>
+        </li>
+        <li class="dropdown">
+          <a href="#" class="dropdown-toggle" data-toggle="dropdown"
+             role="button" aria-haspopup="true"
+             aria-expanded="false">Specification<span class="caret"></span>
+          </a>
+          <ul class="dropdown-menu">
+            <li><a href="/docs/memory_layout.html">Memory Layout</a></li>
+            <li><a href="/docs/metadata.html">Metadata</a></li>
+            <li><a href="/docs/ipc.html">Messaging / IPC</a></li>
+          </ul>
+        </li>
+
+        <li class="dropdown">
+          <a href="#" class="dropdown-toggle" data-toggle="dropdown"
+             role="button" aria-haspopup="true"
+             aria-expanded="false">Documentation<span class="caret"></span>
+          </a>
+          <ul class="dropdown-menu">
+            <li><a href="/docs/python">Python</a></li>
+            <li><a href="/docs/cpp">C++ API</a></li>
+            <li><a href="/docs/java">Java API</a></li>
+            <li><a href="/docs/c_glib">C GLib API</a></li>
+            <li><a href="/docs/js">Javascript API</a></li>
+          </ul>
+        </li>
+        <!-- <li><a href="/blog">Blog</a></li> -->
+        <li class="dropdown">
+          <a href="#" class="dropdown-toggle" data-toggle="dropdown"
+             role="button" aria-haspopup="true"
+             aria-expanded="false">ASF Links<span class="caret"></span>
+          </a>
+          <ul class="dropdown-menu">
+            <li><a href="http://www.apache.org/";>ASF Website</a></li>
+            <li><a href="http://www.apache.org/licenses/";>License</a></li>
+            <li><a 
href="http://www.apache.org/foundation/sponsorship.html";>Donate</a></li>
+            <li><a 
href="http://www.apache.org/foundation/thanks.html";>Thanks</a></li>
+            <li><a href="http://www.apache.org/security/";>Security</a></li>
+          </ul>
+        </li>
+      </ul>
+      <a href="http://www.apache.org/";>
+        <img style="float:right;" src="/img/asf_logo.svg" width="120px"/>
+      </a>
+      </div><!-- /.navbar-collapse -->
+    </div>
+  </nav>
+
+
+    <h2>
+      Apache Arrow 0.5.0 Release
+      <a href="/blog/2017/07/25/0.5.0-release/" class="permalink" 
title="Permalink">â</a>
+    </h2>
+
+    
+
+    <div class="panel">
+      <div class="panel-body">
+        <div>
+          <span class="label label-default">Published</span>
+          <span class="published">
+            <i class="fa fa-calendar"></i>
+            25 Jul 2017
+          </span>
+        </div>
+        <div>
+          <span class="label label-default">By</span>
+          <a href="http://wesmckinney.com";><i class="fa fa-user"></i> Wes 
McKinney (wesm)</a>
+        </div>
+      </div>
+    </div>
+
+    <!--
+
+-->
+
+<p>The Apache Arrow team is pleased to announce the 0.5.0 release. It includes
+<a 
href="https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%200.5.0"><strong>130
 resolved JIRAs</strong></a> with some new features, expanded integration
+testing between implementations, and bug fixes. The Arrow memory format remains
+stable since the 0.3.x and 0.4.x releases.</p>
+
+<p>See the <a href="http://arrow.apache.org/install";>Install Page</a> to learn 
how to get the libraries for your
+platform. The <a href="http://arrow.apache.org/release/0.5.0.html";>complete 
changelog</a> is also available.</p>
+
+<h2 id="expanded-integration-testing">Expanded Integration Testing</h2>
+
+<p>In this release, we added compatibility tests for dictionary-encoded data
+between Java and C++. This enables the distinct values (the 
<em>dictionary</em>) in a
+vector to be transmitted as part of an Arrow schema while the record batches
+contain integers which correspond to the dictionary.</p>
+
+<p>So we might have:</p>
+
+<div class="highlighter-rouge"><pre class="highlight"><code>data (string): 
['foo', 'bar', 'foo', 'bar']
+</code></pre>
+</div>
+
+<p>In dictionary-encoded form, this could be represented as:</p>
+
+<div class="highlighter-rouge"><pre class="highlight"><code>indices (int8): 
[0, 1, 0, 1]
+dictionary (string): ['foo', 'bar']
+</code></pre>
+</div>
+
+<p>In upcoming releases, we plan to complete integration testing for the 
remaining
+data types (including some more complicated types like unions and decimals) on
+the road to a 1.0.0 release in the future.</p>
+
+<h2 id="c-activity">C++ Activity</h2>
+
+<p>We completed a number of significant pieces of work in the C++ part of 
Apache
+Arrow.</p>
+
+<h3 id="using-jemalloc-as-default-memory-allocator">Using jemalloc as default 
memory allocator</h3>
+
+<p>We decided to use <a 
href="https://github.com/jemalloc/jemalloc";>jemalloc</a> as the default memory 
allocator unless it is
+explicitly disabled. This memory allocator has significant performance
+advantages in Arrow workloads over the default <code 
class="highlighter-rouge">malloc</code> implementation. We will
+publish a blog post going into more detail about this and why you might 
care.</p>
+
+<h3 id="sharing-more-c-code-with-apache-parquet">Sharing more C++ code with 
Apache Parquet</h3>
+
+<p>We imported the compression library interfaces and dictionary encoding
+algorithms from the <a href="http://github.com/apache/parquet-cpp";>Apache 
Parquet C++ library</a>. The Parquet library now
+depends on this code in Arrow, and we will be able to use it more easily for
+data compression in Arrow use cases.</p>
+
+<p>As part of incorporating Parquetâs dictionary encoding utilities, we have
+developed an <code class="highlighter-rouge">arrow::DictionaryBuilder</code> 
class to enable building
+dictionary-encoded arrays iteratively. This can help save memory and yield
+better performance when interacting with databases, Parquet files, or other
+sources which may have columns having many duplicates.</p>
+
+<h3 id="support-for-lz4-and-zstd-compressors">Support for LZ4 and ZSTD 
compressors</h3>
+
+<p>We added LZ4 and ZSTD compression library support. In ARROW-300 and other
+planned work, we intend to add some compression features for data sent via 
RPC.</p>
+
+<h2 id="python-activity">Python Activity</h2>
+
+<p>We fixed many bugs which were affecting Parquet and Feather users and fixed
+several other rough edges with normal Arrow use. We also added some additional
+Arrow type conversions: structs, lists embedded in pandas objects, and Arrow
+time types (which deserialize to the <code 
class="highlighter-rouge">datetime.time</code> type).</p>
+
+<p>In upcoming releases we plan to continue to improve <a 
href="http://github.com/dask/dask";>Dask</a> support and
+performance for distributed processing of Apache Parquet files with 
pyarrow.</p>
+
+<h2 id="the-road-ahead">The Road Ahead</h2>
+
+<p>We have much work ahead of us to build out Arrow integrations in other data
+systems to improve their processing performance and interoperability with other
+systems.</p>
+
+<p>We are discussing the roadmap to a future 1.0.0 release on the <a 
href="http://mail-archives.apache.org/mod_mbox/arrow-dev/";>developer
+mailing list</a>. Please join the discussion there.</p>
+
+
+
+    <hr/>
+<footer class="footer">
+  <p>Apache Arrow, Arrow, Apache, the Apache feather logo, and the Apache 
Arrow project logo are either registered trademarks or trademarks of The Apache 
Software Foundation in the United States and other countries.</p>
+  <p>&copy; 2017 Apache Software Foundation</p>
+</footer>
+
+  </div>
+</body>
+</html>

http://git-wip-us.apache.org/repos/asf/arrow-site/blob/3cd84682/build/blog/2017/07/26/spark-arrow/index.html
----------------------------------------------------------------------
diff --git a/build/blog/2017/07/26/spark-arrow/index.html 
b/build/blog/2017/07/26/spark-arrow/index.html
new file mode 100644
index 0000000..1cbd2ea
--- /dev/null
+++ b/build/blog/2017/07/26/spark-arrow/index.html
@@ -0,0 +1,272 @@
+<!DOCTYPE html>
+<html lang="en-US">
+  <head>
+    <meta charset="UTF-8">
+    <title>Apache Arrow Homepage</title>
+    <meta http-equiv="X-UA-Compatible" content="IE=edge">
+    <meta name="viewport" content="width=device-width, initial-scale=1">
+    <meta name="generator" content="Jekyll v3.4.3">
+    <!-- The above 3 meta tags *must* come first in the head; any other head 
content must come *after* these tags -->
+    <link rel="icon" type="image/x-icon" href="/favicon.ico">
+
+    <link rel="stylesheet" 
href="//fonts.googleapis.com/css?family=Lato:300,300italic,400,400italic,700,700italic,900">
+
+    <link href="/css/main.css" rel="stylesheet">
+    <link href="/css/syntax.css" rel="stylesheet">
+    <script src="https://code.jquery.com/jquery-3.2.1.min.js";
+            integrity="sha256-hwg4gsxgFZhOsEEamdOYGBf13FyQuiTwlAQgxVSNgt4="
+            crossorigin="anonymous"></script>
+    <script src="/assets/javascripts/bootstrap.min.js"></script>
+    
+    <!-- Global Site Tag (gtag.js) - Google Analytics -->
+<script async 
src="https://www.googletagmanager.com/gtag/js?id=UA-107500873-1";></script>
+<script>
+  window.dataLayer = window.dataLayer || [];
+  function gtag(){dataLayer.push(arguments)};
+  gtag('js', new Date());
+
+  gtag('config', 'UA-107500873-1');
+</script>
+
+    
+  </head>
+
+
+
+<body class="wrap">
+  <div class="container">
+    <nav class="navbar navbar-default">
+  <div class="container-fluid">
+    <div class="navbar-header">
+      <button type="button" class="navbar-toggle" data-toggle="collapse" 
data-target="#arrow-navbar">
+        <span class="sr-only">Toggle navigation</span>
+        <span class="icon-bar"></span>
+        <span class="icon-bar"></span>
+        <span class="icon-bar"></span>
+      </button>
+      <a class="navbar-brand" href="/">Apache 
Arrow&#8482;&nbsp;&nbsp;&nbsp;</a>
+    </div>
+
+    <!-- Collect the nav links, forms, and other content for toggling -->
+    <div class="collapse navbar-collapse" id="arrow-navbar">
+      <ul class="nav navbar-nav">
+        <li class="dropdown">
+          <a href="#" class="dropdown-toggle" data-toggle="dropdown"
+             role="button" aria-haspopup="true"
+             aria-expanded="false">Project Links<span class="caret"></span>
+          </a>
+          <ul class="dropdown-menu">
+            <li><a href="/install/">Install</a></li>
+            <li><a href="/blog/">Blog</a></li>
+            <li><a href="/release/">Releases</a></li>
+            <li><a href="https://issues.apache.org/jira/browse/ARROW";>Issue 
Tracker</a></li>
+            <li><a href="https://github.com/apache/arrow";>Source Code</a></li>
+            <li><a 
href="http://mail-archives.apache.org/mod_mbox/arrow-dev/";>Mailing List</a></li>
+            <li><a href="https://apachearrowslackin.herokuapp.com";>Slack 
Channel</a></li>
+            <li><a href="/committers/">Committers</a></li>
+            <li><a href="/powered_by/">Powered By</a></li>
+          </ul>
+        </li>
+        <li class="dropdown">
+          <a href="#" class="dropdown-toggle" data-toggle="dropdown"
+             role="button" aria-haspopup="true"
+             aria-expanded="false">Specification<span class="caret"></span>
+          </a>
+          <ul class="dropdown-menu">
+            <li><a href="/docs/memory_layout.html">Memory Layout</a></li>
+            <li><a href="/docs/metadata.html">Metadata</a></li>
+            <li><a href="/docs/ipc.html">Messaging / IPC</a></li>
+          </ul>
+        </li>
+
+        <li class="dropdown">
+          <a href="#" class="dropdown-toggle" data-toggle="dropdown"
+             role="button" aria-haspopup="true"
+             aria-expanded="false">Documentation<span class="caret"></span>
+          </a>
+          <ul class="dropdown-menu">
+            <li><a href="/docs/python">Python</a></li>
+            <li><a href="/docs/cpp">C++ API</a></li>
+            <li><a href="/docs/java">Java API</a></li>
+            <li><a href="/docs/c_glib">C GLib API</a></li>
+            <li><a href="/docs/js">Javascript API</a></li>
+          </ul>
+        </li>
+        <!-- <li><a href="/blog">Blog</a></li> -->
+        <li class="dropdown">
+          <a href="#" class="dropdown-toggle" data-toggle="dropdown"
+             role="button" aria-haspopup="true"
+             aria-expanded="false">ASF Links<span class="caret"></span>
+          </a>
+          <ul class="dropdown-menu">
+            <li><a href="http://www.apache.org/";>ASF Website</a></li>
+            <li><a href="http://www.apache.org/licenses/";>License</a></li>
+            <li><a 
href="http://www.apache.org/foundation/sponsorship.html";>Donate</a></li>
+            <li><a 
href="http://www.apache.org/foundation/thanks.html";>Thanks</a></li>
+            <li><a href="http://www.apache.org/security/";>Security</a></li>
+          </ul>
+        </li>
+      </ul>
+      <a href="http://www.apache.org/";>
+        <img style="float:right;" src="/img/asf_logo.svg" width="120px"/>
+      </a>
+      </div><!-- /.navbar-collapse -->
+    </div>
+  </nav>
+
+
+    <h2>
+      Speeding up PySpark with Apache Arrow
+      <a href="/blog/2017/07/26/spark-arrow/" class="permalink" 
title="Permalink">â</a>
+    </h2>
+
+    
+
+    <div class="panel">
+      <div class="panel-body">
+        <div>
+          <span class="label label-default">Published</span>
+          <span class="published">
+            <i class="fa fa-calendar"></i>
+            26 Jul 2017
+          </span>
+        </div>
+        <div>
+          <span class="label label-default">By</span>
+          <a href="http://people.apache.org/~BryanCutler";><i class="fa 
fa-user"></i>  (BryanCutler)</a>
+        </div>
+      </div>
+    </div>
+
+    <!--
+
+-->
+
+<p><em><a href="https://github.com/BryanCutler";>Bryan Cutler</a> is a software 
engineer at IBMâs Spark Technology Center <a 
href="http://www.spark.tc/";>STC</a></em></p>
+
+<p>Beginning with <a href="https://spark.apache.org/";>Apache Spark</a> version 
2.3, <a href="https://arrow.apache.org/";>Apache Arrow</a> will be a supported
+dependency and begin to offer increased performance with columnar data 
transfer.
+If you are a Spark user that prefers to work in Python and Pandas, this is a 
cause
+to be excited over! The initial work is limited to collecting a Spark DataFrame
+with <code class="highlighter-rouge">toPandas()</code>, which I will discuss 
below, however there are many additional
+improvements that are currently <a 
href="https://issues.apache.org/jira/issues/?filter=12335725&amp;jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20text%20~%20%22arrow%22%20ORDER%20BY%20createdDate%20DESC">underway</a>.</p>
+
+<h1 id="optimizing-spark-conversion-to-pandas">Optimizing Spark Conversion to 
Pandas</h1>
+
+<p>The previous way of converting a Spark DataFrame to Pandas with <code 
class="highlighter-rouge">DataFrame.toPandas()</code>
+in PySpark was painfully inefficient. Basically, it worked by first collecting 
all
+rows to the Spark driver. Next, each row would get serialized into Pythonâs 
pickle
+format and sent to a Python worker process. This child process unpickles each 
row into
+a huge list of tuples. Finally, a Pandas DataFrame is created from the list 
using
+<code class="highlighter-rouge">pandas.DataFrame.from_records()</code>.</p>
+
+<p>This all might seem like standard procedure, but suffers from 2 glaring 
issues: 1)
+even using CPickle, Python serialization is a slow process and 2) creating
+a <code class="highlighter-rouge">pandas.DataFrame</code> using <code 
class="highlighter-rouge">from_records</code> must slowly iterate over the list 
of pure
+Python data and convert each value to Pandas format. See <a 
href="https://gist.github.com/wesm/0cb5531b1c2e346a0007";>here</a> for a detailed
+analysis.</p>
+
+<p>Here is where Arrow really shines to help optimize these steps: 1) Once the 
data is
+in Arrow memory format, there is no need to serialize/pickle anymore as Arrow 
data can
+be sent directly to the Python process, 2) When the Arrow data is received in 
Python,
+then pyarrow can utilize zero-copy methods to create a <code 
class="highlighter-rouge">pandas.DataFrame</code> from entire
+chunks of data at once instead of processing individual scalar values. 
Additionally,
+the conversion to Arrow data can be done on the JVM and pushed back for the 
Spark
+executors to perform in parallel, drastically reducing the load on the 
driver.</p>
+
+<p>As of the merging of <a 
href="https://issues.apache.org/jira/browse/SPARK-13534";>SPARK-13534</a>, the 
use of Arrow when calling <code class="highlighter-rouge">toPandas()</code>
+needs to be enabled by setting the SQLConf 
âspark.sql.execution.arrow.enabledâ to
+âtrueâ.  Letâs look at a simple usage example.</p>
+
+<div class="highlighter-rouge"><pre class="highlight"><code>Welcome to
+      ____              __
+     / __/__  ___ _____/ /__
+    _\ \/ _ \/ _ `/ __/  '_/
+   /__ / .__/\_,_/_/ /_/\_\   version 2.3.0-SNAPSHOT
+      /_/
+
+Using Python version 2.7.13 (default, Dec 20 2016 23:09:15)
+SparkSession available as 'spark'.
+
+In [1]: from pyspark.sql.functions import rand
+   ...: df = spark.range(1 &lt;&lt; 22).toDF("id").withColumn("x", rand())
+   ...: df.printSchema()
+   ...: 
+root
+ |-- id: long (nullable = false)
+ |-- x: double (nullable = false)
+
+
+In [2]: %time pdf = df.toPandas()
+CPU times: user 17.4 s, sys: 792 ms, total: 18.1 s
+Wall time: 20.7 s
+
+In [3]: spark.conf.set("spark.sql.execution.arrow.enabled", "true")
+
+In [4]: %time pdf = df.toPandas()
+CPU times: user 40 ms, sys: 32 ms, total: 72 ms                                
 
+Wall time: 737 ms
+
+In [5]: pdf.describe()
+Out[5]: 
+                 id             x
+count  4.194304e+06  4.194304e+06
+mean   2.097152e+06  4.998996e-01
+std    1.210791e+06  2.887247e-01
+min    0.000000e+00  8.291929e-07
+25%    1.048576e+06  2.498116e-01
+50%    2.097152e+06  4.999210e-01
+75%    3.145727e+06  7.498380e-01
+max    4.194303e+06  9.999996e-01
+</code></pre>
+</div>
+
+<p>This example was run locally on my laptop using Spark defaults so the times
+shown should not be taken precisely. Even though, it is clear there is a huge
+performance boost and using Arrow took something that was excruciatingly slow
+and speeds it up to be barely noticeable.</p>
+
+<h1 id="notes-on-usage">Notes on Usage</h1>
+
+<p>Here are some things to keep in mind before making use of this new feature. 
At
+the time of writing this, pyarrow will not be installed automatically with
+pyspark and needs to be manually installed, see installation <a 
href="https://github.com/apache/arrow/blob/master/site/install.md";>instructions</a>.
+It is planned to add pyarrow as a pyspark dependency so that 
+<code class="highlighter-rouge">&gt; pip install pyspark</code> will also 
install pyarrow.</p>
+
+<p>Currently, the controlling SQLConf is disabled by default. This can be 
enabled
+programmatically as in the example above or by adding the line
+âspark.sql.execution.arrow.enabled=trueâ to <code 
class="highlighter-rouge">SPARK_HOME/conf/spark-defaults.conf</code>.</p>
+
+<p>Also, not all Spark data types are currently supported and limited to 
primitive
+types. Expanded type support is in the works and expected to also be in the 
Spark
+2.3 release.</p>
+
+<h1 id="future-improvements">Future Improvements</h1>
+
+<p>As mentioned, this was just a first step in using Arrow to make life easier 
for
+Spark Python users. A few exciting initiatives in the works are to allow for
+vectorized UDF evaluation (<a 
href="https://issues.apache.org/jira/browse/SPARK-21190";>SPARK-21190</a>, <a 
href="https://issues.apache.org/jira/browse/SPARK-21404";>SPARK-21404</a>), and 
the ability
+to apply a function on grouped data using a Pandas DataFrame (<a 
href="https://issues.apache.org/jira/browse/SPARK-20396";>SPARK-20396</a>).
+Just as Arrow helped in converting a Spark to Pandas, it can also work in the
+other direction when creating a Spark DataFrame from an existing Pandas
+DataFrame (<a 
href="https://issues.apache.org/jira/browse/SPARK-20791";>SPARK-20791</a>). Stay 
tuned for more!</p>
+
+<h1 id="collaborators">Collaborators</h1>
+
+<p>Reaching this first milestone was a group effort from both the Apache Arrow 
and
+Spark communities. Thanks to the hard work of <a 
href="https://github.com/wesm";>Wes McKinney</a>, <a 
href="https://github.com/icexelloss";>Li Jin</a>,
+<a href="https://github.com/holdenk";>Holden Karau</a>, Reynold Xin, Wenchen 
Fan, Shane Knapp and many others that
+helped push this effort forwards.</p>
+
+
+
+    <hr/>
+<footer class="footer">
+  <p>Apache Arrow, Arrow, Apache, the Apache feather logo, and the Apache 
Arrow project logo are either registered trademarks or trademarks of The Apache 
Software Foundation in the United States and other countries.</p>
+  <p>&copy; 2017 Apache Software Foundation</p>
+</footer>
+
+  </div>
+</body>
+</html>

http://git-wip-us.apache.org/repos/asf/arrow-site/blob/3cd84682/build/blog/2017/08/07/plasma-in-memory-object-store/index.html
----------------------------------------------------------------------
diff --git a/build/blog/2017/08/07/plasma-in-memory-object-store/index.html 
b/build/blog/2017/08/07/plasma-in-memory-object-store/index.html
new file mode 100644
index 0000000..d2f25da
--- /dev/null
+++ b/build/blog/2017/08/07/plasma-in-memory-object-store/index.html
@@ -0,0 +1,273 @@
+<!DOCTYPE html>
+<html lang="en-US">
+  <head>
+    <meta charset="UTF-8">
+    <title>Apache Arrow Homepage</title>
+    <meta http-equiv="X-UA-Compatible" content="IE=edge">
+    <meta name="viewport" content="width=device-width, initial-scale=1">
+    <meta name="generator" content="Jekyll v3.4.3">
+    <!-- The above 3 meta tags *must* come first in the head; any other head 
content must come *after* these tags -->
+    <link rel="icon" type="image/x-icon" href="/favicon.ico">
+
+    <link rel="stylesheet" 
href="//fonts.googleapis.com/css?family=Lato:300,300italic,400,400italic,700,700italic,900">
+
+    <link href="/css/main.css" rel="stylesheet">
+    <link href="/css/syntax.css" rel="stylesheet">
+    <script src="https://code.jquery.com/jquery-3.2.1.min.js";
+            integrity="sha256-hwg4gsxgFZhOsEEamdOYGBf13FyQuiTwlAQgxVSNgt4="
+            crossorigin="anonymous"></script>
+    <script src="/assets/javascripts/bootstrap.min.js"></script>
+    
+    <!-- Global Site Tag (gtag.js) - Google Analytics -->
+<script async 
src="https://www.googletagmanager.com/gtag/js?id=UA-107500873-1";></script>
+<script>
+  window.dataLayer = window.dataLayer || [];
+  function gtag(){dataLayer.push(arguments)};
+  gtag('js', new Date());
+
+  gtag('config', 'UA-107500873-1');
+</script>
+
+    
+  </head>
+
+
+
+<body class="wrap">
+  <div class="container">
+    <nav class="navbar navbar-default">
+  <div class="container-fluid">
+    <div class="navbar-header">
+      <button type="button" class="navbar-toggle" data-toggle="collapse" 
data-target="#arrow-navbar">
+        <span class="sr-only">Toggle navigation</span>
+        <span class="icon-bar"></span>
+        <span class="icon-bar"></span>
+        <span class="icon-bar"></span>
+      </button>
+      <a class="navbar-brand" href="/">Apache 
Arrow&#8482;&nbsp;&nbsp;&nbsp;</a>
+    </div>
+
+    <!-- Collect the nav links, forms, and other content for toggling -->
+    <div class="collapse navbar-collapse" id="arrow-navbar">
+      <ul class="nav navbar-nav">
+        <li class="dropdown">
+          <a href="#" class="dropdown-toggle" data-toggle="dropdown"
+             role="button" aria-haspopup="true"
+             aria-expanded="false">Project Links<span class="caret"></span>
+          </a>
+          <ul class="dropdown-menu">
+            <li><a href="/install/">Install</a></li>
+            <li><a href="/blog/">Blog</a></li>
+            <li><a href="/release/">Releases</a></li>
+            <li><a href="https://issues.apache.org/jira/browse/ARROW";>Issue 
Tracker</a></li>
+            <li><a href="https://github.com/apache/arrow";>Source Code</a></li>
+            <li><a 
href="http://mail-archives.apache.org/mod_mbox/arrow-dev/";>Mailing List</a></li>
+            <li><a href="https://apachearrowslackin.herokuapp.com";>Slack 
Channel</a></li>
+            <li><a href="/committers/">Committers</a></li>
+            <li><a href="/powered_by/">Powered By</a></li>
+          </ul>
+        </li>
+        <li class="dropdown">
+          <a href="#" class="dropdown-toggle" data-toggle="dropdown"
+             role="button" aria-haspopup="true"
+             aria-expanded="false">Specification<span class="caret"></span>
+          </a>
+          <ul class="dropdown-menu">
+            <li><a href="/docs/memory_layout.html">Memory Layout</a></li>
+            <li><a href="/docs/metadata.html">Metadata</a></li>
+            <li><a href="/docs/ipc.html">Messaging / IPC</a></li>
+          </ul>
+        </li>
+
+        <li class="dropdown">
+          <a href="#" class="dropdown-toggle" data-toggle="dropdown"
+             role="button" aria-haspopup="true"
+             aria-expanded="false">Documentation<span class="caret"></span>
+          </a>
+          <ul class="dropdown-menu">
+            <li><a href="/docs/python">Python</a></li>
+            <li><a href="/docs/cpp">C++ API</a></li>
+            <li><a href="/docs/java">Java API</a></li>
+            <li><a href="/docs/c_glib">C GLib API</a></li>
+          </ul>
+        </li>
+        <!-- <li><a href="/blog">Blog</a></li> -->
+        <li class="dropdown">
+          <a href="#" class="dropdown-toggle" data-toggle="dropdown"
+             role="button" aria-haspopup="true"
+             aria-expanded="false">ASF Links<span class="caret"></span>
+          </a>
+          <ul class="dropdown-menu">
+            <li><a href="http://www.apache.org/";>ASF Website</a></li>
+            <li><a href="http://www.apache.org/licenses/";>License</a></li>
+            <li><a 
href="http://www.apache.org/foundation/sponsorship.html";>Donate</a></li>
+            <li><a 
href="http://www.apache.org/foundation/thanks.html";>Thanks</a></li>
+            <li><a href="http://www.apache.org/security/";>Security</a></li>
+          </ul>
+        </li>
+      </ul>
+      <a href="http://www.apache.org/";>
+        <img style="float:right;" src="/img/asf_logo.svg" width="120px"/>
+      </a>
+      </div><!-- /.navbar-collapse -->
+    </div>
+  </nav>
+
+
+    <h2>
+      Plasma In-Memory Object Store
+      <a href="/blog/2017/08/07/plasma-in-memory-object-store/" 
class="permalink" title="Permalink">â</a>
+    </h2>
+
+    
+
+    <div class="panel">
+      <div class="panel-body">
+        <div>
+          <span class="label label-default">Published</span>
+          <span class="published">
+            <i class="fa fa-calendar"></i>
+            07 Aug 2017
+          </span>
+        </div>
+        <div>
+          <span class="label label-default">By</span>
+          <a href="http://people.apache.org/~Philipp Moritz and Robert 
Nishihara"><i class="fa fa-user"></i>  (Philipp Moritz and Robert Nishihara)</a>
+        </div>
+      </div>
+    </div>
+
+    <!--
+
+-->
+
+<p><em><a href="https://people.eecs.berkeley.edu/~pcmoritz/";>Philipp 
Moritz</a> and <a href="http://www.robertnishihara.com";>Robert Nishihara</a> 
are graduate students at UC
+ Berkeley.</em></p>
+
+<h2 id="plasma-a-high-performance-shared-memory-object-store">Plasma: A 
High-Performance Shared-Memory Object Store</h2>
+
+<h3 id="motivating-plasma">Motivating Plasma</h3>
+
+<p>This blog post presents Plasma, an in-memory object store that is being
+developed as part of Apache Arrow. <strong>Plasma holds immutable objects in 
shared
+memory so that they can be accessed efficiently by many clients across process
+boundaries.</strong> In light of the trend toward larger and larger multicore 
machines,
+Plasma enables critical performance optimizations in the big data regime.</p>
+
+<p>Plasma was initially developed as part of <a 
href="https://github.com/ray-project/ray";>Ray</a>, and has recently been moved
+to Apache Arrow in the hopes that it will be broadly useful.</p>
+
+<p>One of the goals of Apache Arrow is to serve as a common data layer enabling
+zero-copy data exchange between multiple frameworks. A key component of this
+vision is the use of off-heap memory management (via Plasma) for storing and
+sharing Arrow-serialized objects between applications.</p>
+
+<p><strong>Expensive serialization and deserialization as well as data copying 
are a
+common performance bottleneck in distributed computing.</strong> For example, a
+Python-based execution framework that wishes to distribute computation across
+multiple Python âworkerâ processes and then aggregate the results in a 
single
+âdriverâ process may choose to serialize data using the built-in <code 
class="highlighter-rouge">pickle</code>
+library. Assuming one Python process per core, each worker process would have 
to
+copy and deserialize the data, resulting in excessive memory usage. The driver
+process would then have to deserialize results from each of the workers,
+resulting in a bottleneck.</p>
+
+<p>Using Plasma plus Arrow, the data being operated on would be placed in the
+Plasma store once, and all of the workers would read the data without copying 
or
+deserializing it (the workers would map the relevant region of memory into 
their
+own address spaces). The workers would then put the results of their 
computation
+back into the Plasma store, which the driver could then read and aggregate
+without copying or deserializing the data.</p>
+
+<h3 id="the-plasma-api">The Plasma API:</h3>
+
+<p>Below we illustrate a subset of the API. The C++ API is documented more 
fully
+<a 
href="https://github.com/apache/arrow/blob/master/cpp/apidoc/tutorials/plasma.md";>here</a>,
 and the Python API is documented <a 
href="https://github.com/apache/arrow/blob/master/python/doc/source/plasma.rst";>here</a>.</p>
+
+<p><strong>Object IDs:</strong> Each object is associated with a string of 
bytes.</p>
+
+<p><strong>Creating an object:</strong> Objects are stored in Plasma in two 
stages. First, the
+object store <em>creates</em> the object by allocating a buffer for it. At 
this point,
+the client can write to the buffer and construct the object within the 
allocated
+buffer. When the client is done, the client <em>seals</em> the buffer making 
the object
+immutable and making it available to other Plasma clients.</p>
+
+<div class="language-python highlighter-rouge"><pre 
class="highlight"><code><span class="c"># Create an object.</span>
+<span class="n">object_id</span> <span class="o">=</span> <span 
class="n">pyarrow</span><span class="o">.</span><span 
class="n">plasma</span><span class="o">.</span><span 
class="n">ObjectID</span><span class="p">(</span><span class="mi">20</span> 
<span class="o">*</span> <span class="n">b</span><span 
class="s">'a'</span><span class="p">)</span>
+<span class="n">object_size</span> <span class="o">=</span> <span 
class="mi">1000</span>
+<span class="nb">buffer</span> <span class="o">=</span> <span 
class="n">memoryview</span><span class="p">(</span><span 
class="n">client</span><span class="o">.</span><span 
class="n">create</span><span class="p">(</span><span 
class="n">object_id</span><span class="p">,</span> <span 
class="n">object_size</span><span class="p">))</span>
+
+<span class="c"># Write to the buffer.</span>
+<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> 
<span class="nb">range</span><span class="p">(</span><span 
class="mi">1000</span><span class="p">):</span>
+    <span class="nb">buffer</span><span class="p">[</span><span 
class="n">i</span><span class="p">]</span> <span class="o">=</span> <span 
class="mi">0</span>
+
+<span class="c"># Seal the object making it immutable and available to other 
clients.</span>
+<span class="n">client</span><span class="o">.</span><span 
class="n">seal</span><span class="p">(</span><span 
class="n">object_id</span><span class="p">)</span>
+</code></pre>
+</div>
+
+<p><strong>Getting an object:</strong> After an object has been sealed, any 
client who knows the
+object ID can get the object.</p>
+
+<div class="language-python highlighter-rouge"><pre 
class="highlight"><code><span class="c"># Get the object from the store. This 
blocks until the object has been sealed.</span>
+<span class="n">object_id</span> <span class="o">=</span> <span 
class="n">pyarrow</span><span class="o">.</span><span 
class="n">plasma</span><span class="o">.</span><span 
class="n">ObjectID</span><span class="p">(</span><span class="mi">20</span> 
<span class="o">*</span> <span class="n">b</span><span 
class="s">'a'</span><span class="p">)</span>
+<span class="p">[</span><span class="n">buff</span><span class="p">]</span> 
<span class="o">=</span> <span class="n">client</span><span 
class="o">.</span><span class="n">get</span><span class="p">([</span><span 
class="n">object_id</span><span class="p">])</span>
+<span class="nb">buffer</span> <span class="o">=</span> <span 
class="n">memoryview</span><span class="p">(</span><span 
class="n">buff</span><span class="p">)</span>
+</code></pre>
+</div>
+
+<p>If the object has not been sealed yet, then the call to <code 
class="highlighter-rouge">client.get</code> will block
+until the object has been sealed.</p>
+
+<h3 id="a-sorting-application">A sorting application</h3>
+
+<p>To illustrate the benefits of Plasma, we demonstrate an <strong>11x 
speedup</strong> (on a
+machine with 20 physical cores) for sorting a large pandas DataFrame (one
+billion entries). The baseline is the built-in pandas sort function, which 
sorts
+the DataFrame in 477 seconds. To leverage multiple cores, we implement the
+following standard distributed sorting scheme.</p>
+
+<ul>
+  <li>We assume that the data is partitioned across K pandas DataFrames and 
that
+each one already lives in the Plasma store.</li>
+  <li>We subsample the data, sort the subsampled data, and use the result to 
define
+L non-overlapping buckets.</li>
+  <li>For each of the K data partitions and each of the L buckets, we find the
+subset of the data partition that falls in the bucket, and we sort that
+subset.</li>
+  <li>For each of the L buckets, we gather all of the K sorted subsets that 
fall in
+that bucket.</li>
+  <li>For each of the L buckets, we merge the corresponding K sorted 
subsets.</li>
+  <li>We turn each bucket into a pandas DataFrame and place it in the Plasma 
store.</li>
+</ul>
+
+<p>Using this scheme, we can sort the DataFrame (the data starts and ends in 
the
+Plasma store), in 44 seconds, giving an 11x speedup over the baseline.</p>
+
+<h3 id="design">Design</h3>
+
+<p>The Plasma store runs as a separate process. It is written in C++ and is
+designed as a single-threaded event loop based on the <a 
href="https://redis.io/";>Redis</a> event loop library.
+The plasma client library can be linked into applications. Clients communicate
+with the Plasma store via messages serialized using <a 
href="https://google.github.io/flatbuffers/";>Google Flatbuffers</a>.</p>
+
+<h3 id="call-for-contributions">Call for contributions</h3>
+
+<p>Plasma is a work in progress, and the API is currently unstable. Today 
Plasma is
+primarily used in <a href="https://github.com/ray-project/ray";>Ray</a> as an 
in-memory cache for Arrow serialized objects.
+We are looking for a broader set of use cases to help refine Plasmaâs API. In
+addition, we are looking for contributions in a variety of areas including
+improving performance and building other language bindings. Please let us know
+if you are interested in getting involved with the project.</p>
+
+
+
+    <hr/>
+<footer class="footer">
+  <p>Apache Arrow, Arrow, Apache, the Apache feather logo, and the Apache 
Arrow project logo are either registered trademarks or trademarks of The Apache 
Software Foundation in the United States and other countries.</p>
+  <p>&copy; 2017 Apache Software Foundation</p>
+</footer>
+
+  </div>
+</body>
+</html>

http://git-wip-us.apache.org/repos/asf/arrow-site/blob/3cd84682/build/blog/2017/08/08/plasma-in-memory-object-store/index.html
----------------------------------------------------------------------
diff --git a/build/blog/2017/08/08/plasma-in-memory-object-store/index.html 
b/build/blog/2017/08/08/plasma-in-memory-object-store/index.html
new file mode 100644
index 0000000..528ad5f
--- /dev/null
+++ b/build/blog/2017/08/08/plasma-in-memory-object-store/index.html
@@ -0,0 +1,274 @@
+<!DOCTYPE html>
+<html lang="en-US">
+  <head>
+    <meta charset="UTF-8">
+    <title>Apache Arrow Homepage</title>
+    <meta http-equiv="X-UA-Compatible" content="IE=edge">
+    <meta name="viewport" content="width=device-width, initial-scale=1">
+    <meta name="generator" content="Jekyll v3.4.3">
+    <!-- The above 3 meta tags *must* come first in the head; any other head 
content must come *after* these tags -->
+    <link rel="icon" type="image/x-icon" href="/favicon.ico">
+
+    <link rel="stylesheet" 
href="//fonts.googleapis.com/css?family=Lato:300,300italic,400,400italic,700,700italic,900">
+
+    <link href="/css/main.css" rel="stylesheet">
+    <link href="/css/syntax.css" rel="stylesheet">
+    <script src="https://code.jquery.com/jquery-3.2.1.min.js";
+            integrity="sha256-hwg4gsxgFZhOsEEamdOYGBf13FyQuiTwlAQgxVSNgt4="
+            crossorigin="anonymous"></script>
+    <script src="/assets/javascripts/bootstrap.min.js"></script>
+    
+    <!-- Global Site Tag (gtag.js) - Google Analytics -->
+<script async 
src="https://www.googletagmanager.com/gtag/js?id=UA-107500873-1";></script>
+<script>
+  window.dataLayer = window.dataLayer || [];
+  function gtag(){dataLayer.push(arguments)};
+  gtag('js', new Date());
+
+  gtag('config', 'UA-107500873-1');
+</script>
+
+    
+  </head>
+
+
+
+<body class="wrap">
+  <div class="container">
+    <nav class="navbar navbar-default">
+  <div class="container-fluid">
+    <div class="navbar-header">
+      <button type="button" class="navbar-toggle" data-toggle="collapse" 
data-target="#arrow-navbar">
+        <span class="sr-only">Toggle navigation</span>
+        <span class="icon-bar"></span>
+        <span class="icon-bar"></span>
+        <span class="icon-bar"></span>
+      </button>
+      <a class="navbar-brand" href="/">Apache 
Arrow&#8482;&nbsp;&nbsp;&nbsp;</a>
+    </div>
+
+    <!-- Collect the nav links, forms, and other content for toggling -->
+    <div class="collapse navbar-collapse" id="arrow-navbar">
+      <ul class="nav navbar-nav">
+        <li class="dropdown">
+          <a href="#" class="dropdown-toggle" data-toggle="dropdown"
+             role="button" aria-haspopup="true"
+             aria-expanded="false">Project Links<span class="caret"></span>
+          </a>
+          <ul class="dropdown-menu">
+            <li><a href="/install/">Install</a></li>
+            <li><a href="/blog/">Blog</a></li>
+            <li><a href="/release/">Releases</a></li>
+            <li><a href="https://issues.apache.org/jira/browse/ARROW";>Issue 
Tracker</a></li>
+            <li><a href="https://github.com/apache/arrow";>Source Code</a></li>
+            <li><a 
href="http://mail-archives.apache.org/mod_mbox/arrow-dev/";>Mailing List</a></li>
+            <li><a href="https://apachearrowslackin.herokuapp.com";>Slack 
Channel</a></li>
+            <li><a href="/committers/">Committers</a></li>
+            <li><a href="/powered_by/">Powered By</a></li>
+          </ul>
+        </li>
+        <li class="dropdown">
+          <a href="#" class="dropdown-toggle" data-toggle="dropdown"
+             role="button" aria-haspopup="true"
+             aria-expanded="false">Specification<span class="caret"></span>
+          </a>
+          <ul class="dropdown-menu">
+            <li><a href="/docs/memory_layout.html">Memory Layout</a></li>
+            <li><a href="/docs/metadata.html">Metadata</a></li>
+            <li><a href="/docs/ipc.html">Messaging / IPC</a></li>
+          </ul>
+        </li>
+
+        <li class="dropdown">
+          <a href="#" class="dropdown-toggle" data-toggle="dropdown"
+             role="button" aria-haspopup="true"
+             aria-expanded="false">Documentation<span class="caret"></span>
+          </a>
+          <ul class="dropdown-menu">
+            <li><a href="/docs/python">Python</a></li>
+            <li><a href="/docs/cpp">C++ API</a></li>
+            <li><a href="/docs/java">Java API</a></li>
+            <li><a href="/docs/c_glib">C GLib API</a></li>
+            <li><a href="/docs/js">Javascript API</a></li>
+          </ul>
+        </li>
+        <!-- <li><a href="/blog">Blog</a></li> -->
+        <li class="dropdown">
+          <a href="#" class="dropdown-toggle" data-toggle="dropdown"
+             role="button" aria-haspopup="true"
+             aria-expanded="false">ASF Links<span class="caret"></span>
+          </a>
+          <ul class="dropdown-menu">
+            <li><a href="http://www.apache.org/";>ASF Website</a></li>
+            <li><a href="http://www.apache.org/licenses/";>License</a></li>
+            <li><a 
href="http://www.apache.org/foundation/sponsorship.html";>Donate</a></li>
+            <li><a 
href="http://www.apache.org/foundation/thanks.html";>Thanks</a></li>
+            <li><a href="http://www.apache.org/security/";>Security</a></li>
+          </ul>
+        </li>
+      </ul>
+      <a href="http://www.apache.org/";>
+        <img style="float:right;" src="/img/asf_logo.svg" width="120px"/>
+      </a>
+      </div><!-- /.navbar-collapse -->
+    </div>
+  </nav>
+
+
+    <h2>
+      Plasma In-Memory Object Store
+      <a href="/blog/2017/08/08/plasma-in-memory-object-store/" 
class="permalink" title="Permalink">â</a>
+    </h2>
+
+    
+
+    <div class="panel">
+      <div class="panel-body">
+        <div>
+          <span class="label label-default">Published</span>
+          <span class="published">
+            <i class="fa fa-calendar"></i>
+            08 Aug 2017
+          </span>
+        </div>
+        <div>
+          <span class="label label-default">By</span>
+          <a href="http://people.apache.org/~Philipp Moritz and Robert 
Nishihara"><i class="fa fa-user"></i>  (Philipp Moritz and Robert Nishihara)</a>
+        </div>
+      </div>
+    </div>
+
+    <!--
+
+-->
+
+<p><em><a href="https://people.eecs.berkeley.edu/~pcmoritz/";>Philipp 
Moritz</a> and <a href="http://www.robertnishihara.com";>Robert Nishihara</a> 
are graduate students at UC
+ Berkeley.</em></p>
+
+<h2 id="plasma-a-high-performance-shared-memory-object-store">Plasma: A 
High-Performance Shared-Memory Object Store</h2>
+
+<h3 id="motivating-plasma">Motivating Plasma</h3>
+
+<p>This blog post presents Plasma, an in-memory object store that is being
+developed as part of Apache Arrow. <strong>Plasma holds immutable objects in 
shared
+memory so that they can be accessed efficiently by many clients across process
+boundaries.</strong> In light of the trend toward larger and larger multicore 
machines,
+Plasma enables critical performance optimizations in the big data regime.</p>
+
+<p>Plasma was initially developed as part of <a 
href="https://github.com/ray-project/ray";>Ray</a>, and has recently been moved
+to Apache Arrow in the hopes that it will be broadly useful.</p>
+
+<p>One of the goals of Apache Arrow is to serve as a common data layer enabling
+zero-copy data exchange between multiple frameworks. A key component of this
+vision is the use of off-heap memory management (via Plasma) for storing and
+sharing Arrow-serialized objects between applications.</p>
+
+<p><strong>Expensive serialization and deserialization as well as data copying 
are a
+common performance bottleneck in distributed computing.</strong> For example, a
+Python-based execution framework that wishes to distribute computation across
+multiple Python âworkerâ processes and then aggregate the results in a 
single
+âdriverâ process may choose to serialize data using the built-in <code 
class="highlighter-rouge">pickle</code>
+library. Assuming one Python process per core, each worker process would have 
to
+copy and deserialize the data, resulting in excessive memory usage. The driver
+process would then have to deserialize results from each of the workers,
+resulting in a bottleneck.</p>
+
+<p>Using Plasma plus Arrow, the data being operated on would be placed in the
+Plasma store once, and all of the workers would read the data without copying 
or
+deserializing it (the workers would map the relevant region of memory into 
their
+own address spaces). The workers would then put the results of their 
computation
+back into the Plasma store, which the driver could then read and aggregate
+without copying or deserializing the data.</p>
+
+<h3 id="the-plasma-api">The Plasma API:</h3>
+
+<p>Below we illustrate a subset of the API. The C++ API is documented more 
fully
+<a 
href="https://github.com/apache/arrow/blob/master/cpp/apidoc/tutorials/plasma.md";>here</a>,
 and the Python API is documented <a 
href="https://github.com/apache/arrow/blob/master/python/doc/source/plasma.rst";>here</a>.</p>
+
+<p><strong>Object IDs:</strong> Each object is associated with a string of 
bytes.</p>
+
+<p><strong>Creating an object:</strong> Objects are stored in Plasma in two 
stages. First, the
+object store <em>creates</em> the object by allocating a buffer for it. At 
this point,
+the client can write to the buffer and construct the object within the 
allocated
+buffer. When the client is done, the client <em>seals</em> the buffer making 
the object
+immutable and making it available to other Plasma clients.</p>
+
+<div class="language-python highlighter-rouge"><pre 
class="highlight"><code><span class="c"># Create an object.</span>
+<span class="n">object_id</span> <span class="o">=</span> <span 
class="n">pyarrow</span><span class="o">.</span><span 
class="n">plasma</span><span class="o">.</span><span 
class="n">ObjectID</span><span class="p">(</span><span class="mi">20</span> 
<span class="o">*</span> <span class="n">b</span><span 
class="s">'a'</span><span class="p">)</span>
+<span class="n">object_size</span> <span class="o">=</span> <span 
class="mi">1000</span>
+<span class="nb">buffer</span> <span class="o">=</span> <span 
class="n">memoryview</span><span class="p">(</span><span 
class="n">client</span><span class="o">.</span><span 
class="n">create</span><span class="p">(</span><span 
class="n">object_id</span><span class="p">,</span> <span 
class="n">object_size</span><span class="p">))</span>
+
+<span class="c"># Write to the buffer.</span>
+<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> 
<span class="nb">range</span><span class="p">(</span><span 
class="mi">1000</span><span class="p">):</span>
+    <span class="nb">buffer</span><span class="p">[</span><span 
class="n">i</span><span class="p">]</span> <span class="o">=</span> <span 
class="mi">0</span>
+
+<span class="c"># Seal the object making it immutable and available to other 
clients.</span>
+<span class="n">client</span><span class="o">.</span><span 
class="n">seal</span><span class="p">(</span><span 
class="n">object_id</span><span class="p">)</span>
+</code></pre>
+</div>
+
+<p><strong>Getting an object:</strong> After an object has been sealed, any 
client who knows the
+object ID can get the object.</p>
+
+<div class="language-python highlighter-rouge"><pre 
class="highlight"><code><span class="c"># Get the object from the store. This 
blocks until the object has been sealed.</span>
+<span class="n">object_id</span> <span class="o">=</span> <span 
class="n">pyarrow</span><span class="o">.</span><span 
class="n">plasma</span><span class="o">.</span><span 
class="n">ObjectID</span><span class="p">(</span><span class="mi">20</span> 
<span class="o">*</span> <span class="n">b</span><span 
class="s">'a'</span><span class="p">)</span>
+<span class="p">[</span><span class="n">buff</span><span class="p">]</span> 
<span class="o">=</span> <span class="n">client</span><span 
class="o">.</span><span class="n">get</span><span class="p">([</span><span 
class="n">object_id</span><span class="p">])</span>
+<span class="nb">buffer</span> <span class="o">=</span> <span 
class="n">memoryview</span><span class="p">(</span><span 
class="n">buff</span><span class="p">)</span>
+</code></pre>
+</div>
+
+<p>If the object has not been sealed yet, then the call to <code 
class="highlighter-rouge">client.get</code> will block
+until the object has been sealed.</p>
+
+<h3 id="a-sorting-application">A sorting application</h3>
+
+<p>To illustrate the benefits of Plasma, we demonstrate an <strong>11x 
speedup</strong> (on a
+machine with 20 physical cores) for sorting a large pandas DataFrame (one
+billion entries). The baseline is the built-in pandas sort function, which 
sorts
+the DataFrame in 477 seconds. To leverage multiple cores, we implement the
+following standard distributed sorting scheme.</p>
+
+<ul>
+  <li>We assume that the data is partitioned across K pandas DataFrames and 
that
+each one already lives in the Plasma store.</li>
+  <li>We subsample the data, sort the subsampled data, and use the result to 
define
+L non-overlapping buckets.</li>
+  <li>For each of the K data partitions and each of the L buckets, we find the
+subset of the data partition that falls in the bucket, and we sort that
+subset.</li>
+  <li>For each of the L buckets, we gather all of the K sorted subsets that 
fall in
+that bucket.</li>
+  <li>For each of the L buckets, we merge the corresponding K sorted 
subsets.</li>
+  <li>We turn each bucket into a pandas DataFrame and place it in the Plasma 
store.</li>
+</ul>
+
+<p>Using this scheme, we can sort the DataFrame (the data starts and ends in 
the
+Plasma store), in 44 seconds, giving an 11x speedup over the baseline.</p>
+
+<h3 id="design">Design</h3>
+
+<p>The Plasma store runs as a separate process. It is written in C++ and is
+designed as a single-threaded event loop based on the <a 
href="https://redis.io/";>Redis</a> event loop library.
+The plasma client library can be linked into applications. Clients communicate
+with the Plasma store via messages serialized using <a 
href="https://google.github.io/flatbuffers/";>Google Flatbuffers</a>.</p>
+
+<h3 id="call-for-contributions">Call for contributions</h3>
+
+<p>Plasma is a work in progress, and the API is currently unstable. Today 
Plasma is
+primarily used in <a href="https://github.com/ray-project/ray";>Ray</a> as an 
in-memory cache for Arrow serialized objects.
+We are looking for a broader set of use cases to help refine Plasmaâs API. In
+addition, we are looking for contributions in a variety of areas including
+improving performance and building other language bindings. Please let us know
+if you are interested in getting involved with the project.</p>
+
+
+
+    <hr/>
+<footer class="footer">
+  <p>Apache Arrow, Arrow, Apache, the Apache feather logo, and the Apache 
Arrow project logo are either registered trademarks or trademarks of The Apache 
Software Foundation in the United States and other countries.</p>
+  <p>&copy; 2017 Apache Software Foundation</p>
+</footer>
+
+  </div>
+</body>
+</html>

http://git-wip-us.apache.org/repos/asf/arrow-site/blob/3cd84682/build/blog/2017/08/15/0.6.0-release/index.html
----------------------------------------------------------------------
diff --git a/build/blog/2017/08/15/0.6.0-release/index.html 
b/build/blog/2017/08/15/0.6.0-release/index.html
new file mode 100644
index 0000000..4276b8c
--- /dev/null
+++ b/build/blog/2017/08/15/0.6.0-release/index.html
@@ -0,0 +1,234 @@
+<!DOCTYPE html>
+<html lang="en-US">
+  <head>
+    <meta charset="UTF-8">
+    <title>Apache Arrow Homepage</title>
+    <meta http-equiv="X-UA-Compatible" content="IE=edge">
+    <meta name="viewport" content="width=device-width, initial-scale=1">
+    <meta name="generator" content="Jekyll v3.4.3">
+    <!-- The above 3 meta tags *must* come first in the head; any other head 
content must come *after* these tags -->
+    <link rel="icon" type="image/x-icon" href="/favicon.ico">
+
+    <link rel="stylesheet" 
href="//fonts.googleapis.com/css?family=Lato:300,300italic,400,400italic,700,700italic,900">
+
+    <link href="/css/main.css" rel="stylesheet">
+    <link href="/css/syntax.css" rel="stylesheet">
+    <script src="https://code.jquery.com/jquery-3.2.1.min.js";
+            integrity="sha256-hwg4gsxgFZhOsEEamdOYGBf13FyQuiTwlAQgxVSNgt4="
+            crossorigin="anonymous"></script>
+    <script src="/assets/javascripts/bootstrap.min.js"></script>
+    
+    <!-- Global Site Tag (gtag.js) - Google Analytics -->
+<script async 
src="https://www.googletagmanager.com/gtag/js?id=UA-107500873-1";></script>
+<script>
+  window.dataLayer = window.dataLayer || [];
+  function gtag(){dataLayer.push(arguments)};
+  gtag('js', new Date());
+
+  gtag('config', 'UA-107500873-1');
+</script>
+
+    
+  </head>
+
+
+
+<body class="wrap">
+  <div class="container">
+    <nav class="navbar navbar-default">
+  <div class="container-fluid">
+    <div class="navbar-header">
+      <button type="button" class="navbar-toggle" data-toggle="collapse" 
data-target="#arrow-navbar">
+        <span class="sr-only">Toggle navigation</span>
+        <span class="icon-bar"></span>
+        <span class="icon-bar"></span>
+        <span class="icon-bar"></span>
+      </button>
+      <a class="navbar-brand" href="/">Apache 
Arrow&#8482;&nbsp;&nbsp;&nbsp;</a>
+    </div>
+
+    <!-- Collect the nav links, forms, and other content for toggling -->
+    <div class="collapse navbar-collapse" id="arrow-navbar">
+      <ul class="nav navbar-nav">
+        <li class="dropdown">
+          <a href="#" class="dropdown-toggle" data-toggle="dropdown"
+             role="button" aria-haspopup="true"
+             aria-expanded="false">Project Links<span class="caret"></span>
+          </a>
+          <ul class="dropdown-menu">
+            <li><a href="/install/">Install</a></li>
+            <li><a href="/blog/">Blog</a></li>
+            <li><a href="/release/">Releases</a></li>
+            <li><a href="https://issues.apache.org/jira/browse/ARROW";>Issue 
Tracker</a></li>
+            <li><a href="https://github.com/apache/arrow";>Source Code</a></li>
+            <li><a 
href="http://mail-archives.apache.org/mod_mbox/arrow-dev/";>Mailing List</a></li>
+            <li><a href="https://apachearrowslackin.herokuapp.com";>Slack 
Channel</a></li>
+            <li><a href="/committers/">Committers</a></li>
+            <li><a href="/powered_by/">Powered By</a></li>
+          </ul>
+        </li>
+        <li class="dropdown">
+          <a href="#" class="dropdown-toggle" data-toggle="dropdown"
+             role="button" aria-haspopup="true"
+             aria-expanded="false">Specification<span class="caret"></span>
+          </a>
+          <ul class="dropdown-menu">
+            <li><a href="/docs/memory_layout.html">Memory Layout</a></li>
+            <li><a href="/docs/metadata.html">Metadata</a></li>
+            <li><a href="/docs/ipc.html">Messaging / IPC</a></li>
+          </ul>
+        </li>
+
+        <li class="dropdown">
+          <a href="#" class="dropdown-toggle" data-toggle="dropdown"
+             role="button" aria-haspopup="true"
+             aria-expanded="false">Documentation<span class="caret"></span>
+          </a>
+          <ul class="dropdown-menu">
+            <li><a href="/docs/python">Python</a></li>
+            <li><a href="/docs/cpp">C++ API</a></li>
+            <li><a href="/docs/java">Java API</a></li>
+            <li><a href="/docs/c_glib">C GLib API</a></li>
+          </ul>
+        </li>
+        <!-- <li><a href="/blog">Blog</a></li> -->
+        <li class="dropdown">
+          <a href="#" class="dropdown-toggle" data-toggle="dropdown"
+             role="button" aria-haspopup="true"
+             aria-expanded="false">ASF Links<span class="caret"></span>
+          </a>
+          <ul class="dropdown-menu">
+            <li><a href="http://www.apache.org/";>ASF Website</a></li>
+            <li><a href="http://www.apache.org/licenses/";>License</a></li>
+            <li><a 
href="http://www.apache.org/foundation/sponsorship.html";>Donate</a></li>
+            <li><a 
href="http://www.apache.org/foundation/thanks.html";>Thanks</a></li>
+            <li><a href="http://www.apache.org/security/";>Security</a></li>
+          </ul>
+        </li>
+      </ul>
+      <a href="http://www.apache.org/";>
+        <img style="float:right;" src="/img/asf_logo.svg" width="120px"/>
+      </a>
+      </div><!-- /.navbar-collapse -->
+    </div>
+  </nav>
+
+
+    <h2>
+      Apache Arrow 0.6.0 Release
+      <a href="/blog/2017/08/15/0.6.0-release/" class="permalink" 
title="Permalink">â</a>
+    </h2>
+
+    
+
+    <div class="panel">
+      <div class="panel-body">
+        <div>
+          <span class="label label-default">Published</span>
+          <span class="published">
+            <i class="fa fa-calendar"></i>
+            15 Aug 2017
+          </span>
+        </div>
+        <div>
+          <span class="label label-default">By</span>
+          <a href="http://wesmckinney.com";><i class="fa fa-user"></i> Wes 
McKinney (wesm)</a>
+        </div>
+      </div>
+    </div>
+
+    <!--
+
+-->
+
+<p>The Apache Arrow team is pleased to announce the 0.6.0 release. It includes
+<a 
href="https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%200.6.0"><strong>90
 resolved JIRAs</strong></a> with the new Plasma shared memory object store, and
+improvements and bug fixes to the various language implementations. The Arrow
+memory format remains stable since the 0.3.x release.</p>
+
+<p>See the <a href="http://arrow.apache.org/install";>Install Page</a> to learn 
how to get the libraries for your
+platform. The <a href="http://arrow.apache.org/release/0.6.0.html";>complete 
changelog</a> is also available.</p>
+
+<h2 id="plasma-shared-memory-object-store">Plasma Shared Memory Object 
Store</h2>
+
+<p>This release includes the <a 
href="http://arrow.apache.org/blog/2017/08/08/plasma-in-memory-object-store/";>Plasma
 Store</a>, which you can read more about in
+the linked blog post. This system was originally developed as part of the <a 
href="https://ray-project.github.io/ray/";>Ray
+Project</a> at the <a href="https://rise.cs.berkeley.edu/";>UC Berkeley 
RISELab</a>. We recognized that Plasma would be
+highly valuable to the Arrow community as a tool for shared memory management
+and zero-copy deserialization. Additionally, we believe we will be able to
+develop a stronger software stack through sharing of IO and buffer management
+code.</p>
+
+<p>The Plasma store is a server application which runs as a separate process. A
+reference C++ client, with Python bindings, is made available in this
+release. Clients can be developed in Java or other languages in the future to
+enable simple sharing of complex datasets through shared memory.</p>
+
+<h2 id="arrow-format-addition-map-type">Arrow Format Addition: Map type</h2>
+
+<p>We added a Map logical type to represent ordered and unordered maps
+in-memory. This corresponds to the <code class="highlighter-rouge">MAP</code> 
logical type annotation in the Parquet
+format (where maps are represented as repeated structs).</p>
+
+<p>Map is represented as a list of structs. It is the first example of a 
logical
+type whose physical representation is a nested type. We have not yet created
+implementations of Map containers in any of the implementations, but this can
+be done in a future release.</p>
+
+<p>As an example, the Python data:</p>
+
+<div class="highlighter-rouge"><pre class="highlight"><code>data = [{'a': 1, 
'bb': 2, 'cc': 3}, {'dddd': 4}]
+</code></pre>
+</div>
+
+<p>Could be represented in an Arrow <code 
class="highlighter-rouge">Map&lt;String, Int32&gt;</code> as:</p>
+
+<div class="highlighter-rouge"><pre class="highlight"><code>Map&lt;String, 
Int32&gt; = List&lt;Struct&lt;keys: String, values: Int32&gt;&gt;
+  is_valid: [true, true]
+  offsets: [0, 3, 4]
+  values: Struct&lt;keys: String, values: Int32&gt;
+    children:
+      - keys: String
+          is_valid: [true, true, true, true]
+          offsets: [0, 1, 3, 5, 9]
+          data: abbccdddd
+      - values: Int32
+          is_valid: [true, true, true, true]
+          data: [1, 2, 3, 4]
+</code></pre>
+</div>
+<h2 id="python-changes">Python Changes</h2>
+
+<p>Some highlights of Python development outside of bug fixes and general API
+improvements include:</p>
+
+<ul>
+  <li>New <code class="highlighter-rouge">strings_to_categorical=True</code> 
option when calling <code class="highlighter-rouge">Table.to_pandas</code> will
+yield pandas <code class="highlighter-rouge">Categorical</code> types from 
Arrow binary and string columns</li>
+  <li>Expanded Hadoop Filesystem (HDFS) functionality to improve compatibility 
with
+Dask and other HDFS-aware Python libraries.</li>
+  <li>s3fs and other Dask-oriented filesystems can now be used with
+<code class="highlighter-rouge">pyarrow.parquet.ParquetDataset</code></li>
+  <li>More graceful handling of pandasâs nanosecond timestamps when writing 
to
+Parquet format. You can now pass <code 
class="highlighter-rouge">coerce_timestamps='ms'</code> to cast to
+milliseconds, or <code class="highlighter-rouge">'us'</code> for 
microseconds.</li>
+</ul>
+
+<h2 id="toward-arrow-100-and-beyond">Toward Arrow 1.0.0 and Beyond</h2>
+
+<p>We are still discussing the roadmap to 1.0.0 release on the <a 
href="http://mail-archives.apache.org/mod_mbox/arrow-dev/";>developer mailing
+list</a>. The focus of the 1.0.0 release will likely be memory format stability
+and hardening integration tests across the remaining data types implemented in
+Java and C++. Please join the discussion there.</p>
+
+
+
+    <hr/>
+<footer class="footer">
+  <p>Apache Arrow, Arrow, Apache, the Apache feather logo, and the Apache 
Arrow project logo are either registered trademarks or trademarks of The Apache 
Software Foundation in the United States and other countries.</p>
+  <p>&copy; 2017 Apache Software Foundation</p>
+</footer>
+
+  </div>
+</body>
+</html>

[44/51] [partial] arrow-site git commit: Update API documentation for version 0.9.0

Reply via email to