[1/3] arrow-site git commit: Add Plasma blog post

wesm Tue, 08 Aug 2017 07:26:19 -0700

Repository: arrow-site
Updated Branches:
  refs/heads/asf-site b286da84c -> 3b67853c5



http://git-wip-us.apache.org/repos/asf/arrow-site/blob/3b67853c/docs/ipc.html
----------------------------------------------------------------------
diff --git a/docs/ipc.html b/docs/ipc.html
index ffbe491..69bfa36 100644
--- a/docs/ipc.html
+++ b/docs/ipc.html
@@ -106,17 +106,22 @@
 -->
 
 <!---
-  Licensed under the Apache License, Version 2.0 (the "License");
-  you may not use this file except in compliance with the License.
-  You may obtain a copy of the License at
-
-   http://www.apache.org/licenses/LICENSE-2.0
-
-  Unless required by applicable law or agreed to in writing, software
-  distributed under the License is distributed on an "AS IS" BASIS,
-  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-  See the License for the specific language governing permissions and
-  limitations under the License. See accompanying LICENSE file.
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
 -->
 
 <h1 id="interprocess-messaging--communication-ipc">Interprocess messaging / 
communication (IPC)</h1>

http://git-wip-us.apache.org/repos/asf/arrow-site/blob/3b67853c/docs/memory_layout.html
----------------------------------------------------------------------
diff --git a/docs/memory_layout.html b/docs/memory_layout.html
index 7703a15..98cb556 100644
--- a/docs/memory_layout.html
+++ b/docs/memory_layout.html
@@ -106,17 +106,22 @@
 -->
 
 <!---
-  Licensed under the Apache License, Version 2.0 (the "License");
-  you may not use this file except in compliance with the License.
-  You may obtain a copy of the License at
-
-   http://www.apache.org/licenses/LICENSE-2.0
-
-  Unless required by applicable law or agreed to in writing, software
-  distributed under the License is distributed on an "AS IS" BASIS,
-  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-  See the License for the specific language governing permissions and
-  limitations under the License. See accompanying LICENSE file.
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
 -->
 
 <h1 id="arrow-physical-memory-layout">Arrow: Physical memory layout</h1>
@@ -167,7 +172,11 @@ proprietary systems that utilize the open source 
components.</li>
 linearly in the nesting level</li>
   <li>Capable of representing fully-materialized and decoded / decompressed <a 
href="https://parquet.apache.org/documentation/latest/";>Parquet</a>
 data</li>
-  <li>All contiguous memory buffers are aligned at 64-byte boundaries and 
padded to a multiple of 64 bytes.</li>
+  <li>It is required to have all the contiguous memory buffers in an IPC 
payload
+aligned at 8-byte boundaries. In other words, each buffer must start at
+an aligned 8-byte offset.</li>
+  <li>The general recommendation is to align the buffers at 64-byte boundary, 
but
+this is not absolutely necessary.</li>
   <li>Any relative type can have null slots</li>
   <li>Arrays are immutable once created. Implementations can provide APIs to 
mutate
 an array, but applying mutations will require a new array data structure to
@@ -218,9 +227,9 @@ via byte swapping.</p>
 
 <h2 id="alignment-and-padding">Alignment and Padding</h2>
 
-<p>As noted above, all buffers are intended to be aligned in memory at 64 byte
-boundaries and padded to a length that is a multiple of 64 bytes.  The 
alignment
-requirement follows best practices for optimized memory access:</p>
+<p>As noted above, all buffers must be aligned in memory at 8-byte boundaries 
and padded
+to a length that is a multiple of 8 bytes.  The alignment requirement follows 
best
+practices for optimized memory access:</p>
 
 <ul>
   <li>Elements in numeric arrays will be guaranteed to be retrieved via 
aligned access.</li>
@@ -229,12 +238,14 @@ requirement follows best practices for optimized memory 
access:</p>
 data-structures over 64 bytes (which will be a common case for Arrow 
Arrays).</li>
 </ul>
 
-<p>Requiring padding to a multiple of 64 bytes allows for using <a 
href="https://software.intel.com/en-us/node/600110";>SIMD</a> instructions
+<p>Recommending padding to a multiple of 64 bytes allows for using <a 
href="https://software.intel.com/en-us/node/600110";>SIMD</a> instructions
 consistently in loops without additional conditional checks.
-This should allow for simpler and more efficient code.
+This should allow for simpler, efficient and CPU cache-friendly code.
 The specific padding length was chosen because it matches the largest known
-SIMD instruction registers available as of April 2016 (Intel AVX-512).
-Guaranteed padding can also allow certain compilers
+SIMD instruction registers available as of April 2016 (Intel AVX-512). In other
+words, we can load the entire 64-byte buffer into a 512-bit wide SIMD register
+and get data-level parallelism on all the columnar values packed into the 
64-byte
+buffer. Guaranteed padding can also allow certain compilers
 to generate more optimized code directly (e.g. One can safely use Intelâs
 <code class="highlighter-rouge">-qopt-assume-safe-padding</code>).</p>
 

http://git-wip-us.apache.org/repos/asf/arrow-site/blob/3b67853c/docs/metadata.html
----------------------------------------------------------------------
diff --git a/docs/metadata.html b/docs/metadata.html
index 76da9eb..7382193 100644
--- a/docs/metadata.html
+++ b/docs/metadata.html
@@ -106,17 +106,22 @@
 -->
 
 <!---
-  Licensed under the Apache License, Version 2.0 (the "License");
-  you may not use this file except in compliance with the License.
-  You may obtain a copy of the License at
-
-   http://www.apache.org/licenses/LICENSE-2.0
-
-  Unless required by applicable law or agreed to in writing, software
-  distributed under the License is distributed on an "AS IS" BASIS,
-  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-  See the License for the specific language governing permissions and
-  limitations under the License. See accompanying LICENSE file.
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
 -->
 
 <h1 id="metadata-logical-types-schemas-data-headers">Metadata: Logical types, 
schemas, data headers</h1>

http://git-wip-us.apache.org/repos/asf/arrow-site/blob/3b67853c/feed.xml
----------------------------------------------------------------------
diff --git a/feed.xml b/feed.xml
index f01301e..453eee8 100644
--- a/feed.xml
+++ b/feed.xml
@@ -1,4 +1,125 @@
-<?xml version="1.0" encoding="utf-8"?><feed 
xmlns="http://www.w3.org/2005/Atom"; ><generator uri="https://jekyllrb.com/"; 
version="3.4.3">Jekyll</generator><link href="/feed.xml" rel="self" 
type="application/atom+xml" /><link href="/" rel="alternate" type="text/html" 
/><updated>2017-07-27T11:28:36-04:00</updated><id>/</id><entry><title 
type="html">Speeding up PySpark with Apache Arrow</title><link 
href="/blog/2017/07/26/spark-arrow/" rel="alternate" type="text/html" 
title="Speeding up PySpark with Apache Arrow" 
/><published>2017-07-26T12:00:00-04:00</published><updated>2017-07-26T12:00:00-04:00</updated><id>/blog/2017/07/26/spark-arrow</id><content
 type="html" xml:base="/blog/2017/07/26/spark-arrow/">&lt;!--
+<?xml version="1.0" encoding="utf-8"?><feed 
xmlns="http://www.w3.org/2005/Atom"; ><generator uri="https://jekyllrb.com/"; 
version="3.4.3">Jekyll</generator><link href="/feed.xml" rel="self" 
type="application/atom+xml" /><link href="/" rel="alternate" type="text/html" 
/><updated>2017-08-08T10:25:08-04:00</updated><id>/</id><entry><title 
type="html">Plasma In-Memory Object Store</title><link 
href="/blog/2017/08/08/plasma-in-memory-object-store/" rel="alternate" 
type="text/html" title="Plasma In-Memory Object Store" 
/><published>2017-08-08T00:00:00-04:00</published><updated>2017-08-08T00:00:00-04:00</updated><id>/blog/2017/08/08/plasma-in-memory-object-store</id><content
 type="html" xml:base="/blog/2017/08/08/plasma-in-memory-object-store/">&lt;!--
+
+--&gt;
+
+&lt;p&gt;&lt;em&gt;&lt;a 
href=&quot;https://people.eecs.berkeley.edu/~pcmoritz/&quot;&gt;Philipp 
Moritz&lt;/a&gt; and &lt;a 
href=&quot;http://www.robertnishihara.com&quot;&gt;Robert Nishihara&lt;/a&gt; 
are graduate students at UC
+ Berkeley.&lt;/em&gt;&lt;/p&gt;
+
+&lt;h2 
id=&quot;plasma-a-high-performance-shared-memory-object-store&quot;&gt;Plasma: 
A High-Performance Shared-Memory Object Store&lt;/h2&gt;
+
+&lt;h3 id=&quot;motivating-plasma&quot;&gt;Motivating Plasma&lt;/h3&gt;
+
+&lt;p&gt;This blog post presents Plasma, an in-memory object store that is 
being
+developed as part of Apache Arrow. &lt;strong&gt;Plasma holds immutable 
objects in shared
+memory so that they can be accessed efficiently by many clients across process
+boundaries.&lt;/strong&gt; In light of the trend toward larger and larger 
multicore machines,
+Plasma enables critical performance optimizations in the big data 
regime.&lt;/p&gt;
+
+&lt;p&gt;Plasma was initially developed as part of &lt;a 
href=&quot;https://github.com/ray-project/ray&quot;&gt;Ray&lt;/a&gt;, and has 
recently been moved
+to Apache Arrow in the hopes that it will be broadly useful.&lt;/p&gt;
+
+&lt;p&gt;One of the goals of Apache Arrow is to serve as a common data layer 
enabling
+zero-copy data exchange between multiple frameworks. A key component of this
+vision is the use of off-heap memory management (via Plasma) for storing and
+sharing Arrow-serialized objects between applications.&lt;/p&gt;
+
+&lt;p&gt;&lt;strong&gt;Expensive serialization and deserialization as well as 
data copying are a
+common performance bottleneck in distributed computing.&lt;/strong&gt; For 
example, a
+Python-based execution framework that wishes to distribute computation across
+multiple Python âworkerâ processes and then aggregate the results in a 
single
+âdriverâ process may choose to serialize data using the built-in &lt;code 
class=&quot;highlighter-rouge&quot;&gt;pickle&lt;/code&gt;
+library. Assuming one Python process per core, each worker process would have 
to
+copy and deserialize the data, resulting in excessive memory usage. The driver
+process would then have to deserialize results from each of the workers,
+resulting in a bottleneck.&lt;/p&gt;
+
+&lt;p&gt;Using Plasma plus Arrow, the data being operated on would be placed 
in the
+Plasma store once, and all of the workers would read the data without copying 
or
+deserializing it (the workers would map the relevant region of memory into 
their
+own address spaces). The workers would then put the results of their 
computation
+back into the Plasma store, which the driver could then read and aggregate
+without copying or deserializing the data.&lt;/p&gt;
+
+&lt;h3 id=&quot;the-plasma-api&quot;&gt;The Plasma API:&lt;/h3&gt;
+
+&lt;p&gt;Below we illustrate a subset of the API. The C++ API is documented 
more fully
+&lt;a 
href=&quot;https://github.com/apache/arrow/blob/master/cpp/apidoc/tutorials/plasma.md&quot;&gt;here&lt;/a&gt;,
 and the Python API is documented &lt;a 
href=&quot;https://github.com/apache/arrow/blob/master/python/doc/source/plasma.rst&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
+
+&lt;p&gt;&lt;strong&gt;Object IDs:&lt;/strong&gt; Each object is associated 
with a string of bytes.&lt;/p&gt;
+
+&lt;p&gt;&lt;strong&gt;Creating an object:&lt;/strong&gt; Objects are stored 
in Plasma in two stages. First, the
+object store &lt;em&gt;creates&lt;/em&gt; the object by allocating a buffer 
for it. At this point,
+the client can write to the buffer and construct the object within the 
allocated
+buffer. When the client is done, the client &lt;em&gt;seals&lt;/em&gt; the 
buffer making the object
+immutable and making it available to other Plasma clients.&lt;/p&gt;
+
+&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;pre 
class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c&quot;&gt;# 
Create an object.&lt;/span&gt;
+&lt;span class=&quot;n&quot;&gt;object_id&lt;/span&gt; &lt;span 
class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;pyarrow&lt;/span&gt;&lt;span 
class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span 
class=&quot;n&quot;&gt;plasma&lt;/span&gt;&lt;span 
class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span 
class=&quot;n&quot;&gt;ObjectID&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span 
class=&quot;mi&quot;&gt;20&lt;/span&gt; &lt;span 
class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span 
class=&quot;s&quot;&gt;'a'&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;)&lt;/span&gt;
+&lt;span class=&quot;n&quot;&gt;object_size&lt;/span&gt; &lt;span 
class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span 
class=&quot;mi&quot;&gt;1000&lt;/span&gt;
+&lt;span class=&quot;nb&quot;&gt;buffer&lt;/span&gt; &lt;span 
class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;memoryview&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span 
class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span 
class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span 
class=&quot;n&quot;&gt;create&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span 
class=&quot;n&quot;&gt;object_id&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;object_size&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;))&lt;/span&gt;
+
+&lt;span class=&quot;c&quot;&gt;# Write to the buffer.&lt;/span&gt;
+&lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span 
class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span 
class=&quot;nb&quot;&gt;range&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span 
class=&quot;mi&quot;&gt;1000&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;):&lt;/span&gt;
+    &lt;span class=&quot;nb&quot;&gt;buffer&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span 
class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span 
class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span 
class=&quot;mi&quot;&gt;0&lt;/span&gt;
+
+&lt;span class=&quot;c&quot;&gt;# Seal the object making it immutable and 
available to other clients.&lt;/span&gt;
+&lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span 
class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span 
class=&quot;n&quot;&gt;seal&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span 
class=&quot;n&quot;&gt;object_id&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;)&lt;/span&gt;
+&lt;/code&gt;&lt;/pre&gt;
+&lt;/div&gt;
+
+&lt;p&gt;&lt;strong&gt;Getting an object:&lt;/strong&gt; After an object has 
been sealed, any client who knows the
+object ID can get the object.&lt;/p&gt;
+
+&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;pre 
class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c&quot;&gt;# 
Get the object from the store. This blocks until the object has been 
sealed.&lt;/span&gt;
+&lt;span class=&quot;n&quot;&gt;object_id&lt;/span&gt; &lt;span 
class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;pyarrow&lt;/span&gt;&lt;span 
class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span 
class=&quot;n&quot;&gt;plasma&lt;/span&gt;&lt;span 
class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span 
class=&quot;n&quot;&gt;ObjectID&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span 
class=&quot;mi&quot;&gt;20&lt;/span&gt; &lt;span 
class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span 
class=&quot;s&quot;&gt;'a'&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;)&lt;/span&gt;
+&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span 
class=&quot;n&quot;&gt;buff&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span 
class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span 
class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span 
class=&quot;n&quot;&gt;get&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;([&lt;/span&gt;&lt;span 
class=&quot;n&quot;&gt;object_id&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;])&lt;/span&gt;
+&lt;span class=&quot;nb&quot;&gt;buffer&lt;/span&gt; &lt;span 
class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span 
class=&quot;n&quot;&gt;memoryview&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span 
class=&quot;n&quot;&gt;buff&lt;/span&gt;&lt;span 
class=&quot;p&quot;&gt;)&lt;/span&gt;
+&lt;/code&gt;&lt;/pre&gt;
+&lt;/div&gt;
+
+&lt;p&gt;If the object has not been sealed yet, then the call to &lt;code 
class=&quot;highlighter-rouge&quot;&gt;client.get&lt;/code&gt; will block
+until the object has been sealed.&lt;/p&gt;
+
+&lt;h3 id=&quot;a-sorting-application&quot;&gt;A sorting application&lt;/h3&gt;
+
+&lt;p&gt;To illustrate the benefits of Plasma, we demonstrate an 
&lt;strong&gt;11x speedup&lt;/strong&gt; (on a
+machine with 20 physical cores) for sorting a large pandas DataFrame (one
+billion entries). The baseline is the built-in pandas sort function, which 
sorts
+the DataFrame in 477 seconds. To leverage multiple cores, we implement the
+following standard distributed sorting scheme.&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;We assume that the data is partitioned across K pandas DataFrames 
and that
+each one already lives in the Plasma store.&lt;/li&gt;
+  &lt;li&gt;We subsample the data, sort the subsampled data, and use the 
result to define
+L non-overlapping buckets.&lt;/li&gt;
+  &lt;li&gt;For each of the K data partitions and each of the L buckets, we 
find the
+subset of the data partition that falls in the bucket, and we sort that
+subset.&lt;/li&gt;
+  &lt;li&gt;For each of the L buckets, we gather all of the K sorted subsets 
that fall in
+that bucket.&lt;/li&gt;
+  &lt;li&gt;For each of the L buckets, we merge the corresponding K sorted 
subsets.&lt;/li&gt;
+  &lt;li&gt;We turn each bucket into a pandas DataFrame and place it in the 
Plasma store.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;Using this scheme, we can sort the DataFrame (the data starts and 
ends in the
+Plasma store), in 44 seconds, giving an 11x speedup over the 
baseline.&lt;/p&gt;
+
+&lt;h3 id=&quot;design&quot;&gt;Design&lt;/h3&gt;
+
+&lt;p&gt;The Plasma store runs as a separate process. It is written in C++ and 
is
+designed as a single-threaded event loop based on the &lt;a 
href=&quot;https://redis.io/&quot;&gt;Redis&lt;/a&gt; event loop library.
+The plasma client library can be linked into applications. Clients communicate
+with the Plasma store via messages serialized using &lt;a 
href=&quot;https://google.github.io/flatbuffers/&quot;&gt;Google 
Flatbuffers&lt;/a&gt;.&lt;/p&gt;
+
+&lt;h3 id=&quot;call-for-contributions&quot;&gt;Call for 
contributions&lt;/h3&gt;
+
+&lt;p&gt;Plasma is a work in progress, and the API is currently unstable. 
Today Plasma is
+primarily used in &lt;a 
href=&quot;https://github.com/ray-project/ray&quot;&gt;Ray&lt;/a&gt; as an 
in-memory cache for Arrow serialized objects.
+We are looking for a broader set of use cases to help refine Plasmaâs API. In
+addition, we are looking for contributions in a variety of areas including
+improving performance and building other language bindings. Please let us know
+if you are interested in getting involved with the 
project.&lt;/p&gt;</content><author><name>Philipp Moritz and Robert 
Nishihara</name></author></entry><entry><title type="html">Speeding up PySpark 
with Apache Arrow</title><link href="/blog/2017/07/26/spark-arrow/" 
rel="alternate" type="text/html" title="Speeding up PySpark with Apache Arrow" 
/><published>2017-07-26T12:00:00-04:00</published><updated>2017-07-26T12:00:00-04:00</updated><id>/blog/2017/07/26/spark-arrow</id><content
 type="html" xml:base="/blog/2017/07/26/spark-arrow/">&lt;!--
 
 --&gt;

[1/3] arrow-site git commit: Add Plasma blog post

Reply via email to